• Thanks for stopping by. Logging in to a registered account will remove all generic ads. Please reach out with any questions or concerns.

Fixing broken accents

Mike Bobbitt

Administrator
Staff member
Owner
Directing Staff
Reaction score
88
Points
960
My apologies for posting in English here, I can only suggest that for me to post in French would likely embarrass us all!

As most of you probably have noticed, many of the French accents were wrecked with the last forum upgrade. I have repaired some, but others still remain as "garbage characters" in older posts. I can fix the remaining accents in existing posts if I know what the original accent and the new "garbage characters" are. (Then it's a simple database search and replace.)

If anyone notices accents in older posts that are not showing correctly, please post a link and what the corrected accent should be and I'll attempt to fix them.

Sorry for the inconvenience.


Thanks
Mike
 

Zell_Dietrich

Full Member
Reaction score
0
Points
0
Please forgive me in advance - I'm a computer programmer, I saw this issue.  If you have an ability to execute SQL statements on the Database here,  why not do a massive search/replace.  I had to do this for one company I worked for "Upgraded" to Windows. 

Svp pardon mon mauvais Français.
Si vous pouvez employer le SQL, pouvez-vous employer Find/Replace pour mettre à jour tous une fois.
J'ai fait ceci une fois - damm Windows.  Merci.  J'essaye de pratiquer mon français. ::)
 

jo-dionne

Jr. Member
Reaction score
0
Points
0
-- Quote --
My apologies for posting in English here, ...
-- Quote --


Mike, since you 0wn the box ... you don't have to apologies.
However, considering all the time, effort and money you spend on this site… we all have to thank you to provide us this exceptional forum!

-- Quote --
I can only suggest that for me to post in French would likely embarrass us all!
-- Quote --


Don’t care… I embarrass myself each time I post/reply in English!

Dionne, J
 

Michael OLeary

Army.ca Fixture
Subscriber
Donor
Reaction score
1
Points
410
Zell_Dietrich said:
Please forgive me in advance - I'm a computer programmer, I saw this issue.  If you have an ability to execute SQL statements on the Database here,  why not do a massive search/replace.  I had to do this for one company I worked for "Upgraded" to Windows. 

Svp pardon mon mauvais Français.
Si vous pouvez employer le SQL, pouvez-vous employer Find/Replace pour mettre à jour tous une fois.
J'ai fait ceci une fois - damm Windows.  Merci.  J'essaye de pratiquer mon français. ::)

Zell, Mike can do the search and replace, but he needs someone to let him know exactly what to search for and what to replace it with. He needs help in building the table of character strings that currently appear, and what they should be.

Merci.
 

jo-dionne

Jr. Member
Reaction score
0
Points
0
Mike,

If you start the mysqld service from a 4.1.x (or 5.x) distribution with data created by MySQL 4.0 and older, you should start the server with the same character set and collation or you have to convert the columns character set.

ATTENTION, This may result in data loss!

Ex: [latin1|utf8|...]

ALTER TABLE smf_messages MODIFY subject BINARY;
ALTER TABLE smf_messages MODIFY subject TINYTEXT CHARACTER SET utf8;
...
ALTER TABLE smf_messages MODIFY body BINARY;
ALTER TABLE smf_messages MODIFY body TEXT CHARACTER SET utf8;
...

MySQL Ref:

10.10. Upgrading Character Sets from MySQL 4.0
http://dev.mysql.com/doc/refman/4.1/en/charset-upgrading.html

10.10.2. Converting 4.0 Character Columns to 4.1 Format
http://dev.mysql.com/doc/refman/4.1/en/charset-conversion.html


Dionne, J
Quebec City
 

Zell_Dietrich

Full Member
Reaction score
0
Points
0
I had this trouble when another tech at the office installed SQL Server onto an NT box and “copied” the data over. So redefining the tables would’t help,  the data itself had been altered.  I see that accents now work, so the tables can handle the data. I know that I just changed the text from the improperly converted to the proper code with Replace/Stuff.  I wrote a simple script that looped through all tables and worked like a charm.  (I made a big deal about having to find them manually)  Yes I wrote it in VB… yes I’m ashamed, but it took me 20 minutes and worked.

I thought that because this site was so complicated,  so involved that it might have been commercial software that was subtly tweaked. (Therefore some parts can’t be changed)  If this entire site was written from scratch I I’m standing(okay sitting) in awe.  Either way I’m impressed and very glad that I stumbled upon this site. I’ve a much better idea of what I’m getting myself into.  Thanks for all the hard work on this site.(It has helped me)
 

Mike Bobbitt

Administrator
Staff member
Owner
Directing Staff
Reaction score
88
Points
960
All,

Thanks for the feedback. In essence, the problem *should* be solved for future upgrades/restores, but couldn't retroactively be solved by changing the collation. (I.E. the data was backed up in a non-binary format which didn't lend itself to being restored to a UTF-8 database .) I believe I have resolved this particular issue if it comes up again, and we're using ISO-8859-1 across the board.

In the mean time, I can do a search and replace on the DB for any accents that may still be trashed. I've done a number already but I'm sure there are more left to fix up. If anyone comes across some, please let me know and I can take care of those too.


Cheers
Mike

P.S. Nice to see some technical folks here, I appreciate the follow-up info. The forum software isn't developed in-house, but the rest of the site (quotes, etc.) is a pet project of mine, always in need of improvement.
 

jo-dionne

Jr. Member
Reaction score
0
Points
0
Last night I have updated one of my server's SMF forum and the same problem occurred

I have fixed some accent from the mysql client, but some faulty characters can't be typed in the ssh console and it's way to slooow, so this morning I decided to write a simple PHP script to fix the accents.

Both ways work fine, however, it's much more faster with the script.


ATTENTION, This may result in data loss!


To replace the faulty characters from the script:

[root@pe800-1-3 ~]# /usr/bin/php /var/www/*smf_path*/repair_characters.phps
http://64.66.185.182/repair_characters.phps
    or
http://64.66.185.182/repair_characters.zip    (MD5: 3a29d58aa549b11fe28921a8ffc04902)


Character-set (Latin1, UTF-1, UTF-8, UTF-7,5, UTF-7, ...)
http://64.66.185.182/character-set.html


To replace the faulty characters manually from mysql client:
...
$db_name = 'smf_db';
$db_prefix = 'smf_';
...

[root@pe800-1-3 ~]# mysql
...
mysql> connect smf_db
...
mysql> UPDATE LOW_PRIORITY smf_messages SET subject = REPLACE(subject,'é','é');
mysql> UPDATE LOW_PRIORITY smf_messages SET body = REPLACE(body,'é','é');
...
mysql> UPDATE LOW_PRIORITY smf_messages SET subject = REPLACE(subject,'è','è');
mysql> UPDATE LOW_PRIORITY smf_messages SET body = REPLACE(body,'è','è');
...
...


Edit: I have fixed a bug in the .phps (Sorry!)


Dionne, J
Quebec City

 

Mike Bobbitt

Administrator
Staff member
Owner
Directing Staff
Reaction score
88
Points
960
Hi Dionne,

That's fantastic work. I've been using the manual method you describe above and it works, but it's time consuming (and I didn't know what all the patterns/replacements were). I've downloaed your script with wget and on first glance it looks like the accents didn't survive:

Code:
$search = array (
        '/Ã<88>/',
        '/Ã<89>/',
        '/Ã<8a>/',
        '/Ã<8b>/',
        '/è/',
        '/é/',
        '/ê/',
        '/ë/',
        '/Ã<80>/',
        '/à/',
        '/â/',
        '/Ã<87>/',
        '/ç/',
        '/Ã<9b>/',
        '/û/',
        '/é/',
        '/î/',
        '/Ã<8e>/',
        '/î/',
        '/Ã<8f>/',
        '/ï/',
        '/Ã<94>/',
        '/Ã'/',
        '/Ã<82>Ã/',
        '/â<82>¬/'
);

$replace = array (
        'Ã',
        'Ã',
        'Ã',
        'Ã',
        'è',
        'é',
        'ê',
        'ë',
        'Ã',
        'à',
        'â',
        'Ã',
        'ç',
        'Ã',
        'û',
        '©',
        '®',
        '\'',
        'Ã',

Could you mail it to me to see if that helps? (It could be the viewer I'm using - vi - but I want to be sure I don't cause any problems with improperly formatted characters.)

Thanks again, this is a very useful script! (And well written by the looks of it, you know your PHP...)


Cheers
Mike
 

jo-dionne

Jr. Member
Reaction score
0
Points
0
-- Quote --
It could be the viewer I'm using - vi ...
-- Quote --


I just opened the file with vi, it trash most of the characters

-- Quote --
Thanks again, this is a very useful script!
-- Quote --


It's a pleasure!

-- Quote --
And well written by the looks of it, you know your PHP...
-- Quote --


You may find some bug, it took me less than 45min to write.

I'm not a PHP expert, it’s a pretty good scripting language (well designed for fast string manipulation, like Perl). Though, I will always prefer real programming language… nothing will replace C and assembly!


Cheers,
Jonathan Dionne
 

Mike Bobbitt

Administrator
Staff member
Owner
Directing Staff
Reaction score
88
Points
960
Many thanks to Jonathan for the script, which has fixed most - or all - of the broken accents with basically no work on my part. If anyone sees any remaining problems, please let me know.

Cheers
Mike
 
Top