You are here

Encoding! eek

Posted at 11:09 on Thu, 6th September 2007 in mysql.

I recently went through hell and back with migrating a clients website over to a new web host: Basically, the lesson learnt here is, when importing data into mysql, make sure your character set's are set properly - client, server, connection, browser - they can all be set to different character sets.

In more detail, the clients database was in utf8 format and since it contains both english and french text, it uses a lot of diacritic marks (otherwise known as accent's). Basically utf8 is Unicode, which is an encoding that can have up to 4 bytes for one character. This allows a very large number of characters to be represented in one encoding. For more information on utf8 see the utf8 wikipedia page

So if you have a database that has utf8 data, and you're on a machine thats default might be latin1, or ISO-8859-1, if you try do a mysql import, you'll find all the 4 bytes characters being converted into individual characters - in other words.. characters (pun) - gibberish characters!

The simple fix is to tell mysql to import the data using the right character set, so:
$ mysql -u root -D mydatabase --default-character-set=utf8 > myexporteddata.sql

In my instance, because the database was so large, I couldn't import via phpmyadmin, and because the web hosting company didn't allow remote mysql connections, they had to do the import themselves - for some reason, after dozens of tickets back and forth, they didn't think to try this.

Anyway, I am quite relieved to say the least, since this was a fairly important site/client - I just hope my notes above can save someone else all this trouble - check your encodings!

A few sites that were helpful throughout the ordeal:

You might ask, why not host the site yourself? - unfortunately, which web host a client uses, is not my decision, and since the site's visitors are mostly in Europe, the site needs to be hosted somewhere close by.

If you're looking for a web host that has good uptimes, and features isn't too bad, but don't expect them to make an effort to assist with fixing such issues - they will only go so far.