HARRYWEBSTER
Fixing mixed encoding (latin1 to utf8)

After recently migrating to an environment that uses to a UTF-8 MySQL database connection we found a large number of invalid characters even though the tables “should” have already been converted. Examples: They’re should be They’re frappé should be frappe 65º should be 65º – should be –
I first came across this problem in an early version of WordPress however the fix I used then was catastrophic in this instance, data simply vanished mid way through the record with no errors reported.
Although the following method is a little laborious (depending on the size/complexity of you database) I found it to be the best solution.
Firstly we need to export the data which looks broken, we’re going to be using this to manually verify the fix has worked.
echo "select id, broken_text_in_here from table_with_broken_cols where broken_text_in_here !=''" | mysql database_name > backup_pre
In a nutshell we’re going to write a PHP script to…
- Connect to the database using Latin1 charset
- Fetch all the broken data
- Convert this data to HEX
- Update the connection charset to UTF-8
- Update the broken records with the unhex-ed data
Lets get started!
$mysql_user = “username”;
$mysql_pass = “password";
$mysql_host = “localhost”;
$mysql_db = “database_name”;
$broken_table = “broken_table_name”;
$broken_col = “broken_column_name”;
$link = mysql_connect($mysql_host, $mysql_user, $mysql_pass) or die('Error: ' . mysql_error());
echo 'Connected successfully';
mysql_select_db( $mysql_db ) or die('Could not select database');
mysql_query("SET NAMES latin1");
$result = mysql_query("SELECT id, $broken_col FROM $broken_table WHERE $broken_col != ‘’") or die('Query failed: ' . mysql_error());
while ($line = mysql_fetch_array($result, MYSQL_ASSOC)) {
$hex = bin2hex($line[$col]);
foreach ($line as $col_value) {
echo "- $col_value";
mysql_query("SET NAMES utf8");
mysql_query("UPDATE $broken_table SET $broken_col=UNHEX('$hex') WHERE id=".$line["id"]);
echo "\n";
}
}
mysql_free_result($result);
mysql_close($link);
After this script has been run we should re-run the original export we did to ensure no data has been lost!
echo "select id, broken_text_in_here from table_with_broken_cols where broken_text_in_here != ''" | mysql database_name > backup_post
Now using VIMDIFF verify the output:
$ vimdiff backup_pre backup_post