Fixing Corrupt Database Encoding
Contents
Background
In 0.21-fixes and below, MythTV required that the database be set up such that MySQL thought that text columns were encoded using the latin1 encoding and that all communications with the database be done using the latin1 encoding. Some distros (notably, Gentoo) and some users have misconfigured their MySQL servers incorrectly with the effect that all 8-bit characters are corrupted in the database.
The end result of this misconfiguration is that any user with a misconfigured database can not successfully upgrade to post-r16789 SVN trunk (which means that upgrades to 0.22, when released, will also fail). Before attempting to upgrade to post-r16789 SVN trunk or to 0.22, those users who have been running MythTV against a misconfigured database must fix the data encoding as described below.
Note that the use of the latin1 encoding does not prevent MythTV from using non-Latin characters. Instead, MythTV actually writes UTF-8 characters into the column and does all required conversions when reading/writing database data. The fact that the database uses latin1 encoding simply means that the MySQL server is unable to understand the data in the database--i.e. any 8-bit characters will be encoded in such a way that the MySQL server or any mysql client will see gibberish rather than the expected character. However, MythTV programs (including mythfrontend's GUI and OSD and MythWeb) will be able to interpret the data properly, meaning the user will see the proper characters through MythTV programs. For example, in a misconfigured database, the character, "å", would appear as shown when using MySQL programs, such as the mysql command-line client. When properly encoded, the character would be, "Ã¥". Note that properly encoded, the character (which is represented as a 2-byte character in UTF-8) seems to MySQL programs to be 2 characters. Similarly, 3-byte UTF-8 characters would be represented as 3 characters when using MySQL programs.
Determining if your database is misconfigured
To determine if your database is misconfigured, execute the following command from the shell (fixing the path to the mysql command-line client and/or the username/database name as required) and type in the password when prompted:
echo 'status;' | mysql -umythtv -p mythconverg
Misconfigured server
A misconfigured MySQL server will show output such as:
-------------- mysql Ver 14.12 Distrib 5.0.60, for pc-linux-gnu (i486) using readline 5.2 Connection id: 14 Current database: mythconverg Current user: mythtv@localhost SSL: Not in use Current pager: stdout Using outfile: '' Using delimiter: ; Server version: 5.0.60-log Gentoo Linux mysql-5.0.60-r1 Protocol version: 10 Connection: Localhost via UNIX socket Server characterset: utf8 Db characterset: latin1 Client characterset: utf8 Conn. characterset: utf8 UNIX socket: /var/run/mysqld/mysqld.sock Uptime: 4 min 45 sec Threads: 10 Questions: 45369 Slow queries: 0 Opens: 125 Flush tables: 1 Open tables: 107 Queries per second avg: 159.189
Note the Server characterset
, Client characterset
, and Conn. characterset
are utf8
.
Properly-configured server
A properly configured MySQL server should show output as below:
-------------- mysql Ver 14.12 Distrib 5.0.60, for pc-linux-gnu (i686) using readline 5.2 Connection id: 8162 Current database: mythconverg Current user: mythtv@localhost SSL: Not in use Current pager: stdout Using outfile: '' Using delimiter: ; Server version: 5.0.60 Protocol version: 10 Connection: Localhost via UNIX socket Server characterset: latin1 Db characterset: latin1 Client characterset: latin1 Conn. characterset: latin1 UNIX socket: /var/run/mysqld/mysql.sock Uptime: 14 days 22 hours 24 min 56 sec Threads: 5 Questions: 15125446 Slow queries: 42 Opens: 9904 Flush tables: 1 Open tables: 64 Queries per second avg: 11.722
Note the Server characterset
, Db characterset
, Client characterset
, and Conn. characterset
are latin1
.
Recognizing the difference
The important difference between the output of a misconfigured versus a properly-configured server is that some or all of the character sets in a misconfigured database will be utf8. For a MythTV 0.21-fixes or below system, the character sets will be latin1.
Note that once upgrade to post-r16789
Fixing the database corruption
Note that the approach described here will only work on a database that was corrupted due to a misconfigured MySQL server, as described above. If you have any other type of corruption, applying the procedure outlined below will further corrupt your database--likely breaking it.
To fix the broken encoding, you will need to create a database backup.