[mythtv-users] Database encodings
Michael T. Dean
mtdean at thirdcontact.com
Mon Mar 30 14:50:11 UTC 2009
On 03/30/2009 04:08 AM, Glenn Sommer wrote:
> I saw at http://www.mythtv.org/wiki/Fixing_Corrupt_Database_Encoding
> that mythtv 0.22 can only handle latin1 connections to the MySQL
> database - but uses UTF8 internally (Actually it writes UTF8 into the
> database).
You've got your versions wrong.
MythTV 0.21-fixes and below use UTF-8. MythTV 0.21-fixes and below
stores UTF-8 in the database. MythTV 0.21-fixes and below tells MySQL
that the text columns are actually latin1. MythTV 0.21-fixes and below
does /not/ use latin1.
MythTV trunk use UTF-8. MythTV trunk stores UTF-8 in the database.
MythTV trunk tells MySQL that the text columns are actually UTF-8.
MythTV trunk does /not/ use latin1.
In other words, the /only/ difference is that MythTV 0.21-fixes and
below doesn't tell MySQL what encoding is actually in use.
> I don't understand why MythTV doesn't use UTF8 all the way - so no
> encoding/decoding is required when talking to the database?
> Also, putting UTF8 text in a latin1 database is in my opinion wrong...
It does. It used to store UTF-8 data in MySQL without /allowing/ MySQL
to know that the data inside was UTF-8 to reduce the size of the
database columns and indices significantly for a database where MySQL
knows the data is UTF-8 if most of the data is actually latin1 (as it is
for a /large/ number of users). And, MythTV had to wait until MySQL had
sufficient support for sufficiently-long columns and indices, and we've
only recently started /requiring/ versions of MySQL that do.
> Other clients will be unable to read the data correctly (like
> phpMyAdmin for example).
Well, the /only/ client that should be using MythTV database is really
MythTV or other clients designed for use with MythTV (and, therefore,
aware of the encoding). And, that being said, if you knew what you were
doing, you could actually make it work rather easily even in "other"
clients that didn't realize what was going on.
> In my opinion latin1 text is for latin1 databases - and UTF8 text is
> for UTF8 databases...
>
> Surely I must be missing something here?
Yes. You're missing an understanding of what that page actually said. :)
> What is the reason for breaking the database - instead of fixing MythTV?
Again, re-read that page. We're simply telling people who have
completely broken data (because they had configurations where they told
MySQL to ignore the database schema's defined charset, so MySQL did
character-set conversions it should /not/ have done) that they cannot
successfully upgrade their databases until they fix the data.
Mike
More information about the mythtv-users
mailing list