[mythtv-users] UTF-8 Support in MythTV still looks messed up to me

Michael T. Dean mtdean at thirdcontact.com
Tue Sep 23 22:54:34 UTC 2008


On 09/22/2008 01:16 AM, Dave M G wrote:
> A long time back, I had trouble getting Japanese characters to display 
> on MythTV. So, after asking around on this list and the developers list, 
> I found I had to make alterations to the database to get UTF-8 support 
> to work.
>   

Doing so did not fix Myth's handling of UTF-8 characters, it simply
broke the data in your database.

> I wrote my instructions into the MythTV Wiki:
>
> http://www.mythtv.org/wiki/index.php/Utf8_Text_in_OSD
>
> I've just completely reinstalled MythTV, and went back to the same 
> instructions I wrote on the MythTV wiki. Turns out, someone wrote in a 
> warning saying that the instructions are completely wrong and should no 
> longer be used.
>   

Actually, "no longer be," shouldn't be part of that sentence.  It should
really be, "should never have been."  The proper fix would have been
submitting patches that fixed Myth's improper handling of UTF-8
characters in the first place, or--if nothing else--submitting tickets
that described where you saw issues and how to reproduce those issues. 
Unfortunately, too many people fixed the symptom of the problem (by
changing the database) rather than fixing--or even reporting--the
problem, so Myth devs never got notified of where there were problems to
fix.

> Fine, I guess, so long as MythTV is actually doing a better job of UTF-8 
> now. A little more looking around on the net and it seems that the 
> database is supposed to be in UTF-8 and "just work".
>   

No.  The MythTV trunk database schema uses UTF-8 format.

> However, near as I can tell, the database still encodes information in 
> "latin_swedish_ci", and converts all my Japanese text into ASCII 
> gibberish when trying to use mythfilldatabase.
>   

MythTV 0.21 is still telling MySQL that the data is latin1, yes.

MythTV trunk has taken all of the data that MySQL thought was
latin1-encoded and working around the conversions MySQL would have done,
changed the schema so that MySQL now knows that the data is UTF-8. 
However, for anyone who changed table encodings in their MySQL database
(i.e. following the instructions on the wiki page), the data in the
database is in a format that will almost definitely cause conversion
problems (and duplicate keys and ...) meaning that it will be "extremely
difficult" to convert their database data to work in trunk/0.22.

So, chances are when 0.22 is released, anyone who did change their
database schema to "fix" UTF-8 problems will be unable to upgrade their
DB's.  If some enterprising individual would like to help those people
(or himself, as the case may be), he could spend some time figuring out
how to undo the changes suggested by that wiki page--and how to fix the
data in the table so that undoing the changes is possible--so that the
MythTV database upgrade will succeed.  If no one does that, those who
broke their databases will probably be on their own (having to either
start fresh or manually fix the broken data--which is likely to be a
very tedious process).

> On this thread, it says something vague about using some fromutf8() 
> command when moving strings to and from the database, but no 
> instructions on where to apply the command

If we knew where, it would be there.  What we need/ed/ were reports of
where the data handling breaks so that we could inspect the code and
find the missing (or, in some cases, possibly redundant) conversions. 
Unfortunately, now that trunk has completely changed its database
character encoding/handling, I'm guessing the motivation for tracking
down/testing character-handling fixes in 0.21-fixes is very low.  I
wouldn't be surprised if the motivation for simply testing
character-handling fixes submitted as patches is very low at this point
(though, patches submitted for 0.21-fixes are far more likely to get
applied by some generous-with-his-time dev, especially if 0.22 is a long
way off).  (Basically, any fix for the 0.21 version is only relevant
while 0.21 is still in use.)

>  or how it relates to 
> mythfilldatabase, or, well, anything:
>
> http://www.gossamer-threads.com/lists/mythtv/users/344332
>
> So... I'm willing to believe the "ugly hack" I wrote to get Japanese 
> characters before is no longer the way to do things.
>
> But I don't see what is the proper way to do things now. My Japanese 
> characters are not working with the default setup.
>   

IMHO, the proper way of dealing with it in 0.21-fixes is to live with
the problems/mangled text.

> Any advice would be much appreciated. Thanks.
>
> (PS: Oh, and the fonts that ship with MythTV don't support Japanese 
> characters fully, so that does need to be changed)

You mean you plan to submit patches to the FreeFont project to add
glyphs for Japanese characters or do you know of an
appropriately-licensed (for a GPL project) font that supports the
characters you want as well as all those supported by the FreeFont fonts?

Mike


More information about the mythtv-users mailing list