[mythtv-users] MythTV, phpMyAdmin and German umlauts

Michael T. Dean mtdean at thirdcontact.com
Thu Jun 26 21:28:54 UTC 2008


On 06/26/2008 05:10 PM, Torsten Crass wrote:
> German umlauts seem to be a recurring issue with MythTV... Actuall, 
> everything works fine for me (DVB-T EPG data, display in frontend etc.) 
> -- until I try to edit some mythconverg.recorded entry containing 
> umlauts in the title/subtitle/description field using phpMyAdmin.

This will /not/ work.  MythTV uses a character encoding that differs 
from that MySQL thinks is in use, so phpMyAdmin will mis-interpret 
and/or mis-encode the data.

Therefore, Brad's advice is spot-on:  "Don't edit those fields with 
phpMyAdmin."

>  
> Although the umlauts show fine in the MySQL command-line client, in 
> phpMyAdmin they get displayed up as funny double-characters. Example 
> (though I don't know whether this will survive e-mailing):
>
> "Die größte Waldlandschaft der Erde..."
>
> shows up as
>
> "Die größte Waldlandschaft der Erde..."
>
> I tried changing the affected column's and/or the connection's collation 
> to utf8 (which is the default locale on all my systems),

Doing that will likely mean that the data will be corrupted when you 
upgrade to 0.22 (if the process you used to change it didn't already 
corrupt it)...

>  but the only 
> change was that little boxes were displayed instead of funny characters...
>
> Someone once said something like "At the first glance, character 
> encoding may look rather difficult. But in reality, it's even worse". I 
> tend to agree.
>
> So any ideas, anyone?
>   

The entire mechanism used for character encoding in MythTV is changing 
significantly between 0.21 and 0.22, so the development version of Myth 
no longer has the issue you have (though it may have other character 
encoding issues, as the conversion is still occurring).  That means that 
if you really want it fixed, you'll probably need to fix it yourself 
since fixing a small UI bug in the already released version that doesn't 
apply to the development version is probably low priority for any devs.

Note, however, that the changes you made to your DB mean that if you 
"fix" it without first fixing your DB, your fix will be incorrect.  
Therefore, you need to change the database schema back /and/ fix the 
data that was in the columns you changed (i.e. TRUNCATE TABLE ... is the 
best way, so I hope it's only short-term data, like in program).

In MythTV 0.21, the entire MySQL DB /must/ be in latin1 encoding (table 
names, column names, and VARCHAR and other text columns).  However, the 
data within the DB should be proper UTF-8 encoding.  Myth encodes the 
data before insert and after queries.  Likely the part you're using just 
forgets to do the appropriate conversion (and it's as simple as adding a 
toUtf8() somewhere or vice versa).

The reason that these bugs still existed in 0.21 is because users who 
aren't using latin character sets didn't report them as bugs, so the 
devs--most of whom are using latin character sets--didn't notice them.  
Unfortunately, many users instead followed the instructions on broken 
wiki pages or blogs or fora posts ( i.e., such as 
http://www.mythtv.org/wiki/index.php/Utf8_Text_in_OSD ).

The worst part is that all the data in those users' databases is likely 
to be corrupted during the upgrade to 0.22--likely in a way that 
prevents the database upgrade for 0.22 from occurring.  That means those 
users will probably have to start their MythTV databases over from 
scratch (losing all sorts of info, like previous recordings, recording 
rules, ...).  If they're lucky, they'll be able to get by with a "new 
host" database restore (though they'll have to follow a /very/ strict 
upgrade procedure to make that work).

If nothing else, 0.22 should fix a bunch of the issues (and those that 
remain are far more likely to be reported).

Anyway, lots of "extra" info, just to let those users who have broken 
their database schemas start planning for problems with 0.22.

Mike


More information about the mythtv-users mailing list