Utf8 Text in OSD
From MythTV
Warning: The information on this page is UNOFFICIAL and UNSUPPORTED, the mysql changes used are ENTIRELY INCORRECT. If characters display incorrectly in MythTV then please submit a bug report and only follow the advice below at your own risk!
Following the mysql portion of this guide will make it difficult or impossible to upgrade to 0.22 or trunk! In short, don't do it!
This howto will instruct you on getting MythTV to properly display non latin characters in your program guide and On Screen Display ("OSD"). The examples here are for getting Japanese characters to display, but hopefully this should work as a model for getting other languages to work as well.
For the sake of completeness and to avoid confusion, this howto will cover not just the specifics of font display and encoding issues, but the whole process of getting program information.
Contents |
Step One - first things first: what you need to know before beginning
This tutorial assumes you have MythTV set up and working. This tutorial only covers how to get view program information, not how to view programs.
It's assumed that you know some Linux basics, like how to move files around, create symbolic links, and work in "super user" mode when necessary. I've tried to write it to be understandable to the novice user wherever possible.
This tutorial and all its examples were done on the Ubuntu Edgy Eft, v6.10, distribution of Linux, MySQL 5.0, and MythTV version 0.20. None of the instructions are Linux specific, so hopefully you should have no trouble with other flavours of Linux. However, it's strongly recommended that you use version 0.20 of MythTV or later, and MySQL 5.0 or later. Results with other versions could vary a great deal.
This tutorial also assumes you have access to the MySQL database server that supports your MythTV installation. And that either you understand MySQL well enough to tinker with it, or that you are at least foolhardy enough to do the MySQL commands and hope they will work for you (the latter is the case with me!). If you can use phpMyAdmin, or similar database manager interface, to edit your MySQL databases, then that might help you save a little work.
It's also good if you understand what utf8 encoding is, although it's not critical to have that deep of an understanding. So long as you get that "utf8" encoding means text that supports many characters from many languages, and that "latin1" encoding means basically just the alphanumeric characters used in English. Fonts that can support utf8 can also support latin1, but fonts that only support latin1 can't support utf8.
Step Two - downloading stuff you need
The program data for your city, town, or whatever kind of area you live in, is made available from web sites that carry program information in a form of data known as XML. If you don't know what XML is, don't worry, it's just a format for storing data. The process of getting the data you want from this source is made easy by a thing called "XMLTV".
XMLTV is just a collection of scripts, each of which is designed for a particular geographical region. You make some simple configurations, like deciding which channels you want information for, and then it will go to the right place on the right web site for you and get the TV program information you want.
You download it here: http://sourceforge.net/project/showfiles.php?group_id=39046
It's also available in most Linux repositories. If you get it from your repositories, it will probably put all the files in your home directory, in a directory called ".xmltv".
UPDATE: On a more recent install of MythTV, I found the above instructions were insufficient. I had to install XMLTV and some supporting software with the following command. This works on Ubuntu. If you use an RPM based Linux distribution, you will have to figure what the equivalent command is:
sudo apt-get install libtext-kakasi-perl xmltv
Once you have XMLTV installed, then what you need to do is extract the one script file that is for your area, and put it somewhere on your system where you'll access it. So I recommend getting it from the link above, taking the one file for your area, and putting it in your ".mythtv" directory, which you probably already have. But wherever you put it, make sure it's somewhere that it will stay permanently. It will be accessed regularly, as you get new program information every week or so, either manually or automatically.
In my case, the data is available from a web site called "ontvjapan.org", and the script that handles it for me is called "tv_grab_jp". It's not really important to know the name of the web site that serves the data, actually, as the script handles everything.
The other thing you will want to get is a utf8 capable font for your system. MythTV comes with "freesans.ttf", which is, in theory, a fully capable utf8 font that can display non-latin character sets like Japanese. However, my personal experience in setting up mythtv was that, for whatever reason, it did not display the Japanese characters correctly.
If you use your computer for other things besides MythTV, and have your system set up to input and read your non-latin language, then you probably already have some good fonts for that language on your system. On my system they were located in:
/usr/share/fonts/truetype/
Your system may differ, but you can find them with the following command:
locate *.ttf
If you can't find a font on your locale system which will handle your non-latin language to your satisfaction, then head out on the internet. Searching for "free utf8 true type font Japanese" gave me some pages with freely available utf8 Japanese fonts. I took one I liked called "osaka.unicode.ttf".
You may want to pick up some extra fonts in different styles, because there will be options later for using different fonts for different screens.
Once you have a font, or fonts, you will need to put them in your /usr/share/mythtv directory. This requires super user priveleges. You can also create a symbolic link if you want to use a font that's already on your system somewhere else.
Got everything? Good. Now onto setting things up.
Step Three - MySQL set up: the hardest part
You'll want to set up your MySQL to properly handle utf8 encoding before anything else. Otherwise, it could get cluttered up with a bunch of question marks or ASCII code gibberish in place of where the program names and descriptions should be. That's what happened to me on my first try. As a result, anywhere in your MythTV interface where you expect to see names of programs or descriptions, you'll see either question marks, little squares in place of characters, or just plain gibberish.
The problem is that the makers of MythTV left the encoding of its MySQL database to be "latin1". This also shows up in some MySQL interfaces, such as phpMyAdmin, as "swedish_latin_ci". Trying to put utf8 encoded text into a latin1 encoded database might work if you're really lucky. But it didn't for me, and I'm assuming you're here because you anticipate that not working for you.
First, make sure your installation of MySQL is ready for utf8 encoding. This means editing your my.cnf file. The location of this file may vary between distributions and versions of Linux. For me, the file is here:
/etc/mysql/my.cnf
You will need super user permissions to edit this file. Open it in your favourite text editor, and locate the section that starts with this text:
[mysqld]
Add these lines under that heading:
# utf8 init-connect='SET NAMES utf8' character-set-server=utf8 collation-server=utf8_general_ci skip-character-set-client-handshake
Save and close "my.cnf", and then restart the MySQL server:
sudo /etc/init.d/mysql restart
Note that this makes it so that your MySQL server does all its transactions in utf8. If you work with other encodings regularly, this may cause issues in other areas. However, if you work with encodings, you probably already know a lot more than me about encodings settings, and are way above this tutorial anyway!
For me, this was a necessary step in getting utf8 to work both in MythTV and other settings, since I only ever deal in utf8. If you have concerns about this step, you're advised to consult with people who have expertise with MySQL.
Warning: When I first set about to do this, changed all the tables that had "program" and "record" in their name to use utf8. This made some Japanese text store correctly, but simultaneously I noticed that I was unable to schedule recordings. I thought these were different issues, until one time I checked my logs and it said:
Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and (utf8_general_ci,COERCIBLE) for operation '='
When I went in and changed all the rest of the tables to use utf8, this error went away, and I was able to schedule recordings.
So you have been warned. Changing collation settings is not to be taken too lightly. On the up side, though, once you make everything consistent, you should be free from hassles.
Previously on this page there was a long tedious set of instructions on how to change your MythTV database to be utf8 encoded. Fortunately I've worked out a simpler way.
Open a command line if you haven't already. Then type in:
mysqldump -u"root" -p"rootPassword" mythconverg --result-file=mythconverg.sql
Obviously, you need to replace "rootPassword" with your root password.
Now you have a text file, called mythconverg.sql which holds your entire MythTV database in it.
Open up that with your favourite text editor. I use gedit. But any text editor with a find and replace function will do.
Using the search and replace function, change every instance of "latin1" to "utf8".
Unfortunately, there's a flaw in MySQL about how it handles utf8 on key indexes... but I won't bore you with the details. All you need to know is that on the following tables, you need to change the numerical value in the key index declaration of some "varchar" types. For example, where it says:
CREATE TABLE `jumppoints'
You will find a line underneath that says:
PRIMARY KEY (`destination`,`hostname`)
You need to change it like so:
PRIMARY KEY (`destination`(64),`hostname`(128))
I've put the whole lines you need to replace under the table names here, so you can just copy and paste: TABLE jumpoints: PRIMARY KEY (`destination`(64),`hostname`(128))
TABLE profilegroups: UNIQUE KEY `name` (`name`(64),`hostname`(128)),
TABLE displayprofilegroups PRIMARY KEY (`name`(64),`hostname`(128)),
TABLE settings: KEY `value` (`value`(64),`hostname`(128))
Note that the above instructions took care of all the problematic tables in my install of MythTV, which I only use for watching TV and videos. If you're running more plugins, you'll have more tables, and potentially more problems. All I can tell you is that if you get errors on specific tables in reference to "key" values, then try dividing the number by 3 and see if that helps. The above changes do not change the length of any field (since that might cause problems elsewhere in MythTV), but instead only use the first X characters of each field into the key indexes.
But, let's go on assuming the above worked for you. Now your table is ready for importing back into MySQL.
Before you do, you need to go into mysql:
mysql -u"root" -p
Then:
USE mythconverg
(command to truncate table here)
Then this:
ALTER DATABASE `mythconverg` DEFAULT CHARACTER SET utf8 COLLATE utf8_unicode_ci;
You should get back:
Query OK, 1 row affected (0.00 sec)
You might mistake that command for meaning the whole database has been turned into utf8, and wonder why all the bother with the other steps. Actually, that command just makes the top level of the database set to use utf8, not the tables and columns. What's the top level? Don't worry about it, it's just good to set it to utf8.
(command to upload modified sql file here)
Leave the MySQL interface by typing:
quit
Step Four - getting program data
Open up a terminal, and go to where you've put your XMLTV "grab" file. Mine is called "tv_grab_jp", so that's what I've used in the example below. Substitute your file name if you're not in Japan. Here's what I typed in:
tv_grab_jp --config-file tv_grab_jp.conf --configure
When you run your "grab" file for the first time, you need to run the --configure option. Otherwise, it won't know what channels you want. It stores the information you're about to select in a seperate file, which here I've specifically named as "tv_grab_jp.conf" by using the "--config-file tv_grab_jp.conf" option. You can call your configuration file anything you like.
You can also run the above command without the "--config-file" option, but then it will default to creating the configuration file in your home directory, in a subdirectory called ".xmltv". Maybe that's okay for you. But for me, by specifying the file name, it creates it in the same directory as the tv_grab_jp script, which I have in my ".mythtv" directory. I figure it makes sense to have the script and its configuration in the same place.
I got a whole bunch of error output when I ran the above command. I don't know if these errors are created for all languages that require utf8, or just for Japanese, or just for some "grab" script. In any case, you might see a bunch of error messages that look like this:
Malformed UTF-8 character (unexpected non-continuation byte 0x20, immediately after start byte 0xed) at /usr/share/perl5/Date/Manip.pm line 7221.
These look scary, but you can safely ignore them. They won't affect anything.
The script will first search for the web site it needs and download the relevant channel information:
getting regions: ##################################################
The script will then ask you to select the channels you want information for.
In the case of Japan (your region may differ), it first asked me what area of Japan I was in:
Select your region: 0: hokkaidou 1: toukyou ... (trimmed for brevity)
I'm in Tokyo, so I selected "1", which was the default anyway. Then it gathers the channels for your region:
getting channels: ##################################################
Then it asks exactly which channels. Not all the ones it offered to me for my area are actually available, and it might be the same situation for you. Just hit "Enter" to select the channels you want, and type in "no" followed by "Enter" for the channels you don't need.
add NHKsougou(NHK)? [yes,no,all,none (default=yes)] add NHKkyouiku(ETV)? [yes,no,all,none (default=yes)] ... (trimmed for brevity)
One last step for getting the data. Now that your "grab" script has a configuration file, we want to actually go grab some program information. You do so by running the "grab" script with the "--output" parameter, and storing the information into a file.
For me, the command looks like this:
/home/dave/.mythtv/tv_grab_jp --config-file tv_grab_jp.conf --output jtv.xml
"jtv.xml" is the name I use, but you can use any name you like. You have to specify the local configuration file with the "--config-file" option, otherwise it will look for it in the default location.
If all goes well, you will see the script download some information, and then create the file with your chosen name containing that data right there in the same directory.
Run the following command:
mythfilldatabase --file 1 -1 jtv.xml
"1" refers to the ID of the capture card. For you it may be different, but you'll have to work that out with instructions elsewhere. "-1" is a command to clear out the the database before putting new information in. "--file" means to use a file that contains the data. Replace "jtv.xml" with whatever you called your file from when you created it above.
Assuming all went well, we can finally enter MythTV and change the settings there.
Step Five - MythTV settings
Start mythfrontend however you usually do. I have my remote control set to open it, but in any case, you can start it by typing at a command prompt:
mythfrontend
Select "Setup", then "TV Settings", then "Playback". Keep clicking "Next" until you get to the screen titled "On-screen display".
There are two drop down menus related to fonts. One is called "OSD font:" and the other is "CC font:". The "OSD font" is the only one that really matters, I think, but I changed both of them anyway.
If you've placed some utf8 fonts that you like as described in the first section of this howto, then you should see them listed here in these drop down menus. If you don't see them listed here, then they might not be in the right directory, or perhaps they're malformed somehow.
Assuming they are listed, select them.
Before you click "Next" and move on, take note of the "OSD theme" listed at the top of this page. You might want to choose one you like if you haven't already. In any case, the important thing is to be sure of its name for the next step.
Now click next until you leave. You will notice on other screens there are other font settings, like "ATSC caption fonts". I set these to use my utf8 Japanese font as well, but I don't think it's critical for the objectives of this howto.
Step Six - finishing up: themes
If at this point you were to watch television now to check the status of your on screen display, you would see that your program guide is displaying the right characters. Great!
But, you might notice something odd about the On Screen Display.
Let's say you are in "Browse" mode, which is the default behaviour for MythTV. This means that to change channels, you first use the arrow buttons on your remote or keyboard to look at descriptions of shows before you actually hit "OK" or "enter" to go to that channel. If you try it now, you'll see that as you "browse" along, your utf8 characters are displaying fine.
But, then when you actually change the channel, your characters go away, and are replaced by little boxes or question marks. What's going on there?
You might think, as I did, that setting the "OSD font" in the mythfrontend set up would mean that *all* On Screen Displays would use that font.
But it turns out that the "OSD font" setting in the mythfrontend setup only applies to the default font. If the theme you use has any font settings, and just about all of them do, they will over ride the default font with their own.
Oddly enough, I found that with the three or four themes I experimented with, the default font applied before changing channels, and the theme font applied after changing channels. That was kind of counter intuitive, as I would expect the default to get overwritten either in both cases, or in neither case.
I still find that a little odd, but, fortunately, it's easily solved. All that needs to be done is to edit the OSD theme that you want to use, and that can be accomplished without too much hassle.
Hopefully you took note, as instructed in the last step, of the name of your "OSD theme". You need to go to the themes directory for your MythTV installation. For me, it was here:
/usr/share/mythtv/themes
There's a directory for every theme. Find the one that matches the name of your current OSD theme. Inside that directory you'll see a file called "osd.xml". Open that file with your favorite text editor.
Inside, there's a lot of code that may look confusing. Don't worry about it. If you do a search for ".ttf", you'll find every mention of the fonts being used.
There might be a lot of different references to fonts, depending on the theme and its complexity. And, if you wanted to get fancy, you could probably use different fonts of your own for various stylistic effects. Remember in the first section I said the opportunity existed to use multiple fonts? This is where you can if you want. Personally, I just wanted to get it working, so I replaced every font with my osaka.unicode.ttf font that I have.
The bottom line is that you want to switch every mention of fonts within this file to a utf8 capable font that you know works for your language. Once you see the name of the font being used in the file, you can use your text editor's "search and replace" function to replace all them much quicker than by hand. You may need to use the search and replace two or more times if the theme uses a few different fonts.
You'll need to do this for every theme you intend to use.
Once that's done, congratulations! You're finished!
Other notes
Just to give you an idea of what you are aspiring for, here are two screen shots showing my MythTV On Screen Displays using Japanese text. One is the Program Guide, and the other is the text that comes up when changing channels.
There's only one last thing, which I'm not necessarily recommending you do. Based on some information I got on the MythTV mailing list, I felt it might be good to edit this file:
/usr/share/mythtv/sql/mc.sql
Inside, it usually says:
ALTER DATABASE mythconverg DEFAULT CHARACTER SET latin1;
I changed it to:
ALTER DATABASE mythconverg DEFAULT CHARACTER SET utf8;
I'm not entirely sure when, why, how, or even if this file and the above MySQL command would be called. But I figured if it ever does, I don't want it to go in and switch my database back to latin1.
