[mythtv-users] Radio Times XMLTV failing
Neil Dunbar
neil.dunbar at pobox.com
Tue Oct 3 07:36:26 UTC 2006
On Monday 02 October 2006 23:20, malcolm torrent wrote:
> I'd like to echo Simon's thanks to Neil for the fix.
> I tried to diagnose this myself (unsuccessfully) so if possible I'd be
> interested in a short explanation as to how the problem was
> approached, resolved and why this fix works.
> Mal.
OK. The problem is corruption in the datafile 1961.dat, which corresponds to
the schedules for ITV4 (running mythfilldatabase from the command line shows
the Unicode wide character \u0000 is not acceptable within an XML document).
So I wget'ed the offending URL and looked at the file with a binary editor
(bvi), searched for the sequence of null characters.
First thing I thought was to stop the script dying (comment out the "croak"
instruction in the XMLTV code), but then it just died with a "unexpected
end-of-file" error. So I had to replace the offending text with something
else, so I stuck in that line in tv_grab_uk_rt which substitutes \u0000 with
the text ".." (ie, something harmless). Now, all of that said, there may very
well be a legitimate use of a sequence of two nulls in Unicode (eg, for 3 or
4 byte wide characters), so this kludge can't stay in - it replaces the nulls
without regard for their context in the file.
In the end, I suspect it's just a bit of file corruption from Radio Times.
It's not happening anywhere else in the data feed, and it'll disappear from
the schedules on Saturday, and we can say goodbye to ugly kludges.
A longer term fix would be for XMLTV to replace offending Unicode characters
with harmless ones, just to be a bit more robust when dealing with partially
corrupted data. I may have a look at this over the weekend.
Cheers,
Neil
More information about the mythtv-users
mailing list