[mythtv-users] xmltv grab_uk_rt or EIT data, which is the best option?

Tue Oct 10 09:57:00 UTC 2006

On 10/10/06, Andrew Wilson <migmog at gmail.com> wrote:
> On 08/10/06, Simon Dyson <sdyson at themaelstrom.co.uk> wrote:
> >
> > I've also noticed that checking for duplicates by subtitle isn't always
> > accurate because the repeat often prefixes the subtitles with the
> > episode (e.g. 6/12) whilst the original airing doesn't.
> >
> > Simon
>
> I was thinking about this. There's scope here for a better duplicate
> finder by using some kind of fuzzy pattern matching...
> Say if 95% of the description is the same then it's a dup.... or would
> this give too many false +ve results?
> Maybe if the search elimintated anything in brackets first?
> Or with some known acronyms, eg my box sometimes records the SL (sign
> language) version
> _______________________________________________
> mythtv-users mailing list
> mythtv-users at mythtv.org
> http://mythtv.org/cgi-bin/mailman/listinfo/mythtv-users
>

I was thinking about something similar once, A possible solution is to
use the Levenshtein distance algorithm
(http://en.wikipedia.org/wiki/Levenshtein_distance)
But there could be quite a processing overhead, I mean where do you
place the alorithm? If it's in the scheduler then it would run
excesivly. But if it was in the grab/mythfilldatabase place, then you
have to make a decision about which description to use (probably
easier, e.g. use the first one).

Ant.