[mythtv] 'recommendations' section

Kirby Vandivort kvandivo at ks.uiuc.edu
Sun Sep 14 21:46:54 EDT 2003


I'm not against bayesian..  But everything that I've seen says that myth's
database structure can't yet handle it.  Back in June I had revisited

http://www.paulgraham.com/spam.html

(which is the original article on bayesian spam filtering)

as well as

http://www.paulgraham.com/better.html

where he tweaked his ideas..

and I don't think we can apply it accurately until we have the ratings
system in place that I mentioned in my last version of the recommendations
code.  One of the common concepts behind bayes is that you have to have
both and good and bad corpus (refer to spamassassin's implemention notes:
http://www.mirror.ac.uk/sites/spamassassin.taint.org/spamassassin.org/doc/sa-learn.html#effective%20training
).  Once we have that we can do all kinds of things.  Having said that,
what I'm doing right now is essentially a tweaked version of 'poor
man's bayesian'.  I'm using the corpus of old recorded data to generate
signatures of what a good program would be, and then comparing future
programs against these signatures to see which ones match the best.
In addition, version 0.002 also has a category multiplier built in, so
that it gives more credence to comedies if you watch a lot of comedies,
etc..  Once we get a ratings system in place in myth, THEN we can start
doing official bayes and other cool things as well.


On Sun, Sep 14, 2003 at 09:32:07PM -0400, Michael J. Pedersen wrote:
> On Sun, Sep 14, 2003 at 03:24:24PM -0500, Kirby Vandivort wrote:
> > It's not the worst idea i've ever heard.  As I was trying to implement
> > such a thing, I ran into a roadblock, though..  It's not quite the same
> > as email.  For email, you have a corpus that can clearly be defined as
> > 'this is spam' or 'this is not spam'.  We don't really have an equivalent
> > thing in (current) myth.  You have a 'this is a good show' and 'these
> > are other shows'.  We can't really say accurately that every show that is
> > on that wasn't recorded is something that the user doesn't want to see.
> 
> You're right, in that there is no corpus of things you don't want to
> see. Instead, you have an (admittedly smaller) corpus of things you DO
> want to see.
> 
> To try and phrase it differently (and assuming I'm understanding
> Bayesian filtering), Bayesian filtering gives you a probability that
> item A fits into bucket B.
> 
> So, what I'm getting at, is look at the old recorded table, and use that
> to generate a listing (I'd base the filtering on title and description
> only, though), and make a new table which has the probabilities you want
> to store. Update it daily (make it part of the mythfilldatabase cron job).
> 
> Remember, with the application in email, you're filtering out things you
> DON'T want to see. That leaves the set you DO want to see. If the
> Bayesian filters are reflexive (if I'm using the word correctly), then
> you can construct the filter to look at what you DO want to see, based
> on the oldrecorded table. Oh, and make sure to have a bucket in as part
> of the mix which says "I've suggested this to the user, and they
> rejected it."
> 
> I guess a (slightly better?) way of looking at it would be that the spam
> bucket is empty at the start, but the good bucket isn't. The good bucket
> being, of course, the oldrecorded table. Am I making any sense, or am I
> just babbling away at random? I'll be quiet now...
> 
> -- 
> Michael J. Pedersen
> My Jabber ID: pedersen at jabber.org.uk
> My GnuPG KeyID: 6CB0A96C       My Public Key Available At: www.keyserver.net
> My GnuPG Key Fingerprint: E8F0 920F EB2F 7FDE DF4E  23CC 2CEB 8E6F 6CB0 A96C



> _______________________________________________
> mythtv-dev mailing list
> mythtv-dev at mythtv.org
> http://mythtv.org/cgi-bin/mailman/listinfo/mythtv-dev


-- 

Kirby Vandivort                      Theoretical and Computational Biophysics 
Email: kvandivo at ks.uiuc.edu          3051 Beckman Institute
http://www.ks.uiuc.edu/~kvandivo/    University of Illinois
Phone: (217) 244-5711                405 N. Mathews Ave
Fax  : (217) 244-6078                Urbana, IL  61801, USA


More information about the mythtv-dev mailing list