[mythtv-users] Feature request: Detect existence of closed captioning

f-myth-users at media.mit.edu f-myth-users at media.mit.edu
Mon Jul 19 04:22:41 UTC 2010


    > Date: Mon, 19 Jul 2010 00:04:12 -0400
    > From: Chris Pinkham <cpinkham at bc2va.org>

    > * On Sun Jul 18, 2010 at 09:34:22PM -0400, f-myth-users at media.mit.edu wrote:
    > >     > Date: Mon, 19 Jul 2010 11:26:35 +1000 (EST)
    > >     > From: "Anthony Giggins" <seven at seven.dorksville.net>
    > > 
    > >     > Wasn't Closed Captions detection also suggested as a Commflagging
    > >     > detection mechanism?
    > > 
    > > If you're asking, "Can you use CC to indicate whether you're in a
    > > commercial or not?", I'd sadly have to say, "no."  I've seen some
    > > ads w/CC, and plenty of programs w/o CC.  (This is North American

    > I'll have to disagree here.  I believe the windows program called comskip
    > that was originally based off of mythcommflag source can do detection based
    > on CC.  I think it does it by using a dictionary of words to rate the
    > block.  When I considered adding CC detection to mythcommflag, this is
    > how I was also planning on doing it.

I was assuming no learning and no a priori knowledge of what words
might occur in a commercial, hence the most naive approach possible
of just whether CC exists at all.  (In other words, I was taking the
word "detection" seriously---not "content of the captions themselves").

If you're willing to make some assumptions (and presumably keep
updating the database of words that occur in commercials), sure, using
CC data could probably work.  I'm not sure whether it could work more
reliably than current commflagging, but I guess that's what running
the experiment would tell us...

(If you're willing to do learning, you can learn that particular shows
have -zero- CC in them, for example, so any CC must be that of
commercials, though plenty of commercials have zero as well.  For
example, it's interesting that Mythbusters seems to have no CC data at
all in first-run episodes (but often has them in the repeats), even
though Dirty Jobs---on the same network---appears to always have CC.
[But Mythbusters doesn't have CC in its commercials, either, whereas
Dirty Jobs has CC in both the program and the commercials.  I say
this looking at a large database I have of all the CC information
from several years of each.])

Doing learning has a pile of other gotchas, of course---you can't just
learn that common phrases must be commercials, because some shows have
a relatively constant set of intro verbiage, etc.  It'd certainly make
an entertaining little AI project for someone to poke at.


More information about the mythtv-users mailing list