Difference between revisions of "Captions with HD-PVR"

From MythTV Official Wiki
Jump to: navigation, search
(Added text to note that open-captioning is an alternative that some may find sufficient)
(Tweaked the script. Set the charset explicitly to UTF8 for special symbols in captions, and added code to handle what happens when the user manually stops a recording)
Line 136: Line 136:
 
# Records the time, in seconds since the epoch, when the caption extraction began, for use by the finalize script
 
# Records the time, in seconds since the epoch, when the caption extraction began, for use by the finalize script
 
# Runs ccextractor for the duration of the show, recording the results in its own binary format
 
# Runs ccextractor for the duration of the show, recording the results in its own binary format
 +
# Records the PID of the ccextractor program in a file, so that the finalize script can kill it if the user manually requests an early stop to the recording.
  
 
<pre>
 
<pre>
Line 194: Line 195:
 
     ${DATE} -u "+%s" > captions-start.txt
 
     ${DATE} -u "+%s" > captions-start.txt
  
     ${CCEXTRACTOR} -s ${CAPTION_DEVICE} -endat `${PRINTF} "%02d:%02d:%02d" $hours $mins $secs` -out=bin -o pass1.srtbin >/dev/null 2>&1
+
     ${CCEXTRACTOR} -s ${CAPTION_DEVICE} -utf8 -endat `${PRINTF} "%02d:%02d:%02d" $hours $mins $secs` -out=bin -o pass1.srtbin >/dev/null 2>&1 &
 +
 
 +
    ccpid=$!
 +
    echo $ccpid > extractor_pid.txt
 +
    wait $ccpid
 +
    ${RM} extractor_pid.txt
  
 
) </dev/null >/dev/null 2>&1 &
 
) </dev/null >/dev/null 2>&1 &
Line 215: Line 221:
 
# Compares the starting time of the HD-PVR stream with that of the VBI stream, and computes the time offset
 
# Compares the starting time of the HD-PVR stream with that of the VBI stream, and computes the time offset
 
# Runs ccextractor on the previously-collected data, correcting for the time offset
 
# Runs ccextractor on the previously-collected data, correcting for the time offset
 +
# Kills the first ccextractor run from the startup script, if it's still running, to free up the device file for immediate use
 
# Builds the .srt filename
 
# Builds the .srt filename
 
# Makes sure we don't clobber the recording file if something's wrong
 
# Makes sure we don't clobber the recording file if something's wrong
Line 275: Line 282:
 
     diff_secs=`${EXPR} $diff_start % 60`
 
     diff_secs=`${EXPR} $diff_start % 60`
  
     ${CCEXTRACTOR} -startat ${diff_mins}:${diff_secs} -delay -${diff_msecs} pass1.srtbin -o result.srt
+
     ${CCEXTRACTOR} -utf8 -startat ${diff_mins}:${diff_secs} -delay -${diff_msecs} pass1.srtbin -o result.srt
 +
 
 +
    if [ -f extractor_pid.txt ]
 +
    then
 +
kill `cat extractor_pid.txt`
 +
    fi
  
 
     ofile=`echo $file | ${SED} -e 's|.mpg$|.srt|'`
 
     ofile=`echo $file | ${SED} -e 's|.mpg$|.srt|'`

Revision as of 18:54, 31 August 2012

Author Christopher Neufeld
Description This set of scripts provides a way, given the right hardware, to record closed-caption data for HD-PVR recordings.
Supports


Because there is no defined standard for the transmission of closed-caption information over high-definition connections such as component or HDMI, it is not currently possible to obtain closed caption data from recordings produced by a Hauppauge HD-PVR. One alternative that may be suitable for some users is to use the STB (set-top box) to render the captions, so that they are seen as open captions by the HD-PVR, but this has the disadvantage of not being selectable at viewing time, the captions are an inextricable part of the video data. The technique described here allows closed-caption recording, with caption text that can be turned on or off as desired during the viewing of a recording.

The technique described here allows recordings to be made with closed caption information, provided that the STB has the correct behaviour, and that you have a card capable of reading the VBI data from a composite or coaxial standard-definition stream. This procedure has been tested with a Hauppauge_PVR-500.

Please note that this procedure includes post-processing after a recording is complete, so it does not work with live television, or with programs that are being watched while still being recorded. It will work for viewing recordings that have completed. The post-processing stage takes only seconds, so the show can be viewed almost immediately.

As with several other scripts, this works by dropping a .srt file into the storage directory. MythTV will automatically use such a file if it is found. The .srt file is obtained by scanning the standard-definition outputs of the STB.

Prerequisites

Before proceeding, first determine whether you have the hardware required.

  1. You must have a STB that has standard-definition outputs as well as the component outputs used by the HD-PVR. These can be composite or coaxial. If using a coaxial connection, some changes will have to be made to the script, you will have to choose the correct input number, and will have to set the tuner frequency.
  2. Your STB must produce output on the standard-definition outputs even when tuned to a high-definition channel.
  3. Your STB must include VBI data in the standard-definition outputs.

To test these requirements, simply connect your television set to the standard-definition outputs of the STB. Tune the STB to a high-definition channel, and then use the television's internal settings (not the STB settings) to select closed captions. If you see captions, then your STB is suitable for use with this technique. Note that not all programs will have captions, and sometimes commercials or promos don't have them, so you might have to check several high-definition channels to determine whether or not your STB transmits VBI data.

Next, you must have a hardware device capable of reading the VBI stream from a standard-definition, analogue stream. In my case, my backend has a PVR-500 card, which can do that. I connected the composite outputs of the STB to the composite inputs of the PVR-500. Note that you only have to connect one cable, on the video plug, the two audio plugs aren't necessary for this operation, but I've plugged in all three because I don't have individual cables, only triplet cables.

You must be using at least MythTV 0.23, because we are using system events.

You must have installed the CCExtractor program. I have tested this with ccextractor version 0.59.

Parameters to determine

You may have to modify some parameters in these scripts. They should all be adjustable by editing the hd-captions-common.sh script, not the other two. The required parameters are:

  1. The CardID of your HD-PVR. Mine is '1'. You can determine this by running the MySQL command "select cardid,cardtype from capturecard;", or simply modify the script to write cardid to a file and exit, then start a recording on the HD-PVR.
  2. The pathname of the device file to the VBI-extracting hardware. In my case, that's the second module of my PVR-500, and on my system that's /dev/pvr_500_2
  3. The input number of the composite input on the VBI-extracting hardware. In my case, that's '2'.
  4. A working directory, writable by the UID that runs the mythbackend. I have a tmp directory in /myth, so I've set the working directory prefix to point under that. The script will create a new directory in which it will work, and will remove the directory when it completes.
  5. A bias can be set here, or it can be left to zero. If set to a positive number, captions will appear that many seconds earlier in the stream. This is to allow for the possibility that there is an undetected systematic delay on your particular hardware, one that requires correction.

There is also a set of binaries used by the scripts. You should verify that the pathnames are correct for your system. In particular, ccextractor might be installed somewhere other than in /usr/bin.


Setting up

Copy all three scripts to the same directory. They should be made executable, and should be in a directory that is readable by the UID that runs the mythbackend.

In mythtv-setup, go to the screen "System Events". Add two new events. Under "Recording started", using the complete pathname, insert the script hd-captions-start.sh:

/SOME/DIR/hd-captions-start.sh "%CARDID%" "%CHANID%" "%STARTTIMEISOUTC%" "%ENDTIMEISOUTC%"

Under "Recording finished", insert the script hd-captions-finalize.sh:

/SOME/DIR/hd-captions-finalize.sh "%CARDID%" "%CHANID%" "%STARTTIMEISOUTC%" "%ENDTIMEISOUTC%" "%DIR%" "%FILE%"

Also in mythtv-setup, under "Input Connections", on the second page for the HD-PVR, create a new recording group for the HD-PVR, and add the VBI-decoding hardware to that recording group. That ensures that the backend will not try to schedule a recording on your VBI-decoding hardware while you're using it to extract captions for the HD-PVR. Note that the backend must be restarted for this recording group change to be noticed by the scheduler.


Finished

You can now restart your backend, and you should get recordings with closed captions, selectable at viewing time.


Script hd-captions-common.sh

This script sets up the variables used by the two other scripts.


#! /bin/sh
#

# Variables used by the hd-captions scripts

#########################################
#
# VERIFY THESE PATHS FOR YOUR SYSTEM
#
#########################################


CCEXTRACTOR=/usr/bin/ccextractor
FFPROBE=/usr/bin/ffprobe
V4L2_CTL=/usr/bin/v4l2-ctl
GREP=/bin/grep
AWK=/bin/awk
SED=/bin/sed
FUSER=/usr/bin/fuser
EXPR=/usr/bin/expr
DATE=/bin/date
PRINTF=/usr/bin/printf
CP=/bin/cp
RM=/bin/rm
TOUCH=/bin/touch
MKDIR=/bin/mkdir
SLEEP=/bin/sleep



#########################################
#
#  EDIT THESE PARAMETERS IF NECESSARY
#
#########################################



CAPTION_DEVICE=/dev/pvr_500_2
CAPTION_INPUT_NUM=2     # the composite input
HD_PVR_CARDID=1


workdir_prefix=/myth/tmp/captions_

delay_bias=0   # set this to any consistent value, a number of seconds
	       # earlier that you want to see all captions appearing



Recording start script

This script performs the following functions, in order:

  1. Verifies that this recording is being made on the HD-PVR
  2. Builds a pathname for its working directory
  3. Parses out the passed parameters to determine how long the recording is
  4. Verifies that our working directory doesn't exist
  5. Creates the working directory, and chdirs into it
  6. Spawns a subshell to do the work
  7. Makes sure that the VBI-decoding device isn't in use, and waits for it to be free (can happen if you have "end late" set to a negative number)
  8. Selects the appropriate input on the VBI-decoding device
  9. Records the time, in seconds since the epoch, when the caption extraction began, for use by the finalize script
  10. Runs ccextractor for the duration of the show, recording the results in its own binary format
  11. Records the PID of the ccextractor program in a file, so that the finalize script can kill it if the user manually requests an early stop to the recording.

#! /bin/sh
#

# Invoke with CARDID CHANID STARTTIMEISOUTC ENDTIMEISOUTC


. `dirname $0`/hd-captions-common.sh


cardid=$1
chanid=$2
starttime=$3
endtime=$4

# Only do this for the HD-PVR input
if [ $cardid -ne ${HD_PVR_CARDID} ]
then
    exit 0
fi

workdir=${workdir_prefix}${chanid}_${starttime}

epoch1=`${DATE} +%s`

subst=`echo $endtime | tr "T" " "`
epoch2=`${DATE} -u --date="$subst" +%s`

duration=`${EXPR} $epoch2 - $epoch1`
hours=`${EXPR} $duration / 3600`
rem=`${EXPR} $duration % 3600`
mins=`${EXPR} $rem / 60`
secs=`${EXPR} $rem % 60 `

if [ -e $workdir ]
then
    echo "Working directory name collision"
    exit 1
fi

${MKDIR} $workdir
cd $workdir
(
    # Avoid a potential race condition if the previous recording
    # hasn't finished reading captions (can happen if the end-late
    # time is negative).
    while ${FUSER} ${CAPTION_DEVICE} >/dev/null 2>&1
    do
	${SLEEP} 1
	duration=`${EXPR} $duration - 1`
    done

    ${V4L2_CTL} -i ${CAPTION_INPUT_NUM} -d ${CAPTION_DEVICE}

    ${DATE} -u "+%s" > captions-start.txt

    ${CCEXTRACTOR} -s ${CAPTION_DEVICE} -utf8 -endat `${PRINTF} "%02d:%02d:%02d" $hours $mins $secs` -out=bin -o pass1.srtbin >/dev/null 2>&1 &

    ccpid=$!
    echo $ccpid > extractor_pid.txt
    wait $ccpid
    ${RM} extractor_pid.txt

) </dev/null >/dev/null 2>&1 &

exit 0


Recording finalize script

This script, executed once the recording has finished, performs the following steps, in order:

  1. Verify that this was an HD-PVR recording
  2. Looks for the working directory, and exits if it wasn't found
  3. Enters the working directory
  4. Forks a subshell to to the work
  5. Uses ffprobe to parse out the duration of the HD-PVR recording. I find that it can take the HD-PVR up to dozens of seconds to start recording, so we can't assume that it is as long as the requested recording interval.
  6. Uses the end time of the recording (assumed correct) and the length of the recording to deduce the starting time of the HD-PVR stream
  7. Compares the starting time of the HD-PVR stream with that of the VBI stream, and computes the time offset
  8. Runs ccextractor on the previously-collected data, correcting for the time offset
  9. Kills the first ccextractor run from the startup script, if it's still running, to free up the device file for immediate use
  10. Builds the .srt filename
  11. Makes sure we don't clobber the recording file if something's wrong
  12. Copies the .srt file to its final position
  13. Removes the working directory and exits

#! /bin/sh

# Invoke with CARDID CHANID STARTTIMEISOUTC ENDTIMEISOUTC DIR FILE


. `dirname $0`/hd-captions-common.sh


cardid=$1
chanid=$2
starttime=$3
endtime=$4
dir=$5
file=$6

# Only do this for the HD-PVR input
if [ $cardid -ne ${HD_PVR_CARDID} ]
then
    exit 0
fi

workdir=${workdir_prefix}${chanid}_${starttime}

if [ ! -d $workdir ]
then
    exit 0
fi

pushd $workdir

# Fork off a subshell to do this work
(
    rec_duration=`${FFPROBE} $dir/$file 2>&1 | ${GREP} '^  Duration: ' | \
	${AWK} -F: ' { print $2 * 3600 + $3 * 60 + int($4) } '`

# OK, it's a bit awkward here.  The captions might have started early,
# and might have ended early (recording past the end of the slot isn't
# passed in ENDTIMEISOUTC for recording start scripts).  The recording
# should have ended on time, at ENDTIMEISOUTC.  So, we can compute the
# start time of the recording

    subst=`echo $endtime | tr "T" " "`
    epoch2=`${DATE} -u --date="$subst" +%s`

    recording_start=`${EXPR} $epoch2 - $rec_duration`
    captions_start=`cat captions-start.txt`

    diff_start=`${EXPR} $recording_start - $captions_start + $delay_bias`
    diff_msecs=`${EXPR} $diff_start '*' 1000`

    diff_mins=`${EXPR} $diff_start / 60`
    diff_secs=`${EXPR} $diff_start % 60`

    ${CCEXTRACTOR} -utf8 -startat ${diff_mins}:${diff_secs} -delay -${diff_msecs} pass1.srtbin -o result.srt

    if [ -f extractor_pid.txt ]
    then
	kill `cat extractor_pid.txt`
    fi

    ofile=`echo $file | ${SED} -e 's|.mpg$|.srt|'`
    if [ $ofile = $file ]
    then
	exit 1
    fi

    ${CP} result.srt $dir/$ofile

    popd
    ${RM} -fr $workdir

) </dev/null >/dev/null 2>&1 &

exit 0