Captions with HD-PVR

From MythTV Official Wiki
Jump to: navigation, search

Author Christopher Neufeld
Description This set of scripts provides a way, given the right hardware, to record closed-caption data for HD-PVR recordings.

Because there is no defined standard for the transmission of closed-caption information over high-definition connections such as component or HDMI, it is not currently possible to obtain closed caption data from recordings produced by a Hauppauge HD-PVR. One alternative that may be suitable for some users is to use the STB (set-top box) to render the captions, so that they are seen as open captions by the HD-PVR, but this has the disadvantage of not being selectable at viewing time, the captions are an inextricable part of the video data. The technique described here allows closed-caption recording, with caption text that can be turned on or off as desired during the viewing of a recording.

The technique described here allows recordings to be made with closed caption information, provided that the STB has the correct behaviour, and that you have a card capable of reading the VBI data from a composite or coaxial standard-definition stream. This procedure has been tested with a Hauppauge PVR-500.

Please note that this procedure includes post-processing after a recording is complete, so it does not work with live television, or with programs that are being watched while still being recorded. It will work for viewing recordings that have completed. The post-processing stage takes only seconds, so the show can be viewed almost immediately.

As with several other scripts, this works by dropping a .srt file into the storage directory. MythTV will automatically use such a file if it is found. The .srt file is obtained by scanning the standard-definition outputs of the STB.



  • The V4L2 kernel modules have tightened the rules on ioctls. The VBI-related ioctl calls should go through the appropriate VBI device, it is now an error to configure VBI through the video device. The scripts have been modified to account for this.
  • There is a command-line parsing bug present in some versions of v4l2-ctl, so the script has been changed to handle that.


  • Added a v4l2-ctl command to explicitly set the bitrate. There is some evidence that if the card's current bitrate (from either boot-time configuration or a previous use of the card) is too high, ccextractor may error out partway through with a message like "Error: Not enough memory. Please report this: 65536 bytes is not enough!"


  • Put in changes to kill a ccextractor doing VBI reads from an earlier recording if it somehow is still running
  • Following some suggestions from stichnot:
    • Explicitly use /bin/bash instead of /bin/sh to ensure we have pushd/popd available
    • Change the delay_bias to be in milliseconds rather than in seconds
    • Add a configurable post-roll overrun to ensure that captions are recorded during that interval when it is being used


Before proceeding, first determine whether you have the hardware required.

  1. You must have a STB that has standard-definition outputs as well as the component outputs used by the HD-PVR. These can be composite or coaxial. If using a coaxial connection, some changes will have to be made to the script, you will have to choose the correct input number, and will have to set the tuner frequency.
  2. Your STB must produce output on the standard-definition outputs even when tuned to a high-definition channel.
  3. Your STB must include VBI data in the standard-definition outputs.

To test these requirements, connect your television set to the standard-definition outputs of the STB. Tune the STB to a high-definition channel, and then use the television's internal settings (not the STB settings) to select closed captions. If you see captions, then your STB is suitable for use with this technique. Note that not all programs will have captions, and sometimes commercials or promos don't have them, so you might have to check several high-definition channels to determine whether or not your STB transmits VBI data.

Next, you must have a hardware device capable of reading the VBI stream from a standard-definition, analogue stream. In my case, my backend has a PVR-500 card, which can do that. I connected the composite outputs of the STB to the composite inputs of the PVR-500. Note that you only have to connect one cable, on the video plug, the two audio plugs aren't necessary for this operation, but I've plugged in all three because I don't have individual cables, only triplet cables.

You must be using at least MythTV 0.23, because we are using system events.

You must have installed the CCExtractor program. I have tested this with ccextractor versions 0.59 and 0.64.

You must be able to identify your video and VBI device nodes in a reliable fashion. I have udev rules to build symlinks to those devices, and use the symlinks to connect.

Parameters to determine

You may have to modify some parameters in these scripts. They should all be adjustable by editing the script, not the other two. The required parameters are:

  1. The CardID of your HD-PVR. Mine is '1'. You can determine this by running the MySQL command "select cardid,cardtype from capturecard;", or simply modify the script to write cardid to a file and exit, then start a recording on the HD-PVR.
  2. The pathname of the device file to the VBI-extracting hardware. In my case, that's the second module of my PVR-500, and on my system that's /dev/pvr_500_2
  3. The pathname of the device file to the VBI control node on the hardware. On my system, that's /dev/pvr_500_vbi_2.
  4. The input number of the composite input on the VBI-extracting hardware. In my case, that's '2'.
  5. A working directory, writable by the UID that runs the mythbackend. I have a tmp directory in /myth, so I've set the working directory prefix to point under that. The script will create a new directory in which it will work, and will remove the directory when it completes.
  6. A bias can be set here, or it can be left to zero. If set to a positive number, captions will appear that many seconds earlier in the stream. This is to allow for the possibility that there is an undetected systematic delay on your particular hardware, one that requires correction.

There is also a set of binaries used by the scripts. You should verify that the pathnames are correct for your system. In particular, ccextractor might be installed somewhere other than in /usr/bin.

Setting up

Copy all three scripts to the same directory. They should be made executable, and should be in a directory that is readable by the UID that runs the mythbackend.

In mythtv-setup, go to the screen "System Events". Add two new events. Under "Recording started", using the complete pathname, insert the script


Under "Recording finished", insert the script


Also in mythtv-setup, under "Input Connections", on the second page for the HD-PVR, create a new recording group for the HD-PVR, and add the VBI-decoding hardware to that recording group. That ensures that the backend will not try to schedule a recording on your VBI-decoding hardware while you're using it to extract captions for the HD-PVR. Note that the backend must be restarted for this recording group change to be noticed by the scheduler.


You can now restart your backend, and you should get recordings with closed captions, selectable at viewing time.


This script sets up the variables used by the two other scripts.

#! /bin/bash

# Variables used by the hd-captions scripts





CAPTION_INPUT_NUM=2     # the composite input

# set the following variable to "=1" if the command:
# 	v4l2-ctl -d ${VBI_DEVICE} --set-fmt-sliced-vbi=cc
# generates an error message:
#       "No value given to suboption <cc>"
# otherwise, leave it blank.
# This is a known command-line parsing bug that will be fixed.


delay_bias_ms=0   # set this to any consistent value, a number of
		  # milliseconds earlier that you want to see all
		  # captions appearing
post_roll_seconds=0   # If you record past the end of a recording by
		      # some seconds, set this value to at least this
		      # number to avoid losing captions in this
		      # post-recording interval

Recording start script

This script performs the following functions, in order:

  1. Verifies that this recording is being made on the HD-PVR
  2. Builds a pathname for its working directory
  3. Parses out the passed parameters to determine how long the recording is
  4. Verifies that our working directory doesn't exist
  5. Creates the working directory, and chdirs into it
  6. Spawns a subshell to do the work
  7. Makes sure that the VBI-decoding device isn't in use, and waits for it to be free (can happen if you have "end late" set to a negative number)
  8. Selects the appropriate input on the VBI-decoding device
  9. Records the time, in seconds since the epoch, when the caption extraction began, for use by the finalize script
  10. Runs ccextractor for the duration of the show, recording the results in its own binary format
  11. Records the PID of the ccextractor program in a file, so that the finalize script can kill it if the user manually requests an early stop to the recording.

#! /bin/bash


. `dirname $0`/


# Only do this for the HD-PVR input
if [ $cardid -ne ${HD_PVR_CARDID} ]
    exit 0


epoch1=`${DATE} +%s`

subst=`echo $endtime | tr "T" " "`
epoch2=`${DATE} -u --date="$subst" +%s`

duration=`${EXPR} $epoch2 - $epoch1 + $post_roll_seconds`
hours=`${EXPR} $duration / 3600`
rem=`${EXPR} $duration % 3600`
mins=`${EXPR} $rem / 60`
secs=`${EXPR} $rem % 60 `

if [ -e $workdir ]
    echo "Working directory name collision"
    exit 1

${MKDIR} $workdir
cd $workdir
    # Avoid a potential race condition if the previous recording
    # hasn't finished reading captions (can happen if the end-late
    # time is negative).

    while ${FUSER} ${VIDEO_DEVICE} >/dev/null 2>&1
	${SLEEP} 1
	ctr=`${EXPR} $ctr + 1`
	# If it is taking too long, axe the ccextractor run that has
	# wedged the device.
	if [ $ctr -gt 5 ]
	    vidholder=`fuser ${VIDEO_DEVICE} |& awk ' { print $2 } '`
	    ccpid=`fuser ${CCEXTRACTOR} |& awk ' { print $2 } '`
	    if [ "${vidholder}e" = "${ccpid}" ]
		kill ${vidholder}

    # Enable VBI (thanks Jpoet)
    ${V4L2_CTL} -d ${VBI_DEVICE} --set-fmt-sliced-vbi=cc${V4L2_CTL_PARSE_BUG} --set-ctrl=stream_vbi_format=1

    # Set bitrate to something medium to avoid ccextractor overflows
    ${V4L2_CTL} -d ${VIDEO_DEVICE} -c video_bitrate=4500000 -c video_peak_bitrate=6000000


    ${DATE} -u "+%s" > captions-start.txt

    ${CCEXTRACTOR} -s ${VIDEO_DEVICE} -utf8 -endat `${PRINTF} "%02d:%02d:%02d" $hours $mins $secs` -out=bin -o pass1.srtbin >/dev/null 2>&1 &

    echo $ccpid > extractor_pid.txt
    wait $ccpid
    ${RM} extractor_pid.txt

) </dev/null >/dev/null 2>&1 &

exit 0

Recording finalize script

This script, executed once the recording has finished, performs the following steps, in order:

  1. Verify that this was an HD-PVR recording
  2. Looks for the working directory, and exits if it wasn't found
  3. Enters the working directory
  4. Forks a subshell to to the work
  5. Uses ffprobe to parse out the duration of the HD-PVR recording. I find that it can take the HD-PVR up to dozens of seconds to start recording, so we can't assume that it is as long as the requested recording interval.
  6. Uses the end time of the recording (assumed correct) and the length of the recording to deduce the starting time of the HD-PVR stream
  7. Compares the starting time of the HD-PVR stream with that of the VBI stream, and computes the time offset
  8. Runs ccextractor on the previously-collected data, correcting for the time offset
  9. Kills the first ccextractor run from the startup script, if it's still running, to free up the device file for immediate use
  10. Builds the .srt filename
  11. Makes sure we don't clobber the recording file if something's wrong
  12. Copies the .srt file to its final position
  13. Removes the working directory and exits

#! /bin/sh


. `dirname $0`/


# Only do this for the HD-PVR input
if [ $cardid -ne ${HD_PVR_CARDID} ]
    exit 0


if [ ! -d $workdir ]
    exit 0

pushd $workdir

# Fork off a subshell to do this work
    rec_duration=`${FFPROBE} $dir/$file 2>&1 | ${GREP} '^  Duration: ' | \
	${AWK} -F: ' { print $2 * 3600 + $3 * 60 + int($4) } '`

# OK, it's a bit awkward here.  The captions might have started early,
# and might have ended early (recording past the end of the slot isn't
# passed in ENDTIMEISOUTC for recording start scripts).  The recording
# should have ended on time, at ENDTIMEISOUTC.  So, we can compute the
# start time of the recording

    subst=`echo $endtime | tr "T" " "`
    epoch2=`${DATE} -u --date="$subst" +%s`

    recording_start=`${EXPR} $epoch2 - $rec_duration`
    captions_start=`cat captions-start.txt`

    diff_msecs=`${EXPR} $diff_start '*' 1000 + $delay_bias_ms`
    diff_start=`${EXPR} $diff_msecs / 1000`
    diff_mins=`${EXPR} $diff_start / 60`
    diff_secs=`${EXPR} $diff_start % 60`

    if [ ${diff_start} -lt 0 ]
	diff_msecs=`expr ${diff_msecs} / -1`
	${CCEXTRACTOR} -utf8 -delay ${diff_msecs} pass1.srtbin -o
	${CCEXTRACTOR} -utf8 -startat ${diff_mins}:${diff_secs} -delay -${diff_msecs} pass1.srtbin -o

    if [ -f extractor_pid.txt ]
	kill `cat extractor_pid.txt`

    ofile=`echo $file | ${SED} -e 's|.mpg$|.srt|'`
    if [ $ofile = $file ]
	exit 1

    ${CP} $dir/$ofile

    ${RM} -fr $workdir

) </dev/null >/dev/null 2>&1 &

exit 0