Captions with HD-PVR

From MythTV Official Wiki
Jump to: navigation, search


Author Christopher Neufeld
Description This set of scripts provides a way, given the right hardware, to record closed-caption data for HD-PVR recordings.
Supports


Because there is no defined standard for the transmission of closed-caption information over high-definition connections such as component or HDMI, it is not currently possible to obtain closed caption data from recordings produced by a Hauppauge HD-PVR. One alternative that may be suitable for some users is to use the STB (set-top box) to render the captions, so that they are seen as open captions by the HD-PVR, but this has the disadvantage of not being selectable at viewing time, the captions are an inextricable part of the video data. The technique described here allows closed-caption recording, with caption text that can be turned on or off as desired during the viewing of a recording.

The technique described here allows recordings to be made with closed caption information, provided that the STB has the correct behaviour, and that you have a card capable of reading the VBI data from a composite or coaxial standard-definition stream. This procedure has been tested with a Hauppauge PVR-500.

Please note that, unlike earlier versions of these scripts, this procedure now requires no post-processing after a recording is complete, so it works with live television and with programs that are being watched while still being recorded. The captions can take up to several seconds to be written to disc, so viewing will have to be at least several seconds behind the live broadcast, or captions will arrive too late to be displayed.

As with several other scripts, this works by dropping a .srt file into the storage directory. MythTV will automatically use such a file if it is found. The .srt file is obtained by scanning the standard-definition outputs of the STB.

Changes

2015-04-11:

  • Following more suggestions from stichnot, made changes that would allow captions to be viewed while recording, or in live TV.

2014-10-27:

  • The V4L2 kernel modules have tightened the rules on ioctls. The VBI-related ioctl calls should go through the appropriate VBI device, it is now an error to configure VBI through the video device. The scripts have been modified to account for this.
  • There is a command-line parsing bug present in some versions of v4l2-ctl, so the script has been changed to handle that.

2013-02-05:

  • Added a v4l2-ctl command to explicitly set the bitrate. There is some evidence that if the card's current bitrate (from either boot-time configuration or a previous use of the card) is too high, ccextractor may error out partway through with a message like "Error: Not enough memory. Please report this: 65536 bytes is not enough!"

2013-02-01:

  • Put in changes to kill a ccextractor doing VBI reads from an earlier recording if it somehow is still running
  • Following some suggestions from stichnot:
    • Explicitly use /bin/bash instead of /bin/sh to ensure we have pushd/popd available
    • Change the delay_bias to be in milliseconds rather than in seconds
    • Add a configurable post-roll overrun to ensure that captions are recorded during that interval when it is being used


Prerequisites

Before proceeding, first determine whether you have the hardware required.

  1. You must have a STB that has standard-definition outputs as well as the component outputs used by the HD-PVR. These can be composite or coaxial. If using a coaxial connection, some changes will have to be made to the script, you will have to choose the correct input number, and will have to set the tuner frequency.
  2. Your STB must produce output on the standard-definition outputs even when tuned to a high-definition channel.
  3. Your STB must include VBI data in the standard-definition outputs.

To test these requirements, connect your television set to the standard-definition outputs of the STB. Tune the STB to a high-definition channel, and then use the television's internal settings (not the STB settings) to select closed captions. If you see captions, then your STB is suitable for use with this technique. Note that not all programs will have captions, and sometimes commercials or promos don't have them, so you might have to check several high-definition channels to determine whether or not your STB transmits VBI data.

Next, you must have a hardware device capable of reading the VBI stream from a standard-definition, analogue stream. In my case, my backend has a PVR-500 card, which can do that. I connected the composite outputs of the STB to the composite inputs of the PVR-500. Note that you only have to connect one cable, on the video plug, the two audio plugs aren't necessary for this operation, but I've plugged in all three because I don't have individual cables, only triplet cables.

You must be using at least MythTV 0.27, because we are using the REC_STARTED_WRITING system event.

You must have installed the CCExtractor program at least version 0.67. I have tested this with ccextractor version 0.76.

You must be able to identify your video and VBI device nodes in a reliable fashion. I have udev rules to build symlinks to those devices, and use the symlinks to connect.

If you have a version of MythTV before 0.27, or ccextractor before about 0.67, you will have to use the previous versions of the scripts, available here.

Parameters to determine

You may have to modify some parameters in these scripts. They should all be adjustable by editing the hd-captions-common.sh script, not the other two. The required parameters are:

  1. The CardID of your HD-PVR. Mine is '1'. You can determine this by running the MySQL command "select cardid,cardtype from capturecard;", or simply modify the script to write cardid to a file and exit, then start a recording on the HD-PVR.
  2. The pathname of the device file to the VBI-extracting hardware. In my case, that's the second module of my PVR-500, and on my system that's /dev/pvr_500_2
  3. The pathname of the device file to the VBI control node on the hardware. On my system, that's /dev/pvr_500_vbi_2.
  4. The input number of the composite input on the VBI-extracting hardware. In my case, that's '2'.
  5. A working directory, writable by the UID that runs the mythbackend. I have a tmp directory in /myth, so I've set the working directory prefix to point under that. The script will create a new directory in which it will work, and will remove the directory when it completes.
  6. A bias can be set here, or it can be left to zero. If set to a positive number, captions will appear that many seconds earlier in the stream. This is to allow for the possibility that there is an undetected systematic delay on your particular hardware, one that requires correction.

There is also a set of binaries used by the scripts. You should verify that the pathnames are correct for your system. In particular, ccextractor might be installed somewhere other than in /usr/bin.


Setting up

Copy all three scripts to the same directory. They should be made executable, and should be in a directory that is readable by the UID that runs the mythbackend.

In mythtv-setup, go to the screen "System Events". Add two new events. Under "Recording started writing", using the complete pathname, insert the script hd-captions-start.sh:

/SOME/DIR/hd-captions-start.sh "%CARDID%" "%CHANID%" "%STARTTIMEISOUTC%" "%ENDTIMEISOUTC%" "%DIR%" "%FILE%"

Under "Recording finished", insert the script hd-captions-finalize.sh:

/SOME/DIR/hd-captions-finalize.sh "%CARDID%" "%CHANID%" "%STARTTIMEISOUTC%" "%ENDTIMEISOUTC%" "%DIR%" "%FILE%"

Also in mythtv-setup, under "Input Connections", on the second page for the HD-PVR, create a new recording group for the HD-PVR, and add the VBI-decoding hardware to that recording group. That ensures that the backend will not try to schedule a recording on your VBI-decoding hardware while you're using it to extract captions for the HD-PVR. Note that the backend must be restarted for this recording group change to be noticed by the scheduler.


Finished

You can now restart your backend, and you should get recordings with closed captions, selectable at viewing time.


Script hd-captions-common.sh

This script sets up the variables used by the two other scripts.


#! /bin/bash
#

# Variables used by the hd-captions scripts

#########################################
#
# VERIFY THESE PATHS FOR YOUR SYSTEM
#
#########################################


CCEXTRACTOR=/usr/bin/ccextractor
FFPROBE=/usr/bin/ffprobe
V4L2_CTL=/usr/bin/v4l2-ctl
GREP=/bin/grep
AWK=/bin/awk
SED=/bin/sed
FUSER=/usr/bin/fuser
EXPR=/usr/bin/expr
DATE=/bin/date
PRINTF=/usr/bin/printf
CP=/bin/cp
RM=/bin/rm
TOUCH=/bin/touch
MKDIR=/bin/mkdir
SLEEP=/bin/sleep



#########################################
#
#  EDIT THESE PARAMETERS IF NECESSARY
#
#########################################



VIDEO_DEVICE=/dev/pvr_500_2
VBI_DEVICE=/dev/pvr_500_vbi_2

CAPTION_INPUT_NUM=2     # the composite input
HD_PVR_CARDID=1

# set the following variable to "=1" if the command:
# 	v4l2-ctl -d ${VBI_DEVICE} --set-fmt-sliced-vbi=cc
# generates an error message:
#       "No value given to suboption <cc>"
# otherwise, leave it blank.
#
# This is a known command-line parsing bug that will be fixed.
#
V4L2_CTL_PARSE_BUG="=1"

workdir_prefix=/myth/tmp/captions_

delay_bias_ms=0   # set this to any consistent value, a number of
		  # milliseconds earlier that you want to see all
		  # captions appearing
post_roll_seconds=0   # If you record past the end of a recording by
		      # some seconds, set this value to at least this
		      # number to avoid losing captions in this
		      # post-recording interval

Recording start script

This script performs the following functions, in order:

  1. Verifies that this recording is being made on the HD-PVR
  2. Builds a pathname for its working directory
  3. Parses out the passed parameters to determine how long the recording is
  4. Verifies that our working directory doesn't exist
  5. Creates the working directory, and chdirs into it
  6. Spawns a subshell to do the work
  7. Makes sure that the VBI-decoding device isn't in use, and waits for it to be free (can happen if you have "end late" set to a negative number)
  8. Selects the appropriate input on the VBI-decoding device
  9. Records the time, in seconds since the epoch, when the caption extraction began, for use by the finalize script
  10. Runs ccextractor for the duration of the show, recording the results in the .srt file in the correct place.
  11. Records the PID of the ccextractor program in a file, so that the finalize script can kill it if the user manually requests an early stop to the recording.
#! /bin/bash
#

# Invoke with CARDID CHANID STARTTIMEISOUTC ENDTIMEISOUTC DIR FILE


. `dirname $0`/hd-captions-common.sh


cardid=$1
chanid=$2
starttime=$3
endtime=$4
dir=$5
file=$6

ofile=`echo $file | ${SED} -e 's|.mpg$|.srt|' -e 's|.ts$|.srt|'`
if [ $ofile = $file ]
then
    exit 1
fi


# Only do this for the HD-PVR input
if [ $cardid -ne ${HD_PVR_CARDID} ]
then
    exit 0
fi

workdir=${workdir_prefix}${chanid}_${starttime}

epoch1=`${DATE} +%s`

subst=`echo $endtime | tr "T" " "`
epoch2=`${DATE} -u --date="$subst" +%s`

duration=`${EXPR} $epoch2 - $epoch1 + $post_roll_seconds`

if [ -e $workdir ]
then
    echo "Working directory name collision"
    exit 1
fi

${MKDIR} $workdir
cd $workdir
touch starting.$$
(
    set -x

    ctr=0

    # Avoid a potential race condition if the previous recording
    # hasn't finished reading captions (can happen if the end-late
    # time is negative).

    # Enable VBI (thanks Jpoet)
    while ! ${V4L2_CTL} -d ${VBI_DEVICE} \
	    	--set-fmt-sliced-vbi=cc${V4L2_CTL_PARSE_BUG} \
		--set-ctrl=stream_vbi_format=1
    do
	${SLEEP} 1
	ctr=`${EXPR} $ctr + 1`
	
	# If it is taking too long, axe the ccextractor run that has
	# wedged the device.
	if [ $ctr -gt 5 ]
	then
	    vidholder=`fuser ${VIDEO_DEVICE} |& awk ' { print $2 } '`
	    ccpid=`fuser ${CCEXTRACTOR} |& awk ' { print $2 } '`
	    if [ "${vidholder}e" = "${ccpid}" ]
	    then
		kill ${vidholder}
	    fi
	fi

	duration=`${EXPR} $duration - 1`
    done

    hours=`${EXPR} $duration / 3600`
    rem=`${EXPR} $duration % 3600`
    mins=`${EXPR} $rem / 60`
    secs=`${EXPR} $rem % 60 `

    ${V4L2_CTL} -d ${VIDEO_DEVICE} -c video_bitrate=4500000 \
	-c video_peak_bitrate=6000000

    ${V4L2_CTL} -i ${CAPTION_INPUT_NUM} -d ${VIDEO_DEVICE}

    ${DATE} -u "+%s" > captions-start.txt

    ${CCEXTRACTOR} -s ${VIDEO_DEVICE} -delay ${delay_bias_ms} -utf8 \
		   -endat `${PRINTF} "%02d:%02d:%02d" $hours $mins $secs` \
		   --buffersize 1M -out=srt -o ${dir}/${ofile} 2>&1 &

    ccpid=$!
    echo $ccpid > extractor_pid.txt
    wait $ccpid
    ${RM} extractor_pid.txt

) >startup.txt 2>&1 < /dev/null &

exit 0


Recording finalize script

This script, executed once the recording has finished, performs the following steps, in order:

  1. Verify that this was an HD-PVR recording
  2. Looks for the working directory, and exits if it wasn't found
  3. Enters the working directory
  4. Forks a subshell to to the work
  5. Kills the first ccextractor run from the startup script, if it's still running, to free up the device file for immediate use
  6. Removes the working directory and exits
#! /bin/bash

# Invoke with CARDID CHANID STARTTIMEISOUTC ENDTIMEISOUTC DIR FILE


. `dirname $0`/hd-captions-common.sh


cardid=$1
chanid=$2
starttime=$3
endtime=$4
dir=$5
file=$6

# Only do this for the HD-PVR input
if [ $cardid -ne ${HD_PVR_CARDID} ]
then
    exit 0
fi

workdir=${workdir_prefix}${chanid}_${starttime}

if [ ! -d $workdir ]
then
    exit 0
fi

pushd $workdir

# Fork off a subshell to do this work
(
    if [ -f extractor_pid.txt ]
    then
	kill `cat extractor_pid.txt`
    fi

    popd
    ${RM} -fr $workdir

) </dev/null >finalize.txt 2>&1 &

exit 0