Transcoding Preserving Captions

From MythTV Official Wiki
Jump to: navigation, search


Author Christopher Neufeld
Description This framework supplies the means for a user job to do a lossless cut on an MPEG4 recording, while still retaining the ability of MythTV to display closed captions. It also provides methods for transcoding the video to H.264 format, with a valid seek table, while preserving captions.
Supports


A common request is to perform a lossless transcode without losing the closed caption data in the ivtv data stream. For technical reasons, this is a difficult problem. The technique outlined here allows the preservation of captions during a lossless transcode. This is done not by preserving the caption data in the transcoded stream, but by extracting it into a .srt file, which is a file that MythTV automatically detects and makes available if caption displaying is enabled. Once the .srt file is produced, it can be used with the stream however it is transcoded, so converting it to H.264 format is also supported, and still preserves the captions.

This technique makes use of the pull-captions.pl script shown at Closed_captioning, which in turn depends on the ccextractor project (see that page for details).

To use this script, first put the following .pm file into your Perl library path. The name I've used is MythXCode.pm, but you can rename that with a global search/replace if it collides with something in your namespace. To view the documentation for MythXCode.pm, simply run:

pod2man MythXCode.pm | groff -man -Tascii | less

Library Module

Now, here is MythXCode.pm:

#! /usr/bin/perl

# A class to handle transcoding, for people with unusual requirements.
#

package MythXCode;

=pod

=head1 NAME

MythXCode Perl package - Support for transcoding and preserving captions

=head1 SYNOPSIS

    use MythXCode;
    my $worker = MythXCode->new();
    $worker->set_db_parms("mythconverg", "mythtv", "myth-db-password");
    $worker->set_program_identifier($chanid, $starttime, $jobid);
    $worker->init();
    $worker->cut_and_caption();
    $worker->transcode_to_h264();
    undef $worker;


=head1 DESCRIPTION

The C<MythXCode> package provides configurable support for certain
operations connected to transcoding files in MythTV.  Its primary
function is to preserve closed-caption data as recordings are
manipulated, for instance when cutting to remove commercials, or when
creating DVDs.  It has been tested on NTSC recordings with ivtv
captions, and on North American ATSC digital OTA captions.

This package allows the user to convert MPEG recordings to H.264.  The
resulting transcoded recordings will have valid seek tables and their
closed captioning data will be intact.

The settings for H.264 transcodes are hard-coded in the file, and must
be edited if they're not deemed appropriate for the circumstances.
Standard definition recordings are transcoded to 2000 kbits/s.  720p/i
recordings are transcoded to 4000 kbits/s, and 1080p/i recordings are
transcoded to 6000 kbits/s.  Edit the method
C<perform_transcoding_to_h264()> if you want to adjust these.

The MythXCode objects have only been tested while working on a single
recording.  If multiple recordings are to be processed, each must be
done in its own MythXCode object, rather than reusing the object.

This package depends on the F<pull-captions.pl> script to extract
captions from recordings with cutlists.  That script can be found at
L<http://www.mythtv.org/wiki/Closed_captioning>, and it depends, in
turn, on the ccextractor binary found at
L<http://ccextractor.sourceforge.net/>.

The following methods should be considered public methods.  Methods
not documented here are intended for internal use within MythXCode,
and should not be called by users of the package.  The user should
call these methods in order, though not all applications will have to
use all methods.

=cut



use strict;
use DBI;
use File::Basename;
use Fcntl qw(:flock);

# This script pokes around in the database.  Things might go really
# wrong if the database innards are changed.  So, we sanity check
# against that.  This script has been tested on schema 1264.
my $schema_var = "DBSchemaVer";
my $verified_schema = 1264;

my $pull_captions_pathname = "/home/mythtv/pull-captions.pl";
my $ffmpeg_pathname = "/usr/bin/ffmpeg";
my $ffprobe_pathname = "/usr/bin/ffprobe";
my $mplayer_pathname = "/usr/bin/mplayer";

my $ffmpeg_containeropts = "-f matroska";

# you must have "recordedprogram" here, other parts of the code use
# it.
my @tables_to_backup = ( "recorded", "recordedprogram", "recordedseek",
			 "recordedmarkup" );


sub new {
    my $class = shift;
    my $self = {};
    bless $self, $class;
    $self->check_dependent_executables();
    return $self;
}


sub check_dependent_executables {
    my $self = shift;

    die "Unable to locate pull-captions.pl" if ! -x "$pull_captions_pathname";
    die "Unable to locate ffmpeg" if ! -x "$ffmpeg_pathname";
    die "Unable to locate ffprobe" if ! -x "$ffprobe_pathname";

    if ( ! -x "$mplayer_pathname" ) {
	print "NOTICE:  unable to locate mplayer, no cropping will be done.\n";
	$self->{ cannot_crop } = 1;
    }
}



sub get_debug_level {

    my $self = shift;
    if ( $self->{ debug_level } ) {
	return $self->{ debug_level };
    } else {
	return 0;
    }
}



sub connect_to_db {

    my $self = shift;

    die "Failed to set database parameters" if ! $self->{ parms_set };

    my $dsn;

    if ( $self->{ dbhost } ) {
	if ( $self->{ dbportnum } ) {
	    $dsn = "DBI:mysql:database=$self->{dbname};host-$self->{dbhost};port=$self->{dbport}";
	} else {
	    $dsn = "DBI:mysql:database=$self->{dbname};host-$self->{dbhost}";
	}
    } else {
	$dsn = "DBI:mysql:database=$self->{dbname}";
    }

    if ( @_ && @_[0]->{ readonly } ) {

	if ( $self->get_debug_level() > 0) {
	    print "Opening read-only connection to database.\n";
	}

	$self->{ dbhandle } = DBI->connect($dsn, $self->{ dbuser }, $self->{ dbpasswd }, { RaiseError => 1, AutoCommit => 0, ReadOnly => 1 });
    } else {

	if ( $self->get_debug_level() > 0) {
	    print "Opening read-write connection to database.\n";
	}

	$self->{ dbhandle } = DBI->connect($dsn, $self->{ dbuser }, $self->{ dbpasswd }, { RaiseError => 1, AutoCommit => 0, ReadOnly => 0 });
    }

    die "Failed to connect to the database: $DBI::errstr" if ! $self->{ dbhandle };
}

sub disconnect_db {
    my $self = shift;

    if ( $self->get_debug_level() > 0) {
	print "Disconnecting from database\n";
    }

    $self->{ dbhandle }->disconnect;
    $self->{ dbhandle } = undef();
}

sub check_database_schema {
    my $self = shift;

    $self->connect_to_db( { readonly => 1 } );
    my $qstr = "SELECT data FROM settings WHERE value=?";
    my $query = $self->{ dbhandle }->prepare($qstr);
    $query->execute($schema_var);

    my @onerow = $query->fetchrow_array();
    $query->finish();
    $self->disconnect_db();

    die "Unverified database schema.  This script must be carefully inspected for the new version." if $onerow[0] != $verified_schema;
}


sub read_recording_data {
    my $self = shift;
    $self->connect_to_db( { readonly => 1 } );

    my @dirs;
    my @onerow;
    
    my $query = $self->{ dbhandle }->prepare("SELECT dirname FROM storagegroup");
    $query->execute();
    while ( @onerow = $query->fetchrow_array ) {
	push @dirs, $onerow[0];
    }

    die "No storage group directories located" if $#dirs == -1;

    $query = $self->{ dbhandle }->prepare("SELECT basename FROM recorded WHERE chanid=? AND starttime=?");
    $query->execute( $self->{ chanid }, $self->{ starttime } );
    @onerow = $query->fetchrow_array;
    die "Failed to locate recording basename" if $#onerow != 0;
    $self->{ orig_basename } = $onerow[0];

    @onerow = $query->fetchrow_array;
    die "Corrupted database, found multiple recordings at chanid and starttime" if $#onerow != -1;

    for ( @dirs ) {
	my $candidate = $_ . "$self->{ orig_basename }";
	if ( -e "$candidate" ) {
	    $self->{ orig_pathname } = $candidate;
	    last;
	}
    }
    
    $self->{ backups } = {};
    for ( @tables_to_backup ) {
	@{$self->{ backups }->{ $_ }} = $self->retrieve_table ( $_ );
    }

    $self->disconnect_db();
}


sub retrieve_table {
    my $self = shift;
    my $tablename = shift;

    my $qstr = "SELECT * from $tablename WHERE chanid=? AND starttime=?";
    my $query = $self->{ dbhandle }->prepare($qstr);
    $query->execute($self->{ chanid }, $self->{ starttime });
    my $names = $query->{NAME};

    my @retval;
    push @retval, $names;

    while ( my @res = $query->fetchrow_array ) {
	push @retval, \@res;
    }

    return @retval;
}


sub generate_caption_filename {
    my $self = shift;
    
    my $localcopy = $self->{ orig_pathname };
    $localcopy =~ s/\.[^.]+$/.srt/;

    return $localcopy;
}


sub captions_already_extracted {
    my $self = shift;

    return -s $self->generate_caption_filename();
}


sub has_cutlist {
    my $self = shift;

    $self->connect_to_db( { readonly => 1 } );

    my $qstr = "SELECT * FROM recordedmarkup WHERE chanid=? AND starttime=? AND ( type=0 OR type=1 )";

    my $query = $self->{ dbhandle }->prepare($qstr);
    $query->execute($self->{ chanid }, $self->{ starttime });
    my @onerow = $query->fetchrow_array;
    $query->finish();
    $self->disconnect_db();

    return $#onerow != -1;
}


sub clear_cutlist {
    my $self = shift;

    $self->connect_to_db();

    my $query;
    my $qstr = "DELETE FROM recordedmarkup WHERE chanid=? AND starttime=? AND type=0";

    if ( $self->get_debug_level() > 0 ) {
	print "DBExecute:  $qstr;   args= '$self->{ chanid }','$self->{ starttime }'\n";
    } else {
	$query = $self->{ dbhandle }->prepare($qstr);
	$query->execute($self->{ chanid }, $self->{ starttime });
    }

    $qstr = "DELETE FROM recordedmarkup WHERE chanid=? AND starttime=? AND type=1";

    if ( $self->get_debug_level() > 0 ) {
	print "DBExecute:  $qstr;   args= '$self->{ chanid }','$self->{ starttime }'\n";
    } else {
	$query = $self->{ dbhandle }->prepare($qstr);
	$query->execute($self->{ chanid }, $self->{ starttime });
    }

    $qstr = "UPDATE recorded SET cutlist=0 WHERE chanid=? AND starttime=?";
    if ( $self->get_debug_level() > 0 ) {
	print "DBExecute:  $qstr;   args= '$self->{ chanid }','$self->{ starttime }'\n";
    } else {
	$query = $self->{ dbhandle }->prepare($qstr);
	$query->execute($self->{ chanid }, $self->{ starttime });
    }

    $qstr = "UPDATE recorded SET commflagged=0 WHERE chanid=? AND starttime=?";
    if ( $self->get_debug_level() > 0 ) {
	print "DBExecute:  $qstr;   args= '$self->{ chanid }','$self->{ starttime }'\n";
    } else {
	$query = $self->{ dbhandle }->prepare($qstr);
	$query->execute($self->{ chanid }, $self->{ starttime });
    }

    if ( $self->get_debug_level() > 0 ) {
	print "COMMIT !!!\n";
    } else {
	$self->{ dbhandle }->commit || die "Database update failed clearing cutlist.\n";
    }

    $self->disconnect_db();
}


sub get_recordedprogram_table_data {
    my $self = shift;

    my $names = $self->{ backups }->{ "recordedprogram" }[0];
    my $res = $self->{ backups }->{ "recordedprogram" }[1];
    
    my %hashret;

    for (0 .. $#$res) {
	$hashret{$$names[$_]} = $$res[$_];
    }

    return %hashret;
}


sub safe_to_cut_before_captions {
    my $self = shift;

    my %alldat = $self->get_recordedprogram_table_data();
    return $alldat{ "videoprop" } =~ "HDTV,(720|1080)";
}

sub program_may_need_cropping {
    my $self = shift;
    
    my %alldat = $self->get_recordedprogram_table_data();
    return $alldat{ "videoprop" } !~ "HDTV,(720|1080)";
}

sub high_definition {
    my $self = shift;
    
    my %alldat = $self->get_recordedprogram_table_data();
    return $alldat{ "videoprop" } =~ "HDTV,(720|1080)";
}


sub extract_captions {
    my $self = shift;

    my $command = "$pull_captions_pathname $self->{ chanid } '" . $self->{ starttime } . "' " . $self->generate_caption_filename() . " >/dev/null 2>&1";

    print "EXECUTE:  $command\n";
    if ( $self->get_debug_level() < 2 ) {
	$self->set_job_status("Extracting captions");
	if ( system($command) != 0) {
	    $self->set_job_status("Failed while extracting captions");
	    die "Failed to pull the captions from the recording.";
	}
	$self->set_job_status("Finished extracting captions");
    }
}


sub apply_cutlist_to_recording
{
    my $self = shift;

    my $tmpname = $self->{ orig_pathname } . ".tmp";
    
    my $command = "mythtranscode -m -c $self->{ chanid } -s '$self->{ starttime }' -l -o $tmpname >/dev/null 2>&1";
    print "EXECUTE:  $command\n";

    if ( $self->get_debug_level() == 0 ) {
	$self->set_job_status("Performing lossless cut on the recording.");
	if ( system($command) != 0 ) {
	    $self->set_job_status("Failed while applying cutlist");
	    die "Failed to apply cutlist to the recording.";
	}

	if ( ! -s $tmpname ) {
	    $self->set_job_status("Failed while applying cutlist");
	    die "Failed to apply cutlist to the recording.";
	}

	my $newlen = -s $tmpname;
	if ( $newlen <= 0 ) {
	    $self->set_job_status("Failed while applying cutlist");
	    die "Failed to apply cutlist to the recording.";
	}

	$self->set_job_status("Finished applying cutlist to recording.");

	$self->connect_to_db();

	my $qstr = "DELETE FROM recordedseek WHERE chanid=? AND starttime=?";
	my $query = $self->{ dbhandle }->prepare($qstr);
	$query->execute($self->{ chanid }, $self->{ starttime });

	$qstr = "DELETE FROM recordedmarkup WHERE chanid=? AND starttime=?";
	$query = $self->{ dbhandle }->prepare($qstr);
	$query->execute($self->{ chanid }, $self->{ starttime });

	$qstr = "UPDATE recorded SET filesize=? WHERE chanid=? AND starttime=?";
	$query = $self->{ dbhandle }->prepare($qstr);
	$query->execute($newlen, $self->{ chanid }, $self->{ starttime });

	my $deletename = $self->{ orig_pathname } . ".DELETEME";
	if ( ! rename $self->{ orig_pathname }, $deletename ) {
	    $self->{ dbhandle }->rollback();
	    unlink $tmpname;
	    die "Failed to move aside the original recording.";
	}

	if ( ! rename $tmpname, $self->{ orig_pathname } ) {
	    rename $deletename, $self->{ orig_pathname };
	    $self->{ dbhandle }->rollback();
	    die "Failed to move the recording after cutting.";
	}
	
	if ( ! $self->{ dbhandle }->commit() ) {
	    unlink $self->{ orig_pathname };
	    rename $deletename, $self->{ orig_pathname };
	    die "Database update failed after cutlist";
	}

	$self->disconnect_db();
	unlink $deletename;
	unlink $tmpname . ".map";

	$command = "mythtranscode --buildindex -c $self->{ chanid } -s '$self->{ starttime }' >/dev/null 2>&1";

	$self->set_job_status("Rebuilding keyframe index");
	if ( system($command) != 0 ) {
	    print "Failed to rebuild keyframe index.\n";
	}

	$self->set_job_status("Finished lossless cut on the recording.");

	$self->clear_cutlist();
    }
}


sub examine_source_media {
    my $self = shift;


    print "EXECUTE:  $ffprobe_pathname $self->{ orig_pathname } 2>&1\n";

    my $fh;
    open ($fh, "$ffprobe_pathname $self->{ orig_pathname } 2>&1 |") 
	|| die "Unable to run ffprobe on $self->{ orig_pathname }";


    my $saw_5_1;
    while (<$fh>) {
	if ( /Duration: ([0-9]+):([0-9]+):([0-9]+\.[0-9][0-9]), start: / ) {
	    $self->{ duration_secs } = $1 * 3600 + $2 * 60 + $3;
	    print "Media duration is $self->{ duration_secs } seconds.\n";
	}
	if ( /Stream #([0-9]+)[.:]([0-9]+)\[.* Video:.*(1920x1080|1280x720|720x480)/ ) {
	    $self->{ video_map } = "-map $1:$2";
	    if ( $self->{ map_dot } ) {
		$self->{ video_map } =~ s/:/./g;
	    }

	    if ( $3 =~ "1920x1080" ) {
		$self->{ video_res } = 1080;

	    } elsif ( $3 =~ "1280x720" ) {

		$self->{ video_res } = 720;
		
	    } elsif ( $3 =~ "720x480" ) {
		$self->{ video_res } = 480;
	    } else {
		die "Unable to determine video resolution";
	    }
	}

	if ( /Stream #([0-9]+)[.:]([0-9]+)\[.* Audio: / ) {

	    my $mcopy = "-map $1:$2";
	    if ( $self->{ map_dot } ) {
		$mcopy =~ s/:/./g;
	    }

	    if ( ! $saw_5_1 ) {

		if ( /, 5\.1, / ) {
		    $saw_5_1 = 1;
		    $self->{ audio_map } = $mcopy;
		}

		if ( m/, stereo, / ||  m/, 2 channels, / ) {

		    $self->{ audio_map } = $mcopy;
		}
	    }
	}
    }
    close $fh;

    die "Can't find video information" if ! $self->{ video_map } || ! $self->{ video_res };
    die "Can't find audio information" if ! $self->{ audio_map };
}


sub get_cropping_argument {
    my $self = shift;

    $self->set_job_status("Determining cropping values");

    # We'll go 5 minutes into the recording, to avoid any issues with
    # opening credits being in a strange shape.  Then, draw 10 samples
    # at 15 second intervals.  We choose the most commonly reported
    # answer.

    my %votehash;

    for (0 .. 10) {
	my $time_delay = 300 + $_ * 15;
	my @oneshot = $self->probe_cropping_data($self->{ orig_pathname }, $time_delay);
	for (@oneshot) {
	    if ( ! $votehash{$_} ) {
		$votehash{$_} = 1;
	    } else {
		$votehash{$_}++;
	    }
	}
    }

    my $beststring = "";
    my $bestcount = -1;
    foreach my $key (keys %votehash) {
	if ( $votehash{$key} > $bestcount ) {
	    $beststring = $key;
	    $bestcount = $votehash{$key};
	}
    }
    return $beststring;
}



sub probe_cropping_data {
    my $self = shift;
    my $filename = shift;
    my $delay = shift;

    my $fh;
    open ($fh, "$mplayer_pathname $filename -ss $delay -identify -frames 10 -vo md5sum:outfile=/dev/null -ao null -nocache -vf pp=lb,cropdetect=20:16 2>/dev/null |") || die "Unable to probe for cropping settings.";

    my @result;
    while (<$fh>) {
	if ( /CROP.*X:.*Y:.*\((-vf crop=[0-9]+:[0-9]+:[0-9]+:[0-9]+)\)/ ) {
	    push @result, $1;
	}
    }
    close $fh;
    return @result;
}


sub set_job_status {
    my $self = shift;
    my $text = shift;

    die "Job status string too long" if length($text) > 127;

    if ( $self->get_debug_level() == 0 && $self->{ jobid } ) {
	$self->connect_to_db();

	my $qstr = "UPDATE jobqueue SET comment=? WHERE id=?";
	my $query = $self->{ dbhandle }->prepare($qstr);
	$query->execute($text, $self->{ jobid });
	$self->{ dbhandle }->commit();
	
	$self->disconnect_db();
    } else {
	print "STATUS:  $text\n";
    }
}


# This resubmits the job as another job with a later time.  Then it
# erases its own entry, which will make noise in the error log for the
# backend, but otherwise is safe.
sub resubmit_self_and_exit {
    my $self = shift;
    
    my $jobtype = $self->my_job_type();

    $self->connect_to_db();

# This copies my entry into the job queue, but submitted for "now".
    my $qstr = "INSERT INTO jobqueue (chanid,starttime,inserttime,type,cmds,flags,status,hostname,args,schedruntime) SELECT chanid,starttime,current_timestamp(),type,cmds,flags,1,hostname,args,current_timestamp() FROM jobqueue WHERE id=?";

    my $query = $self->{ dbhandle }->prepare($qstr);
    $query->execute($self->{ jobid });

    $qstr = "DELETE FROM jobqueue WHERE id=?";
    $query = $self->{ dbhandle }->prepare($qstr);
    $query->execute($self->{ jobid });

    # and this rewrites all the other queued MythXCode jobs, so they don't
    # have to do that on their own.

    $qstr = "UPDATE jobqueue SET inserttime=current_timestamp() WHERE type=? AND status=1";
    $query = $self->{ dbhandle }->prepare($qstr);
    $query->execute($jobtype);

    $self->{ dbhandle }->commit();

    $self->disconnect_db();

    exit 0;
}


sub my_job_type {
    my $self = shift;

    $self->connect_to_db( { readonly => 1 } );
    my $jobtype;

    my $qstr = "SELECT type FROM jobqueue WHERE id=?";
    my $query = $self->{ dbhandle }->prepare($qstr);
    $query->execute($self->{ jobid });
    my @onerow = $query->fetchrow_array;
    
    if ( @onerow ) {
	$jobtype = $onerow[0];
    }

    $query->finish();
    $self->disconnect_db();
    return $jobtype;
}


# We might want to let other jobs go first, if we're blocking waiting
# for another MythXCode process to complete.  This determines if there are
# any others waiting.
sub other_jobs_queued {
    my $self = shift;
    my $rv = undef;

    my $jobtype = $self->my_job_type;

    if ( $jobtype ) {

	$self->connect_to_db( { readonly => 1 } );
    
	# Now, find other queued jobs, not of the same type
	my $qstr = "SELECT COUNT(id) FROM jobqueue WHERE type != ? AND status=1";
	my $query = $self->{ dbhandle }->prepare($qstr);
	$query->execute($jobtype);
	my @onerow = $query->fetchrow_array;
	
	if ( $onerow[0] != 0 ) {
	    $rv = 1;
	}

	$query->finish();
	$self->disconnect_db();
    }


    return $rv;
}


sub wait_on_lock {
    my $self = shift;

    if ( $self->{ lockfile } ) {

	if ( ! $self->{ jobid } ) {
	    die "Asked for locking, but without a job ID.";
	}

	# We'll auto-close this on exit, releasing the lock.
	open $self->{ lockhandle }, '>', $self->{ lockfile } || die "Failed to create lockfile.";

	# Try a non-blocking, exclusive lock
	while ( ! flock $self->{ lockhandle}, LOCK_EX | LOCK_NB ) {

	    if ( $self->other_jobs_queued() ) {
		$self->resubmit_self_and_exit();
	    }

	    # wait for 5 minutes and check again.
	    sleep 301;
	}
    }
}


sub perform_transcoding_to_h264 {
    my $self = shift;

    $self->examine_source_media();

    my $cropping_arg;

    if ( ! $self->{ cannot_crop } && $self->program_may_need_cropping() ) {
	$cropping_arg = $self->get_cropping_argument();
    }

    my $bitrate;
    if ( $self->{ video_res } == 1080 ) {
	$bitrate = "6000k";
    } elsif ( $self->{ video_res } == 720 ) {
	$bitrate = "4000k";
    } elsif ( $self->{ video_res } == 480 ) {
	$bitrate = "2000k";
    } else {
	die "Unable to determine bitrate for h.264 transcoding";
    }

    my $xcoded_name = $self->{ orig_pathname };
    $xcoded_name =~ s/\.[^.]+$/.mkv/;

    # Don't re-transcode the same file
    if ( $self->{ orig_pathname } eq $xcoded_name ) {
	return;
    }

    # Before we get to .mkv we'll pass through a .tmp file and a .ts file
    my $tmpname1 = $xcoded_name . ".tmp";
    my $tmpname2 = $xcoded_name . ".ts";

    if ( -e $tmpname1 ) {
	unlink $tmpname1;
    }
    if ( -e $tmpname2 ) {
	unlink $tmpname2;
    }

    # we use the h264_mp4toannexb filter so that we can rebuild the
    # seek table.  That's convenient.

    my $command;
    my $localname1;
    my $localname2;

    my $audioopts;
    if ( ! $self->high_definition() ) {
	$audioopts = "-acodec libfaac -ab 256k -ac 2";
    } else {
	$audioopts = "-acodec copy";
    }

    if ( $self->{ ssh_string } ) {
	$localname1 = basename($tmpname1);
	$localname2 = basename($tmpname2);
	$command = "$self->{ ssh_string } $ffmpeg_pathname -y -i - -threads $self->{ ssh_threads } -b $bitrate $cropping_arg -vcodec libx264 -vpre fast -vpre baseline -vf yadif $self->{ video_map } $audioopts $self->{ audio_map } $ffmpeg_containeropts $localname1 2\\>\\&1 < $self->{ orig_pathname }";
    } else {
	$command = "$ffmpeg_pathname -i $self->{ orig_pathname } -b $bitrate $cropping_arg -vcodec libx264 -vpre fast -vpre baseline -vf yadif $self->{ video_map } $audioopts $self->{ audio_map } $ffmpeg_containeropts $tmpname1 2>&1 </dev/null";
    }

    print "EXECUTE:  $command\n";
    if ( $self->get_debug_level() > 1 ) {

	print "Convert h264 to TS.\n";
	print "Rebuild keyframe index.\n";
	print "rename $tmpname2 $xcoded_name\n";
	print "unlink $self->{ orig_pathname }\n";
	print "Update database to reflect change\n";

    } else {

	$self->set_job_status("Beginning h.264 transcode");

	my $oldIRS = $/;
	$/ = "\r";

	my $fh;
	
	open ( $fh, "$command |") || die "Failed to execute ffmpeg for transcoding";

	my $last_percent = 0;
	while (<$fh>) {

	    my $timesecs;
	    my $fps;

	    if ( /^frame= *[0-9]+ +fps= *([0-9]+) +q=.*time= *([0-9]+\.[0-9]+) / ) {
		$fps = $1;
		$timesecs = $2;
	    }

	    if ( /^frame= *[0-9]+ +fps= *([0-9]+) +q=.*time= *([0-9]+):([0-9]+):([0-9]+\.[0-9]+) / ) {
		$fps = $1;
		$timesecs = $2 * 3600 + $3 * 60 + $4;
	    }

	    if ( $timesecs ) {
	    
		my $frac_done = $timesecs / $self->{ duration_secs };
		my $this_percent = int( $frac_done * 100 );

		if ($this_percent > $last_percent) {
		    $self->set_job_status("H.264 transcode $this_percent\% done.  $1 FPS");
		    $last_percent = $this_percent;
		}
	    }
	}

	close $fh;
	$/ = $oldIRS;

	if ( $? != 0 ) {
	    $self->set_job_status("H.264 transcode failed");
	    die "Transcoding to h.264 failed.";
	}

	$self->set_job_status("Running through mp4toannexb to allow rebuilding the keyframe index");

	if ( $self->{ ssh_string } ) {
	    $command = "$self->{ ssh_string} $ffmpeg_pathname -y -i $localname1 -vcodec copy -vbsf h264_mp4toannexb -acodec copy $localname2 >/dev/null 2>&1";
	} else {
	    $command = "$ffmpeg_pathname -i $tmpname1 -vcodec copy -vbsf h264_mp4toannexb -acodec copy $tmpname2 >/dev/null 2>&1";
	}
	print "EXECUTE:  $command\n";
	if ( system($command) != 0 ) {
	    $self->set_job_status("H.264 transcode failed");
	    die "Transcoding to h.264 failed.";
	}


	if ( $self->{ ssh_string } ) {
	    $command = "$self->{ ssh_string } cat $localname2 > $tmpname2";
	    if ( system($command) != 0 ) {
		$self->set_job_status("H.264 transcode failed");
		die "Copying file back from remote engine failed.";
	    }
	    $command = "$self->{ ssh_string} /bin/rm -f $localname1 $localname2";
	    if ( system($command) != 0 ) {
		print "Filed to unlink file on remote engine.";
	    }
	} else {
	    unlink $tmpname1;
	}


	$self->set_job_status("Finished h.264 transcode");

	rename $tmpname2, $xcoded_name;
	my $newlen = -s $xcoded_name;

	$self->connect_to_db();

	my $qstr = "DELETE FROM recordedseek WHERE chanid=? AND starttime=?";
	if ( $self->get_debug_level() == 0 ) {
	    my $query = $self->{ dbhandle }->prepare($qstr);
	    $query->execute($self->{ chanid }, $self->{ starttime });
	} else {
	    print "$qstr\n";
	}

	$qstr = "DELETE FROM recordedmarkup WHERE chanid=? AND starttime=?";
	if ( $self->get_debug_level() == 0 ) {
	    my $query = $self->{ dbhandle }->prepare($qstr);
	    $query->execute($self->{ chanid }, $self->{ starttime });
	} else {
	    print "$qstr\n";
	}

	$qstr = "UPDATE recorded SET basename=?,filesize=?,transcoded='1' WHERE chanid=? AND starttime=?";
	if ( $self->get_debug_level() == 0 ) {
	    my $query = $self->{ dbhandle }->prepare($qstr);
	    $query->execute($xcoded_name, $newlen, $self->{ chanid }, $self->{ starttime });
	} else {
	    print "$qstr\n";
	}

	if ( $self->get_debug_level() == 0 ) {
	    if ( ! $self->{ dbhandle }->commit() ) {
		unlink $xcoded_name;
		die "Failed to move transcoded file into database.";
	    }

	    unlink $self->{ orig_pathname };

	    $self->disconnect_db();
	}

	# Rebuild the seek table, but don't worry if it fails.
	$self->set_job_status("Rebuilding seek table after transcode");
	my $command = "mythcommflag --rebuild -c $self->{ chanid } -s '$self->{ starttime }'";
	print "EXECUTE:  $command\n";
	if ( $self->get_debug_level() == 0 ) {
	    system($command);
	}
	$self->set_job_status("Finished rebuilding seek table");
    }
}

sub set_queue_status {
    my $self = shift;
    my $code = shift;

    $self->connect_to_db();

    my $qstr = "UPDATE jobqueue SET status=? WHERE id=?";
    my $query = $self->{ dbhandle }->prepare($qstr);
    $query->execute($code, $self->{ jobid } );

    $self->{ dbhandle }->commit();
    $self->disconnect_db();
}


#########################################################################
#
# The following methods are expected to be used by the consumer.  The
# methods above this line are internal, consider them 'private'.
#
#########################################################################

=head2 C<set_db_parms(DBNAME, DBUSER, DBPASSWD [, DBHOST [, DBPORT ] ])>

This method is required and must be called before C<init()>.  

This method informs the MythXCode object of the parameters needed to
access the MythTV database.  At least I<DBNAME>, I<DBUSER>, and
I<DBPASSWD> are required parameters.  I<DBHOST> is optional, and
declares that the database is running on a remote host, which can be
either a resolvable hostname or an IP number.  I<DBPORT> is the
non-standard port number on the remote host.  To use a non-standard
port number on the local host, set I<DBHOST> to the IP number of an
interface on the local machine that is listening for connections.

This method ABENDs if the three required parameters are not supplied.
Otherwise, it will always succeed.  It does not validate the
connection parameters.

=cut

sub set_db_parms {
    my $self = shift;
    my $dbname = shift;
    my $dbuser = shift;
    my $dbpasswd = shift;
    my $dbhost = shift;
    my $dbport = shift;

    die "Required database name not supplied." if ! $dbname;
    die "Required database user not supplied." if ! $dbuser;
    die "Required database password not supplied." if ! $dbpasswd;

    $self->{ dbname } = $dbname;
    $self->{ dbuser } = $dbuser;
    $self->{ dbpasswd } = $dbpasswd;
    $self->{ dbhost } = $dbhost;
    $self->{ dbport } = $dbport;
    $self->{ parms_set } = 1;
}

=head2 C<set_backup_directory(DIRNAME)>

This method is optional, and must be run before C<init()>.

This method informs the MythXCode object that we want a backup made of
the working files before processing begins.  A new directory will be
created, with filename F<DIRNAME/transcode_PID> where PID is the
process ID of the process running MythXCode.  If that directory
already exists, it is an error, and execution terminates.

The directory must have sufficient free space to hold the recording
file, plus a relatively small amount of database information.

=cut

sub set_backup_directory {
    my $self = shift;
    my $backup_dir = shift;

    $self->{ backup_dir } = $backup_dir;
}

=head2 C<set_debug_level(NUM)>

This method is optional, and must be run before C<init()>.

MythTV systems are not all alike.  The operations that this package
performs are potentially destructive.  While every attempt has been
made to detect and gracefully handle error conditions, it is
recommended that the user test the procedure first.  The debug level
should be set first to 2, the script executed, and the log examined
for unexpected or undesirable behaviour.  In debug level 2, no
database writes are performed, and no files are created or modified.

If everything looks safe at debug level 2, the debug level should be
reduced to 1 and the procedure repeated.  At debug level 1, no
database writes are performed, but transcoded/cut files are produced
and a .srt file with captions information is created.  These files
will occupy space on disk and will not be deleted by MythTV, which has
no knowledge of them, so they should be manually deleted after
debugging is complete.  Again, the log should be examined for any
unexpected or undesirable behaviour.

Finally, the debug level should be set to 0.  Files are created,
original recording files are deleted and replaced by the new files,
and database entries are changed to reflect this.  This is the normal
operation mode.

=cut

sub set_debug_level {
    my $self = shift;
    my $debug_level = shift;

    $self->{ debug_level } = $debug_level;
}


=head2 C<set_lock_file(FILENAME)>

This method is optional, and must be run before C<init()>.

A transcode job can run for a long time.  It is not hard to queue up
days of transcodes.  The user may want to allow other jobs, such as
commercial flagging runs, to execute as new recordings are created,
without waiting days for the queue of transcode jobs to empty.

To do this, supply a common absolute pathname, F<FILENAME>, to all
MythXCode objects using this method.  The MythTV job execution queue
must be set to allow at least two concurrent jobs.  Also, the
C<set_program_identifier()> method must supply the I<JOBID> argument
because the locking method must interact with the MythTV job
scheduler.

Implementation is via an advisory lock on the supplied filename.  When
the C<init()> method is called, an attempt is made to acquire the
lock.  If it succeeds, the C<init()> method returns.  If it fails, it
sleeps for 5 minutes and tries again.  Before sleeping, it checks the
job queue to see if there are any user jobs of a different type
waiting in the queue.  If there are, it finds all queued jobs of the
same type and resets their submission times to now.  It then resubmits
itself, and exits, freeing up the job queue entry.

The net effect, if MythTV is configured to allow two concurrent jobs
to run, is that There will initially be two running MythXCode jobs,
but one will not be making progress and will not be consuming CPU.
If, later, another job of a different type, such as commercial
flagging, appears on the queue, then the blocked MythXCode job will,
within 5 minutes, detect it, and reorder the queue so that the
commercial flagging job moves to the front of the line.  This allows
the commercial flagging job to run concurrently with the MythXCode, so
it can be done in a timely manner.

=cut

sub set_lock_file {
    my $self = shift;
    $self->{ lockfile } = shift;
}


=head2 C<set_remote_ffmpeg_engine(SSH_INVOCATION_STRING [, NUMTHREADS])>

This method is required and must be called before C<init()>.  

The MythTV backend may not have the most computing resources available
on your network.  It may, therefore, be desirable to offload the
transcoding to H.264 to another computer.  In that case, this method
can be used.

The I<SSH_INVOCATION_STRING> must be a string that allows
password-free access to an account on the remote machine.  An example
might be C<S<ssh -l guest -i /home/mythtv/.ssh/key bigbox>>.  The second
argument is optional, and should be the integer number of threads to
use, passed to the C<-threads> option of F<ffmpeg>.  The default
number of threads is 1.

There must be enough space in the home directory of the remote machine
to hold the transcoded file, but the original file will not be copied
over, it is streamed to the C<stdin> of F<ffmpeg>.  Once the file has
been copied back, it will be deleted from the remote machine,
releasing the disk space.

=cut

sub set_remote_ffmpeg_engine {
    my $self = shift;
    my $ssh_string = shift;
    my $num_threads = shift;

    if ( ! $num_threads ) {
	$num_threads = 1;
    }

    die "Bad thread number" if $num_threads < 1;

    $self->{ ssh_string } = $ssh_string;
    $self->{ ssh_threads } = $num_threads;
}

=head2 C<ffmpeg_map_uses_colon()>

This method is optional, and must be called before C<init()>.

The command-line invocation for stream mapping in F<ffmpeg> has
changed recently.  Older versions use a syntax like C<-map 0.1>, while
newer ones use C<-map 0:1>.  This method asks that the colon separator
be used in F<ffmpeg> invocations.  This is also the default, so this
method need not be used, but is recommended in case the F<ffmpeg>
invocation changes again in the future.

=cut

sub ffmpeg_map_uses_colon {
    my $self = shift;
    undef $self->{ map_dot };
}

=head2 C<ffmpeg_map_uses_colon()>

This method is optional, and must be called before C<init()>.

The command-line invocation for stream mapping in F<ffmpeg> has
changed recently.  Older versions use a syntax like C<-map 0.1>, while
newer ones use C<-map 0:1>.  This method asks that the dot separator
be used in F<ffmpeg> invocations.  Use this method if your F<ffmpeg>
does not recognize the newer format.

=cut

sub ffmpeg_map_uses_dot {
    my $self;
    $self->{ map_dot } = 1;
}


=head2 C<set_program_identifier(CHANID, STARTTIME [, JOBID])>

This method is required, and must be called before C<init()>.

It is used to set the channel ID and starting time of the recording to
be processed.  The optional third parameter holds the job ID, and must
be supplied if locking is being used (see C<set_lock_file>).

The expected formats for these three parameters are simply the strings
that MythTV supplies to user jobs when the pseudo-parameters
B<%CHANID%>, B<%STARTTIMEISO%>, and B<%JOBID%> are used in the command
line invocation.

=cut

sub set_program_identifier {
    my $self = shift;
    my $chanid = shift;
    my $starttime = shift;
    my $jobid = shift;

    die "It is not legal to call set_program_identifier more than once on a MythXCode object" if $self->{ chanid };

    die "Failed to supply a channel ID" if ! $chanid;
    die "Failed to supply a starttime" if ! $starttime;
    
    $self->{ chanid } = $chanid;
    $self->{ starttime } = $starttime;
    $self->{ jobid } = $jobid;

}



=head2 C<init()>

This method is required.  It validates some of the configurations set
up by the previously-described methods.  It verifies that the database
schema is one that has been tested and is known to work.  If locking
is enabled, it acquires the lock.

This method will never return an error, but it may not return.  If the
configuration settings supplied earlier are wrong, it may ABEND at
this time.  If the process has to move out of the way to allow a
commercial flagging job to move up in the queue, it will do a quiet
exit and not return.

=cut

sub init {
    my $self = shift;

    $self->check_database_schema();
    $self->wait_on_lock();
    $self->{ initialized } = 1;
    $self->read_recording_data();
}


=head2 C<perform_backup()>

This method is optional, and must be called after C<init()>.  

If a backup directory was set with C<set_backup_dir()>, the backup is
performed at this time.  See C<set_backup_dir()> for details of the
location of the backups.  The backup holds the original MPEG recording
file, a file named F<recovery> that shows how to copy this file back
to its original location, and several MySQL command files suitable for
streaming directly into F<mysql> to restore the relevant database
entries for the recording.

B<NOTE>: no backups will be performed unless this method is explicitly
called.

=cut

sub perform_backup {
    my $self = shift;
    my $newdir = $self->{ backup_dir };

    die "Failed to call init()" if ! $self->{ initialized };

    if ( ! $newdir ) {
	return;
    }

    $newdir = $newdir . "/transcode_$$";
    die "Backup directory already exists" if -d $newdir;
    mkdir $newdir,0700 || die "Failed to create backup directory.";

    if ( $self->get_debug_level() > 0 ) {
	print "cp $self->{ orig_pathname } $newdir\n";

    } else {
	$self->set_job_status("Backing up original file.");
	if (system( "cp", $self->{ orig_pathname }, $newdir ) != 0 ) {
	    die "Failed to make the backup copy.";
	}
	my $hints;
	open $hints, ">$newdir/recovery";
	print $hints "cp $newdir/$self->{ orig_basename} $self->{ orig_pathname }\n";
	close $hints;
    }

    for ( @tables_to_backup ) {
	
	my $tablename = $_;

	my $hints;
	open $hints, ">$newdir/$tablename";
	print $hints "DELETE FROM $_ WHERE chanid='$self->{ chanid }' AND starttime='$self->{ starttime }';\n";
	my @onetable = @{$self->{ backups }->{ $tablename }};
	for ( my $i = 1; $i <= $#onetable; $i++ ) {

	    print $hints "INSERT INTO $_ (", join(',', @{$onetable[0]}), " ) VALUES ('", join ("','", map { my $t = $_; $t =~ s|'|\\'|g; $t; } @{$onetable[$i]}), "');\n";
	}
	if ( ! close $hints ) {
	    die "Failed to write backup data file $tablename";
	}
    }

    if ( $self->get_debug_level() == 0 ) {
	$self->set_job_status("");
    }
}


=head2 C<cut_and_caption()>

This method is optional, and must be called after C<init()>.

This method extracts the captions from the recording and applies the
cutlist (if any).  Unless running in debugging mode (see
C<set_debug_level()>), this operation makes permanent changes to the
recording on disk and to the database entries related to it.

This method should be called before C<transcode_to_h264()>, as
otherwise captions and cutlist data will be lost.

=cut

sub cut_and_caption {
    my $self = shift;

    die "Failed to call init()" if ! $self->{ initialized };

    if ( $self->safe_to_cut_before_captions() ) {

	if ( $self->has_cutlist() ) {

	    $self->apply_cutlist_to_recording();

	}

	if ( ! $self->captions_already_extracted() ) {

	    $self->extract_captions();

	}

    } else {
	
	if ( ! $self->captions_already_extracted() ) {

	    $self->extract_captions();

	}

	if ( $self->has_cutlist() ) {

	    $self->apply_cutlist_to_recording();

	}
    }
}



=head2 C<transcode_to_h264()>

This method is optional, and must be called after C<init()>.

If a cutlist exists, or if captions are to be preserved,
C<cut_and_caption()> should be called before this method.

This method converts an MPEG recording to an H.264 recording.  It will
use the remote F<ffmpeg> host, if requested (see
C<set_remote_ffmpeg_engine()>).

The resulting recording will have a valid seek table.

Unless running in debugging mode (see C<set_debug_level()>), this
operation makes permanent changes to the recording on disk and to the
database entries related to it.

=cut

sub transcode_to_h264 {
    my $self = shift;

    die "Failed to call init()" if ! $self->{ initialized };

    $self->perform_transcoding_to_h264();
}


=head1 AUTHOR

Christopher Neufeld.  Copyright 2012, released under the GPL version 3.

=cut

Invocation Script

The .pm file describes a class. A separate script is needed to invoke the methods defined in the class. Here is the one I use, xcode_to_h264.pl

#! /usr/bin/perl

# xcode_to_h264.pl is a user job.  It should be given the three parameters  %CHANID% "%STARTTIMEISO% %JOBID%

use lib '/home/mythtv';     # My MythXCode.pm is located here

use MythXCode;
use strict;

umask 0022;

my $dbname = "mythconverg";
my $dbuser = "USER";
my $dbpasswd = "PASSWORD";

# my $backup_dir = "/myth/tmp";
# my $lock_file = $backup_dir . "/transcode_lock";

my $dbglevel = 0;



my $chanid = $ARGV[0];
my $starttime = $ARGV[1];
my $jobid = $ARGV[2];

die "Bad channel ID" if ! $chanid;
die "Bad start time" if ! $starttime;

$starttime =~ tr/T/ /;


my $worker = MythXCode->new();

$worker->set_db_parms($dbname, $dbuser, $dbpasswd);

# $worker->set_backup_directory($backup_dir);

$worker->set_debug_level($dbglevel);

# $worker->set_lock_file($lock_file);
# $worker->set_remote_ffmpeg_engine("ssh -i /home/mythtv/.ssh/bigboxkey bigbox", 3);

$worker->set_program_identifier($chanid, $starttime, $jobid);

$worker->init();

# $worker->perform_backup();

$worker->cut_and_caption();
$worker->transcode_to_h264();

Read the documentation for MythXCode.pm for more details. You can set up a locking scheme that allows transcode jobs to move out of the way of commercial flagging jobs, in the event that your transcoding runs take a very long time. You can (and probably should) use the backup facility to produce files capable of exactly restoring the state of the recording in case something goes wrong. You can use a remote ffmpeg offload engine (I do, as a 64-bit box is much faster than my 32-bit backend on transcodes).