Transcoding Preserving Captions
Author | Christopher Neufeld |
Description | This framework supplies the means for a user job to do a lossless cut on an MPEG4 recording, while still retaining the ability of MythTV to display closed captions. It also provides methods for transcoding the video to H.264 format, with a valid seek table, while preserving captions. |
Supports |
A common request is to perform a lossless transcode without losing the closed caption data in the ivtv data stream. For technical reasons, this is a difficult problem. The technique outlined here allows the preservation of captions during a lossless transcode. This is done not by preserving the caption data in the transcoded stream, but by extracting it into a .srt file, which is a file that MythTV automatically detects and makes available if caption displaying is enabled. Once the .srt file is produced, it can be used with the stream however it is transcoded, so converting it to H.264 format is also supported, and still preserves the captions.
This technique makes use of the pull-captions.pl script shown at Closed_captioning, which in turn depends on the ccextractor project (see that page for details).
To use this script, first put the following .pm file into your Perl library path. The name I've used is MythXCode.pm, but you can rename that with a global search/replace if it collides with something in your namespace. To view the documentation for MythXCode.pm, simply run:
pod2man MythXCode.pm | groff -man -Tascii | less
Library Module
Now, here is MythXCode.pm:
#! /usr/bin/perl # A class to handle transcoding, for people with unusual requirements. # package MythXCode; =pod =head1 NAME MythXCode Perl package - Support for transcoding and preserving captions =head1 SYNOPSIS use MythXCode; my $worker = MythXCode->new(); $worker->set_db_parms("mythconverg", "mythtv", "myth-db-password"); $worker->set_program_identifier($chanid, $starttime, $jobid); $worker->init(); $worker->cut_and_caption(); $worker->transcode_to_h264(); undef $worker; =head1 DESCRIPTION The C<MythXCode> package provides configurable support for certain operations connected to transcoding files in MythTV. Its primary function is to preserve closed-caption data as recordings are manipulated, for instance when cutting to remove commercials, or when creating DVDs. It has been tested on NTSC recordings with ivtv captions, and on North American ATSC digital OTA captions. This package allows the user to convert MPEG recordings to H.264. The resulting transcoded recordings will have valid seek tables and their closed captioning data will be intact. The settings for H.264 transcodes are hard-coded in the file, and must be edited if they're not deemed appropriate for the circumstances. Standard definition recordings are transcoded to 2000 kbits/s. 720p/i recordings are transcoded to 4000 kbits/s, and 1080p/i recordings are transcoded to 6000 kbits/s. Edit the method C<perform_transcoding_to_h264()> if you want to adjust these. The MythXCode objects have only been tested while working on a single recording. If multiple recordings are to be processed, each must be done in its own MythXCode object, rather than reusing the object. This package depends on the F<pull-captions.pl> script to extract captions from recordings with cutlists. That script can be found at L<http://www.mythtv.org/wiki/Closed_captioning>, and it depends, in turn, on the ccextractor binary found at L<http://ccextractor.sourceforge.net/>. The following methods should be considered public methods. Methods not documented here are intended for internal use within MythXCode, and should not be called by users of the package. The user should call these methods in order, though not all applications will have to use all methods. =cut use strict; use DBI; use File::Basename; use Fcntl qw(:flock); # This script pokes around in the database. Things might go really # wrong if the database innards are changed. So, we sanity check # against that. This script has been tested on schema 1264. my $schema_var = "DBSchemaVer"; my $verified_schema = 1264; my $pull_captions_pathname = "/home/mythtv/pull-captions.pl"; my $ffmpeg_pathname = "/usr/bin/ffmpeg"; my $ffprobe_pathname = "/usr/bin/ffprobe"; my $mplayer_pathname = "/usr/bin/mplayer"; my $ffmpeg_containeropts = "-f matroska"; # you must have "recordedprogram" here, other parts of the code use # it. my @tables_to_backup = ( "recorded", "recordedprogram", "recordedseek", "recordedmarkup" ); sub new { my $class = shift; my $self = {}; bless $self, $class; $self->check_dependent_executables(); return $self; } sub check_dependent_executables { my $self = shift; die "Unable to locate pull-captions.pl" if ! -x "$pull_captions_pathname"; die "Unable to locate ffmpeg" if ! -x "$ffmpeg_pathname"; die "Unable to locate ffprobe" if ! -x "$ffprobe_pathname"; if ( ! -x "$mplayer_pathname" ) { print "NOTICE: unable to locate mplayer, no cropping will be done.\n"; $self->{ cannot_crop } = 1; } } sub get_debug_level { my $self = shift; if ( $self->{ debug_level } ) { return $self->{ debug_level }; } else { return 0; } } sub connect_to_db { my $self = shift; die "Failed to set database parameters" if ! $self->{ parms_set }; my $dsn; if ( $self->{ dbhost } ) { if ( $self->{ dbportnum } ) { $dsn = "DBI:mysql:database=$self->{dbname};host-$self->{dbhost};port=$self->{dbport}"; } else { $dsn = "DBI:mysql:database=$self->{dbname};host-$self->{dbhost}"; } } else { $dsn = "DBI:mysql:database=$self->{dbname}"; } if ( @_ && @_[0]->{ readonly } ) { if ( $self->get_debug_level() > 0) { print "Opening read-only connection to database.\n"; } $self->{ dbhandle } = DBI->connect($dsn, $self->{ dbuser }, $self->{ dbpasswd }, { RaiseError => 1, AutoCommit => 0, ReadOnly => 1 }); } else { if ( $self->get_debug_level() > 0) { print "Opening read-write connection to database.\n"; } $self->{ dbhandle } = DBI->connect($dsn, $self->{ dbuser }, $self->{ dbpasswd }, { RaiseError => 1, AutoCommit => 0, ReadOnly => 0 }); } die "Failed to connect to the database: $DBI::errstr" if ! $self->{ dbhandle }; } sub disconnect_db { my $self = shift; if ( $self->get_debug_level() > 0) { print "Disconnecting from database\n"; } $self->{ dbhandle }->disconnect; $self->{ dbhandle } = undef(); } sub check_database_schema { my $self = shift; $self->connect_to_db( { readonly => 1 } ); my $qstr = "SELECT data FROM settings WHERE value=?"; my $query = $self->{ dbhandle }->prepare($qstr); $query->execute($schema_var); my @onerow = $query->fetchrow_array(); $query->finish(); $self->disconnect_db(); die "Unverified database schema. This script must be carefully inspected for the new version." if $onerow[0] != $verified_schema; } sub read_recording_data { my $self = shift; $self->connect_to_db( { readonly => 1 } ); my @dirs; my @onerow; my $query = $self->{ dbhandle }->prepare("SELECT dirname FROM storagegroup"); $query->execute(); while ( @onerow = $query->fetchrow_array ) { push @dirs, $onerow[0]; } die "No storage group directories located" if $#dirs == -1; $query = $self->{ dbhandle }->prepare("SELECT basename FROM recorded WHERE chanid=? AND starttime=?"); $query->execute( $self->{ chanid }, $self->{ starttime } ); @onerow = $query->fetchrow_array; die "Failed to locate recording basename" if $#onerow != 0; $self->{ orig_basename } = $onerow[0]; @onerow = $query->fetchrow_array; die "Corrupted database, found multiple recordings at chanid and starttime" if $#onerow != -1; for ( @dirs ) { my $candidate = $_ . "$self->{ orig_basename }"; if ( -e "$candidate" ) { $self->{ orig_pathname } = $candidate; last; } } $self->{ backups } = {}; for ( @tables_to_backup ) { @{$self->{ backups }->{ $_ }} = $self->retrieve_table ( $_ ); } $self->disconnect_db(); } sub retrieve_table { my $self = shift; my $tablename = shift; my $qstr = "SELECT * from $tablename WHERE chanid=? AND starttime=?"; my $query = $self->{ dbhandle }->prepare($qstr); $query->execute($self->{ chanid }, $self->{ starttime }); my $names = $query->{NAME}; my @retval; push @retval, $names; while ( my @res = $query->fetchrow_array ) { push @retval, \@res; } return @retval; } sub generate_caption_filename { my $self = shift; my $localcopy = $self->{ orig_pathname }; $localcopy =~ s/\.[^.]+$/.srt/; return $localcopy; } sub captions_already_extracted { my $self = shift; return -s $self->generate_caption_filename(); } sub has_cutlist { my $self = shift; $self->connect_to_db( { readonly => 1 } ); my $qstr = "SELECT * FROM recordedmarkup WHERE chanid=? AND starttime=? AND ( type=0 OR type=1 )"; my $query = $self->{ dbhandle }->prepare($qstr); $query->execute($self->{ chanid }, $self->{ starttime }); my @onerow = $query->fetchrow_array; $query->finish(); $self->disconnect_db(); return $#onerow != -1; } sub clear_cutlist { my $self = shift; $self->connect_to_db(); my $query; my $qstr = "DELETE FROM recordedmarkup WHERE chanid=? AND starttime=? AND type=0"; if ( $self->get_debug_level() > 0 ) { print "DBExecute: $qstr; args= '$self->{ chanid }','$self->{ starttime }'\n"; } else { $query = $self->{ dbhandle }->prepare($qstr); $query->execute($self->{ chanid }, $self->{ starttime }); } $qstr = "DELETE FROM recordedmarkup WHERE chanid=? AND starttime=? AND type=1"; if ( $self->get_debug_level() > 0 ) { print "DBExecute: $qstr; args= '$self->{ chanid }','$self->{ starttime }'\n"; } else { $query = $self->{ dbhandle }->prepare($qstr); $query->execute($self->{ chanid }, $self->{ starttime }); } $qstr = "UPDATE recorded SET cutlist=0 WHERE chanid=? AND starttime=?"; if ( $self->get_debug_level() > 0 ) { print "DBExecute: $qstr; args= '$self->{ chanid }','$self->{ starttime }'\n"; } else { $query = $self->{ dbhandle }->prepare($qstr); $query->execute($self->{ chanid }, $self->{ starttime }); } $qstr = "UPDATE recorded SET commflagged=0 WHERE chanid=? AND starttime=?"; if ( $self->get_debug_level() > 0 ) { print "DBExecute: $qstr; args= '$self->{ chanid }','$self->{ starttime }'\n"; } else { $query = $self->{ dbhandle }->prepare($qstr); $query->execute($self->{ chanid }, $self->{ starttime }); } if ( $self->get_debug_level() > 0 ) { print "COMMIT !!!\n"; } else { $self->{ dbhandle }->commit || die "Database update failed clearing cutlist.\n"; } $self->disconnect_db(); } sub get_recordedprogram_table_data { my $self = shift; my $names = $self->{ backups }->{ "recordedprogram" }[0]; my $res = $self->{ backups }->{ "recordedprogram" }[1]; my %hashret; for (0 .. $#$res) { $hashret{$$names[$_]} = $$res[$_]; } return %hashret; } sub safe_to_cut_before_captions { my $self = shift; my %alldat = $self->get_recordedprogram_table_data(); return $alldat{ "videoprop" } =~ "HDTV,(720|1080)"; } sub program_may_need_cropping { my $self = shift; my %alldat = $self->get_recordedprogram_table_data(); return $alldat{ "videoprop" } !~ "HDTV,(720|1080)"; } sub high_definition { my $self = shift; my %alldat = $self->get_recordedprogram_table_data(); return $alldat{ "videoprop" } =~ "HDTV,(720|1080)"; } sub extract_captions { my $self = shift; my $command = "$pull_captions_pathname $self->{ chanid } '" . $self->{ starttime } . "' " . $self->generate_caption_filename() . " >/dev/null 2>&1"; print "EXECUTE: $command\n"; if ( $self->get_debug_level() < 2 ) { $self->set_job_status("Extracting captions"); if ( system($command) != 0) { $self->set_job_status("Failed while extracting captions"); die "Failed to pull the captions from the recording."; } $self->set_job_status("Finished extracting captions"); } } sub apply_cutlist_to_recording { my $self = shift; my $tmpname = $self->{ orig_pathname } . ".tmp"; my $command = "mythtranscode -m -c $self->{ chanid } -s '$self->{ starttime }' -l -o $tmpname >/dev/null 2>&1"; print "EXECUTE: $command\n"; if ( $self->get_debug_level() == 0 ) { $self->set_job_status("Performing lossless cut on the recording."); if ( system($command) != 0 ) { $self->set_job_status("Failed while applying cutlist"); die "Failed to apply cutlist to the recording."; } if ( ! -s $tmpname ) { $self->set_job_status("Failed while applying cutlist"); die "Failed to apply cutlist to the recording."; } my $newlen = -s $tmpname; if ( $newlen <= 0 ) { $self->set_job_status("Failed while applying cutlist"); die "Failed to apply cutlist to the recording."; } $self->set_job_status("Finished applying cutlist to recording."); $self->connect_to_db(); my $qstr = "DELETE FROM recordedseek WHERE chanid=? AND starttime=?"; my $query = $self->{ dbhandle }->prepare($qstr); $query->execute($self->{ chanid }, $self->{ starttime }); $qstr = "DELETE FROM recordedmarkup WHERE chanid=? AND starttime=?"; $query = $self->{ dbhandle }->prepare($qstr); $query->execute($self->{ chanid }, $self->{ starttime }); $qstr = "UPDATE recorded SET filesize=? WHERE chanid=? AND starttime=?"; $query = $self->{ dbhandle }->prepare($qstr); $query->execute($newlen, $self->{ chanid }, $self->{ starttime }); my $deletename = $self->{ orig_pathname } . ".DELETEME"; if ( ! rename $self->{ orig_pathname }, $deletename ) { $self->{ dbhandle }->rollback(); unlink $tmpname; die "Failed to move aside the original recording."; } if ( ! rename $tmpname, $self->{ orig_pathname } ) { rename $deletename, $self->{ orig_pathname }; $self->{ dbhandle }->rollback(); die "Failed to move the recording after cutting."; } if ( ! $self->{ dbhandle }->commit() ) { unlink $self->{ orig_pathname }; rename $deletename, $self->{ orig_pathname }; die "Database update failed after cutlist"; } $self->disconnect_db(); unlink $deletename; unlink $tmpname . ".map"; $command = "mythtranscode --buildindex -c $self->{ chanid } -s '$self->{ starttime }' >/dev/null 2>&1"; $self->set_job_status("Rebuilding keyframe index"); if ( system($command) != 0 ) { print "Failed to rebuild keyframe index.\n"; } $self->set_job_status("Finished lossless cut on the recording."); $self->clear_cutlist(); } } sub examine_source_media { my $self = shift; print "EXECUTE: $ffprobe_pathname $self->{ orig_pathname } 2>&1\n"; my $fh; open ($fh, "$ffprobe_pathname $self->{ orig_pathname } 2>&1 |") || die "Unable to run ffprobe on $self->{ orig_pathname }"; my $saw_5_1; while (<$fh>) { if ( /Duration: ([0-9]+):([0-9]+):([0-9]+\.[0-9][0-9]), start: / ) { $self->{ duration_secs } = $1 * 3600 + $2 * 60 + $3; print "Media duration is $self->{ duration_secs } seconds.\n"; } if ( /Stream #([0-9]+)[.:]([0-9]+)\[.* Video:.*(1920x1080|1280x720|720x480)/ ) { $self->{ video_map } = "-map $1:$2"; if ( $self->{ map_dot } ) { $self->{ video_map } =~ s/:/./g; } if ( $3 =~ "1920x1080" ) { $self->{ video_res } = 1080; } elsif ( $3 =~ "1280x720" ) { $self->{ video_res } = 720; } elsif ( $3 =~ "720x480" ) { $self->{ video_res } = 480; } else { die "Unable to determine video resolution"; } } if ( /Stream #([0-9]+)[.:]([0-9]+)\[.* Audio: / ) { my $mcopy = "-map $1:$2"; if ( $self->{ map_dot } ) { $mcopy =~ s/:/./g; } if ( ! $saw_5_1 ) { if ( /, 5\.1, / ) { $saw_5_1 = 1; $self->{ audio_map } = $mcopy; } if ( m/, stereo, / || m/, 2 channels, / ) { $self->{ audio_map } = $mcopy; } } } } close $fh; die "Can't find video information" if ! $self->{ video_map } || ! $self->{ video_res }; die "Can't find audio information" if ! $self->{ audio_map }; } sub get_cropping_argument { my $self = shift; $self->set_job_status("Determining cropping values"); # We'll go 5 minutes into the recording, to avoid any issues with # opening credits being in a strange shape. Then, draw 10 samples # at 15 second intervals. We choose the most commonly reported # answer. my %votehash; for (0 .. 10) { my $time_delay = 300 + $_ * 15; my @oneshot = $self->probe_cropping_data($self->{ orig_pathname }, $time_delay); for (@oneshot) { if ( ! $votehash{$_} ) { $votehash{$_} = 1; } else { $votehash{$_}++; } } } my $beststring = ""; my $bestcount = -1; foreach my $key (keys %votehash) { if ( $votehash{$key} > $bestcount ) { $beststring = $key; $bestcount = $votehash{$key}; } } return $beststring; } sub probe_cropping_data { my $self = shift; my $filename = shift; my $delay = shift; my $fh; open ($fh, "$mplayer_pathname $filename -ss $delay -identify -frames 10 -vo md5sum:outfile=/dev/null -ao null -nocache -vf pp=lb,cropdetect=20:16 2>/dev/null |") || die "Unable to probe for cropping settings."; my @result; while (<$fh>) { if ( /CROP.*X:.*Y:.*\((-vf crop=[0-9]+:[0-9]+:[0-9]+:[0-9]+)\)/ ) { push @result, $1; } } close $fh; return @result; } sub set_job_status { my $self = shift; my $text = shift; die "Job status string too long" if length($text) > 127; if ( $self->get_debug_level() == 0 && $self->{ jobid } ) { $self->connect_to_db(); my $qstr = "UPDATE jobqueue SET comment=? WHERE id=?"; my $query = $self->{ dbhandle }->prepare($qstr); $query->execute($text, $self->{ jobid }); $self->{ dbhandle }->commit(); $self->disconnect_db(); } else { print "STATUS: $text\n"; } } # This resubmits the job as another job with a later time. Then it # erases its own entry, which will make noise in the error log for the # backend, but otherwise is safe. sub resubmit_self_and_exit { my $self = shift; my $jobtype = $self->my_job_type(); $self->connect_to_db(); # This copies my entry into the job queue, but submitted for "now". my $qstr = "INSERT INTO jobqueue (chanid,starttime,inserttime,type,cmds,flags,status,hostname,args,schedruntime) SELECT chanid,starttime,current_timestamp(),type,cmds,flags,1,hostname,args,current_timestamp() FROM jobqueue WHERE id=?"; my $query = $self->{ dbhandle }->prepare($qstr); $query->execute($self->{ jobid }); $qstr = "DELETE FROM jobqueue WHERE id=?"; $query = $self->{ dbhandle }->prepare($qstr); $query->execute($self->{ jobid }); # and this rewrites all the other queued MythXCode jobs, so they don't # have to do that on their own. $qstr = "UPDATE jobqueue SET inserttime=current_timestamp() WHERE type=? AND status=1"; $query = $self->{ dbhandle }->prepare($qstr); $query->execute($jobtype); $self->{ dbhandle }->commit(); $self->disconnect_db(); exit 0; } sub my_job_type { my $self = shift; $self->connect_to_db( { readonly => 1 } ); my $jobtype; my $qstr = "SELECT type FROM jobqueue WHERE id=?"; my $query = $self->{ dbhandle }->prepare($qstr); $query->execute($self->{ jobid }); my @onerow = $query->fetchrow_array; if ( @onerow ) { $jobtype = $onerow[0]; } $query->finish(); $self->disconnect_db(); return $jobtype; } # We might want to let other jobs go first, if we're blocking waiting # for another MythXCode process to complete. This determines if there are # any others waiting. sub other_jobs_queued { my $self = shift; my $rv = undef; my $jobtype = $self->my_job_type; if ( $jobtype ) { $self->connect_to_db( { readonly => 1 } ); # Now, find other queued jobs, not of the same type my $qstr = "SELECT COUNT(id) FROM jobqueue WHERE type != ? AND status=1"; my $query = $self->{ dbhandle }->prepare($qstr); $query->execute($jobtype); my @onerow = $query->fetchrow_array; if ( $onerow[0] != 0 ) { $rv = 1; } $query->finish(); $self->disconnect_db(); } return $rv; } sub wait_on_lock { my $self = shift; if ( $self->{ lockfile } ) { if ( ! $self->{ jobid } ) { die "Asked for locking, but without a job ID."; } # We'll auto-close this on exit, releasing the lock. open $self->{ lockhandle }, '>', $self->{ lockfile } || die "Failed to create lockfile."; # Try a non-blocking, exclusive lock while ( ! flock $self->{ lockhandle}, LOCK_EX | LOCK_NB ) { if ( $self->other_jobs_queued() ) { $self->resubmit_self_and_exit(); } # wait for 5 minutes and check again. sleep 301; } } } sub perform_transcoding_to_h264 { my $self = shift; $self->examine_source_media(); my $cropping_arg; if ( ! $self->{ cannot_crop } && $self->program_may_need_cropping() ) { $cropping_arg = $self->get_cropping_argument(); } my $bitrate; if ( $self->{ video_res } == 1080 ) { $bitrate = "6000k"; } elsif ( $self->{ video_res } == 720 ) { $bitrate = "4000k"; } elsif ( $self->{ video_res } == 480 ) { $bitrate = "2000k"; } else { die "Unable to determine bitrate for h.264 transcoding"; } my $xcoded_name = $self->{ orig_pathname }; $xcoded_name =~ s/\.[^.]+$/.mkv/; # Don't re-transcode the same file if ( $self->{ orig_pathname } eq $xcoded_name ) { return; } # Before we get to .mkv we'll pass through a .tmp file and a .ts file my $tmpname1 = $xcoded_name . ".tmp"; my $tmpname2 = $xcoded_name . ".ts"; if ( -e $tmpname1 ) { unlink $tmpname1; } if ( -e $tmpname2 ) { unlink $tmpname2; } # we use the h264_mp4toannexb filter so that we can rebuild the # seek table. That's convenient. my $command; my $localname1; my $localname2; my $audioopts; if ( ! $self->high_definition() ) { $audioopts = "-acodec libfaac -ab 256k -ac 2"; } else { $audioopts = "-acodec copy"; } if ( $self->{ ssh_string } ) { $localname1 = basename($tmpname1); $localname2 = basename($tmpname2); $command = "$self->{ ssh_string } $ffmpeg_pathname -y -i - -threads $self->{ ssh_threads } -b $bitrate $cropping_arg -vcodec libx264 -vpre fast -vpre baseline -vf yadif $self->{ video_map } $audioopts $self->{ audio_map } $ffmpeg_containeropts $localname1 2\\>\\&1 < $self->{ orig_pathname }"; } else { $command = "$ffmpeg_pathname -i $self->{ orig_pathname } -b $bitrate $cropping_arg -vcodec libx264 -vpre fast -vpre baseline -vf yadif $self->{ video_map } $audioopts $self->{ audio_map } $ffmpeg_containeropts $tmpname1 2>&1 </dev/null"; } print "EXECUTE: $command\n"; if ( $self->get_debug_level() > 1 ) { print "Convert h264 to TS.\n"; print "Rebuild keyframe index.\n"; print "rename $tmpname2 $xcoded_name\n"; print "unlink $self->{ orig_pathname }\n"; print "Update database to reflect change\n"; } else { $self->set_job_status("Beginning h.264 transcode"); my $oldIRS = $/; $/ = "\r"; my $fh; open ( $fh, "$command |") || die "Failed to execute ffmpeg for transcoding"; my $last_percent = 0; while (<$fh>) { my $timesecs; my $fps; if ( /^frame= *[0-9]+ +fps= *([0-9]+) +q=.*time= *([0-9]+\.[0-9]+) / ) { $fps = $1; $timesecs = $2; } if ( /^frame= *[0-9]+ +fps= *([0-9]+) +q=.*time= *([0-9]+):([0-9]+):([0-9]+\.[0-9]+) / ) { $fps = $1; $timesecs = $2 * 3600 + $3 * 60 + $4; } if ( $timesecs ) { my $frac_done = $timesecs / $self->{ duration_secs }; my $this_percent = int( $frac_done * 100 ); if ($this_percent > $last_percent) { $self->set_job_status("H.264 transcode $this_percent\% done. $1 FPS"); $last_percent = $this_percent; } } } close $fh; $/ = $oldIRS; if ( $? != 0 ) { $self->set_job_status("H.264 transcode failed"); die "Transcoding to h.264 failed."; } $self->set_job_status("Running through mp4toannexb to allow rebuilding the keyframe index"); if ( $self->{ ssh_string } ) { $command = "$self->{ ssh_string} $ffmpeg_pathname -y -i $localname1 -vcodec copy -vbsf h264_mp4toannexb -acodec copy $localname2 >/dev/null 2>&1"; } else { $command = "$ffmpeg_pathname -i $tmpname1 -vcodec copy -vbsf h264_mp4toannexb -acodec copy $tmpname2 >/dev/null 2>&1"; } print "EXECUTE: $command\n"; if ( system($command) != 0 ) { $self->set_job_status("H.264 transcode failed"); die "Transcoding to h.264 failed."; } if ( $self->{ ssh_string } ) { $command = "$self->{ ssh_string } cat $localname2 > $tmpname2"; if ( system($command) != 0 ) { $self->set_job_status("H.264 transcode failed"); die "Copying file back from remote engine failed."; } $command = "$self->{ ssh_string} /bin/rm -f $localname1 $localname2"; if ( system($command) != 0 ) { print "Filed to unlink file on remote engine."; } } else { unlink $tmpname1; } $self->set_job_status("Finished h.264 transcode"); rename $tmpname2, $xcoded_name; my $newlen = -s $xcoded_name; $self->connect_to_db(); my $qstr = "DELETE FROM recordedseek WHERE chanid=? AND starttime=?"; if ( $self->get_debug_level() == 0 ) { my $query = $self->{ dbhandle }->prepare($qstr); $query->execute($self->{ chanid }, $self->{ starttime }); } else { print "$qstr\n"; } $qstr = "DELETE FROM recordedmarkup WHERE chanid=? AND starttime=?"; if ( $self->get_debug_level() == 0 ) { my $query = $self->{ dbhandle }->prepare($qstr); $query->execute($self->{ chanid }, $self->{ starttime }); } else { print "$qstr\n"; } $qstr = "UPDATE recorded SET basename=?,filesize=?,transcoded='1' WHERE chanid=? AND starttime=?"; if ( $self->get_debug_level() == 0 ) { my $query = $self->{ dbhandle }->prepare($qstr); $query->execute($xcoded_name, $newlen, $self->{ chanid }, $self->{ starttime }); } else { print "$qstr\n"; } if ( $self->get_debug_level() == 0 ) { if ( ! $self->{ dbhandle }->commit() ) { unlink $xcoded_name; die "Failed to move transcoded file into database."; } unlink $self->{ orig_pathname }; $self->disconnect_db(); } # Rebuild the seek table, but don't worry if it fails. $self->set_job_status("Rebuilding seek table after transcode"); my $command = "mythcommflag --rebuild -c $self->{ chanid } -s '$self->{ starttime }'"; print "EXECUTE: $command\n"; if ( $self->get_debug_level() == 0 ) { system($command); } $self->set_job_status("Finished rebuilding seek table"); } } sub set_queue_status { my $self = shift; my $code = shift; $self->connect_to_db(); my $qstr = "UPDATE jobqueue SET status=? WHERE id=?"; my $query = $self->{ dbhandle }->prepare($qstr); $query->execute($code, $self->{ jobid } ); $self->{ dbhandle }->commit(); $self->disconnect_db(); } ######################################################################### # # The following methods are expected to be used by the consumer. The # methods above this line are internal, consider them 'private'. # ######################################################################### =head2 C<set_db_parms(DBNAME, DBUSER, DBPASSWD [, DBHOST [, DBPORT ] ])> This method is required and must be called before C<init()>. This method informs the MythXCode object of the parameters needed to access the MythTV database. At least I<DBNAME>, I<DBUSER>, and I<DBPASSWD> are required parameters. I<DBHOST> is optional, and declares that the database is running on a remote host, which can be either a resolvable hostname or an IP number. I<DBPORT> is the non-standard port number on the remote host. To use a non-standard port number on the local host, set I<DBHOST> to the IP number of an interface on the local machine that is listening for connections. This method ABENDs if the three required parameters are not supplied. Otherwise, it will always succeed. It does not validate the connection parameters. =cut sub set_db_parms { my $self = shift; my $dbname = shift; my $dbuser = shift; my $dbpasswd = shift; my $dbhost = shift; my $dbport = shift; die "Required database name not supplied." if ! $dbname; die "Required database user not supplied." if ! $dbuser; die "Required database password not supplied." if ! $dbpasswd; $self->{ dbname } = $dbname; $self->{ dbuser } = $dbuser; $self->{ dbpasswd } = $dbpasswd; $self->{ dbhost } = $dbhost; $self->{ dbport } = $dbport; $self->{ parms_set } = 1; } =head2 C<set_backup_directory(DIRNAME)> This method is optional, and must be run before C<init()>. This method informs the MythXCode object that we want a backup made of the working files before processing begins. A new directory will be created, with filename F<DIRNAME/transcode_PID> where PID is the process ID of the process running MythXCode. If that directory already exists, it is an error, and execution terminates. The directory must have sufficient free space to hold the recording file, plus a relatively small amount of database information. =cut sub set_backup_directory { my $self = shift; my $backup_dir = shift; $self->{ backup_dir } = $backup_dir; } =head2 C<set_debug_level(NUM)> This method is optional, and must be run before C<init()>. MythTV systems are not all alike. The operations that this package performs are potentially destructive. While every attempt has been made to detect and gracefully handle error conditions, it is recommended that the user test the procedure first. The debug level should be set first to 2, the script executed, and the log examined for unexpected or undesirable behaviour. In debug level 2, no database writes are performed, and no files are created or modified. If everything looks safe at debug level 2, the debug level should be reduced to 1 and the procedure repeated. At debug level 1, no database writes are performed, but transcoded/cut files are produced and a .srt file with captions information is created. These files will occupy space on disk and will not be deleted by MythTV, which has no knowledge of them, so they should be manually deleted after debugging is complete. Again, the log should be examined for any unexpected or undesirable behaviour. Finally, the debug level should be set to 0. Files are created, original recording files are deleted and replaced by the new files, and database entries are changed to reflect this. This is the normal operation mode. =cut sub set_debug_level { my $self = shift; my $debug_level = shift; $self->{ debug_level } = $debug_level; } =head2 C<set_lock_file(FILENAME)> This method is optional, and must be run before C<init()>. A transcode job can run for a long time. It is not hard to queue up days of transcodes. The user may want to allow other jobs, such as commercial flagging runs, to execute as new recordings are created, without waiting days for the queue of transcode jobs to empty. To do this, supply a common absolute pathname, F<FILENAME>, to all MythXCode objects using this method. The MythTV job execution queue must be set to allow at least two concurrent jobs. Also, the C<set_program_identifier()> method must supply the I<JOBID> argument because the locking method must interact with the MythTV job scheduler. Implementation is via an advisory lock on the supplied filename. When the C<init()> method is called, an attempt is made to acquire the lock. If it succeeds, the C<init()> method returns. If it fails, it sleeps for 5 minutes and tries again. Before sleeping, it checks the job queue to see if there are any user jobs of a different type waiting in the queue. If there are, it finds all queued jobs of the same type and resets their submission times to now. It then resubmits itself, and exits, freeing up the job queue entry. The net effect, if MythTV is configured to allow two concurrent jobs to run, is that There will initially be two running MythXCode jobs, but one will not be making progress and will not be consuming CPU. If, later, another job of a different type, such as commercial flagging, appears on the queue, then the blocked MythXCode job will, within 5 minutes, detect it, and reorder the queue so that the commercial flagging job moves to the front of the line. This allows the commercial flagging job to run concurrently with the MythXCode, so it can be done in a timely manner. =cut sub set_lock_file { my $self = shift; $self->{ lockfile } = shift; } =head2 C<set_remote_ffmpeg_engine(SSH_INVOCATION_STRING [, NUMTHREADS])> This method is required and must be called before C<init()>. The MythTV backend may not have the most computing resources available on your network. It may, therefore, be desirable to offload the transcoding to H.264 to another computer. In that case, this method can be used. The I<SSH_INVOCATION_STRING> must be a string that allows password-free access to an account on the remote machine. An example might be C<S<ssh -l guest -i /home/mythtv/.ssh/key bigbox>>. The second argument is optional, and should be the integer number of threads to use, passed to the C<-threads> option of F<ffmpeg>. The default number of threads is 1. There must be enough space in the home directory of the remote machine to hold the transcoded file, but the original file will not be copied over, it is streamed to the C<stdin> of F<ffmpeg>. Once the file has been copied back, it will be deleted from the remote machine, releasing the disk space. =cut sub set_remote_ffmpeg_engine { my $self = shift; my $ssh_string = shift; my $num_threads = shift; if ( ! $num_threads ) { $num_threads = 1; } die "Bad thread number" if $num_threads < 1; $self->{ ssh_string } = $ssh_string; $self->{ ssh_threads } = $num_threads; } =head2 C<ffmpeg_map_uses_colon()> This method is optional, and must be called before C<init()>. The command-line invocation for stream mapping in F<ffmpeg> has changed recently. Older versions use a syntax like C<-map 0.1>, while newer ones use C<-map 0:1>. This method asks that the colon separator be used in F<ffmpeg> invocations. This is also the default, so this method need not be used, but is recommended in case the F<ffmpeg> invocation changes again in the future. =cut sub ffmpeg_map_uses_colon { my $self = shift; undef $self->{ map_dot }; } =head2 C<ffmpeg_map_uses_colon()> This method is optional, and must be called before C<init()>. The command-line invocation for stream mapping in F<ffmpeg> has changed recently. Older versions use a syntax like C<-map 0.1>, while newer ones use C<-map 0:1>. This method asks that the dot separator be used in F<ffmpeg> invocations. Use this method if your F<ffmpeg> does not recognize the newer format. =cut sub ffmpeg_map_uses_dot { my $self; $self->{ map_dot } = 1; } =head2 C<set_program_identifier(CHANID, STARTTIME [, JOBID])> This method is required, and must be called before C<init()>. It is used to set the channel ID and starting time of the recording to be processed. The optional third parameter holds the job ID, and must be supplied if locking is being used (see C<set_lock_file>). The expected formats for these three parameters are simply the strings that MythTV supplies to user jobs when the pseudo-parameters B<%CHANID%>, B<%STARTTIMEISO%>, and B<%JOBID%> are used in the command line invocation. =cut sub set_program_identifier { my $self = shift; my $chanid = shift; my $starttime = shift; my $jobid = shift; die "It is not legal to call set_program_identifier more than once on a MythXCode object" if $self->{ chanid }; die "Failed to supply a channel ID" if ! $chanid; die "Failed to supply a starttime" if ! $starttime; $self->{ chanid } = $chanid; $self->{ starttime } = $starttime; $self->{ jobid } = $jobid; } =head2 C<init()> This method is required. It validates some of the configurations set up by the previously-described methods. It verifies that the database schema is one that has been tested and is known to work. If locking is enabled, it acquires the lock. This method will never return an error, but it may not return. If the configuration settings supplied earlier are wrong, it may ABEND at this time. If the process has to move out of the way to allow a commercial flagging job to move up in the queue, it will do a quiet exit and not return. =cut sub init { my $self = shift; $self->check_database_schema(); $self->wait_on_lock(); $self->{ initialized } = 1; $self->read_recording_data(); } =head2 C<perform_backup()> This method is optional, and must be called after C<init()>. If a backup directory was set with C<set_backup_dir()>, the backup is performed at this time. See C<set_backup_dir()> for details of the location of the backups. The backup holds the original MPEG recording file, a file named F<recovery> that shows how to copy this file back to its original location, and several MySQL command files suitable for streaming directly into F<mysql> to restore the relevant database entries for the recording. B<NOTE>: no backups will be performed unless this method is explicitly called. =cut sub perform_backup { my $self = shift; my $newdir = $self->{ backup_dir }; die "Failed to call init()" if ! $self->{ initialized }; if ( ! $newdir ) { return; } $newdir = $newdir . "/transcode_$$"; die "Backup directory already exists" if -d $newdir; mkdir $newdir,0700 || die "Failed to create backup directory."; if ( $self->get_debug_level() > 0 ) { print "cp $self->{ orig_pathname } $newdir\n"; } else { $self->set_job_status("Backing up original file."); if (system( "cp", $self->{ orig_pathname }, $newdir ) != 0 ) { die "Failed to make the backup copy."; } my $hints; open $hints, ">$newdir/recovery"; print $hints "cp $newdir/$self->{ orig_basename} $self->{ orig_pathname }\n"; close $hints; } for ( @tables_to_backup ) { my $tablename = $_; my $hints; open $hints, ">$newdir/$tablename"; print $hints "DELETE FROM $_ WHERE chanid='$self->{ chanid }' AND starttime='$self->{ starttime }';\n"; my @onetable = @{$self->{ backups }->{ $tablename }}; for ( my $i = 1; $i <= $#onetable; $i++ ) { print $hints "INSERT INTO $_ (", join(',', @{$onetable[0]}), " ) VALUES ('", join ("','", map { my $t = $_; $t =~ s|'|\\'|g; $t; } @{$onetable[$i]}), "');\n"; } if ( ! close $hints ) { die "Failed to write backup data file $tablename"; } } if ( $self->get_debug_level() == 0 ) { $self->set_job_status(""); } } =head2 C<cut_and_caption()> This method is optional, and must be called after C<init()>. This method extracts the captions from the recording and applies the cutlist (if any). Unless running in debugging mode (see C<set_debug_level()>), this operation makes permanent changes to the recording on disk and to the database entries related to it. This method should be called before C<transcode_to_h264()>, as otherwise captions and cutlist data will be lost. =cut sub cut_and_caption { my $self = shift; die "Failed to call init()" if ! $self->{ initialized }; if ( $self->safe_to_cut_before_captions() ) { if ( $self->has_cutlist() ) { $self->apply_cutlist_to_recording(); } if ( ! $self->captions_already_extracted() ) { $self->extract_captions(); } } else { if ( ! $self->captions_already_extracted() ) { $self->extract_captions(); } if ( $self->has_cutlist() ) { $self->apply_cutlist_to_recording(); } } } =head2 C<transcode_to_h264()> This method is optional, and must be called after C<init()>. If a cutlist exists, or if captions are to be preserved, C<cut_and_caption()> should be called before this method. This method converts an MPEG recording to an H.264 recording. It will use the remote F<ffmpeg> host, if requested (see C<set_remote_ffmpeg_engine()>). The resulting recording will have a valid seek table. Unless running in debugging mode (see C<set_debug_level()>), this operation makes permanent changes to the recording on disk and to the database entries related to it. =cut sub transcode_to_h264 { my $self = shift; die "Failed to call init()" if ! $self->{ initialized }; $self->perform_transcoding_to_h264(); } =head1 AUTHOR Christopher Neufeld. Copyright 2012, released under the GPL version 3. =cut
Invocation Script
The .pm file describes a class. A separate script is needed to invoke the methods defined in the class. Here is the one I use, xcode_to_h264.pl
#! /usr/bin/perl # xcode_to_h264.pl is a user job. It should be given the three parameters %CHANID% "%STARTTIMEISO% %JOBID% use lib '/home/mythtv'; # My MythXCode.pm is located here use MythXCode; use strict; umask 0022; my $dbname = "mythconverg"; my $dbuser = "USER"; my $dbpasswd = "PASSWORD"; # my $backup_dir = "/myth/tmp"; # my $lock_file = $backup_dir . "/transcode_lock"; my $dbglevel = 0; my $chanid = $ARGV[0]; my $starttime = $ARGV[1]; my $jobid = $ARGV[2]; die "Bad channel ID" if ! $chanid; die "Bad start time" if ! $starttime; $starttime =~ tr/T/ /; my $worker = MythXCode->new(); $worker->set_db_parms($dbname, $dbuser, $dbpasswd); # $worker->set_backup_directory($backup_dir); $worker->set_debug_level($dbglevel); # $worker->set_lock_file($lock_file); # $worker->set_remote_ffmpeg_engine("ssh -i /home/mythtv/.ssh/bigboxkey bigbox", 3); $worker->set_program_identifier($chanid, $starttime, $jobid); $worker->init(); # $worker->perform_backup(); $worker->cut_and_caption(); $worker->transcode_to_h264();
Read the documentation for MythXCode.pm for more details. You can set up a locking scheme that allows transcode jobs to move out of the way of commercial flagging jobs, in the event that your transcoding runs take a very long time. You can (and probably should) use the backup facility to produce files capable of exactly restoring the state of the recording in case something goes wrong. You can use a remote ffmpeg offload engine (I do, as a 64-bit box is much faster than my 32-bit backend on transcodes).