[mythtv] [Resending] Voice control of MythTV

Sun Mar 23 20:53:22 EST 2003

I'm just resending this in plain text, the copy that went to archives looked
horrible.  Damn Outlook Express and its insistence on crappy formatting.
Anyway, just ignore if you've already read it before.
Sorry.

Original message:
----------

This is my first posting to this list, but I have been following the
archives and development of MythTV closely for a while now.  I am writing to
inform the developers of a project I have been working on as part of my
senior design for a Computer Engineering degree.  The project revolves
around enabling voice control of various home devices, and we are
specifically making a voice controlled set-top box as part of our design.
I'd like to let people know about the project to see if there is any
interest, and also to suggest some development ideas for MythTV.

There are three main components to our system, a voice recognition server, a
set-top box, and a voice remote control.  The set-top box is simply our
custom, light-weight Linux distribution with software for multimedia
capabilities.  This is obviously where MythTV is useful.  The voice remote
control is a Palm Tungsten T, which we are using due to its integrated
microphone and Bluetooth networking capabilities.  We have software running
on the Palm which transfers any audio received over Bluetooth to the set-top
box that it is controlling.  The set-top box then talks to our voice
recognition server, and awaits a control code to signal what action needs to
be taken.   A regular microphone hooked up to the sound card can also be
used, but doesn't have as much of a "wow factor."

The major part of our project centers in the voice recognition server
itself.  A server approach was chosen because most home devices, including
set-top boxes, have minimal power and voice recognition capabilities may be
too much to embed directly in these devices.  This also complements MythTV's
design, being that it is split up into a frontend and a backend.  One voice
recognition server could be running on a single backend system, enabling
voice control of many frontends.

The voice recognition server works on what we call transactions.  Each
transaction specifies one grammar, and any number of optional phrases.  The
grammar defines words which are always considered possible utterances the
user might say.  For instance "play," "pause," "record," etc. are always
something a user could say when watching timeshifted television.

Phrases are loaded dynamically to complement the main grammar, where it
makes sense to do so.  Consider MythMusic, and the list of audio tracks it
displays.  Those tracks could be loaded dynamically, enabling the user to
say "Play Radiohead, Karma Police."  Capabilities could also be added to
allow a TV Guide to be filtered into just airings of a single show that the
user could say, without a clunky onscreen keyboard for instance.
Capabilities such as these are what I would like to discuss.

MythTV is built from the assumption of your standard, every-day remote
control.  It is fairly easy to voice control these standard commands, such
as "play" and "stop," but the real interesting features are when you can
dynamically select an item to play, from anywhere on screen or even perhaps
items that are not listed on screen, such as when you have a long audio
playlist.

Admittedly, I have not yet looked very in depth at the MythTV source code,
as I have been concentrating on the server and voice remote.  However, I am
beginning to, and I'd like to get feedback from the experienced developers
as to what may be the best method to implement an advanced control
architecture such as this.  Control codes returned by the server are
strings, which are defined in the main grammar.  Dynamic grammars can be
told what control code to return when they are sent to the server.  I know
MythTV uses MySQL to store information, so maybe it would be feasible to
have an extra field for the control code, or to build a code out of the
current fields.  The right TV Show/MP3/Movie/whatever could then be played
based on what control code is returned from the server.  Once this system is
in place, various interesting user interfaces could be built around it.
Voice could be used for interfaces not possible with a standard remote, or
simply as an alternative to a standard remote.

Our group would like to open source as much of our project as possible.  Any
changes that we would make to Myth would be, quite obviously.  The Palm
Voice Remote software would also be, and hopefully other projects that need
wireless voice input from greater ranges would find that useful.  The server
itself links against Nuance libraries for voice recognition capabilities.
The code could be open sourced, but it isn't of much use without the
libraries.  These libraries used to be available for free ($), for
development purposes from the Nuance Developer Network.  That has been
discontinued, however.  Luckily, I obtained copies of all their software
when the program was still active.  I am currently looking into options with
regard to this situation.

Anyway, I'd really like to hear from the developers if there is any interest
in this type of project.  Just from running this for personal use and
testing, I can say that it is incredibly cool, even from the standpoint of
normal commands.  Dynamic grammar functionallity is potentially a real area
that open source software could point to and say "we beat the commercial
guys to it."  If you have any questions, feel free to email me or post them
to the list.  I will resond as soon as possible.

Thanks for the great software you guys have created,
Jared Hanson