[mythtv-users] Need ideas for auto-recovery of locked up backend

Simon Hobson linux at thehobsons.co.uk
Tue Apr 16 06:58:50 UTC 2013


Craig Huff wrote:
>I have had a couple of incidents recently (and a few more over the
>many years I've been using MythTV) in which something runs berserk
>(different days, different things) and my backend is left running but
>locked up doing nothing but burning electrons until I get home and
>kick it (hit the reset button).  Usually, in such cases, even ssh is
>not my friend since the system is so tied up in knots that it
>can't/won't allow an ssh session to connect.

Have you ever got a feel for whether "stuff" is still running or not ?
A common option is a watchdog timer. While the system is running, it periodically resets the timer and as long as it keeps doing that, then everything runs normally. If the system locks up and the timer expires, then the watchdog sinks it's teeth into the hardware reset line and reboots the system. Quite a common thing to find on systems designed for embedded use.
Now this is fine if you simply start a process that resets the watchdog timer and that process "dies" along with the rest of the system. But if it's a case of the system going beserk and killing some processes - but the watchdog is still running, then your reset won't happen. In principal, you can make the watchdog reset process check other stuff - so for example if Myth backend isn't responding or SSH has died then reboot, and so on.

Using/implementing one isn't quite as simple as that. It needs to start in the inactive state - then when the system has booted and the software is running it can be activated. Otherwise, there's a risk of the watchdog resetting the system before it finishes booting - especially if you've large filesystems to fsck. On the other hand, if the watchdog has reset the system, it needs to stay active incase it fails to boot the first time - but it probably needs a considerably longer timeout.


More information about the mythtv-users mailing list