[mythtv-commits] Ticket #1882: Fix broken signal handling in housekeeper.c
MythTV
mythtv at cvs.mythtv.org
Sat May 27 22:25:58 UTC 2006
#1882: Fix broken signal handling in housekeeper.c
-----------------------------------+----------------------------------------
Reporter: mrsam at courier-mta.com | Owner: ijr
Type: patch | Status: new
Priority: minor | Milestone:
Component: mythtv | Version: 0.19
Severity: medium |
-----------------------------------+----------------------------------------
A series of events led me to scrutinize the code in housekeeper.c that
runs mythfilldatabase, and the mutual interaction between this code and
the myth_system() code in libmyth/util.cpp.
The series of events in question, was a fairly reproducible instance of my
job queue getting wedged. After about a day of running myth 0.19's
mythbackend, some mythcommflag or transcode job would inevitably stuck in
the "Running" stage. This happened every time. A closer examination
showed that the job apparently completed succesfully, but jobqueue never
got updated, so that this job got wedged in the "Running" state, and any
new jobs just kept piling on in the job queue, in the "Queued" state.
An even closer examination revealed that the thread responsible for
invoking myth_system("/usr/bin/mythcommflag [args]"), etcetera, was in no-
man's land. Attaching a debugger to that thread revealed the following
interesting call trace. Now, I did not, unfortunately, save the actual
gdb's backtrace, but the tail end of the stack frame was, in essence:
1. util.cpp:859 # the call to waitpid() in myth_system()
2. A few stack frames in glibc's waitpid() function
3. [signal handler]
4. housekeeper.cpp:26 # The call to wait(0) in reapChild() in
housekeeper.cpp
5. The wait() syscall
6. <---- You are here
And here, I was stuck.
So, now, how did I get from myth_system() to housekeeper.cpp?
Well, the answer can be found in the linuxthreads FAQ.
http://pauillac.inria.fr/~xleroy/linuxthreads/faq.html#J
============================================================================
J.3: I've installed a handler on a signal. Which thread executes the
handler when the signal is received?
[ ... ]
If the signal is sent to a particular thread using pthread_kill(), then
that thread executes the handler.
[ ... ]
If the signal is sent via kill() or the tty interface (e.g. by pressing
ctrl-C), then the POSIX specs say that the handler is executed by any
thread in the process that does not currently block the signal. In other
terms, POSIX considers that the signal is sent to the process (the
collection of all threads) as a whole, and any thread that is not blocking
this signal can then handle it.
============================================================================
So, the here's what I believe is happening here.
A. The housekeeper thread calls "signal(SIGCHLD, &reapChild)", then forks
off and runs mythfilldatabase. The intent here is to set
"HouseKeeper_filldb_running = false;". Housekeeper's main loop checks if
mythfilldatabase is already running in the background, using this flag,
and, if so, that factors into some decisions. housekeeper does not wait()
for mythfilldatabase to end, it just sets the signal handler, forks it
off, and lets the SIGCHLD handler reset the "HouseKeeper_filldb_running"
flag when it's done.
But, the signal handler remains set forever. So:
B. At some point later, myth_system() forks off and runs mythcommflag, or
transcode
C. The child process does its thing, and exits.
D. The signal handler, reapChild(), that meant to be used when
mythfilldatabase exits, now gets invoked to handle the SIGCHLD that gets
generated when mythcommflag/transcode exits.
It looks like what's happening is that when a child process gets forked,
the new process is considered to be a child of the entire process, not
just the thread that forked it, so when the child process finishes, the
resulting SIGCHLD is *NOT* sent via the pthread_kill() mechanism. It gets
posted to the entire process, and, as per linuxthreads FAQ, any thread can
pick it up, and
run with it.
Now, which thread actually gets it, is not really germane. Whichever
thread gets chosen, it's going to invoke reapChild(), which the real
problem.
The fix is NOT to clear the signal handler when mythfilldatabase is done.
Consider this sequence of events:
1. myth_system() forks off and runs mythcommflag
2. housekeeper.cpp decides it's time to run mythfilldatabase.
3. housekeeper.cpp sets "HouseKeeper_filldb_running = true", sets up a
SIGCHLD handler, and forks off the mythfilldatabase
4. mythcommflag terminates, and generates a SIGCHLD.
5. Since there's a signal handler set for SIGCHLD, it gets invoked.
5. reapChild() sets "HouseKeeper_filldb_running = false", while
mythfilldatabase continues to chug along, doing its business.
6. Boom.
So, I think there's a problem here, that needs fixin'. Maybe it's already
fixed in SVN -- I'm looking at the 0.19 release -- so this message can be
ignored.
I believe that 99.9% of the time this does not actually end up causing
anything bad -- but this is definitely an issue that needs fixing
nevertheless -- and there are other reasons why, in my case, I can
reproduce this fairly consistently. Why it's happening to me it's another
story, but, to make a long story short, the positive side of me being able
to reproduce this badness also means that I can fix it. Which I did.
I've patched this, and I've been testing the fix for almost a week now,
and it completely eliminates this problem, the problem being: you should
either use a signal handler for all child processes, or not. You can't
reliably use a signal handler for some processes, but then rely on a
wait/waitpid for other child processes. You just
can't do it.
My fix is, essentially: ditch the signal handler, create a new detached
thread that forks() mythfilldatabase, waitpid()s for it, clears the
HouseKeeper_filldb_running flag, and goes away. Nice and tidy.
As I mentioned, for various reasons I can reliably reproduce the problems
caused by improper signal handling. After applying the following patch, no
problems have been observed.
This patch is against mythtv-0.19 release.
--
Ticket URL: <http://svn.mythtv.org/trac/ticket/1882>
MythTV <http://www.mythtv.org/>
MythTV
More information about the mythtv-commits
mailing list