[mythtv-commits] Ticket #10805: TFW - Taking too long to flush

Sun Jul 8 12:01:16 UTC 2012

#10805: TFW - Taking too long to flush
-----------------------------------------+----------------------------
 Reporter:  athroener@…                  |          Owner:  danielk
     Type:  Bug Report - General         |         Status:  infoneeded
 Priority:  minor                        |      Milestone:  unknown
Component:  MythTV - General             |        Version:  0.25-fixes
 Severity:  medium                       |     Resolution:
 Keywords:  Taking a long time to flush  |  Ticket locked:  0
-----------------------------------------+----------------------------

Comment (by jens@…):

 Since I upgraded from Ubuntu 11.10 (MythTV 0.24) to 12.04 (MythTV 0.25) I
 experience the same issue. When I try to record two or more channels
 simultaneously the backend frequently locks up. Sometimes with the "Taking
 a long time to flush", sometimes with the "Maximum buffer size exceeded"
 messages.

 I'm not certain about this but to me it seems the lock-ups occur
 preferentially when recording on one channel finishes or switches from one
 show to the next. Otherwise I haven't found any way to reliably reproduce
 the effect. It seems to occur rather randomly from a few minutes to a
 couple of hours or more after recordings start.

 The issue does not seem to be related to excessive IO load. I can wildly
 shuffle files around on the disks without triggering it whereas normally
 nothing else is generating much load on the PC.

 I have attached logs from one incident.

 I'm recording to a level 5 RAID with 5 disks. After rebooting the PC the
 RAID is in an inconsistent state and needs to resync.

 Switching to the mythbuntu repos (v0.25.1-58-g1d41f74) didn't help.

 I tried to attach gdb to the process with "gdb /usr/bin/mythbackend PID"
 but unfortunately I can't get a backtrace after the lock-up. Before gdb
 output looks as expected but afterwards gdb can't stop the process anymore
 as the I/O threads are stuck in "uninterruptible sleep" (ps.txt).

 Trying to kill -9 the backend leaves a zombie behind with the I/O threads
 still in "uninterruptible sleep".

 Any idea how to get a backtrace or coredump anyway?

 As I don't want to risk my older recordings to get corrupted and resyncing
 of my RAID takes more than two hours I have now set up a small extra RAID
 which I can use for tests. If you require more data or something I can
 gather it here. Just tell me what you need.

 I have also recorded I/O utilization with iotop. At 23:27:35 in the log
 (iotop.txt) all threads stop writing. In the ps.txt log you can see that
 at the same time
 the threads get blocked. In the backend log I can't see anything
 extraordinary around that time. This is what always happens: The writing
 threads fall into uninterruptible sleep and stop writing.

 After reading Warpme's message 'Re: [mythtv-users] recent "TFW -- took a
 long time" warnings' in the mythtv-users mailing list I also logged the
 /proc/vmstat values (vmstat.txt). The nr_dirty value always stays way
 below the thresholds.

 I hope this helps to track down the cause. As said above if you require
 more information or data please let me know.

-- 
Ticket URL: <http://code.mythtv.org/trac/ticket/10805#comment:4>
MythTV <http://code.mythtv.org/trac>
MythTV Media Center