[mythtv-users] New Hardware Build Troubleshooting (Possible XFS Kernel Bug?)

Paul Gallaway pgallaway at gmail.com
Thu Jun 4 01:19:12 UTC 2009


I need some assistance/suggestions for debugging my system hardware.
It's a new build and fresh install. Problems have been the same
running Debian Lenny or Squeeze and with kernel 2.6.26 or 2.6.29, all
on the amd64 branch (aka x86_64). System has been crashing about once
a day since I put it together. These are hard locks as the system
stops responding to all inputs and is no longer available through
either SSH or the local PS/2 keyboard. It is not possible to restart
X/gdm, switch to TTY etc. Occasionally it results in a reboot. It does
not appear to be specific to MythTV, occurring when the system is or
is not running the backend (or frontend).

As it's the cheapest to troubleshoot (costing nothing but time...) I
have been trying to rule out software issues to the best of my
ability.
-Confirmed the backend was not running as root (it wasn't/isn't),
-Ran the system without mythtfrontend running,
-Read the meaningful logs I could think of (kernel, mythtv, dmesg)
looking for errors and faults and haven't seen anything. There's
probably more I can do here,
-Tried mythtv from two different Debian repositories, and also tried
installing from source,
-Upgraded from Debian Lenny (stable) to Squeeze (testing),
-Upgraded from 2.6.26 to 2.6.29 kernel,
-Upgraded Nvidia driver from 173.x to 180.44,
-Attempted a backtrace on mythbackend (following instructions here:
http://www.mythtv.org/docs/mythtv-HOWTO-22.html#ss22.2). I understand
that a crash while running gdb indicates that something external to
mythbackend is causing the crash - the crash prevented the backtrace
from completing,
-Applied/recompiled/installed the patch found at ticket # 5733
(http://svn.mythtv.org/trac/ticket/5733), and
-Ran the system without mythtv processes running and it still crashed.

Current setup is Debian Squeeze running kernel 2.6.29. I have compiled
and installed the most recent 0.21-fixes from SVN (as of Saturday
afternoon). Running the 180.44 Nvidia binary driver.

After all this there is still no change to the symptoms and crashes
still occur at some time <24 hours after starting. It seems to crash
most often while the discs are being accessed. I have not noticed a
crash when the system is doing nothing (e.g. recording, transcoding).

Currently I am investigating whether a kernel bug with XFS could be
the cause but I haven't seen the error message reported on this list.
Here's what I can find on it:
http://bugzilla.kernel.org/show_bug.cgi?id=13375
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=526406
The bug reports indicate this may be a regression bug from 2.6.28 -
2.6.29 but my problems existed with 2.6.26 (or maybe I have more than
one crash condition...). Before I reformat my XFS drives to ext3 has
anyone else on this list experienced similar problems that could be
related to this reported bug? Short of reformatting I'm considering
running an idle system with my XFS drives unmounted to see if I can
make it >1 day.

Barring that I feel I've pretty much done all I can on the software
side and need to move on to hardware. Here's what I've done so far:
-Ran Memtest86+ recently for 14 hours - no errors. Also ran this for
>24 hours prior to putting installing the OS.
-Ran smart tests prior to installing the system. The drives were OK
prior to the install and have the latest firmware. You guessed it,
Seagate 7200.11 1TB drives - system is a 7200.9 300GB drive. Haven't
checked since but...
-Ran the system without the UPS. The hard lock still occurred without
the UPS so ruling this out.
-Moved the PVR-150 card to the other PCI slot. The crash still occurred.

Any thing obvious I've missed? I'm leaning toward a motherboard or PSU
problem since I can't, at this point, pin it on anything else and I
don't know of error that would reliably implicate those pieces. I'm
not averse to pulling hardware and trying new parts but my other
systems are all >5 years old so it means putting more money into this
system. Hopefully there are some steps I can take to identify the
problem and RMA the problem part or have a good case for why a
particular piece of hardware is bad prior to buying a replacement. I'm
getting

Ideally there are updates tonight that will make the problem go away
but that wishful thinking hasn't helped so far.

-- 
~pAul.

        all good things. all in good time.


More information about the mythtv-users mailing list