[mythtv-users] OT Help: mdadm segfault

Tue Sep 18 05:46:01 UTC 2007

    > Date: Mon, 17 Sep 2007 22:22:19 -0700
    > From: "Steve MacLaren" <scram69 at gmail.com>

    > Out of desperation, I even booted off the liveCD, used Synaptic to install
    > mdadm in the liveCD environment, and then attempted to assemble the array
    > (sudo mdadm --assemble --scan).  Exact same result: "md0 assembled with 4
    > devices" followed 10 seconds later by a reboot...

I haven't exactly been following this thread closely, but I suddenly
had an idea---this smells of a power supply problem.

Try this hypothesis on for size:  Your PSU went marginal, which is
what caused the original crash.  The machine will now stay up as long
as you -don't- try to pull maximum power.  But what happens the
instant you start trying to assemble your four-disk array?  All the
heads start thrashing around and your drives go to maximum power
demand, causing your PSU to undervolt just enough to cause the CPU
to reset.

Farfetched?  You bet.  But if you can try putting those drives in
another machine, or try putting another PSU in the original machine,
you might be able to quickly eliminate that hypothesis.  (And no, this
doesn't explain the segfaults, either, unless wacky data got scribbled
somewhere during the original crash, so keep reading...)

While you're at it, make sure your CPU heatsink didn't come loose.
Though I'll bet just running something computationally intensive
could test that---does memtest86+ complete at least one full cycle
without either a memory error or a spontaneous reboot?

Once you've eliminated the physical components, you can go back to
figuring out what weird data pattern on your disks is causing your
RAID to segfault.

P.S.  Can you try booting off a newer LiveCD?  Maybe you've hit some
bug that got fixed.

P.P.S.  Other things to try:  Boot from LiveCD and make sure the SMART
data for each drive looks sane.  Then, try dd'ing each drive to make
sure you can read from it; perhaps one drive is causing your disk
interface to go haywire and that resets the machine.  (E.g., for each
drive, try dd if=/dev/sdaN of=/dev/null bs=1M count=1000 and see if
you can read the first gig of the device; make -SURE- you don't get
those if's and of's reversed! :)  Maybe try a readability test where
you simply dd -all- of each drive to /dev/null (tests for weird
resets, but doesn't draw as much power as all four heads thrashing,
and -also- won't test for, e.g., your disk controller spazzing out
when you've got multiple heads moving---I had an old Via-based mobo
that ate a filesystem once I'd filled 1.5 disks with one, but had
worked flawlessly with only a single disk connected; fortunately that
had been a backup server and the data was thus instantly replaceable).

Maybe also try configuring a serial console and see if anything gets
blatted out just before the machine resets?  A longshot, but...