Server hangs shortly after starting Parity-Sync/data-rebuild

tgggd86 · December 17, 2016

I recently updated my server from 6.1.8 to 6.2.4 and have had intermittent errors, but all resulting in the entire server hanging. The only consistent replicateable error is as follows.

Power on server following unclean shutdown due to server previously hanging. One of my drives (1.5TB Western Digital eSATA external) requires data rebuild. Start Array which then begins Parity-Sync/Data-Rebuild. Stop all my docker apps. About 1min (.5%) into Parity-Sync/Data-Rebuild: Unable to access over network/telnet, GUI no workie and command line output directly from the server is also unresponsive.

Ran Memtest overnight and had no issues. HDD in question has no SMART errors and can be written/read fine. No hardware changes since update to 6.2.4. Syslog attached.

Hardware:

M/B: ASUSTeK Computer INC. - M4A89GTD-PRO/USB3

CPU: AMD Phenom™ II X4 910e @ 2600

HVM: Enabled

IOMMU: Disabled

Cache: 512 kB, 2048 kB, 6144 kB

Memory: 4 GB (max. installable capacity 8 GB)

Network: bond0: fault-tolerance (active-backup), mtu 1500

eth0: 1000 Mb/s, full duplex, mtu 1500

Kernel: Linux 4.4.30-unRAID x86_64

OpenSSL: 1.0.2j

tower-syslog-20161216-2155.zip

Frank1940 · December 17, 2016

One thing that you can do is to try this

"* If the system crashes completely and there is no way to capture a final syslog, then start a tail on the unRAID console or Telnet session (tail -f /var/log/syslog)."

This come from the first post in this thread:

http://lime-technology.com/forum/index.php?topic=39257.0

If you use a monitor, you will probably have to take a picture of what you get. (Watch out for flash reflections and focus if you do this.) If you use PuTTY, you can probably copy and paste into a message.

tgggd86 · December 20, 2016

Left my server on for a few days without starting the array and ran pre-clear on a new drive and there were no issues. Started the array and same issue. Full syslogs capturing the hang attached.

Specifically the ICRC ABRT error seems to be the main culprit. Only problem is it is recorded as a different ata device each time. (ata7 x 2 and ata3) both of these ata devices are on my Marvel SATA adapter (88SE63xx/64 BIOS: 3.1.0.15N) I'm assuming my Marvel SATA controller is the main problem here, but it has no issues running pre-clear or running extended SMART tests on my drives. Only thing I can point to is that 6.2.x broke my SATA controller, so reverting back to 6.1.8 might be my best option. Thoughts?

syslog1.txt

syslog2.txt

syslog3.txt

Frank1940 · December 20, 2016

You might want to read through this thread.

http://lime-technology.com/forum/index.php?topic=40683

I don't know if this is part of your problem or not but it is something that you might want to explore.

Fireball3 · December 23, 2016

Only thing I can point to is that 6.2.x broke my SATA controller, so reverting back to 6.1.8 might be my best option.

Any feedback based on Frank1940's link?

Have you checked for a new controller firmware?

Although it's kind of voodoo, i read posts that solved issues by simply re-flashing the controller.

tgggd86 · December 23, 2016

Flashed new firmware on my card (Supermicro AOC-SASLP-MV8) and even updated my mobo bios to the latest version and still the same problems. I disabled virtualization and also modded the config file as mentioned in the recommended link and still same issues. Leads me to believe my card is bad and/or the Marvell controller just no longer works with unRaid. Hopefully a new Controller card fixes the problem.

tgggd86 · December 31, 2016

So I replaced the Supermicro card with the Marvell controller with an Dell H310 and flashed it to the P20 firmware. When I started my array and had parity sync begin, I encountered the exact same errors. Of note, while I was waiting for the Dell H310, I was running preclear on 2x new 8TB WD red drives. The one attached to the Mobo SATA ports ran around 100MB/s, but the one attached through the Supermicro/Marvell controller crawled at around 2MB/s. Also I ran a SMART extended test on all drives with the new controller and they all passed with no issues.

My next thought is to replace the SATA breakout cables and see if that fixes the problem. Anyone have any thoughts?

Server hangs shortly after starting Parity-Sync/data-rebuild

Recommended Posts

tgggd86

Link to comment

Frank1940

Link to comment

tgggd86

Link to comment

Frank1940

Link to comment

Fireball3

Link to comment

tgggd86

Link to comment

tgggd86

Link to comment

Join the conversation