tgggd86 Posted December 17, 2016 Share Posted December 17, 2016 I recently updated my server from 6.1.8 to 6.2.4 and have had intermittent errors, but all resulting in the entire server hanging. The only consistent replicateable error is as follows. Power on server following unclean shutdown due to server previously hanging. One of my drives (1.5TB Western Digital eSATA external) requires data rebuild. Start Array which then begins Parity-Sync/Data-Rebuild. Stop all my docker apps. About 1min (.5%) into Parity-Sync/Data-Rebuild: Unable to access over network/telnet, GUI no workie and command line output directly from the server is also unresponsive. Ran Memtest overnight and had no issues. HDD in question has no SMART errors and can be written/read fine. No hardware changes since update to 6.2.4. Syslog attached. Hardware: M/B: ASUSTeK Computer INC. - M4A89GTD-PRO/USB3 CPU: AMD Phenom™ II X4 910e @ 2600 HVM: Enabled IOMMU: Disabled Cache: 512 kB, 2048 kB, 6144 kB Memory: 4 GB (max. installable capacity 8 GB) Network: bond0: fault-tolerance (active-backup), mtu 1500 eth0: 1000 Mb/s, full duplex, mtu 1500 Kernel: Linux 4.4.30-unRAID x86_64 OpenSSL: 1.0.2j tower-syslog-20161216-2155.zip Quote Link to comment
Frank1940 Posted December 17, 2016 Share Posted December 17, 2016 One thing that you can do is to try this "* If the system crashes completely and there is no way to capture a final syslog, then start a tail on the unRAID console or Telnet session (tail -f /var/log/syslog)." This come from the first post in this thread: http://lime-technology.com/forum/index.php?topic=39257.0 If you use a monitor, you will probably have to take a picture of what you get. (Watch out for flash reflections and focus if you do this.) If you use PuTTY, you can probably copy and paste into a message. Quote Link to comment
tgggd86 Posted December 20, 2016 Author Share Posted December 20, 2016 Left my server on for a few days without starting the array and ran pre-clear on a new drive and there were no issues. Started the array and same issue. Full syslogs capturing the hang attached. Specifically the ICRC ABRT error seems to be the main culprit. Only problem is it is recorded as a different ata device each time. (ata7 x 2 and ata3) both of these ata devices are on my Marvel SATA adapter (88SE63xx/64 BIOS: 3.1.0.15N) I'm assuming my Marvel SATA controller is the main problem here, but it has no issues running pre-clear or running extended SMART tests on my drives. Only thing I can point to is that 6.2.x broke my SATA controller, so reverting back to 6.1.8 might be my best option. Thoughts? syslog1.txt syslog2.txt syslog3.txt Quote Link to comment
Frank1940 Posted December 20, 2016 Share Posted December 20, 2016 You might want to read through this thread. http://lime-technology.com/forum/index.php?topic=40683 I don't know if this is part of your problem or not but it is something that you might want to explore. Quote Link to comment
Fireball3 Posted December 23, 2016 Share Posted December 23, 2016 Only thing I can point to is that 6.2.x broke my SATA controller, so reverting back to 6.1.8 might be my best option. Any feedback based on Frank1940's link? Have you checked for a new controller firmware? Although it's kind of voodoo, i read posts that solved issues by simply re-flashing the controller. Quote Link to comment
tgggd86 Posted December 23, 2016 Author Share Posted December 23, 2016 Flashed new firmware on my card (Supermicro AOC-SASLP-MV8) and even updated my mobo bios to the latest version and still the same problems. I disabled virtualization and also modded the config file as mentioned in the recommended link and still same issues. Leads me to believe my card is bad and/or the Marvell controller just no longer works with unRaid. Hopefully a new Controller card fixes the problem. Quote Link to comment
tgggd86 Posted December 31, 2016 Author Share Posted December 31, 2016 So I replaced the Supermicro card with the Marvell controller with an Dell H310 and flashed it to the P20 firmware. When I started my array and had parity sync begin, I encountered the exact same errors. Of note, while I was waiting for the Dell H310, I was running preclear on 2x new 8TB WD red drives. The one attached to the Mobo SATA ports ran around 100MB/s, but the one attached through the Supermicro/Marvell controller crawled at around 2MB/s. Also I ran a SMART extended test on all drives with the new controller and they all passed with no issues. My next thought is to replace the SATA breakout cables and see if that fixes the problem. Anyone have any thoughts? Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.