Jump to content

Extremely slow rebuild of drive - please help???


Recommended Posts

I had a power failure and a UPS shotdown through the web interface of my unRAID server seems to have caused a problem.  Suddenly I had 1 disk show up with the red cross (disabled).  I tried un-assigning and then re-assigning the disk in order to force a rebuild of the disk, but after a while the disk together with some other disks would stop responding.  Finally decided to replace the initially failed disk with a new one, but the rebuild runs EXTREMELY slow.  Took many hours to do 40MB and then stated that the 2TB (now replaced with a  4TB new disk) would take 100+ days at less than 1MB/s!  So I decided that something else was wrong.  I checked the syslog and it had the following entries every few seconds:

 

Mar 10 21:19:38 Tower kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)

Mar 10 21:19:38 Tower kernel: ata1.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded

Mar 10 21:19:38 Tower kernel: ata1.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out

Mar 10 21:19:38 Tower kernel: ata1.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out

Mar 10 21:19:38 Tower kernel: ata1.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded

Mar 10 21:19:38 Tower kernel: ata1.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out

Mar 10 21:19:38 Tower kernel: ata1.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out

Mar 10 21:19:38 Tower kernel: ata1.00: configured for UDMA/33

Mar 10 21:19:38 Tower kernel: ata1: EH complete

Mar 10 21:19:39 Tower kernel: ata1.00: exception Emask 0x50 SAct 0x0 SErr 0x4890800 action 0xe frozen

Mar 10 21:19:39 Tower kernel: ata1.00: irq_stat 0x0c400040, interface fatal error, connection status changed

Mar 10 21:19:39 Tower kernel: ata1: SError: { HostInt PHYRdyChg 10B8B LinkSeq DevExch }

Mar 10 21:19:39 Tower kernel: ata1.00: failed command: READ DMA EXT

Mar 10 21:19:39 Tower kernel: ata1.00: cmd 25/00:40:80:50:72/00:05:00:00:00/e0 tag 18 dma 688128 in

Mar 10 21:19:39 Tower kernel:        res 50/00:00:47:02:b6/00:00:00:00:00/e0 Emask 0x50 (ATA bus error)

Mar 10 21:19:39 Tower kernel: ata1.00: status: { DRDY }

Mar 10 21:19:39 Tower kernel: ata1: hard resetting link

Mar 10 21:19:43 Tower kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)

Mar 10 21:19:43 Tower kernel: ata1.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded

Mar 10 21:19:43 Tower kernel: ata1.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out

Mar 10 21:19:43 Tower kernel: ata1.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out

Mar 10 21:19:43 Tower kernel: ata1.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded

Mar 10 21:19:43 Tower kernel: ata1.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out

Mar 10 21:19:43 Tower kernel: ata1.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out

Mar 10 21:19:43 Tower kernel: ata1.00: configured for UDMA/33

Mar 10 21:19:43 Tower kernel: ata1: EH complete

Mar 10 21:19:43 Tower kernel: ata1.00: exception Emask 0x50 SAct 0x0 SErr 0x4890800 action 0xe frozen

Mar 10 21:19:43 Tower kernel: ata1.00: irq_stat 0x0c400040, interface fatal error, connection status changed

Mar 10 21:19:43 Tower kernel: ata1: SError: { HostInt PHYRdyChg 10B8B LinkSeq DevExch }

Mar 10 21:19:43 Tower kernel: ata1.00: failed command: READ DMA EXT

Mar 10 21:19:43 Tower kernel: ata1.00: cmd 25/00:40:00:7f:72/00:05:00:00:00/e0 tag 9 dma 688128 in

Mar 10 21:19:43 Tower kernel:        res 50/00:00:c7:30:b6/00:00:00:00:00/e0 Emask 0x50 (ATA bus error)

Mar 10 21:19:43 Tower kernel: ata1.00: status: { DRDY }

Mar 10 21:19:43 Tower kernel: ata1: hard resetting link

 

Complete syslog available at:

https://www.dropbox.com/s/ouy3kool74r8be8/tower-syslog-20160310-2119.zip?dl=0

 

So there seems to be a problem on the ata1 disk.  First of all, how do I figure out which is the ata1 disk that's causing the problem?  I don't believe it an actual disk problem, as the SMART reports for all disks are fine, but rather a mobo or controller or PSU or cable problem.  I will change the cable and have already bought a new PSU that I'm going to install tonight.  But I'd like to be able to identify the problem "port" that is linked to ata1.

 

Has anybody has similar issues before?  How do you suggest I troubleshoot.

 

I'n VERY worried about losing data, as I have already tried rebuilding disk6 (I have 7 disks including the parity) so disk 6 does not have its original data on any more and needs to be rebuilt.  So I need this rebuild to work before I am protected again and if I have any other failures now I'm out of data...

 

Thanks for the help

Johan

Link to comment

OK thanks.  I have added the new diagnostics now also:

https://www.dropbox.com/s/du3arypn3rejkmg/tower-diagnostics-20160311-1342.zip?dl=0

 

By the way, in the meantime disk 4 has also gone "missing" while the server was just standing there and doing nothing (array is even stopped).  And then after a few minutes disk 4 cam back online again.  So I now have a suspicion that the PSU or the motherboard/controller has some problems?  It's funny that it worked for more than 3 years, and now suddenly this?  But I guess that's the way electronics go...

Link to comment

Hi johnnie.black,

 

thanks for taking the time to try and help me.

 

Yes I think that disk 4 was probably offline when I pulled the diags.  I re-did the diags now while disk4 was online and it does contain a SMART for that disk now also:

 

https://www.dropbox.com/s/c99e651f7yesb8o/tower-diagnostics-20160311-1412.zip?dl=0

 

But from what I can see in the SMART report the disk itself seems fine?

 

By the way how did you figure out that ata1 as per the sysylog is physical disk 4?  That's one of the things I couldn't figure out.  Now that I know that I can also go and swap cables and/or ports on that disk and try and figure out whether it's a cable or the controller.  Does that sort of analysis and troubleshooting sounds about right?

Link to comment

I have replaced the SATA data cable for disk 4 and that seemed to have sorted out disk 4.  Then I got the same issues on disk 1 and I have now also swapped disk 1's SATA cable.  Busy rebuilding and will post the results when done or when errors occur.

 

I have also replaced the PSU with a 850W single rail which should be fine for 7 disks?

 

 

Link to comment

Still the same error on disk 1, so I don't think it's a SATA cable problem as all the cables that I have swapped with new ones still get read errors on those drives.

 

I have now spread the power lines differently between the 7 disks.  Not that I think that would be the problem as I have the 850W single rail PSU and johnnie.black also reckons that is plenty.

 

I have then also swapped the SATA ports on the mobo between disk1 and disk5.  If the SATA ports or the controller are to blame, then my swap of the ports should now show the read error on a different disk.  That's my thinking anyway.  If that happens then as a last resort I will have to get a new mobo, CUP and RAM, as my CPU is still socket LGA1155 and I probably won't get a mobo with that socket anymore.    :(

 

I have replaced the SATA data cable for disk 4 and that seemed to have sorted out disk 4.  Then I got the same issues on disk 1 and I have now also swapped disk 1's SATA cable.  Busy rebuilding and will post the results when done or when errors occur.

 

I have also replaced the PSU with a 850W single rail which should be fine for 7 disks?

Link to comment

I hear you about the disk, but now I get the read error on disk5, so whichever disk I plug into that specific SATA port on the mobo gets the read errors.  Which tells me the mobo/SATA ports are shot?

 

 

Before swapping the board I would replace the disk, healthy SMART does not always equal healthy disk.

Link to comment

Thanks a lot for all your time.  That is exactly what I'm going to do - is buy a card like you suggested.  Since it seems to follow one of the ports, I can then replace that port and hopefully the other ports keep on working.

 

If the issue follows the port and not the disk, then yes, probably a bad sata port, if you have any pcie slots available you could by a cheap 2 port sata card, something like this, they are cheap and work great with Unraid.

Link to comment

Thanks a lot for all your time.  That is exactly what I'm going to do - is buy a card like you suggested.  Since it seems to follow one of the ports, I can then replace that port and hopefully the other ports keep on working.

 

If the issue follows the port and not the disk, then yes, probably a bad sata port, if you have any pcie slots available you could by a cheap 2 port sata card, something like this, they are cheap and work great with Unraid.

Just a check - when you were moving the drives between ports were you also moving the cable?  Thought it is worth confirming that the SATA cable has been eliminated as a possible culprit.

 

Link to comment

I have replaced the cables as well with new ones.  So with different cables and different drives in different ports the only common denominator left is the specific port.

 

Thanks a lot for all your time.  That is exactly what I'm going to do - is buy a card like you suggested.  Since it seems to follow one of the ports, I can then replace that port and hopefully the other ports keep on working.

 

If the issue follows the port and not the disk, then yes, probably a bad sata port, if you have any pcie slots available you could by a cheap 2 port sata card, something like this, they are cheap and work great with Unraid.

Just a check - when you were moving the drives between ports were you also moving the cable?  Thought it is worth confirming that the SATA cable has been eliminated as a possible culprit.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...