Drive error... unable to view smart data, replace?


Recommended Posts

My bad I should have mentioned that as well. For now I have stopped the array and will restart it. However is it normal for a drive to suddenly just have connection errors?

Judging by the posts on this forum, connection issues are much more common than drive failures. People fiddle around inside their case and accidentally get something loose, people try to bundle their cables and wind up putting some stress somewhere that gets it loose or not square on the connector, etc.
Link to comment

My bad I should have mentioned that as well. For now I have stopped the array and will restart it. However is it normal for a drive to suddenly just have connection errors?

 

This is actually quite common. It is so incredibly easy to skew a sata cable when installing a new drive our troubleshooting an issue. Users always think it is the drive, but in reality bad drives are only the cause a small portion of the time.

 

Locking cables are very highly recommended, although not all controllers support them. I also recommend drive cages. Once the server is thoroughly burned in, drive cages make it possible to do drive swaps without risk to the cabling.

 

Here is an all too familiar use case. A drive appears to fail. User opens the case, removes the drive, replacing it with new. Begins drive rebuild. Part way through another drive drops offline (due to knocking something lose while replacing the other drive). Now we have two drives down and no way to rebuild (unless user has dual parity). All this is caused by cabling issues.

Link to comment

I have rebooted and i think the drive connection should be ok now. I have not yet rebuild the drive, going to do it soon. Mount without disk and remount it. I have attached the rebooted logs as requested.

 

And yea I agree too, most of my problems were due to cable issues it seems, and they very easily go out of connection for some reason. Interesting, I'm also using the drive cages, but I think it's the tight spaces and (extreme) bending of the cheaply made sata cables that's causing them to have connection problems. Or that it's slowly sliding out... lol.

tower-diagnostics-20170119-2144.zip

Link to comment

Extreme bending sounds bad. The server is vibrating with all the drives in there, which can lead to cables working themselves lose if under any tension. Old sata cables tend to get loose from repeated plugging / unplugging.

 

It's not so easy to have a server with lots of drives with good cabling throughout, and with drive cages all in place. But that's what you want!  :)

Link to comment

SMART looks fine, there are some CRC errors so SATA cable should be replaced or at least monitor the attribute for a few weeks:

 

Device Model:     WDC WD30EFRX-68EUZN0
Serial Number:    WD-WMC4N0934650

199 UDMA_CRC_Error_Count    0x0032   200   198   000    Old_age   Always       -       36

 

This SATA cable should also be replaced:

 

Device Model:     WDC WD30EFRX-68EUZN0
Serial Number:    WD-WCC4N0224734

199 UDMA_CRC_Error_Count    0x0032   200   001   000    Old_age   Always       -       1331

 

There are other disks with a few CRC errors, could be old errors or not, they should be monitored and if the attribute increases by 2 or more means there's still a problem.

 

Since the disk looks fine try rebuild to the same disk.

Link to comment

SMART looks fine, there are some CRC errors so SATA cable should be replaced or at least monitor the attribute for a few weeks:

 

Device Model:     WDC WD30EFRX-68EUZN0
Serial Number:    WD-WMC4N0934650

199 UDMA_CRC_Error_Count    0x0032   200   198   000    Old_age   Always       -       36

 

This SATA cable should also be replaced:

 

Device Model:     WDC WD30EFRX-68EUZN0
Serial Number:    WD-WCC4N0224734

199 UDMA_CRC_Error_Count    0x0032   200   001   000    Old_age   Always       -       1331

 

There are other disks with a few CRC errors, could be old errors or not, they should be monitored and if the attribute increases by 2 or more means there's still a problem.

 

Since the disk looks fine try rebuild to the same disk.

 

Thank you very much! I will certainly take note.. in fact the number has increased to 39 now, I think I will replace the cable when the rebuild is done. As for the 1331 harddrive, i think it happened previously and I had to rebuild it as well, but I didn't think much of it. Will definitely record it down.

It seems like whenever there is some cable error, the udma crc error count will increase? If so, there should be nothing to worry about right? Assuming that we reseat the connection and the error no longer appears on the next boot?

Link to comment

Buy some locking and some not locking (unless you are sure which ones you need). I'd go with monoprice as their stuff send to be consistent qualify, and cheap. Get plenty of spares. Whenever I have a questionable cable, I pop in a brand new one from a bunch I bought several years ago. I know they are good quality from personal experience.

 

Even controllers/MBs that don't take locking cables can sometimes be persuaded with some effort. The drive side always take a locking cable, and drive cages vary. But if I can get a locking cable on it, I'll use it, even if it is not designed for one and is tight. YMMV but once you get one of those cables attached with high friction, it will never lose connection. Just don't break anything!

 

My last server build I used all locking cables and can't remember ever replacing one (or even opening the server because everything is done with drive cages). My backup server is not as good and does have connection issues occasionally.

 

Not sure if newer drive cages take SAS cables. They seem more secure than SATA. Couldn't find one when I bought, but they may exist today. A 4in3 would be a natural for a SAS cable. A 5in3 would need a SAS + one sata.

Link to comment

Buy some locking and some not locking (unless you are sure which ones you need). I'd go with monoprice as their stuff send to be consistent qualify, and cheap. Get plenty of spares. Whenever I have a questionable cable, I pop in a brand new one from a bunch I bought several years ago. I know they are good quality from personal experience.

 

Even controllers/MBs that don't take locking cables can sometimes be persuaded with some effort. The drive side always take a locking cable, and drive cages vary. But if I can get a locking cable on it, I'll use it, even if it is not designed for one and is tight. YMMV but once you get one of those cables attached with high friction, it will never lose connection. Just don't break anything!

 

My last server build I used all locking cables and can't remember ever replacing one (or even opening the server because everything is done with drive cages). My backup server is not as good and does have connection issues occasionally.

 

Not sure if newer drive cages take SAS cables. They seem more secure than SATA. Couldn't find one when I bought, but they may exist today. A 4in3 would be a natural for a SAS cable. A 5in3 would need a SAS + one sata.

When inspecting existing cables and sourcing new ones, keep this WD article in mind.

http://support.wdc.com/knowledgebase/answer.aspx?ID=10477

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.