CyberMew Posted January 18, 2017 Share Posted January 18, 2017 tower-diagnostics-20170119-0043.zip Quote Link to comment
JorgeB Posted January 18, 2017 Share Posted January 18, 2017 Disk dropped offline, reboot or power down, check cables and star back up. Quote Link to comment
trurl Posted January 18, 2017 Share Posted January 18, 2017 And you didn't even mention the disk had a red X next to it. You will have to rebuild it, even if you can get it connected again. Quote Link to comment
CyberMew Posted January 19, 2017 Author Share Posted January 19, 2017 My bad I should have mentioned that as well. For now I have stopped the array and will restart it. However is it normal for a drive to suddenly just have connection errors? Quote Link to comment
JorgeB Posted January 19, 2017 Share Posted January 19, 2017 Post new diags after rebooting, it can be a bad disk or bad cable/power/controller. Logs point more to a cable/power issue but the SMART info after rebooting can help diagnose. Quote Link to comment
trurl Posted January 19, 2017 Share Posted January 19, 2017 My bad I should have mentioned that as well. For now I have stopped the array and will restart it. However is it normal for a drive to suddenly just have connection errors? Judging by the posts on this forum, connection issues are much more common than drive failures. People fiddle around inside their case and accidentally get something loose, people try to bundle their cables and wind up putting some stress somewhere that gets it loose or not square on the connector, etc. Quote Link to comment
SSD Posted January 19, 2017 Share Posted January 19, 2017 My bad I should have mentioned that as well. For now I have stopped the array and will restart it. However is it normal for a drive to suddenly just have connection errors? This is actually quite common. It is so incredibly easy to skew a sata cable when installing a new drive our troubleshooting an issue. Users always think it is the drive, but in reality bad drives are only the cause a small portion of the time. Locking cables are very highly recommended, although not all controllers support them. I also recommend drive cages. Once the server is thoroughly burned in, drive cages make it possible to do drive swaps without risk to the cabling. Here is an all too familiar use case. A drive appears to fail. User opens the case, removes the drive, replacing it with new. Begins drive rebuild. Part way through another drive drops offline (due to knocking something lose while replacing the other drive). Now we have two drives down and no way to rebuild (unless user has dual parity). All this is caused by cabling issues. Quote Link to comment
CyberMew Posted January 19, 2017 Author Share Posted January 19, 2017 I have rebooted and i think the drive connection should be ok now. I have not yet rebuild the drive, going to do it soon. Mount without disk and remount it. I have attached the rebooted logs as requested. And yea I agree too, most of my problems were due to cable issues it seems, and they very easily go out of connection for some reason. Interesting, I'm also using the drive cages, but I think it's the tight spaces and (extreme) bending of the cheaply made sata cables that's causing them to have connection problems. Or that it's slowly sliding out... lol. tower-diagnostics-20170119-2144.zip Quote Link to comment
SSD Posted January 19, 2017 Share Posted January 19, 2017 Extreme bending sounds bad. The server is vibrating with all the drives in there, which can lead to cables working themselves lose if under any tension. Old sata cables tend to get loose from repeated plugging / unplugging. It's not so easy to have a server with lots of drives with good cabling throughout, and with drive cages all in place. But that's what you want! Quote Link to comment
JorgeB Posted January 19, 2017 Share Posted January 19, 2017 SMART is disable for that disk, you need to type: smartctl -s on /dev/sde Then post new diags or just that SMART report. Quote Link to comment
CyberMew Posted January 19, 2017 Author Share Posted January 19, 2017 Well extreme might be a little exaggerated but yea I agree we all want good cabling with drive cages Thank you, I have reenabled as per your command and uploaded the logs again tower-diagnostics-20170119-2242.zip Quote Link to comment
SSD Posted January 19, 2017 Share Posted January 19, 2017 New thread: Replaced drive with bigger drive, then lost another drive Any guesses? Quote Link to comment
CyberMew Posted January 19, 2017 Author Share Posted January 19, 2017 I hate it when that happens Quote Link to comment
JorgeB Posted January 19, 2017 Share Posted January 19, 2017 SMART looks fine, there are some CRC errors so SATA cable should be replaced or at least monitor the attribute for a few weeks: Device Model: WDC WD30EFRX-68EUZN0 Serial Number: WD-WMC4N0934650 199 UDMA_CRC_Error_Count 0x0032 200 198 000 Old_age Always - 36 This SATA cable should also be replaced: Device Model: WDC WD30EFRX-68EUZN0 Serial Number: WD-WCC4N0224734 199 UDMA_CRC_Error_Count 0x0032 200 001 000 Old_age Always - 1331 There are other disks with a few CRC errors, could be old errors or not, they should be monitored and if the attribute increases by 2 or more means there's still a problem. Since the disk looks fine try rebuild to the same disk. Quote Link to comment
CyberMew Posted January 19, 2017 Author Share Posted January 19, 2017 SMART looks fine, there are some CRC errors so SATA cable should be replaced or at least monitor the attribute for a few weeks: Device Model: WDC WD30EFRX-68EUZN0 Serial Number: WD-WMC4N0934650 199 UDMA_CRC_Error_Count 0x0032 200 198 000 Old_age Always - 36 This SATA cable should also be replaced: Device Model: WDC WD30EFRX-68EUZN0 Serial Number: WD-WCC4N0224734 199 UDMA_CRC_Error_Count 0x0032 200 001 000 Old_age Always - 1331 There are other disks with a few CRC errors, could be old errors or not, they should be monitored and if the attribute increases by 2 or more means there's still a problem. Since the disk looks fine try rebuild to the same disk. Thank you very much! I will certainly take note.. in fact the number has increased to 39 now, I think I will replace the cable when the rebuild is done. As for the 1331 harddrive, i think it happened previously and I had to rebuild it as well, but I didn't think much of it. Will definitely record it down. It seems like whenever there is some cable error, the udma crc error count will increase? If so, there should be nothing to worry about right? Assuming that we reseat the connection and the error no longer appears on the next boot? Quote Link to comment
JorgeB Posted January 19, 2017 Share Posted January 19, 2017 Yes they don't reset, and once in while a single error is normal, but any increase of 2 or more it's not. Quote Link to comment
CyberMew Posted January 19, 2017 Author Share Posted January 19, 2017 Looks like I need to change cables to be on the safe side..any recommendations from Amazon? Quote Link to comment
RobJ Posted January 19, 2017 Share Posted January 19, 2017 Looks like I need to change cables to be on the safe side..any recommendations from Amazon? Monoprice, on Amazon or direct Quote Link to comment
SSD Posted January 19, 2017 Share Posted January 19, 2017 Buy some locking and some not locking (unless you are sure which ones you need). I'd go with monoprice as their stuff send to be consistent qualify, and cheap. Get plenty of spares. Whenever I have a questionable cable, I pop in a brand new one from a bunch I bought several years ago. I know they are good quality from personal experience. Even controllers/MBs that don't take locking cables can sometimes be persuaded with some effort. The drive side always take a locking cable, and drive cages vary. But if I can get a locking cable on it, I'll use it, even if it is not designed for one and is tight. YMMV but once you get one of those cables attached with high friction, it will never lose connection. Just don't break anything! My last server build I used all locking cables and can't remember ever replacing one (or even opening the server because everything is done with drive cages). My backup server is not as good and does have connection issues occasionally. Not sure if newer drive cages take SAS cables. They seem more secure than SATA. Couldn't find one when I bought, but they may exist today. A 4in3 would be a natural for a SAS cable. A 5in3 would need a SAS + one sata. Quote Link to comment
CyberMew Posted January 20, 2017 Author Share Posted January 20, 2017 Thanks guys! Will order some Monoprice cables direct, going to get some 18inch SATA 6Gbps Cable w/Locking Latch - Red (Product ID: 8784) and 18inch SATA 6Gbps Cable w/Locking Latch (90 Degree to 180 Degree) - Blue (Product ID: 8783) just in case Quote Link to comment
JonathanM Posted January 20, 2017 Share Posted January 20, 2017 Buy some locking and some not locking (unless you are sure which ones you need). I'd go with monoprice as their stuff send to be consistent qualify, and cheap. Get plenty of spares. Whenever I have a questionable cable, I pop in a brand new one from a bunch I bought several years ago. I know they are good quality from personal experience. Even controllers/MBs that don't take locking cables can sometimes be persuaded with some effort. The drive side always take a locking cable, and drive cages vary. But if I can get a locking cable on it, I'll use it, even if it is not designed for one and is tight. YMMV but once you get one of those cables attached with high friction, it will never lose connection. Just don't break anything! My last server build I used all locking cables and can't remember ever replacing one (or even opening the server because everything is done with drive cages). My backup server is not as good and does have connection issues occasionally. Not sure if newer drive cages take SAS cables. They seem more secure than SATA. Couldn't find one when I bought, but they may exist today. A 4in3 would be a natural for a SAS cable. A 5in3 would need a SAS + one sata. When inspecting existing cables and sourcing new ones, keep this WD article in mind. http://support.wdc.com/knowledgebase/answer.aspx?ID=10477 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.