jpalm101 Posted April 1, 2016 Share Posted April 1, 2016 So I have been running Unraid for about 2 weeks and things have not been the smoothest. When it's working I absolutely love it, but I feel like I am fixing it more than enjoying it so I thought it was time to seek some help. I currently have 4 drives (1 - 3TB Toshiba Parity, 2 - 3TB RED, and an old WD250GB cache drive that I pretty much just put in to play with that feature. So it all started a couple weeks ago when I had a brand new WD Red that failed, or at least unraid told me it failed. I have no logs from this because I was really new to unraid and when I had a problem my first thought was to reboot. I still had back ups of everything so I just decided to take the drive out, reformat it with another computer, put it back in and start over. It failed again, so I took it back out and ran some tests with WD's Lifeguard and it failed, so off to WD it went. So I started over, fresh usb stick and everything. Discovered the pre-clear plug in and ran it through twice on the WD Reds, including one brand new one from the WD RMA. Yesterday disc 2 (the brand new one that ran through 2 pre-clears) alerted me that it was in an error state with a red x and the device was disabled. I got the logs and then read that if you didn't believe it was a bad drive you could stop the array, un-assign the drive, start the array, stop it again, and then re-assign and let it rebuild. I did that and it rebuilt and I was back to all green by morning. I was happy, but not confident. Tonight, I got an alert that Disk 1 was in in error state and was disabled. - So I got the logs again. I'm at a loss I don't know why this keeps happening. The first one who knows what happened and I don't have any logs, but i'm hoping someone can help me diagnose the last two errors. I am running version 6.1.9 M/B: ASUSTeK COMPUTER INC. - P8H61-I R2.0 CPU: Intel® Core™ i5-2400 CPU @ 3.10GHz Corsair CX750M Power Supply tower-diagnostics-20160330-2259.zip tower-diagnostics-20160331-2043.zip Quote Link to comment
Russ Uno Posted April 1, 2016 Share Posted April 1, 2016 I'll have to let someone else look at the logs but I see a lot of write errors on disk1 even though the smart test look ok as far as no reallocated or pending sectors. Have you checked the SATA and Power cables? They can move and come loose when messing around inside. Quote Link to comment
trurl Posted April 1, 2016 Share Posted April 1, 2016 Disk1, the WD drive with serial ending 7VASD has dropped offline and doesn't give a SMART in your diagnostics. More often than not, the problem is not with the drive but with the connections or cables to the drive, either SATA or power. Also wouldn't hurt to do memtest if you haven't. Quote Link to comment
jpalm101 Posted April 1, 2016 Author Share Posted April 1, 2016 I haven't replaced the cables, but I did triple check them and everything is solid. I rebooted and I ran a short SMART test on Disc1 and it passed. I don't think i've done a memory test so I suppose that can be my next step. Quote Link to comment
jpalm101 Posted April 1, 2016 Author Share Posted April 1, 2016 So I attempted to rebuild the disk, just like I did disk2 when it failed on Thursday, but it error'd out during rebuild. So this morning I took the drive out and popped it in a toaster and used Western Digitals Lifeguard tool to first run a SMART test (Pass) and then to do a quick write 0's to it (full takes over a day) and put it back in the array and started the rebuild again. We will see how it goes. Am I the only one that is dealing with drive disconnects like this? Could it be hardware or something in my setup? I'm at a loss. Even if the rebuild works, I feel like its only a matter of time before it will do it again. I have 0 confidence in it at the moment. Quote Link to comment
JorgeB Posted April 1, 2016 Share Posted April 1, 2016 You have UDMA_CRC errors on both drives that were on line, both WD disk were offline, check this attribute on both: 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 207 This is usually a sign of a bad sata cable. Quote Link to comment
itimpi Posted April 1, 2016 Share Posted April 1, 2016 I had some WD 6TB Reds that would sometimes drop offline apparently at random. I disabled spin down for the problem drives and since then they have been rock-solid. It might be worth trying this with the drives causing you problems? Quote Link to comment
jpalm101 Posted April 1, 2016 Author Share Posted April 1, 2016 Thanks, those are both things that I can try easily enough. I will pick up new SATA cables for all of the drives and disable the spin down for the WD red drives to start. So far so good on the rebuild of disk1. I'm at work and I'm just waiting for that dreaded email to come saying it failed Quote Link to comment
jpalm101 Posted April 2, 2016 Author Share Posted April 2, 2016 Alright...I'm about to give up. I replaced the SATA cables, disabled the spin down for the WD drives and rebuild disk1 which completed successfully. So I thought I would run a parity check which ran overnight and when I woke up this morning, disk1 was booted out again. So that brings my total of disabled disks to 4, on 3 separate hard drives since I started using Unraid about 3 weeks ago. I don't know what else to try, I am out of ideas. I have attached the latest logs, i'm going to need another storage solution to house all my logs at this rate. tower-diagnostics-20160402-0800.zip Quote Link to comment
METDeath Posted April 2, 2016 Share Posted April 2, 2016 Are these all plugged into the motherboard or a controller card? If it's the motherboard, perhaps a firmware has been released for it? Quote Link to comment
jpalm101 Posted April 2, 2016 Author Share Posted April 2, 2016 They are all plugged into the motherboards SATA 3 ports. I am making sure the few files I put on in the last week are backed up and then I will look at updating the Firmware and triple checking all of my settings in the BIOS. I had also put an old graphics card in just to play with the passthrough feature with VM's so I'm going to remove that as well just to take everything out that isn't critical to what I need to do which is store media files. Quote Link to comment
Frank1940 Posted April 2, 2016 Share Posted April 2, 2016 You didn't tie the SATA data cables together to make the inside of your case look 'pretty', did you? The problem with doing this is crosstalk between the cables unless the cable are the shielded kind. (98%+ of all SATA cables are NOT shielded!) A second thought is that one of the hard drive makers (I think it was WD but I can't remember with certainty) changed the design of the data connector socket on their drives so that the locking type of SATA connectors did not make reliable contact with the connector on the drive. These drives should only be used with the nonlocking type of cables! Quote Link to comment
METDeath Posted April 2, 2016 Share Posted April 2, 2016 Third thought is maybe try a different power supply? I've heard bad things about the CX line of power supplies, at least from the gaming side of things. I've had good luck with single rail ThermalTake PSUs and when OCZ was briefly in the PSU game. Quote Link to comment
Frank1940 Posted April 3, 2016 Share Posted April 3, 2016 A second thought is that one of the hard drive makers (I think it was WD but I can't remember with certainty) changed the design of the data connector socket on their drives so that the locking type of SATA connectors did not make reliable contact with the connector on the drive. These drives should only be used with the nonlocking type of cables! See here: http://forums.anandtech.com/showthread.php?t=2404493 and http://www.tomshardware.com/answers/id-2784233/western-digital-2tb-black-drive-sata-cable-click-drive.html For two discussions of this issue. Quote Link to comment
Squid Posted April 3, 2016 Share Posted April 3, 2016 With pictures: http://lime-technology.com/forum/index.php?topic=36065.msg335979#msg335979 and a link to WD's announcement of this "feature" Quote Link to comment
jpalm101 Posted April 4, 2016 Author Share Posted April 4, 2016 Thank you for all of the suggestions. I went ahead and tried a bunch of things, which I probably shouldn't have because if I do end up fixing the problem I might never know what the issue was. So Saturday I rebuild the drive again, which was successful. Ran a parity check, which was also successful. I removed the old graphics card and got rid of the cache drive that I wasn't really using anyway, just had it there to play with that feature. I removed the two VM's I had set up to play with (windows 10 and server 2012 r2) I also moved disk1 to a different SATA port, the one the cache drive was in, just to maybe see if it was a bad port, even though it has been two separate ports at this point that have given me issues. I checked my cables, and I don't have them bundled together and they are separated fairly well so I hope there isn't any interference issues. I do have some of the locking cables that were mentioned, however those weren't on the WD at the time of the failures, although they are now because I wanted to replace the cables that were attached to the drives when they failed. Long story short, I made some changes and I've been up for a day. I guess I'll just take it a day at a time and see how it goes. I would really love it to be stable because it is so nice when its working. I suppose if it fails again, I will research how to test a Power Supply unit. Let's hope it doesn't come to that. Quote Link to comment
jpalm101 Posted April 7, 2016 Author Share Posted April 7, 2016 So...I have been good for a few days, but this morning I had both drives go into an error state, so bye bye data. I had 95% of it so that isn't my concern right now. I just really would like to figure out what is causing this. I have... Replaced Drives Replaced SATA Cables (locking kind and non-locking kind) Removed VM's Removed Unnecessary Hardware (Old graphics card) Removed Unnecessary dockers With all of this, I keep getting errors. I'm about to throw in the towel... I also can't attach the logs because they are too large, is there something I can pick out of the logs to attach that would be helpful to get around the size restriction? I'm just not having luck with anything. Quote Link to comment
DoeBoye Posted April 7, 2016 Share Posted April 7, 2016 Do you have access to another PSU you can test with? I would look there next... Quote Link to comment
jpalm101 Posted April 7, 2016 Author Share Posted April 7, 2016 It was brand new, I suppose it has been under 30 days and I could exchange it for another model. My next question would be. How do I proceed from here? If I replace the PS, I now only have my Parity disk as the only health disk and two disk with failures. Do I take them all out and reformat them and put them back in as "new" disks? Quote Link to comment
DoeBoye Posted April 7, 2016 Share Posted April 7, 2016 It was brand new, I suppose it has been under 30 days and I could exchange it for another model. It would be worth doing, even just to rule it out as the problem source. Definitely stay away from the CX series. Higher end Corsairs, EVGA and Seasonic are all fairly reliable brands that come to mind. Not an exclusive list, but generally well-regarded. Keep in mind, they all have their better and worse models. Poke around a bit before pulling the trigger to make sure they are well regarded (Or post back here when you find the one you want to buy). Also, as a previous user mentioned, single rail is definitely a must. So I started over, fresh usb stick and everything. Discovered the pre-clear plug in and ran it through twice on the WD Reds, including one brand new one from the WD RMA. Once you have the new PSU, I would honestly run pre-clear again on your drives. There have been cases (including personally), where 1 or 2 passes showed no issues, but third pass came up with a bad drive... And again, it goes without saying, make sure those cables are all well seated (power and data) . My next question would be. How do I proceed from here? If I replace the PS, I now only have my Parity disk as the only health disk and two disk with failures. Do I take them all out and reformat them and put them back in as "new" disks? You say "2 disks with failures". Just to be clear, 1 data disk is an original, that passed pre-clear, but has since red-balled. 1 disk is an rma from WD that has also passed a preclear cycle and since red-balled. Right? Or are we talking drives that have actually failed with SMART Errors? I wouldn't take them out and reformat them. After the pre-clears, assign them to the array, and the array will take care of everything else. No need to remove and format them externally. Good luck! Problems like these are the worst. Uncertainty is never fun . It's worth staying the course though. When you finally have a stable system, you're going to love unRaid. Especially all the fancy new VM and docker features! Quote Link to comment
JorgeB Posted April 7, 2016 Share Posted April 7, 2016 You can try a new config: Take a screenshot of current assignments. Go to tools and click new config. Reassign all disks. Start array to begin parity sync. If the server still has issues parity sync may not complete but this way you don't have to format disks and copy data again. Quote Link to comment
DoeBoye Posted April 7, 2016 Share Posted April 7, 2016 Also, have you pre-cleared the parity disk to make sure it is not problematic? Might be worth doing if you are starting from scratch again... Quote Link to comment
jpalm101 Posted April 7, 2016 Author Share Posted April 7, 2016 Thanks for the info. I definitely don't want to just exchange the PS, so if I am able I think I will try to switch to a different brand. http://www.newegg.com/Product/Product.aspx?Item=9SIA1N83V01630&cm_re=evga_power_supply-_-17-438-017-_-Product This is one that I was looking at. Think it will fit the bill? To answer the questions above - I have yet to see any SMART errors, Unraid just disables them. In total I have had this happen 5 different times with 3 different disks in a 3 week time period. It has been just the WD RED drives that have been the issue. One of them has been working great in a Buffalo NAS for about 5 months before it was moved to unraid, the other one was brand new and was RMA'd after the first failure. I then got the replacement that also failed. And then as of last night they both failed . So I can't imagine it is the drives themselves. Quote Link to comment
itimpi Posted April 7, 2016 Share Posted April 7, 2016 Thanks for the info. I definitely don't want to just exchange the PS, so if I am able I think I will try to switch to a different brand. http://www.newegg.com/Product/Product.aspx?Item=9SIA1N83V01630&cm_re=evga_power_supply-_-17-438-017-_-Product This is one that I was looking at. Think it will fit the bill? To answer the questions above - I have yet to see any SMART errors, Unraid just disables them. In total I have had this happen 5 different times with 3 different disks in a 3 week time period. It has been just the WD RED drives that have been the issue. One of them has been working great in a Buffalo NAS for about 5 months before it was moved to unraid, the other one was brand new and was RMA'd after the first failure. I then got the replacement that also failed. And then as of last night they both failed . So I can't imagine it is the drives themselves. note that unRAID disables a drive as soon as a write to it fails. The drive (and its data up to that point) are typically intact as it is rare for a drive to physically fail. By far the most common are external factors such as cabling, badly seated controller cards, power etc. Quote Link to comment
jpalm101 Posted April 7, 2016 Author Share Posted April 7, 2016 So if I replace the Power Supply, I should be able to take a picture of my current setup, go to tools do a new config and then re-assign everything the way that I had it? Will this work even thought I have 2 out of 3 drives that have been disabled due to write errors? Currently the only one of the three with a green status is the parity drive. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.