Jump to content

[SOLVED]Rough Start to Unraid (Disabled Drives)


Recommended Posts

So I have been running Unraid for about 2 weeks and things have not been the smoothest. When it's working I absolutely love it, but I feel like I am fixing it more than enjoying it so I thought it was time to seek some help.

 

I currently have 4 drives (1 - 3TB Toshiba Parity, 2 - 3TB RED, and an old WD250GB cache drive that I pretty much just put in to play with that feature.

 

So it all started a couple weeks ago when I had a brand new WD Red that failed, or at least unraid told me it failed. I have no logs from this because I was really new to unraid and when I had a problem my first thought was to reboot. I still had back ups of everything so I just decided to take the drive out, reformat it with another computer, put it back in and start over. It failed again, so I took it back out and ran some tests with WD's Lifeguard and it failed, so off to WD it went.

 

So I started over, fresh usb stick and everything. Discovered the pre-clear plug in and ran it through twice on the WD Reds, including one brand new one from the WD RMA.

 

Yesterday disc 2 (the brand new one that ran through 2 pre-clears) alerted me that it was in an error state with a red x and the device was disabled. I got the logs and then read that if you didn't believe it was a bad drive you could stop the array, un-assign the drive, start the array, stop it again, and then re-assign and let it rebuild. I did that and it rebuilt and I was back to all green by morning. I was happy, but not confident.

 

Tonight, I got an alert that Disk 1 was in in error state and was disabled.  >:( - So I got the logs again. I'm at a loss I don't know why this keeps happening.

 

The first one who knows what happened and I don't have any logs, but i'm hoping someone can help me diagnose the last two errors.

 

I am running version 6.1.9

 

M/B: ASUSTeK COMPUTER INC. - P8H61-I R2.0

CPU: Intel® Core™ i5-2400 CPU @ 3.10GHz

Corsair CX750M Power Supply

 

 

tower-diagnostics-20160330-2259.zip

tower-diagnostics-20160331-2043.zip

Link to comment

I'll have to let someone else look at the logs but I see a lot of write errors on disk1 even though the smart test look ok as far as no reallocated or pending sectors.

Have you checked the SATA and Power cables? They can move and come loose when messing around inside.

Link to comment

Disk1, the WD drive with serial ending 7VASD has dropped offline and doesn't give a SMART in your diagnostics.

 

More often than not, the problem is not with the drive but with the connections or cables to the drive, either SATA or power.

 

Also wouldn't hurt to do memtest if you haven't.

 

Link to comment

So I attempted to rebuild the disk, just like I did disk2 when it failed on Thursday, but it error'd out during rebuild. So this morning I took the drive out and popped it in a toaster and used Western Digitals Lifeguard tool to first run a SMART test (Pass) and then to do a quick write 0's to it (full takes over a day) and put it back in the array and started the rebuild again. We will see how it goes.

 

Am I the only one that is dealing with drive disconnects like this? Could it be hardware or something in my setup? I'm at a loss.

 

Even if the rebuild works, I feel like its only a matter of time before it will do it again. I have 0 confidence in it at the moment.

Link to comment

Thanks, those are both things that I can try easily enough. I will pick up new SATA cables for all of the drives and disable the spin down for the WD red drives to start. So far so good on the rebuild of disk1. I'm at work and I'm just waiting for that dreaded email to come saying it failed  :(

Link to comment

Alright...I'm about to give up.

 

I replaced the SATA cables, disabled the spin down for the WD drives and rebuild disk1 which completed successfully. So I thought I would run a parity check which ran overnight and when I woke up this morning, disk1 was booted out again.

 

So that brings my total of disabled disks to 4, on 3 separate hard drives since I started using Unraid about 3 weeks ago. I don't know what else to try, I am out of ideas. I have attached the latest logs, i'm going to need another storage solution to house all my logs at this rate.

tower-diagnostics-20160402-0800.zip

Link to comment

They are all plugged into the motherboards SATA 3 ports. I am making sure the few files I put on in the last week are backed up and then I will look at updating the Firmware and triple checking all of my settings in the BIOS.

 

I had also put an old graphics card in just to play with the passthrough feature with VM's so I'm going to remove that as well just to take everything out that isn't critical to what I need to do which is store media files.

 

 

Link to comment

You didn't tie the SATA data cables together to make the inside of your case look 'pretty', did you?  The problem with doing this is crosstalk between the cables unless the cable are the shielded kind.  (98%+ of all SATA cables are NOT shielded!)

 

A second thought is that one of the hard drive makers (I think it was WD but I can't remember with certainty) changed the design of the data connector socket on their drives so that the locking type of SATA connectors did not make reliable contact with the connector on the drive.  These drives should only be used with the nonlocking type of cables! 

Link to comment

A second thought is that one of the hard drive makers (I think it was WD but I can't remember with certainty) changed the design of the data connector socket on their drives so that the locking type of SATA connectors did not make reliable contact with the connector on the drive.  These drives should only be used with the nonlocking type of cables!

 

See here: 

 

        http://forums.anandtech.com/showthread.php?t=2404493

and

        http://www.tomshardware.com/answers/id-2784233/western-digital-2tb-black-drive-sata-cable-click-drive.html

 

For two discussions of this issue.

Link to comment

Thank you for all of the suggestions.

 

I went ahead and tried a bunch of things, which I probably shouldn't have because if I do end up fixing the problem I might never know what the issue was.

 

So Saturday I rebuild the drive again, which was successful. Ran a parity check, which was also successful. I removed the old graphics card and got rid of the cache drive that I wasn't really using anyway, just had it there to play with that feature. I removed the two VM's I had set up to play with (windows 10 and server 2012 r2) I also moved disk1 to a different SATA port, the one the cache drive was in, just to maybe see if it was a bad port, even though it has been two separate ports at this point that have given me issues.

 

I checked my cables, and I don't have them bundled together and they are separated fairly well so I hope there isn't any interference issues. I do have some of the locking cables that were mentioned, however those weren't on the WD at the time of the failures, although they are now because I wanted to replace the cables that were attached to the drives when they failed.

 

Long story short, I made some changes and I've been up for a day. I guess I'll just take it a day at a time and see how it goes. I would really love it to be stable because it is so nice when its working.

 

I suppose if it fails again, I will research how to test a Power Supply unit. Let's hope it doesn't come to that.

 

Link to comment

So...I have been good for a few days, but this morning I had both drives go into an error state, so bye bye data. I had 95% of it so that isn't my concern right now.

 

I just really would like to figure out what is causing this.

 

I have...

  • Replaced Drives
  • Replaced SATA Cables (locking kind and non-locking kind)
  • Removed VM's
  • Removed Unnecessary Hardware (Old graphics card)
  • Removed Unnecessary dockers

 

With all of this, I keep getting errors. I'm about to throw in the towel...

 

I also can't attach the logs because they are too large, is there something I can pick out of the logs to attach that would be helpful to get around the size restriction? I'm just not having luck with anything.

Link to comment

It was brand new, I suppose it has been under 30 days and I could exchange it for another model.

 

My next question would be. How do I proceed from here? If I replace the PS, I now only have my Parity disk as the only health disk and two disk with failures. Do I take them all out and reformat them and put them back in as "new" disks?

Link to comment

It was brand new, I suppose it has been under 30 days and I could exchange it for another model.

 

It would be worth doing, even just to rule it out as the problem source. Definitely stay away from the CX series. Higher end Corsairs,  EVGA and Seasonic are all fairly reliable brands that come to mind. Not an exclusive list, but generally well-regarded. Keep in mind, they all have their better and worse models. Poke around a bit before pulling the trigger to make sure they are well regarded (Or post back here when you find the one you want to buy). Also, as a previous user mentioned, single rail is definitely a must.

 

So I started over, fresh usb stick and everything. Discovered the pre-clear plug in and ran it through twice on the WD Reds, including one brand new one from the WD RMA.

 

Once you have the new PSU, I would honestly run pre-clear again on your drives. There have been cases (including personally), where 1 or 2 passes showed no issues, but third pass came up with a bad drive... And again, it goes without saying, make sure those cables are all well seated (power and data) :).

 

My next question would be. How do I proceed from here? If I replace the PS, I now only have my Parity disk as the only health disk and two disk with failures. Do I take them all out and reformat them and put them back in as "new" disks?

 

You say "2 disks with failures". Just to be clear, 1 data disk is an original, that passed pre-clear, but has since red-balled. 1 disk is an rma from WD that has also passed a preclear cycle and since red-balled. Right? Or are we talking drives that have actually failed with SMART Errors?

 

I wouldn't take them out and reformat them. After the pre-clears, assign them to the array, and the array will take care of everything else. No need to remove and format them externally.

 

Good luck! Problems like these are the worst. Uncertainty is never fun :(. It's worth staying the course though. When you finally have a stable system, you're going to love unRaid. Especially all the fancy new VM and docker features!

Link to comment

You can try a new config:

 

Take a screenshot of current assignments.

Go to tools and click new config.

Reassign all disks.

Start array to begin parity sync.

 

If the server still has issues parity sync may not complete but this way you don't have to format disks and copy data again.

Link to comment

Thanks for the info.

 

I definitely don't want to just exchange the PS, so if I am able I think I will try to switch to a different brand.

 

http://www.newegg.com/Product/Product.aspx?Item=9SIA1N83V01630&cm_re=evga_power_supply-_-17-438-017-_-Product

 

This is one that I was looking at. Think it will fit the bill?

 

To answer the questions above - I have yet to see any SMART errors, Unraid just disables them. In total I have had this happen 5 different times with 3 different disks in a 3 week time period. It has been just the WD RED drives that have been the issue.

 

One of them has been working great in a Buffalo NAS for about 5 months before it was moved to unraid, the other one was brand new and was RMA'd after the first failure. I then got the replacement that also failed.

 

And then as of last night they both failed .

 

So I can't imagine it is the drives themselves.

Link to comment

Thanks for the info.

 

I definitely don't want to just exchange the PS, so if I am able I think I will try to switch to a different brand.

 

http://www.newegg.com/Product/Product.aspx?Item=9SIA1N83V01630&cm_re=evga_power_supply-_-17-438-017-_-Product

 

This is one that I was looking at. Think it will fit the bill?

 

To answer the questions above - I have yet to see any SMART errors, Unraid just disables them. In total I have had this happen 5 different times with 3 different disks in a 3 week time period. It has been just the WD RED drives that have been the issue.

 

One of them has been working great in a Buffalo NAS for about 5 months before it was moved to unraid, the other one was brand new and was RMA'd after the first failure. I then got the replacement that also failed.

 

And then as of last night they both failed .

 

So I can't imagine it is the drives themselves.

note that unRAID disables a drive as soon as a write to it fails.    The drive (and its data up to that point) are typically intact as it is rare for a drive to physically fail.  By far the most common are external factors such as cabling, badly seated controller cards, power etc.
Link to comment

So if I replace the Power Supply, I should be able to take a picture of my current setup, go to tools do a new config and then re-assign everything the way that I had it?

 

Will this work even thought I have 2 out of 3 drives that have been disabled due to write errors? Currently the only one of the three with a green status is the parity drive.

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...