tential

Large Disk Failure Help

209 posts in this topic Last Reply

Recommended Posts

I suffered a large disk failure last year.  I kicked my server by accident while cleaning.  I thought I had suffered a MASSIVE data loss, and simply turned off my server, cried, and tried to forget it.

It's been a year, and I'm trying to figure out which drives are safe and which ones need to be trashed.

I opened my unraid server, and only 4 drives were shown missing.  The rest all had glowing smart reports. 

 

What steps can I take to get myself running again?

 

I imagine I need to test the drives and some other things, but I wanted to do it all right.

 

Any suggestions on how to get myself running again?  I did some googling and saw that I need to do something with the new Config to get myself runnig, but I imagine there is more I should do. 

 

What are the necessary steps to thoroughly get myself running again?

 

I have a server rack in an enclosed safe cabinet this time around, so hopefully I won't make the same mistake!

 

Edit: There was no parity drive.  Not that it would stop a 4 disk failure, and not that I shouldn't have one, this time around I will use 2, but well, I didn't use a parity drive.

Edited by tential

Share this post


Link to post

First, unless you legit kicked it downstairs while it was running, you probably knocked connections loose instead of completely crashing the heads. I'd start by individually evaluating the drives in a recovery environment, that is to say connecting only a single drive, and booting a recovery CD that won't try to automatically write to the disk. If the disk comes up readable, I'd try to copy any data you deem important, then after that is done, run a long smart test on the drive, and get another smart report.

 

If by chance all the drives are readable (except for the parity drive of course) and pass smart, then it's just a matter of putting them back in place. As long as NOTHING is written to the drives, then we can recover as many truly bad drives as you have good parity disks.

 

Once you have inventoried the drives one by one, then come back and we can help you set up a plan to either reuse or replace or whatever is needed to get your unraid back in shape.

 

Share this post


Link to post
17 minutes ago, jonathanm said:

First, unless you legit kicked it downstairs while it was running, you probably knocked connections loose instead of completely crashing the heads. I'd start by individually evaluating the drives in a recovery environment, that is to say connecting only a single drive, and booting a recovery CD that won't try to automatically write to the disk. If the disk comes up readable, I'd try to copy any data you deem important, then after that is done, run a long smart test on the drive, and get another smart report.

 

If by chance all the drives are readable (except for the parity drive of course) and pass smart, then it's just a matter of putting them back in place. As long as NOTHING is written to the drives, then we can recover as many truly bad drives as you have good parity disks.

 

Once you have inventoried the drives one by one, then come back and we can help you set up a plan to either reuse or replace or whatever is needed to get your unraid back in shape.

 

 

I kicked it pretty hard I imagine.  It only moved a couple of inches.

It was still running for a couple of days, but then things got wonky and I couldn't access my files well anymore.  Then I couldn't start the array as drives were missing, everything is connected though, I checked the connections. 

I don't have parity drives, so I'm out of luck on that front.

 

Is there a way to inventory the drives within UNRAID by connecting them one at a time?

If not, I'll use my windows PC, I just don't have drive bays or anything (stripped the PC out, and tossed the drive bays when I moved last weekend as I just didn't have space for clutter). so I'll have to attach it one bare drive at a time.

Edited by tential

Share this post


Link to post
4 hours ago, tential said:

Is there a way to inventory the drives within UNRAID by connecting them one at a time?

Yes, but what I was trying to accomplish was getting the drives isolated out of the unraid environment, to eliminate any other problems besides the drives. Using your windows PC is a good alternative, unplug the windows boot drive, connect the drive you want to evaluate, and boot the recovery CD.

 

You haven't given any sort of description of your hardware, so I am left to guess at what would knock 4 drives offline simultaneously.

Share this post


Link to post

My server is a Rosewill Chassis (15 bays)  (8 bays full (9th bay had an HDD but no sata cable attached, I must have forgot to setup that drive)

z87 mobo

4 GB Ram

500w PSU

 

I kicked/tripped over my computer, while it was running, while running no parity drives, thats why the 4 drives knocked off right?

Should that not have happened?

 

I hadn't realized you linked this site "http://www.system-rescue-cd.org/"

For some reason I thought it was a forum ad.  I'll have to work through those drives 1 by 1.  Thanks for the unplug the bootdrive advice too.  Would have been annoying to constantly have to specify boot from USB And hope I don't get into the windows environment by accident.

 

Edited by tential

Share this post


Link to post
24 minutes ago, tential said:

I kicked/tripped over my computer, while it was running, while running no parity drives, thats why the 4 drives knocked off right?

Should that not have happened?

Since it was running, yeah, it's possible you killed all 4 drives, but it's also possible something else is going on with connections or the motherboard.

 

It's best to deal with the drives one at a time in a different system so you know for sure what's good and what's bad. Also, with only a single drive spun up in an open case, any nasty mechanical clicking, screeching or scraping noises will be very obvious, instead of trying to differentiate what's happening with all the drives at once.

 

The beauty of unraid in this scenario is that you only lost the files on the drives that are completely bad, and even then if the files are priceless you can send the individual drive out for recovery at a MUCH lower cost than if you needed to recover a standard RAID array.

Share this post


Link to post
1 minute ago, jonathanm said:

Since it was running, yeah, it's possible you killed all 4 drives, but it's also possible something else is going on with connections or the motherboard.

 

It's best to deal with the drives one at a time in a different system so you know for sure what's good and what's bad. Also, with only a single drive spun up in an open case, any nasty mechanical clicking, screeching or scraping noises will be very obvious, instead of trying to differentiate what's happening with all the drives at once.

 

The beauty of unraid in this scenario is that you only lost the files on the drives that are completely bad, and even then if the files are priceless you can send the individual drive out for recovery at a MUCH lower cost than if you needed to recover a standard RAID array.

 

Oh yes, thats why I like UNRAID.  Even running my setup like a complete idiot, I should still be ok.  The files weren't priceless, hence why I was running it so stupidly. 

 

I don't have drive bays in my case, so each drive will have to sit bare on my desk.

 

I was dreading this process because I had thought I had destroyed ALL of my harddrives.  I had 9 HDDs in there, and was so upset, I bought 10 HDDs to completely restart my build.  It was only today when I booted into unraid to start working on my server again after a long hiatus that I was surprised to see that I hadn't killed as many drives as I previously thought.  If I can get that data off those drives maybe, I'll be REALLY lucky.   But ya, I don't mind, I'm just happy to almost be up and running again.  My custom cabinet is almost done for my server rack, my shelf for my server for the rack just came today, once everything is put together this weekend, I doubt I'll ever have to open up my cabinet to look at my rack again. 

 

I gotta get a USB tomorrow and start the HDD checks.  I'm going to have a lot of fun tomorrow, thanks for the help, I'll be updating this with my progress.

Share this post


Link to post

Have you just tried to reseat memory, any addon boards, and all SATA/power connectors (and for the SATA connectors reseat both at the motherboard and at the drive)?

 

Giving the computer a real knock may have unseated cables or boards.

Share this post


Link to post
9 hours ago, pwm said:

Have you just tried to reseat memory, any addon boards, and all SATA/power connectors (and for the SATA connectors reseat both at the motherboard and at the drive)?

 

Giving the computer a real knock may have unseated cables or boards.

 

I'm pretty sure the drives are dead.  The server is usually being written too, and I was low on space, so it would be writing to the drives trying to spread out the little space I had left.  Most likely, the kick just unseated them, and I should have made sure they were all secure right afterward to prevent any rattling.

We'll see though, again, no big deal.

 

So when I'm checking these drives, how do I go about restarting my unraid build and adding my data back?  If the drive is good how do I add it back, vs if the drive is bad obviously nothing I can do about that.

Share this post


Link to post

It's possible to hurt the drives mechanically.

 

But it's seldom you also break the electronics in a way that you can't communicate with them.

 

If all cables/cards are reseated, then you would most likely be able to have the drives found during boot. And in that case, you should be able to get SMART statistics from them informing if the drives detects something wrong or not.

Share this post


Link to post

Are you planning on using parity this time around?

 

You can do a "new config" in tools, and it will erase all memorized drive positions. Then as you add confirmed good drives to slots, they will be available on array start. As long as you don't have parity, you can continue to do this one drive at a time, setting a new config for each addition. Once you have valid parity in place, adding a drive to an unoccupied slot will zero the drive to maintain parity, unless you set a new config which will then recalculate parity using the data already on the drive.

 

I've always had parity maintained, so I'm unclear whether you can add drives to a non-protected array without doing a new config, but I wouldn't experiment with drives that have valuable data.

Share this post


Link to post
2 minutes ago, jonathanm said:

so I'm unclear whether you can add drives to a non-protected array without doing a new config

You can, just can't remove them without doing a new config.

Share this post


Link to post

ss

2 hours ago, jonathanm said:

Are you planning on using parity this time around?

 

You can do a "new config" in tools, and it will erase all memorized drive positions. Then as you add confirmed good drives to slots, they will be available on array start. As long as you don't have parity, you can continue to do this one drive at a time, setting a new config for each addition. Once you have valid parity in place, adding a drive to an unoccupied slot will zero the drive to maintain parity, unless you set a new config which will then recalculate parity using the data already on the drive.

 

I've always had parity maintained, so I'm unclear whether you can add drives to a non-protected array without doing a new config, but I wouldn't experiment with drives that have valuable data.

Yes, I'm planning on using 2 Parity drives.  I learned my lesson lol.

 

It was my recollection that you can't add a drive with data on it?  Am I wrong in that? I  thought I had to transfer the data over to a newly formatted drive? 

If I can add a drive with data on it to an array.... I've been doing this wrong for so long omg -.-

 

I already have all the drives necessary to fill my server this time around so once this puppy is in place, it's not moving anywhere.

Edited by tential

Share this post


Link to post
11 minutes ago, tential said:

It was my recollection that you can't add a drive with data on it?

You can if it was a drive previously used on unRAID after doing a new config, or if there's no parity, just by assigning it to a data slot.

Share this post


Link to post

Ok, so I can start a new config, add 2 new parity drives to get it started,  then add my in new HDDs as I check them. 

 

What test do I run with this systemrescuecd to ensure the HDDs are ok and are in good condition still?

Share this post


Link to post

I would first do a new config and add the data drives only, look at the SMART info, run at least the short SMART test on all disks, if you like after this grab and post the diagnostics, and if all looks OK start the array, you can then try and copy any irreplaceable data first, do a read check on all disks or add the parity disks and begin a parity sync.

Share this post


Link to post

Should I still do a read check on the disks outside of the unraid environment?  How do I perform that exactly?

Edited by tential

Share this post


Link to post

You can do it in unRAID, without parity drives the check parity function turns into a read check.

Share this post


Link to post
7 hours ago, tential said:

Ok, so I can start a new config, add 2 new parity drives to get it started,  then add my in new HDDs as I check them. 

If you set up parity, then the drives you add WILL be erased. You can only add drives without erasing them if you don't have parity.

 

Add the parity drives LAST, after all your data drives are confirmed and in place.

Share this post


Link to post

Disk 4.zipDisk 3.zipDisk 2.zipDisk 1.zipDisk 10.zipDisk 9.zipDisk 8.zipDisk 7.zipDisk 6.zipDisk 5.zip

 

So I connected each drive to sata 1 and each one turned on.  So yes Jonathan... you're right, my noob self didn't completely destroy any drive.

 

I have attached the smart reports I'm not sure what's going on at all now.

 

Let me know if these are still ok, SMART says they are, that would be insane though if I got out of this with no data loss....

 

Share this post


Link to post

It would be a lot easier for us all if you would just go to Tools - Diagnostics and post the complete diagnostics zip. Then we will only have to download one file that will have all the SMART for each disk plus a lot more info.

Share this post


Link to post
6 minutes ago, trurl said:

It would be a lot easier for us all if you would just go to Tools - Diagnostics and post the complete diagnostics zip. Then we will only have to download one file that will have all the SMART for each disk plus a lot more info.

 

Will that work when I only have one drive in at a time?

I'll do it right now.

tower-diagnostics-20180115-0353.zip

Edited by tential

Share this post


Link to post
1 hour ago, tential said:

Will that work when I only have one drive in at a time?

No.

 

Now that you know each drive spins up on its own, CAREFULLY trace and unplug and replug each and every connection, both power and SATA. Also unplug and replug any PCIe cards.

 

After you've pretty much reassembled your server from scratch, then start it up and see how many drives are missing, if any.

 

Get diagnostics after unraid boots. I'm on the fence about whether or not you should attempt to start the array before obtaining diagnostics, maybe @johnnie.black has an opinion on whether starting the array would be beneficial at this stage of diagnosis / recovery. I'm operating under the "first do no harm" principle, and don't want anything to write to the drives yet until we know more about their health and the overall health of the server.

Share this post


Link to post

SMART for all disks looks fine, and they all passed the shot SMART test, so there shouldn't be any catastrophic damage, I would just start the array, maybe try to copy any irreplaceable data before doing a read check or parity sync.

Share this post


Link to post
6 hours ago, johnnie.black said:

[...] before doing a read check or parity sync.

If it was my server, I would start with a read-only parity scan just to test-read all drives and see what happens.

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now


Copyright © 2005-2018 Lime Technology, Inc.
unRAID® is a registered trademark of Lime Technology, Inc.