In a tough spot - bad drive, now super.dat failed


Recommended Posts

On Friday morning, I was alerted by email to a bad drive in my unRAID tower. I wasn't home for a few days, so no big deal.

 

This evening, we're home finally, and I managed to blow a circuit in the house cooking dinner, and unRAID switched over to UPS power, but apparently my battery has issues, as it died after about 2 minutes on battery.

 

I hit the breaker, got power up and running again and started up unRAID.

 

To my horror, I was presented with zero assigned devices on my array, aside from cache in the GUI.

 

Syslog indicates the super.dat file couldn't be read:

Mar 13 18:46:28 Tower kernel: md: unRAID driver 2.5.3 installed

Mar 13 18:46:28 Tower kernel: md: could not read superblock from /boot/config/super.dat

Mar 13 18:46:28 Tower kernel: md: initializing superblock

 

I see all 9 drives in the drive selection area, but I know that one is bad. Fortunately I know which one as I took a screen shot during the initial failure.

 

I've attached a zipped copy of the syslog, and here's the screen shots I have of the failed drive, and the current status:

 

iFCY3i2.png

 

adDs36i.png

 

 

What do I do now? Can I run chkdsk or fsck on the flash key to fix the super.dat problem or has it already been overwritten?

 

Can I set all drives and get the array going enough so I can rebuild the failed drive without losing data?

 

Thanks for any guidance! I've not been hit with a double fail in unRAID before :(

syslog.txt.zip

Link to comment

Update: Got a chance to shut it down, pull the USB drive, did a chkdsk from my Win7 box. No problems found. Booted back up to the same problem as before.

 

So at this point, with my super.dat to empty, I can think of a few possibilities:

 

• Set all devices in the slots they should be in, including parity, using a "trust my parity" type procedure. Once that's booted up, shut it down, swap the bad drive with a new one (which I have) and start the rebuild. This assumes parity isn't bogus if the super.dat is emptied.

 

• Pull the bad drive, clone it/ddrescue the data to a temporary drive - pulling any damaged files from backups. Put a new drive in the server, clear parity and rebuild without the new drive. Then add the new drive and expand the filesystem on to it. Copy the rescued data to the array.

 

Any thoughts about these?

Link to comment

Are you sure disk7 is really bad? Next time attach full diagnostics, lots more info including SMART for all disks.

 

Assuming disk7 is bad and you have a spare you can do this:

 

-reassign all disks, assign new disk in slot7, double check parity and disk7 are in the correct slots

-very important, before starting array check “parity is already valid”

-start array

-if you're using a new/empty disk for slot7 it will appear as unmountable, it's ok

-stop array

-unassign disk7 (select “no device”)

-start array

-stop array

-reassign disk7

-start array to begin rebuild

 

Link to comment

Thanks for the guidance! I did exactly that, and the rebuild went OK.

 

I think I'll add backing up the flash key to my regimen as well now, just in case something similar happens in the future.

 

Just be careful if you decided to do that I've seen a couple of users suffer data loss from mistakenly using a older super.dat file that misidentifies the current parity disk. I think in those cases the users created the backup replaced there parity with a bigger parity, added the old parity as part of the array and then when they rolled back to the older super.dat it assigned the old parity (that is party of the array now) as parity and caused severe data loss. I'm not saying don't do this, but I am saying there are risks that you should be aware of.

Link to comment

Thanks for the guidance! I did exactly that, and the rebuild went OK.

 

I think I'll add backing up the flash key to my regimen as well now, just in case something similar happens in the future.

 

Just be careful if you decided to do that I've seen a couple of users suffer data loss from mistakenly using a older super.dat file that misidentifies the current parity disk. I think in those cases the users created the backup replaced there parity with a bigger parity, added the old parity as part of the array and then when they rolled back to the older super.dat it assigned the old parity (that is party of the array now) as parity and caused severe data loss. I'm not saying don't do this, but I am saying there are risks that you should be aware of.

 

I'm seconding what is said above. My way of dealing with that is I take a screen shot of my discs and keep it in the same folder with my USB backup in a Dated folder. Something like 2.12.2016-8Discs or whatever makes sense to you.

 

When I swap out discs or anything big I always take a screen shot of my drives as a habit and keep it on my windows machine. Just be careful.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.