[SOLVED] Failed disk, some questions


Recommended Posts

Two months ago I had a drive fail, unraid set it as DISK_DSBL. I was able to add it back to the array, I think by unassigning, rebooting, and assigning it again (don't quote me on this, i don't remember exactly). But everything was working fine for 2 months after that.

 

The same disk failed the other night. I ordered another HDD and I decided to preclear it before I replaced the drive*. I powered down, unplugged the failed disk, inspected it, put it back in. I added the new drive to an empty slot so that I can preclear it.

 

When I powered the server back on, I was showing disk2 (the failed drive) as a blue dot. I reassigned the same disk to the same slot and it said it would try to rebuild (i had no other option to bring the array back up other than to let it do the rebuild).

 

Now it's showing a ton of parity errors, see below. This is a NOCORRECT btw.

 

Parity-Check in progress. Cancel will stop the Parity-Check.

(not selected) Correct any Parity-Check errors by writing the Parity disk with corrected parity.

Total size: 2 TB

Current position: 896.76 GB (45%)

Estimated speed: 24.83 MB/sec

Estimated finish: 741 minutes

Sync errors detected: 177131

 

[*]When I first powered on I was going to leave the failed disk as failed while preclearing the other drive (with lots of praying!) but I saw that I might be able to revive the failed disk, if only for a moment so I wouldn't have any drives failed or NP incase another drive were to die in the interim. It's probably going slow because I'm also preclearing the new HDD right now at the same time. For these errors, whenever it's done do I let it correct (or if it doesn't give me option, re-run parity check and set it to correct)? .

[*]*Also, I read in another thread that when replacing a drive I didn't need to preclear but it is recommended. That's why I'm doing it. It's almost done but should I stop it and just replace the drive and be done with it?

 

The disk failed twice already, I think it's done and I don't mind tossing it. Or maybe I'll run it through 3 preclear cycles and see how that goes, but I'm hesitant to add it back to the array being that it has already failed twice.

Link to comment

Two months ago I had a drive fail, unraid set it as DISK_DSBL. I was able to add it back to the array, ...

 

Did you check the logs to ascertain the cause of the failure.  Did you check the smart report and address any problems identified?  If not, I think that you were living rather dangerously!

 

...When I powered the server back on, I was showing disk2 (the failed drive) as a blue dot. I reassigned the same disk to the same slot and it said it would try to rebuild (i had no other option to bring the array back up other than to let it do the rebuild).

 

Actually, you should have had another option (which would leave the array without parity protection, but might well be safer than continuing to use a known failing drive) - the array will start and run perfectly well, albeit slowly, with one drive missing - the contents of the missing drive will be simulated from the contents of parity and all the other data drives.

 

Now it's showing a ton of parity errors, see below. This is a NOCORRECT btw.

 

Just as well it's non-correcting!

 

....I saw that I might be able to revive the failed disk, if only for a moment so I wouldn't have any drives failed or NP incase another drive were to die in the interim.

 

You would be very ill-advised to rebuild one data drive when another drive is already known to have errors!  I'm sure that you can do the preclear with the array stopped, so no other drives ought to be active, which would give them a high degree of protection.

 

If you are really desperate to have the array up and running, I think, in your situation, I would risk replacing the failed drive with the new one, without waiting for the preclear.  A new drive has a better chance of working than a known faulty drive!  The rebuild will write to ever sector on the new drive, so checking the syslog  and smart reports after it completes would give some degree of confidence.

 

The disk failed twice already, I think it's done and I don't mind tossing it. Or maybe I'll run it through 3 preclear cycles and see how that goes, but I'm hesitant to add it back to the array being that it has already failed twice.

 

Definitely throw it away ... unless you value your data at less than the price of a new drive!

Link to comment

PeterB thank you for those suggestions. The preclear is very close to completion, so I might as well wait at this point. (Yes, i suppose i do live very dangerously!)

 

If i'm understanding unraid correctly, doesn't it use the parity drive to repair/rebuild the drive? I'm afraid if the parity has errors, when I replace the drive, those errors may still be present? If that's the case, do I let it rebuild? Do I let it correct the errors after replacing the drive? Do I make it correct the errors BEFORE replacing the drive?

Link to comment

PeterB thank you for those suggestions. The preclear is very close to completion, so I might as well wait at this point. (Yes, i suppose i do live very dangerously!)

 

Okay, fair enough.

 

I'm afraid if the parity has errors, when I replace the drive, those errors may still be present? If that's the case, do I let it rebuild? Do I let it correct the errors? Do I make it correct the errors BEFORE replacing the drive?

Whether you allow the correction of parity errors depends on whether you trust the contents of all of your data drives to be good.  If the parity errors are the result of a bad data drive, then running a correcting parity check will destroy your parity and, therefore, your chances of getting a good restore onto the new drive.  If you have no reason to be suspicious of the content of the parity drive (like if the system suffered a sudden powerdown while data was being written to the array), I think, in your situation of having a known failing data drive, I would trust the content of the parity drive and NOT, under such circumstances, run a correcting parity check.

 

From the fact that you asked the question, I'm guessing that you don't fully understand how parity works and what it means when the system reports that there are parity errors.

Link to comment
If i'm understanding unraid correctly, doesn't it use the parity drive to repair/rebuild the drive?
It uses all the drives, not just the parity drive. You can only rebuild one drive at a time, so if you have a second drive failing, the rebuild will not work correctly. If you ran a correcting parity check, it tries to read all the drives and calculate parity. If one of the drives is failing, you will be writing bad data to the parity drive.
Link to comment

You are indeed living dangerously, replacing a drive after it's already failed TWICE !!

 

You should absolutely replace the drive with a new one, and let the system rebuild it.

 

Hopefully your existing parity is good => when is the last time you ran a parity check with zero sync errors?

 

Link to comment

If you had a good parity check with zero sync errors on your last monthly check, then you should be able to simply let the new drive rebuild and all should be well.

 

What probably happened is that you let the system rebuild the failed disk -- but the rebuild wasn't successful (i.e. the drive has truly failed) ... so when it checks parity it finds a LOT of "errors" where the data wasn't successfully restored.    So if you replace the disk now with a good NEW disk it should rebuild successfully -- and you should then get a good parity check.

 

After the new drive has been rebuilt, run a parity check -- it SHOULD be perfect.

 

I'd toss the failed drive  :)

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.