New to Unraid - Red Balled Hard Drive


Recommended Posts

I am completely new to unraid so please take it easy on me!  I recently got my unraid server up and running.  The server was accidentally restarted a week or so back with properly stopping the array.  I finally got around to running a parity check a day or two ago.  At first, the parity check was cruising along but was going to take the entire evening to complete so I headed to bed. 

 

The next morning I checked to see how the parity check was doing and the read speeds were EXTREMELY slow.  I decided to check the log and there were many read errors on various sectors of the one hard drive.  I decided to let the parity check continue to run but it eventually stopped and put the troubled disk into a faulty state (red balled).  With that said, I am still able to browse to the unraid server and see the faulty disk share.  I am able to browse the disk and see all the files.  All the files I tested were able to open fine.

 

I did some digging around on here I came across a post about taking the array offline and running both the short and long smart tests.  I have done so and attached the log files.  I am not sure if the drive is actually failed or not?  If I was able to swap out the drive would I be able to rebuild it even though the last parity check did not complete?  I am not sure what my next step should be.  Any and all advice is greatly appreciated.

 

Chris

 

 

smart-short.txt

smart-long.txt

Link to comment

These reports don't look good. Did you preclear the drive before trying to use it?

 

You say you can read the files on the disk. If the drive is red-balled unRAID is not reading from it but is instead emulating the drive by reading all the other drives with parity to reconstruct the data, the same thing it would do to get the data required to rebuild it.

 

Post a complete syslog and a screenshot so we can get a better idea what you are dealing with.

 

Do you have backups of this data?

 

Link to comment

These reports don't look good. Did you preclear the drive before trying to use it?

 

You say you can read the files on the disk. If the drive is red-balled unRAID is not reading from it but is instead emulating the drive by reading all the other drives with parity to reconstruct the data, the same thing it would do to get the data required to rebuild it.

 

Post a complete syslog and a screenshot so we can get a better idea what you are dealing with.

 

Do you have backups of this data?

 

Screenshot and Sys Log attached.  All the drives were added to unraid and formatted using unraid.  I am not sure if this process includes preclearing them or not.  What exactly does preclear do?

syslog.zip

screenshot.png.4a7ae86e7a666e1ed1e074778efc72dd.png

Link to comment

Looks like that syslog, the drive is up, but probably encountered a write error that disabled the drive.  Since it is disabled, it is emulating the drive, so you are probably best to replace the drive as soon as you can.  After you recover to a new disk, you could run a preclear to see if the drive checks out, but with the smart attributes on that drive, I wouldn't use it for any important data if it passed...

Link to comment

Preclear is here.

 

So you have 9 untested drives in your system. I have probably tested about a dozen drives with preclear since I began using unRAID nearly 4 years ago, and I've been lucky enough to say that all of them passed. But not everyone is so lucky.

 

unRAID is not currently using that drive, and you shouldn't try to get it to use it until the pending sectors are cleared.

 

Do you have backups of the files on that disk? That will determine the next thing you should do.

 

Link to comment

Preclear is here.

 

So you have 9 untested drives in your system. I have probably tested about a dozen drives with preclear since I began using unRAID nearly 4 years ago, and I've been lucky enough to say that all of them passed. But not everyone is so lucky.

 

unRAID is not currently using that drive, and you shouldn't try to get it to use it until the pending sectors are cleared.

 

Do you have backups of the files on that disk? That will determine the next thing you should do.

 

I don't believe that I precleared the drives.  Everything on the disk is simply stuff that I have ripped from my collection.  I could rip the files again but I would prefer not to do so if I don't have to. 

 

I am assuming that my next course of action is to go out and get a replacement drive.  I should then preclear the new drive, swap it out for the failing drive perform a rebuild, correct?

Link to comment

I don't believe that I precleared the drives...

You would know if you had precleared them. A preclear on my system typically takes about 10 hours per terabyte.
I am assuming that my next course of action is to go out and get a replacement drive.  I should then preclear the new drive, swap it out for the failing drive perform a rebuild, correct?

Yes. Be sure to ask questions as you go so we can help you through the rebuild process.
Link to comment

As I stated earlier, this drive had about 2 TB worth of data on it.  I have more than 2 TB worth of free space spread out over my other drives that are in the array.  Is it possible to do a "rebuild" and simply have the parity drive rebuild the array by writing the data that was on failed drive across the other drives still in the array since there is plenty of free space available?

Link to comment

As I stated earlier, this drive had about 2 TB worth of data on it.  I have more than 2 TB worth of free space spread out over my other drives that are in the array.  Is it possible to do a "rebuild" and simply have the parity drive rebuild the array by writing the data that was on failed drive across the other drives still in the array since there is plenty of free space available?

Parity does not work like that!  All it is capable of doing is providing a sector-by-sector reconstruction of a failed drive and has no understanding of what data is on that drive.  If you want to move data to other drives then I am afraid that it is a manual process.

Link to comment

There is positiives and negatives with doing the backup first.  It would probalbly take as much time migrating the data to other drives as replacing the failing drive and rebuilding.  You could be protecting the data incase another failure occurs before the drive is replaced, but you would be doing alot of extra IO by moving the data, rebuilding the drive, and then potentially moving the data back.

 

I would think your best path would be to verify the integrity of the other drives via smart reports to make srue they aren't having any issues, and then replace/rebuild the failed drive.  Then you can pre-clear the suspect drive to see how it operates...

Link to comment

I would think your best path would be to verify the integrity of the other drives via smart reports to make srue they aren't having any issues, and then replace/rebuild the failed drive.  Then you can pre-clear the suspect drive to see how it operates...

 

Would it not be best to replace / rebuild the failed drive first and then perform the integrity of the other drives?  My thought process is that the integrity check will put more strain on the working drives.  If one of those happens to fail then there is no way to rebuild the first failed drive.

Link to comment

I was meaning on checking the integrity by just checking the smart reports on all the drives to make sure they don't appear to be failing.

 

Sorry to be such a pain in the a**....by "just checking the smart reports" are you referring to running the "smartctl -t short" command against the drive?  The main dashboard shows a green thumbs-up for all my drives (including the red balled one)  for SMART so I am assuming that you want me looking elsewhere to see the SMART reports.

Link to comment

I was meaning on checking the integrity by just checking the smart reports on all the drives to make sure they don't appear to be failing.

 

Sorry to be such a pain in the a**....by "just checking the smart reports" are you referring to running the "smartctl -t short" command against the drive?  The main dashboard shows a green thumbs-up for all my drives (including the red balled one)  for SMART so I am assuming that you want me looking elsewhere to see the SMART reports.

No - the basic SMART attributes without running a test!  These simply query the drive without doing any I/O.  If you leave off the -t option from the smartctl command you should be shown this.

 

You can get the same information in the GUI by clicking on a disk in the Main tab; selecting the Health tab on the disk details; and then selecting the Disk Attributes option.

Link to comment

No worries...  You can either click on the little thumbs and then check the disk attributes or run a "smartctl --all /dev/sdX" on a command line.  Check for things like "Reallocated Sector Ct",  "Current Pending Sector", or any of the error counts.  If they all look to be 0 you should be good to go forward.

Link to comment

Looks like disk2 has some pending sectors, and disk4 has a end-to-end error and reported uncorrect error.  But it is hard to tell when that happened, I have a few disks that had errors years ago, and I just keep any eye on them and they haven't had an issue since. 

 

You might want to run a smart test(short and long) on those 2 just to make sure they appear ok.

Link to comment

Disk 6 is the one that is red balled and in a failed state.

Yes - I am not surprised.

 

Pending sectors happen when the drive is not sure a sector has been read successfully.  It does not mean that the sector cannot necessarily be read, but that its contents are suspect although in practise they are often reported as a read failure. This is important in unRAID as if another drive fails you need all other drives (and parity) to be read successfully for the rebuild to be perfect.  Pending sectors are cleared if the next write to that sector works OK, and if it fails they should be converted to a reallocated sector from the 'spare' area on the disk.  This is one of the reasons you see frequent suggestions that a suspect disk should be put through a pre-clear cycle as if the pending sector value goes back to zero and the reallocated sectors value is stable then the drive is probably OK for use.

 

If either of these values gets large, or the number of reallocated sectors keeps going up then a drive is probably heading towards failure and should not be used with unRAID.

Link to comment

So I went out and bought a 4 TB Seagate drive and began preclearing it last night.  It is currently on Step 2 and seems to be stuck at 62%.  I also noticed that the hard drive light on the caddy within my rack mount case was not actively flashing either.  I have linked to the sys.log using my DropBox account.

 

I thought that maybe the disk spun down, so I went into the admin panel and selected to spin up all disks.  As soon as I did that, the unraid console started outputting /dev/sdk: no file or directory.  FYI, I launched the preclear from the unraid console.

 

Any help would be appreciated.

 

https://www.dropbox.com/s/435lfqcz379sd1v/syslog.zip?dl=0

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.