Jump to content

Disk disabled / failed?


steve1977

Recommended Posts

I just upgraded and shortly thereafter one disk disabled and now shows "red". The disk is relatively new. It is of course possible that this is just a hardware failure / faulty disk, but I have seen several topics on this forum about wrongly declared disk failures with the more recent Unraid releases.

 

Any chance someone can look into my diagnostic files whether anything indicates whether the disk is really broken? Thanks in advance!!!

tower-diagnostics-20150911-2312.zip

Link to comment

One more related question: I am now copying the content from the "faulty disk" to another disk in the array. Is this a smart thing or am I calling for trouble? If I understand Unraid correctly, I am technically actually not copying from the faulty disk (though it looks to me), but copying from the parity disk (that emulates the faulty disk) to the array. If my understanding is correct, this should be ok.

 

Thoughts?

Link to comment

One more related question: I am now copying the content from the "faulty disk" to another disk in the array. Is this a smart thing or am I calling for trouble? If I understand Unraid correctly, I am technically actually not copying from the faulty disk (though it looks to me), but copying from the parity disk (that emulates the faulty disk) to the array. If my understanding is correct, this should be ok.

 

Thoughts?

unRAID emulates the disk by reading ALL the other disks including parity and from that it is able to calculate the data for the missing disk. If you think about it, it is obvious that the parity disk cannot possibly contain all the data for the failed disk. How would it know which disk was going to fail? See the wiki for a better understanding of how parity actually works. It's pretty simple, and if you get that then a lot of things about unRAID will make sense.

 

So, you are not getting the data from parity, you are actually making unRAID read all of the drives at once. This is what it does when it rebuilds a disk. And that it the usual way to deal with your situation, rebuilding the disk. In fact, even if you manage to copy all the data from the emulated disk onto other disks in the array, you are still going to have to rebuild that disk, or else set a New Config without it and rebuild parity.

Link to comment

Got it, so copying from the "faulty" disk is doable and not worse than rebuilding it? Or is there any destructive as the parity is changing by the copy-activity?

 

Also, any indication from my diagnostic files whether the disk is really faulty or "just" some other issue (as mentioned in some of the other threads)?

Link to comment

Disk is fine.  The SAS card or SAS card driver is not, but I don't know what is wrong.  This is rather coincidental, as this is the second time today I've dealt with this same problem!  However yours is on v6.1.0 and the other is on v5.0.5, hard to see a connection, apart from the mpt2sas driver and the card its managing.  His thread is here, and you should read through it, especially my analysis of what actually happened.  Working from the syslog only at first, I had the wrong idea, but once I saw his screen pic, I had to come up with a different explanation.  Yours appears very similar, and if you look at your Disk 7, you should see that it too has changed drive symbols.  It started as sdd, but now is sds, attached (according to the syslog!) to the 9th SATA port (sd 1:0:8:0) on the card!

 

It didn't lose the drive quite the same way as Marcus.  His appeared very innocent, just a 'synchronization', but almost immediately it said 'removing handle' (which appears to be the way the SAS error handler indicates the drive being dropped).  Yours took much longer before that occurred, but it did occur, and then later it was re-discovered and hooked up to the 9th port and assigned sds as drive symbol.  I have never seen this kind of behavior before, so I have to classify it for now as a bug in the card or mpt2sas driver.

Link to comment

Thanks, we are indeed using the same card (M1015 flashed into IT mode). Do you suspect that the card is faulty and requires replacement? Or the cable to the drive?

 

In the Unraid UI, it still appears to be mounted as "sdd", so not sure about your reference to "sds".

 

I had some issues with the same drive two weeks ago. It shows "I/O errors" when accessing it through the VM and then "disappeared". I didn't really change anything, but this was no longer an issue the next day.

 

Let me do the following. Copy all files to a new drive within the array. Then rebuild the whole array and upgarde to 6.1.2. Then resend diagnostic files.

 

Does this make sense?

Link to comment

Got it, so copying from the "faulty" disk is doable and not worse than rebuilding it?

Well, I would say that the more time you spend not rebuilding the disk the more at risk you are of another failure. Until the disk is rebuilt, your array doesn't have parity protection.
Or is there any destructive as the parity is changing by the copy-activity?

Parity is changed by the copy activity, but that is not destructive. If it didn't keep updating parity, that would be destructive, because parity would be invalid and it would be impossible to rebuild the disk.

 

Copying all the data from an emulated disk to other disks in the array is not the usual course of action, rebuilding the failed disk is. I have seen people backup the data from an emulated disk to another system because they have some reason to not rebuild the disk, but continuing to write to the array with a failed disk is usually not recommended.

 

Link to comment

Thanks for your quick reply. You mentioned in your earlier post that the disk is actually not faulty, but just wrongly seen this way due to a card or driver bug. So, I would not need to change / rebuild the disk. So, I could just do a "new config", but this requires quite a lot of faith that the disk is really 100% ok (as "new config" would wipe the parity). So, I was thinking that copying the data and only thereafter doing the "new config" would be "safer". And I cannot rebuild anyways to an existing disk, can I?

Link to comment

Thanks, we are indeed using the same card (M1015 flashed into IT mode). Do you suspect that the card is faulty and requires replacement? Or the cable to the drive?

Nothing there tells me who's at fault, so could be the firmware on the card (might check for an update) or could be the mpt2sas driver module, or another lower level driver for the card.  It's not the cable, as it's not a communication problem.

 

In the Unraid UI, it still appears to be mounted as "sdd", so not sure about your reference to "sds".

Have you stopped the array yet?  Try that, and see what it says in the drop down for Disk 7.

 

Let me do the following. Copy all files to a new drive within the array. Then rebuild the whole array and upgarde to 6.1.2. Then resend diagnostic files.

You can do that I suppose, it gives you another backup of the files on Disk 7, but the important thing is to rebuild Disk 7 in place.  Normal procedure would be to unassign Disk 7, start and stop the array, then re-assign Disk 7 and start the array, which will start the rebuild of Disk 7.

Link to comment

Ok, I still cannot see any reference to "sds" (in the pull-down, also not after restarting). Also updated to 6.1.2. Attached new diagnostic files.

 

What is the mpt2sas drive module. HW or SW?

 

The copy process is taking forever (probably days), so maybe I just go ahead rebuilding from parity. "Unassign Disk 7, start and stop the array, then re-assign Disk 7" also works with the same disk, right? Even if the rebuild fails, I could still rebuild to another disk, isn't it? How long would you expect the rebuild to take? Shall I shut down the VM during this process?

tower-diagnostics-20150912-1100.zip

Link to comment
  • 4 weeks later...

got a new drive and then the same thing happened. yet again disk 7 (but a new disk).

 

the disk worked for a day or even a few days. then it turned "red" and now I'm again in the need to rebuild.

 

not impossible, but now very unlikely to be another fault disk. maybe driver? maybe my data card?

 

I'm ok to replace the data card, but I thought the M1015 is already the best supported one. any advice?

Link to comment
  • 2 weeks later...

Any thoughts. Now going through the same problem the third time. Event is always the same. I am creating an array. All works well for 3 days or so. Then, one disk goes bad (shows "red") and the content of this disk is emulated.

 

I changed the "faulty" disk and recreated a new array. Unfortuntely, the issue is always the same. Working for 3 days and then one disk is faulty.

 

I am sending a diagnostic of the 3rd time later today, but I am sure that it is the same issue as the 2nd time (see my previous email).

 

This issue is really annoying as it basically prevents me from properly using Unraid...

 

Thanks in advance for any help or ideas you may have!!!

Link to comment

I mentioned before that everything works well for a few days after setting up the array and then one disk fails (while the disk is actually functional).

 

I now know why it takes a few days. It actually works until the first "parity check" kicks in.

 

How often is the parity check required and any idea why the check leads to one disk turn "red"?

Link to comment

I promise not to hijack this thread but i think I may be experiencing the same, or related issue. I have been running smoothly v5b11 since it was bleeding edge. Did a clean install to 6.1 and my disk 3 failed. I bought a new drive, precleared 3 cycles, and popped it in my disk 3 slot. Server rebuilt the drive as expected and then all of my other drives failed simultaneously... rebooted everything and was able to get all the drives green for a while but i cannot access my shares. I will probably create my own thread but if you guys want any logs or hardware info from me for crossreferencing, just lmk!

Link to comment

I promise not to hijack this thread but i think I may be experiencing the same, or related issue. I have been running smoothly v5b11 since it was bleeding edge. Did a clean install to 6.1 and my disk 3 failed. I bought a new drive, precleared 3 cycles, and popped it in my disk 3 slot. Server rebuilt the drive as expected and then all of my other drives failed simultaneously... rebooted everything and was able to get all the drives green for a while but i cannot access my shares. I will probably create my own thread but if you guys want any logs or hardware info from me for crossreferencing, just lmk!

You are experiencing an issue which it seems to you is similar to some issue another user has started a thread about. If that issue is not clearly related to a possible defect in unRAID itself, but might instead be something related to your particular hardware or configuration, then you should definitely start your own thread. It can only confuse things if we start trying to get more information from you in this other users thread.
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...