Jump to content

Failing Disk?


Recommended Posts

I logged onto my unraid server to find that disk 10 has been disabled, it's a WD 6tb red, it has a red x beside it. Says contents emulated.

 

Can I recover the disk? If not how do I identify the disk & replace it, I have another WD 6tb red drive.

 

I've installed the fix common problems plugin, just takes me back to tab

 

I'm new to unraid, my system has run faultlessly for months, last reboot 100 days ago.

 

Thanks in advance.

 

bgeorge104

tower-diagnostics-20160625-2115.zip

Link to comment
  • Replies 57
  • Created
  • Last Reply

Top Posters In This Topic

Top Posters In This Topic

Posted Images

A belated response I know, but yours was a little different, probably caused some to skip it.

 

I've posted here about your case, because it has a particular problem in it, one you aren't aware of yet.  From your syslog, it does not appear you have rebooted, which is a good thing!  Please go immediately to the Main page and grab a screen copy or take notes of ALL of your drive assignments.  Or just use the list I've pasted in at the bottom of this post.

 

Assuming you read the other post, on June 23, you had a glitch between the SAS card and Disk 10, causing it to be dropped from the system.  The drive appears to be fine, good SMART report, so it appears to be the fault of the SAS card, but I have no other evidence either way.  The glitch appears to have been just a random thing, and your system is otherwise fine.  You will need to rebuild Disk 10 inplace.  See What do I do if I get a red X next to a hard disk?

 

The part you don't know is that something has caused your super.dat file to be cleared, and it contains all of your drive assignments.  I'm rather sure that once you reboot, your assignments will all be gone.  Which means you will have to reenter them again, from notes or screen copy or from the list below.  This is a new unRAID issue only recently discovered, rather rare, hopefully fixed in the near future.  I don't know what caused it.

 

You have been experiencing numerous power outages, and your UPS has saved you each time, but it looks like your routers and/or switches are not on the UPS.  I recommend finding a way to get them plugged into a UPS too.

 

This is your assignment list.  Look for the disk number then the drive model and serial.  For example, the parity drive is disk0 with serial ending in 2JS.  The 2 Kensingtons are in Cache drive order.

Mar 20 14:27:34 Tower kernel: md: import disk0: [8,144] (sdj) WDC_WD60EFRX-68MYMN1_WD-WX21D15262JS size: 5860522532
Mar 20 14:27:34 Tower kernel: md: import disk1: [8,208] (sdn) WDC_WD60EFRX-68MYMN1_WD-WX31D15P7LET size: 5860522532
Mar 20 14:27:34 Tower kernel: md: import disk2: [8,16] (sdb) WDC_WD60EFRX-68MYMN1_WD-WX21D1526F7J size: 5860522532
Mar 20 14:27:34 Tower kernel: md: import disk3: [8,80] (sdf) WDC_WD60EFRX-68MYMN1_WD-WX21D65NV26R size: 5860522532
Mar 20 14:27:34 Tower kernel: md: import disk4: [8,128] (sdi) WDC_WD60EFRX-68MYMN1_WD-WX21D1526V5D size: 5860522532
Mar 20 14:27:34 Tower kernel: md: import disk5: [8,192] (sdm) WDC_WD60EFRX-68MYMN1_WD-WX31D15P7DK6 size: 5860522532
Mar 20 14:27:34 Tower kernel: md: import disk6: [65,16] (sdr) WDC_WD60EFRX-68MYMN1_WD-WX31D65A2266 size: 5860522532
Mar 20 14:27:34 Tower kernel: md: import disk7: [8,64] (sde) WDC_WD60EFRX-68MYMN1_WD-WX21D7453TNV size: 5860522532
Mar 20 14:27:34 Tower kernel: md: import disk8: [8,112] (sdh) WDC_WD60EFRX-68MYMN1_WD-WX31D743Y3SD size: 5860522532
Mar 20 14:27:34 Tower kernel: md: import disk9: [65,0] (sdq) WDC_WD60EFRX-68MYMN1_WD-WX21D1526YH6 size: 5860522532
Mar 20 14:27:34 Tower kernel: md: import disk10: [8,176] (sdl) WDC_WD60EFRX-68MYMN1_WD-WX21D74534E1 size: 5860522532
Mar 20 14:27:34 Tower kernel: md: import disk11: [8,240] (sdp) WDC_WD60EFRX-68MYMN1_WD-WX11D4430666 size: 5860522532
Mar 20 14:27:34 Tower kernel: md: import disk12: [8,48] (sdd) WDC_WD30EFRX-68AX9N0_WD-WCC1T1278521 size: 2930266532
Mar 20 14:27:34 Tower kernel: md: import disk13: [8,96] (sdg) WDC_WD30EFRX-68AX9N0_WD-WCC1T1275230 size: 2930266532
Mar 20 14:27:34 Tower kernel: md: import disk14: [8,160] (sdk) WDC_WD30EZRX-00MMMB0_WD-WCAWZ2551212 size: 2930266532
Mar 20 14:27:34 Tower kernel: md: import disk15: [8,32] (sdc) WDC_WD30EZRX-00MMMB0_WD-WMAWZ0212771 size: 2930266532
Mar 20 14:27:34 Tower kernel: md: import disk16: [8,224] (sdo) WDC_WD40EFRX-68WT0N0_WD-WCC4E1664282 size: 3907018532

Mar 20 14:27:34 Tower emhttp: import 23 cache device: sds
Mar 20 14:27:34 Tower emhttp: import 24 cache device: sdt

Mar 20 14:27:34 Tower emhttp: KINGSTON_SS200S330G_50026B7257050D88 (sds) 29313144
Mar 20 14:27:34 Tower emhttp: KINGSTON_SS200S330G_50026B7257087E82 (sdt) 29313144

Link to comment

Thanks for the reply,

 

I'm quite new to unraid,

 

How do I re enter the drive assignments?

 

Will I find the hdd serial numbers listed under the main tab on the physical drives?

 

I think the power outages were during some work that was being done on our house. I can put the switch on the ups, but not the BT hub 5, as its two floors up from the  server, switch & ups.

 

Is there a way or preclearing a new disk with the array running?

 

Thanks in advance

 

bgeorge104

Link to comment
I can put the switch on the ups, but not the BT hub 5, as its two floors up from the  server, switch & ups.
I'd recommend spending a small amount on a minimal UPS specifically for it.

http://amzn.com/B001985SWW

Or ideally something designed for low loads and longer runtimes.

http://amzn.com/B00NTQYUA8

Apologies if you aren't in North America, I'm sure there are local alternatives that fulfill the objective.

Link to comment

How do I re enter the drive assignments?

 

Will I find the hdd serial numbers listed under the main tab on the physical drives?

When you next boot, it will be obvious.  Each drive slot will have a dropdown with all of the available drives listed, each with model and serial.

 

Is there a way or preclearing a new disk with the array running?

Yes, look for the Preclear plugin, may also need the Unassigned Devices plugin too.

 

I have read the two links you posted, the red x post seems to make sense to me.

Unfortunately, your case is going to be different, and the instructions are going to have to be customized for you.  Your system was writing files to Disk 10 when it was lost, and appears to have written more a day later, so all of those writes are on the emulated Disk 10, not the physical one.  I can't think of a way to do a New Config (to regain your assignments), *and* keep Disk 10 emulated so it can be rebuilt.  I'll keep thinking, and hopefully someone else here will have a brainstorm!  Otherwise, you'll only keep the contents of the physical Disk 10, and then run a correcting parity check, which will find and correct a lot of parity errors, but lose everything written to the emulated Disk 10.

Link to comment

Unfortunately, your case is going to be different, and the instructions are going to have to be customized for you.  Your system was writing files to Disk 10 when it was lost, and appears to have written more a day later, so all of those writes are on the emulated Disk 10, not the physical one.  I can't think of a way to do a New Config (to regain your assignments), *and* keep Disk 10 emulated so it can be rebuilt.  I'll keep thinking, and hopefully someone else here will have a brainstorm!  Otherwise, you'll only keep the contents of the physical Disk 10, and then run a correcting parity check, which will find and correct a lot of parity errors, but lose everything written to the emulated Disk 10.

I think (dangerous, I know) that "mdcmd set invalidslot 10" after assigning the rest of the drives may do it, but I would confirm with Tom @ limetech on the exact order of operations, or possibly johnnie.black would be kind enough to do a trial run with his test array to see if it works with the specific version of unraid that the OP is running.
Link to comment

Unless there's a problem writing to the flash drive stopping the array will recreate the super.dat.

 

If it doesn't I'll post a procedure that should work to recover the current emulated disk.

 

I think (dangerous, I know) that "mdcmd set invalidslot 10" after assigning the rest of the drives may do it, but I would confirm with Tom @ limetech on the exact order of operations, or possibly johnnie.black would be kind enough to do a trial run with his test array to see if it works with the specific version of unraid that the OP is running.

 

Not really familiar with that command, I did try it on my test server with v6.1.8 and it appears to do nothing, after a new config, before and after starting the array, do you know what the procedure should be?

Link to comment

If you have disk shares enable browse that disk, e.g., \\tower\disk10

 

If they are disable you have to stop the array first, but if there's an issue writing super.dat the array wont be accessible.

 

Maybe there's a way of listing files by date and check in which disk they are using the CLI, but I don't know how.

Link to comment

Array has to be stopped, but there's a chance it won't be accessible after that, the procedure to recover should work and it would recover all new data on the emulated disk, but if you could copy those files it would be safer.

 

Wait a while maybe someone else has an idea how to identify the new files on disk10.

Link to comment
I think (dangerous, I know) that "mdcmd set invalidslot 10" after assigning the rest of the drives may do it, but I would confirm with Tom @ limetech on the exact order of operations, or possibly johnnie.black would be kind enough to do a trial run with his test array to see if it works with the specific version of unraid that the OP is running.
Not really familiar with that command, I did try it on my test server with v6.1.8 and it appears to do nothing, after a new config, before and after starting the array, do you know what the procedure should be?
https://lime-technology.com/forum/index.php?topic=43765.msg418355#msg418355
Link to comment

Dunno if the array is stable enough, but the unbalance plugin will do what is needed to move all files from disk10 to the rest of the array if there is enough space. At the very least the dry run option will show the contents of disk10.

 

Midnight commander is also an option, but requires extensive command line knowledge to get data to another computer.

Link to comment

I think (dangerous, I know) that "mdcmd set invalidslot 10" after assigning the rest of the drives may do it, but I would confirm with Tom @ limetech on the exact order of operations, or possibly johnnie.black would be kind enough to do a trial run with his test array to see if it works with the specific version of unraid that the OP is running.
Not really familiar with that command, I did try it on my test server with v6.1.8 and it appears to do nothing, after a new config, before and after starting the array, do you know what the procedure should be?
https://lime-technology.com/forum/index.php?topic=43765.msg418355#msg418355

 

This isn't working on v6.1.8, unless I'm doing something wrong:

 

-new config

-assign all disks

-mdcmd set invalidslot 1

-start array, with trust parity checked all disks are green, without it check it starts a parity sync

 

 

 

If it comes to that there is a way I used before and also others on the forum with success:

 

-new config

-reassign all disks

-very important, before starting array check “parity is already valid”

-start array

-stop array

-unassign disk10 (select “no device”)

-start array, confirm the emulated disk 10 mounts OK and all data is there

-stop array

-reassign disk10

-start array to begin rebuild

 

Link to comment

I think (dangerous, I know) that "mdcmd set invalidslot 10" after assigning the rest of the drives may do it, but I would confirm with Tom @ limetech

 

I'd forgotten about that one, that was a good idea!  We really should find out from Tom whether any vestige of that feature remains. I just read jonathanm's link about it!

Link to comment

If it comes to that there is a way I used before and also others on the forum with success:

 

-new config

-reassign all disks

-very important, before starting array check “parity is already valid”

-start array

-stop array

-unassign disk10 (select “no device”)

-start array, confirm the emulated disk 10 mounts OK and all data is there

-stop array

-reassign disk10

-start array to begin rebuild

 

Brilliant!  Just brilliant!

Link to comment

I think (dangerous, I know) that "mdcmd set invalidslot 10" after assigning the rest of the drives may do it, but I would confirm with Tom @ limetech on the exact order of operations, or possibly johnnie.black would be kind enough to do a trial run with his test array to see if it works with the specific version of unraid that the OP is running.
Not really familiar with that command, I did try it on my test server with v6.1.8 and it appears to do nothing, after a new config, before and after starting the array, do you know what the procedure should be?
https://lime-technology.com/forum/index.php?topic=43765.msg418355#msg418355

 

This isn't working on v6.1.8, unless I'm doing something wrong:

 

-new config

-assign all disks

-mdcmd set invalidslot 1

-start array, with trust parity checked all disks are green, without it check it starts a parity sync

 

 

 

If it comes to that there is a way I used before and also others on the forum with success:

 

-new config

-reassign all disks

-very important, before starting array check “parity is already valid”

-start array

-stop array

-unassign disk10 (select “no device”)

-start array, confirm the emulated disk 10 mounts OK and all data is there

-stop array

-reassign disk10

-start array to begin rebuild

 

Do I start this by stopping the array?

 

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.


×
×
  • Create New...