Jump to content

[SOLVED] SOS! Data Disk disabled while upgrading parity drive!!


Recommended Posts

Was upgrading my parity drive from 2TB to 3TB. Suddenly shares went down. So i stopped the parity rebuild and rebooted the server. When the server came back up, disk 1 shows as unformatted. I am very very scared! I only have the option of formatting disk1 (which is FULL with data) and the parity disk is orange - obviously because the parity rebuild did not complete.

 

What next?!? Syslog attached.

 

EDIT: Just in case it helps (which I hope and think it should), I have the original 2TB parity drive with me, and it is still intact. Post the drive swap, I have not made any changes to the array either. Except that I used my connected XBMC client to watch a movie, and it downloaded a subtitle file to one of the folders on one of the drives. Can't say which one.

syslog-2013-03-02.txt

Link to comment

Are you me? Because it sounds like we both have the same exact issue:

http://lime-technology.com/forum/index.php?topic=26265.0

 

Ironically enough, I was also going from 2TB parity to 3TB.

 

Wow yeah! Now that you mention I went around and saw your thread, and it does seem like I'm you!

 

Ironically, it was my wifey too who initiated the movie watching scenario inspite of my protests that we should let the server do its thing for a while!

 

See my response in the other thread.

 

Checking up as I type. Though i'm hoping and praying I can restore the red balled disk using the 2TB parity drive I just pulled out.  :-X

Link to comment

Are you me? Because it sounds like we both have the same exact issue:

http://lime-technology.com/forum/index.php?topic=26265.0

 

Ironically enough, I was also going from 2TB parity to 3TB.

 

Wow yeah! Now that you mention I went around and saw your thread, and it does seem like I'm you!

 

Ironically, it was my wifey too who initiated the movie watching scenario inspite of my protests that we should let the server do its thing for a while!

 

See my response in the other thread.

 

Checking up as I type. Though i'm hoping and praying I can restore the red balled disk using the 2TB parity drive I just pulled out.  :-X

 

 

Before you do anything, you should test the drive with smartctl as I mentioned in the other thread.

At least then you will have confidence in the drive and it's inner workings. If all is well, you can start to look at any loose cables or power issues.

Link to comment

Looks like you ran a smart short test. You should run another one, saving to a separate file and then compare the two.

 

Here's how I did it:

 

Save the initial smart report, then start the short test:

smartctl -a /dev/sdb > /boot/smart_sdb_1.txt
smartctl -test short /dev/sdb

 

The test will indicate that it's been started and give an estimate of time. It took about a minute on mine.

 

After the time has passed, gather the results in to a second text file and then compare:

smartctl -a /dev/sdb > /boot/smart_sdb_2.txt
diff /boot/smart_sdb_1.txt /boot/smart_sdb_2.txt

 

The "diff" command shows you the differences in the files. See if anything appears abnormal.

 

Next up is starting the long test:

smartctl -test long /dev/sdb

 

On my system, it estimated 4.25 hours, but ended up taking about 4.5 hours.

 

You can check if it's done by running this:

smartctl -a /dev/sdb

Look for the line about CURRENT_TEST_STATUS. If it says it still is running, wait longer, and run the above command again to check.

 

Once it's complete run this to gather the results in to a third file and then the diff vs. the second one to compare the two:

smartctl -a /dev/sdb > /boot/smart_sdb_3.txt
diff /boot/smart_sdb_2.txt /boot/smart_sdb_3.txt

 

On my situation, both short and long tests did not show anything out of the ordinary, so I did a reiserfs check, and that went well, so I'm going to copy the drive using ddrescue to another drive and work from there today.

Link to comment

Thanks for the very precise instructions on the test. I would have never figured out that diff business on my own.

 

Please find attached the 2 logs, and below is the information from "diff". To my amateur eye, there is nothing unusual in it.

 

root@NAS:~# smartctl -a /dev/sdb > /boot/smart_sdb_2.txt

root@NAS:~# diff /boot/smart_sdb_1.txt /boot/smart_sdb_2.txt

12c12

< Local Time is:    Sun Mar  3 14:28:51 2013 IST

---

> Local Time is:    Sun Mar  3 14:30:42 2013 IST

65c65

< 193 Load_Cycle_Count        0x0032  179  179  000    Old_age  Always      -      63589

---

> 193 Load_Cycle_Count        0x0032  179  179  000    Old_age  Always      -      63595

root@NAS:~#

smart_sdb_1.txt

Link to comment

I noticed that the second report is 0kb. So I did the test again. And this time the results look a bit different.

 

root@NAS:~# smartctl -a /dev/sdb > /boot/smart_sdb_2.txt

root@NAS:~# diff /boot/smart_sdb_1.txt /boot/smart_sdb_2.txt

12c12

< Local Time is:    Sun Mar  3 14:28:51 2013 IST

---

> Local Time is:    Sun Mar  3 14:36:02 2013 IST

65,66c65,66

< 193 Load_Cycle_Count        0x0032  179  179  000    Old_age  Always      -      63589

< 194 Temperature_Celsius    0x0022  104  095  000    Old_age  Always      -      48

---

> 193 Load_Cycle_Count        0x0032  179  179  000    Old_age  Always      -      63611

> 194 Temperature_Celsius    0x0022  105  095  000    Old_age  Always      -      47

root@NAS:~#

 

smart_sdb_1.txt

smart_sdb_2.txt

Link to comment

Just noticed another peculiar thing.

 

As I said, I started watching a movie just prior to this happening.

 

I'm pretty sure that my movies are on disk 1 and disk 2 out of the 3 disks in my array since I had added files to that share earlier on and haven't done it after getting me the plus upgrade and adding the 3rd drive. So I opened up the contents of Disk 3 and I noticed a folder called Movies. Inside Movies, I can see "Chaos Theory". This is the exact same movie I had started up on that day. Inside the movie folder is only the subtitle, which I had downloaded using XBMCs inbuilt subtitle downloading add-on. Now I thought this might have happened because the actual movie folder with the video file must be residing on disk 1 and it might not have permitted some write to it and thus it created another folder on disk 3. But thats not the case. The actual movie is on disk2 and I can still access it. Why then would another folder be created? My split level is 2 for the Movies share, and the directory structure is Share (Movies) > Folder (Movie Name) > File (Video File), Subtitles, artwork etc

Link to comment

With a split level of 2 the movie folder can be replicated on different disks if the path is movies/moviename.  When you added the subtitle file that is logically inside this moviename folder then the disk it will go on will depend on your free space allocation method.  If it is only the top level 'Movies' folder that you want to allow to be replicated across disks, but any moviename folder under this to be contrained to one disk you would need a split level of 1.

Link to comment

With a split level of 2 the movie folder can be replicated on different disks if the path is movies/moviename. 

 

The agony of losing a disk is causing these "duh" moments. Sorry, I dont know what I was thinking.

I know what you mean (I am prone to 'duh' moments myself at times).  I hope the fact that now you realise what you are seeing is expected behaviour it will at least reduce the stress :)

Link to comment

The disk has an unreadable sector and needs to be rebuilt. Put the original parity drive back in place. I can't test this right now so someone will need to validate the following procedure:

 

1. Record to desired disk config, i.e., parity drive = HDD serial xxx, disk1 = HDD serial yyy, etc.

1. Choose New Config under Utils.

2. The disks may need to be assigned to their correct positions.

3. There should be check box to indicates that parity is already correct.

4. Start the array.

 

You should now have a working array that includes the disk with a pending sector. Stop the array. Assign the New disk as parity and the old parity drive to replace the drive with an unreadable/pending sector. Start the array and after parity is copied and the data disk rebuilt the array should be good. Run pre-clear on the disk with a pending sector to correct the issue.

Link to comment

Could someone please help validate the above procedure so that I can get cracking on it ASAP? I am having sleepless nights with my array broken  :'(

 

Edit: In the procedure that you suggest, I believe that I will basically be forcing unraid to believe that my original parity disk is indeed the correct parity disk and to reconstruct all data on the failed disk based on that assumption. So I assume that the parity disk will rebuild data on the failed disk thinking it is a brand new one. So can I replace the failed 2TB disk with a healthy, brand new 3TB one so that I dont encounter any errors during the rebuilding process?

 

Also, will this process be affected by the fact that there might be some new files in the various healthy disks since like I mentioned there have been subtitles downloaded to one of the disks and my xbmc machine keeps scraping my data non-stop to add information to the library (though this most likely gets stored locally on the XBMC machine and not on unraid)

Link to comment

The numbered steps restore the array to the previous condition. I'm just not certain that the "parity is correct" check box will appear because it is a new feature. I'm in the process of moving so I can't do a test until later this week or next. The second part describes the parity-swap-disabled procedure where the parity disk is upgraded and the old parity drive is assigned as a data disk in one step. Once the array is started, unRAID will copy from the old parity to the new one and then rebuild the data disk automatically.

Link to comment

How much data will I lose? Just the additional data that was written during/ after the failure or more? Nothing significant has been written after the failure (as i mentioned, just a couple of subtitle files), and I dont mind losing that as long as everything else prior to that will be safe.

 

This has been the longest period on the forum that I have gone without any conclusive help. I really wish someone would take notice of this serious problem.

Link to comment

Just to make sure that I have understood right, I am enlisting the steps I am going to take. Please correct me if my understanding of the procedure is wrong in any way.

 

1. Replace currently invalid parity disk (3TB) with the original parity disk (2TB) . Upon booting unRAID in this state, I am faced with a "Missing" message under parity slot. And obviously a red ball next to the disabled disk1.

 

2. Next I should go under "Utils" and apply "New Config".

 

3. Then back to Main page and assign the old parity as parity and other drives in the respective slots and start the array.

 

Am I good to go till here?

 

What will exactly happen on doing this? Will unRAID construct data on the failed disk? Based on the very definition of "New Config", this does not seem like the case. And also how will it read or write to a red-balled disk anyway? I'm sorry for the hesitance, but I'm trying to understand the rationale behind the prescribed steps.

 

I also didnt understand where I would get the "Parity is valid" message in all of this.

 

After starting the array, what are the steps after that?

Link to comment

You can have data corruption at every location where a bit was changed on any of the other drives. So, you can't determine how much or what files were affected.

 

The new config lets you "start over" with a new array setup. Assign all the disks how they should be after doing the new config. At that point, a check box or some other indicator is supposed to appear that lets you confirm that the parity drive is already valid.

 

 

Link to comment

I have not added any more data to the other 2 healthy drives. By following this procedure do I stand the chance of losing any data that resides on those 2 disks?

 

I have sort of analyzed what data I would potentially have lost on the failed disk and it seems like I have a backup of all of it on some other unraid machines. But I cannot absolutely afford to lose what I have on the healthy disks. 

Link to comment

No, this has no effect on the other disks. You will have data corruption issues on the failed/replaced disk if you wrote data to either of the healthy disks. In this case, the parity was not updated to reflect the new data written.

 

But, if you absolutely can not use that data you should have another backup somewhere else.

Link to comment

1. Record to desired disk config, i.e., parity drive = HDD serial xxx, disk1 = HDD serial yyy, etc.

1. Choose New Config under Utils.

2. The disks may need to be assigned to their correct positions.

3. There should be check box to indicates that parity is already correct.

4. Start the array.

 

I am starting this procedure now.

 

At point no. 3 - What if I do not get the check box to indicate that parity is correct? How do I proceed then?

 

Update 1: Did as instructed. The "Parity is valid" box did come up. Checked the box and started the array, now parity check is in progress.

 

Update 2: In about 2% into the parity test, 8000 odd sync errors show as corrected. I should let this continue, right?

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...