extremeaudio Posted March 1, 2013 Share Posted March 1, 2013 Was upgrading my parity drive from 2TB to 3TB. Suddenly shares went down. So i stopped the parity rebuild and rebooted the server. When the server came back up, disk 1 shows as unformatted. I am very very scared! I only have the option of formatting disk1 (which is FULL with data) and the parity disk is orange - obviously because the parity rebuild did not complete. What next?!? Syslog attached. EDIT: Just in case it helps (which I hope and think it should), I have the original 2TB parity drive with me, and it is still intact. Post the drive swap, I have not made any changes to the array either. Except that I used my connected XBMC client to watch a movie, and it downloaded a subtitle file to one of the folders on one of the drives. Can't say which one. syslog-2013-03-02.txt Link to comment
ClunkClunk Posted March 1, 2013 Share Posted March 1, 2013 Are you me? Because it sounds like we both have the same exact issue: http://lime-technology.com/forum/index.php?topic=26265.0 Ironically enough, I was also going from 2TB parity to 3TB. Link to comment
WeeboTech Posted March 1, 2013 Share Posted March 1, 2013 Are you me? Because it sounds like we both have the same exact issue: http://lime-technology.com/forum/index.php?topic=26265.0 Ironically enough, I was also going from 2TB parity to 3TB. See my response in the other thread. Link to comment
extremeaudio Posted March 1, 2013 Author Share Posted March 1, 2013 Are you me? Because it sounds like we both have the same exact issue: http://lime-technology.com/forum/index.php?topic=26265.0 Ironically enough, I was also going from 2TB parity to 3TB. Wow yeah! Now that you mention I went around and saw your thread, and it does seem like I'm you! Ironically, it was my wifey too who initiated the movie watching scenario inspite of my protests that we should let the server do its thing for a while! See my response in the other thread. Checking up as I type. Though i'm hoping and praying I can restore the red balled disk using the 2TB parity drive I just pulled out. Link to comment
WeeboTech Posted March 1, 2013 Share Posted March 1, 2013 Are you me? Because it sounds like we both have the same exact issue: http://lime-technology.com/forum/index.php?topic=26265.0 Ironically enough, I was also going from 2TB parity to 3TB. Wow yeah! Now that you mention I went around and saw your thread, and it does seem like I'm you! Ironically, it was my wifey too who initiated the movie watching scenario inspite of my protests that we should let the server do its thing for a while! See my response in the other thread. Checking up as I type. Though i'm hoping and praying I can restore the red balled disk using the 2TB parity drive I just pulled out. Before you do anything, you should test the drive with smartctl as I mentioned in the other thread. At least then you will have confidence in the drive and it's inner workings. If all is well, you can start to look at any loose cables or power issues. Link to comment
extremeaudio Posted March 2, 2013 Author Share Posted March 2, 2013 I ran smartctl on the red balled disk, ie sdb. The report is attached. How do I proceed further? smart.txt Link to comment
ClunkClunk Posted March 2, 2013 Share Posted March 2, 2013 Looks like you ran a smart short test. You should run another one, saving to a separate file and then compare the two. Here's how I did it: Save the initial smart report, then start the short test: smartctl -a /dev/sdb > /boot/smart_sdb_1.txt smartctl -test short /dev/sdb The test will indicate that it's been started and give an estimate of time. It took about a minute on mine. After the time has passed, gather the results in to a second text file and then compare: smartctl -a /dev/sdb > /boot/smart_sdb_2.txt diff /boot/smart_sdb_1.txt /boot/smart_sdb_2.txt The "diff" command shows you the differences in the files. See if anything appears abnormal. Next up is starting the long test: smartctl -test long /dev/sdb On my system, it estimated 4.25 hours, but ended up taking about 4.5 hours. You can check if it's done by running this: smartctl -a /dev/sdb Look for the line about CURRENT_TEST_STATUS. If it says it still is running, wait longer, and run the above command again to check. Once it's complete run this to gather the results in to a third file and then the diff vs. the second one to compare the two: smartctl -a /dev/sdb > /boot/smart_sdb_3.txt diff /boot/smart_sdb_2.txt /boot/smart_sdb_3.txt On my situation, both short and long tests did not show anything out of the ordinary, so I did a reiserfs check, and that went well, so I'm going to copy the drive using ddrescue to another drive and work from there today. Link to comment
extremeaudio Posted March 3, 2013 Author Share Posted March 3, 2013 Thanks for the very precise instructions on the test. I would have never figured out that diff business on my own. Please find attached the 2 logs, and below is the information from "diff". To my amateur eye, there is nothing unusual in it. root@NAS:~# smartctl -a /dev/sdb > /boot/smart_sdb_2.txt root@NAS:~# diff /boot/smart_sdb_1.txt /boot/smart_sdb_2.txt 12c12 < Local Time is: Sun Mar 3 14:28:51 2013 IST --- > Local Time is: Sun Mar 3 14:30:42 2013 IST 65c65 < 193 Load_Cycle_Count 0x0032 179 179 000 Old_age Always - 63589 --- > 193 Load_Cycle_Count 0x0032 179 179 000 Old_age Always - 63595 root@NAS:~# smart_sdb_1.txt Link to comment
extremeaudio Posted March 3, 2013 Author Share Posted March 3, 2013 I noticed that the second report is 0kb. So I did the test again. And this time the results look a bit different. root@NAS:~# smartctl -a /dev/sdb > /boot/smart_sdb_2.txt root@NAS:~# diff /boot/smart_sdb_1.txt /boot/smart_sdb_2.txt 12c12 < Local Time is: Sun Mar 3 14:28:51 2013 IST --- > Local Time is: Sun Mar 3 14:36:02 2013 IST 65,66c65,66 < 193 Load_Cycle_Count 0x0032 179 179 000 Old_age Always - 63589 < 194 Temperature_Celsius 0x0022 104 095 000 Old_age Always - 48 --- > 193 Load_Cycle_Count 0x0032 179 179 000 Old_age Always - 63611 > 194 Temperature_Celsius 0x0022 105 095 000 Old_age Always - 47 root@NAS:~# smart_sdb_1.txt smart_sdb_2.txt Link to comment
extremeaudio Posted March 3, 2013 Author Share Posted March 3, 2013 Just noticed another peculiar thing. As I said, I started watching a movie just prior to this happening. I'm pretty sure that my movies are on disk 1 and disk 2 out of the 3 disks in my array since I had added files to that share earlier on and haven't done it after getting me the plus upgrade and adding the 3rd drive. So I opened up the contents of Disk 3 and I noticed a folder called Movies. Inside Movies, I can see "Chaos Theory". This is the exact same movie I had started up on that day. Inside the movie folder is only the subtitle, which I had downloaded using XBMCs inbuilt subtitle downloading add-on. Now I thought this might have happened because the actual movie folder with the video file must be residing on disk 1 and it might not have permitted some write to it and thus it created another folder on disk 3. But thats not the case. The actual movie is on disk2 and I can still access it. Why then would another folder be created? My split level is 2 for the Movies share, and the directory structure is Share (Movies) > Folder (Movie Name) > File (Video File), Subtitles, artwork etc Link to comment
itimpi Posted March 3, 2013 Share Posted March 3, 2013 With a split level of 2 the movie folder can be replicated on different disks if the path is movies/moviename. When you added the subtitle file that is logically inside this moviename folder then the disk it will go on will depend on your free space allocation method. If it is only the top level 'Movies' folder that you want to allow to be replicated across disks, but any moviename folder under this to be contrained to one disk you would need a split level of 1. Link to comment
extremeaudio Posted March 3, 2013 Author Share Posted March 3, 2013 With a split level of 2 the movie folder can be replicated on different disks if the path is movies/moviename. The agony of losing a disk is causing these "duh" moments. Sorry, I dont know what I was thinking. Link to comment
itimpi Posted March 3, 2013 Share Posted March 3, 2013 With a split level of 2 the movie folder can be replicated on different disks if the path is movies/moviename. The agony of losing a disk is causing these "duh" moments. Sorry, I dont know what I was thinking. I know what you mean (I am prone to 'duh' moments myself at times). I hope the fact that now you realise what you are seeing is expected behaviour it will at least reduce the stress Link to comment
extremeaudio Posted March 3, 2013 Author Share Posted March 3, 2013 Long smart test also completed. Attached here. What next please? smart_long_03_03_13.txt Link to comment
dgaschk Posted March 3, 2013 Share Posted March 3, 2013 The disk has an unreadable sector and needs to be rebuilt. Put the original parity drive back in place. I can't test this right now so someone will need to validate the following procedure: 1. Record to desired disk config, i.e., parity drive = HDD serial xxx, disk1 = HDD serial yyy, etc. 1. Choose New Config under Utils. 2. The disks may need to be assigned to their correct positions. 3. There should be check box to indicates that parity is already correct. 4. Start the array. You should now have a working array that includes the disk with a pending sector. Stop the array. Assign the New disk as parity and the old parity drive to replace the drive with an unreadable/pending sector. Start the array and after parity is copied and the data disk rebuilt the array should be good. Run pre-clear on the disk with a pending sector to correct the issue. Link to comment
extremeaudio Posted March 4, 2013 Author Share Posted March 4, 2013 Could someone please help validate the above procedure so that I can get cracking on it ASAP? I am having sleepless nights with my array broken :'( Edit: In the procedure that you suggest, I believe that I will basically be forcing unraid to believe that my original parity disk is indeed the correct parity disk and to reconstruct all data on the failed disk based on that assumption. So I assume that the parity disk will rebuild data on the failed disk thinking it is a brand new one. So can I replace the failed 2TB disk with a healthy, brand new 3TB one so that I dont encounter any errors during the rebuilding process? Also, will this process be affected by the fact that there might be some new files in the various healthy disks since like I mentioned there have been subtitles downloaded to one of the disks and my xbmc machine keeps scraping my data non-stop to add information to the library (though this most likely gets stored locally on the XBMC machine and not on unraid) Link to comment
dgaschk Posted March 4, 2013 Share Posted March 4, 2013 The numbered steps restore the array to the previous condition. I'm just not certain that the "parity is correct" check box will appear because it is a new feature. I'm in the process of moving so I can't do a test until later this week or next. The second part describes the parity-swap-disabled procedure where the parity disk is upgraded and the old parity drive is assigned as a data disk in one step. Once the array is started, unRAID will copy from the old parity to the new one and then rebuild the data disk automatically. Link to comment
extremeaudio Posted March 5, 2013 Author Share Posted March 5, 2013 Still awaiting a go ahead on this! Someone please help! Link to comment
lionelhutz Posted March 5, 2013 Share Posted March 5, 2013 You will lose data if another disk got written after the parity swap. I have not tried the "parity is valid" feature either so I can't comment. Try at least up to that step and see if the box appears. Link to comment
extremeaudio Posted March 6, 2013 Author Share Posted March 6, 2013 How much data will I lose? Just the additional data that was written during/ after the failure or more? Nothing significant has been written after the failure (as i mentioned, just a couple of subtitle files), and I dont mind losing that as long as everything else prior to that will be safe. This has been the longest period on the forum that I have gone without any conclusive help. I really wish someone would take notice of this serious problem. Link to comment
extremeaudio Posted March 6, 2013 Author Share Posted March 6, 2013 Just to make sure that I have understood right, I am enlisting the steps I am going to take. Please correct me if my understanding of the procedure is wrong in any way. 1. Replace currently invalid parity disk (3TB) with the original parity disk (2TB) . Upon booting unRAID in this state, I am faced with a "Missing" message under parity slot. And obviously a red ball next to the disabled disk1. 2. Next I should go under "Utils" and apply "New Config". 3. Then back to Main page and assign the old parity as parity and other drives in the respective slots and start the array. Am I good to go till here? What will exactly happen on doing this? Will unRAID construct data on the failed disk? Based on the very definition of "New Config", this does not seem like the case. And also how will it read or write to a red-balled disk anyway? I'm sorry for the hesitance, but I'm trying to understand the rationale behind the prescribed steps. I also didnt understand where I would get the "Parity is valid" message in all of this. After starting the array, what are the steps after that? Link to comment
lionelhutz Posted March 6, 2013 Share Posted March 6, 2013 You can have data corruption at every location where a bit was changed on any of the other drives. So, you can't determine how much or what files were affected. The new config lets you "start over" with a new array setup. Assign all the disks how they should be after doing the new config. At that point, a check box or some other indicator is supposed to appear that lets you confirm that the parity drive is already valid. Link to comment
extremeaudio Posted March 6, 2013 Author Share Posted March 6, 2013 I have not added any more data to the other 2 healthy drives. By following this procedure do I stand the chance of losing any data that resides on those 2 disks? I have sort of analyzed what data I would potentially have lost on the failed disk and it seems like I have a backup of all of it on some other unraid machines. But I cannot absolutely afford to lose what I have on the healthy disks. Link to comment
lionelhutz Posted March 6, 2013 Share Posted March 6, 2013 No, this has no effect on the other disks. You will have data corruption issues on the failed/replaced disk if you wrote data to either of the healthy disks. In this case, the parity was not updated to reflect the new data written. But, if you absolutely can not use that data you should have another backup somewhere else. Link to comment
extremeaudio Posted March 7, 2013 Author Share Posted March 7, 2013 1. Record to desired disk config, i.e., parity drive = HDD serial xxx, disk1 = HDD serial yyy, etc. 1. Choose New Config under Utils. 2. The disks may need to be assigned to their correct positions. 3. There should be check box to indicates that parity is already correct. 4. Start the array. I am starting this procedure now. At point no. 3 - What if I do not get the check box to indicate that parity is correct? How do I proceed then? Update 1: Did as instructed. The "Parity is valid" box did come up. Checked the box and started the array, now parity check is in progress. Update 2: In about 2% into the parity test, 8000 odd sync errors show as corrected. I should let this continue, right? Link to comment
Recommended Posts
Archived
This topic is now archived and is closed to further replies.