BRiT Posted September 4, 2014 Share Posted September 4, 2014 The drive in question is showing signs of unacceptable behavior from wear and tear, so it needs to be replaced. It has not had any write errors, but has had some questionable read errors. The current data drive and current parity drive is 2TB in size. The new parity drive will be a nice Hitachi 4TB 7200rpm drive. This scenario seems to fit perfectly for performing the Disabled Parity Swap move. So my questions are, does the drive need to be marked as disabled? If so, is there a way to set a drive as disabled under either the unRAID 5 or unRAID 6 series? That is, aside from shutting down the server and disconnecting the drive in question, if that does indeed mark a disk as disabled and not just missing. I feel like I've seen this done somewhere by feeding commands into the md driver, but can't quite seem to find it now. Thanks for any help or pointers to where to find the answers. The closest bit I'm finding on this is You can "fail a disk" by stopping the array, un-assigning it from its slot, and starting the array with it un-assigned, then stopping the array once more.. Starting the array with it un-assigned will mark it as "failed" More information is being found in this thread: http://lime-technology.com/forum/index.php?topic=29529.0 Quote Link to comment
Fireball3 Posted September 4, 2014 Share Posted September 4, 2014 Yes, that is the most recent discussion on "parity swap disable" I'm aware of. The key is to cause a drive/slot to show up red balled. Then the option to perform the drive swap will show up. Assign old parity --> new data & new drive --> new parity. As I understand you have the old data drive that is still working as a backup. Basically nothing to loose if it goes wrong. Quote Link to comment
megalodon Posted September 4, 2014 Share Posted September 4, 2014 But its worth taking note of what Gary said in the last post, I had an option to do this two months ago but was too worried that something may go wrong. Let me know if you do this, if it works please BRIT. Quote Link to comment
Fireball3 Posted September 4, 2014 Share Posted September 4, 2014 That is why I said: As I understand you have the old data drive that is still working as a backup. Basically nothing to loose if it goes wrong. If there was a real red balled drive I would also be more cautios. Quote Link to comment
BRiT Posted September 6, 2014 Author Share Posted September 6, 2014 Welp... Decided to give this a go on unRAID 5.05. I resorted to downing the server and physically disconnecting the read-error 2TB data drive. I brought up the server to notice the array was not started and drive appeared as missing. I started the array and the 2TB data drive appeared as disabled. I stopped the array. I assigned new 4TB drive as parity. I assigned the old 2TB parity drive as the disabled 2TB data drive. I noticed the [Copy] button was available, checked the checkbox, and clicked on [Copy]. The array screen updated with the 4TB Parity drive and 2TB Data drive showing up as blue balled. Array status showed "Copying, 0% complete...". After a few minutes I clicked [Refresh] button and Array status showed "Copying, 2% complete...". Now it's a matter of waiting things out and seeing how it turns out. If this produces a functional system, I will then perform a parity check to ensure everything should be fine. After that, I will then upgrade back to unRAID 6.0 beta 8 and add in a second 4TB drive as a brand new data drive. I would have preferred replacing the 2TB drive directly with a 4TB drive, but there were too many mitigating circumstances. Mostly, I didn't fully trust the 2TB data drive enough to be stable enough to generate correct reads if I rebuilt parity on the 4TB drive, then reconstructed the 2TB drive onto the replacement 4TB data drive. I will update this thread after the procedure is complete with the final verdict. The lesson to me is to not trust drives being fine so long as there are no write errors. I now find the following features as absolute requirements in a real NAS system: Scheduled automated SMART tests. Notification of SMART test failures. Notification of all errors, READ errors in addition to WRITE errors. Ability to easily reconstruct a drive onto another drive. Ability to manually mark a drive as disabled as it prevents the need to have physical access to the server Quote Link to comment
BRiT Posted September 6, 2014 Author Share Posted September 6, 2014 Process is still going. Seems to be about 3.5 minutes per percent. Sep 6 18:34:16 REAVER emhttp: copy: 54% complete Sep 6 18:37:38 REAVER emhttp: copy: 55% complete Sep 6 18:40:59 REAVER emhttp: copy: 56% complete Sep 6 18:44:21 REAVER emhttp: copy: 57% complete Sep 6 18:47:48 REAVER emhttp: copy: 58% complete Sep 6 18:51:18 REAVER emhttp: copy: 59% complete Sep 6 18:54:46 REAVER emhttp: copy: 60% complete Sep 6 18:58:15 REAVER emhttp: copy: 61% complete Sep 6 19:01:45 REAVER emhttp: copy: 62% complete Sep 6 19:05:20 REAVER emhttp: copy: 63% complete Quote Link to comment
BRiT Posted September 7, 2014 Author Share Posted September 7, 2014 The system finished copying the parity information from the old 2TB parity drive to the new 4TB parity drive. It's then showed Array Status of "Stopped. Ugrading disk/swapping parity." The parity disk is green-balled and the replacement 2TB data drive is orange-balled. The next step was to check the box "Yes I want to do this" next to the [start] button which states: "Start will expand the file system of the data disk (if possible); and then bring the array on-line and start Data-Rebuild." After some time (30 seconds or so), the web console refreshed and showed Array Status as "Started. Data-Rebuild in progress.". The progress indicator shows total size of 2TB, 18.29 GB (1%) completed at estimated speed of 99.28 MB/sec and finish in 333 minutes. Quote Link to comment
itimpi Posted September 7, 2014 Share Posted September 7, 2014 Just a point to note - if you want to replace the 2TB data drive with a 4TB one then I would suggest doing this before going back to v6 or waiting for v6 beta 9. v6 Beta 8 has (temporarily) disabled expanding the file system to use the full drive when replacing a drive with a larger one. Quote Link to comment
BRiT Posted September 7, 2014 Author Share Posted September 7, 2014 The replacement data drive rebuild has completed. The Array Status showed parity has not been checked. I unchecked the box indicating "Correct any Parity-Check errors by writing the Parity disk with corrected parity.". The non-correcting parity check is now in progress. The check status shows 4TB with 0 sync errors, current position 15.42 GB at estimated 105.11 MB/s with finish time in 632 minutes. Quote Link to comment
megalodon Posted September 7, 2014 Share Posted September 7, 2014 Thanks for all the updates. Handy to know it seems to have worked fine for you. Quote Link to comment
BRiT Posted September 8, 2014 Author Share Posted September 8, 2014 It does appear to have completely worked. The parity check finished; Last checked on Sun Sep 7 21:41:36 2014 EDT, finding 0 errors. The first disk rebuild was done only over the size of the actual data drive (2TB), while the final parity check was over the entire array (4TB). Sep 6 21:51:34 REAVER kernel: mdcmd (41): check CORRECT Sep 6 21:51:34 REAVER kernel: md: recovery thread woken up ... Sep 6 21:51:34 REAVER kernel: md: recovery thread rebuilding disk1 ... Sep 6 21:51:34 REAVER kernel: md: using 6688k window, over a total of 1953514552 blocks. Sep 7 05:14:34 REAVER kernel: md: sync done. time=26579sec Sep 7 05:14:34 REAVER kernel: md: recovery thread sync completion status: 0 <<...snip...snip...>> Sep 7 09:55:55 REAVER kernel: mdcmd (46): check NOCORRECT Sep 7 09:55:55 REAVER kernel: md: recovery thread woken up ... Sep 7 09:55:55 REAVER kernel: md: recovery thread checking parity... Sep 7 09:55:55 REAVER kernel: md: using 6688k window, over a total of 3907018532 blocks. Sep 7 19:58:01 REAVER kernel: mdcmd (47): spindown 2 Sep 7 19:58:01 REAVER kernel: mdcmd (48): spindown 3 Sep 7 19:58:02 REAVER kernel: mdcmd (49): spindown 4 Sep 7 19:58:03 REAVER kernel: mdcmd (50): spindown 5 Sep 7 21:41:36 REAVER kernel: md: sync done. time=42340sec Sep 7 21:41:36 REAVER kernel: md: recovery thread sync completion status: 0 Quote Link to comment
RobJ Posted September 22, 2015 Share Posted September 22, 2015 I have created an updated wiki page for the Parity Swap procedure -> The Parity Swap procedure I would really appreciate review and corrections, especially from Brit if he has time. It's wordy, no pictures (afraid that's not my strong point), but I believe has extra hand holding, for both new users and all of us that rarely run it. I've called it the 'Parity Swap' procedure, not the 'Swap Disable' procedure, which it's called more often. I hope that's not a problem, and I can change it, but I think 'Parity Swap' is clearer, easier to understand. It's not well tested. I just used it successfully on my own v6.1 system, but not with a failed drive, so there may be behavioral quirks with other versions and situations. PLEASE let us know! Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.