etk29321 Posted March 18, 2014 Share Posted March 18, 2014 I just replaced one of my data disks that failed with a new 2TB drive. After putting in the new disk, the array shows the new drive with the amber light but also says 'New parity disk installed'. This is not behavior I've seen before when replacing a drive. I don't want to do a new parity sync since I need to use that parity to rebuild the failed drive! Has anyone come across this before? How do I get it to understand I have not replaced the parity drive and need to rebuild the failed drive from parity? (Disk 10 was replaced, not disk 0) ar 18 17:07:01 Tower emhttp: Device inventory: Mar 18 17:07:01 Tower emhttp: WDC_WD20EARS-00MVWB0_WD-WMAZ20227263 (sdb) 1953514584 Mar 18 17:07:01 Tower emhttp: ST3750330AS_5QK0263X (sdc) 732574584 Mar 18 17:07:01 Tower emhttp: ST3750330AS_5QK00LE3 (sdd) 732574584 Mar 18 17:07:01 Tower emhttp: WDC_WD20EURS-63S48Y0_WD-WCAZAK714963 (sde) 1953514584 Mar 18 17:07:01 Tower emhttp: WDC_WD20EARS-00MVWB0_WD-WCAZA5801616 (sdf) 1953514584 Mar 18 17:07:01 Tower emhttp: WDC_WD5000AAKS-00YGA0_WD-WCAS82518954 (sdg) 488386584 Mar 18 17:07:01 Tower emhttp: WDC_WD20EARS-00MVWB0_WD-WMAZ20274922 (sdh) 1953514584 Mar 18 17:07:01 Tower emhttp: WDC_WD20EARS-60MVWB0_WD-WCAZA8355188 (sdi) 1953514584 Mar 18 17:07:01 Tower emhttp: ST3750330AS_5QK025CX (sdj) 732574584 Mar 18 17:07:01 Tower emhttp: SAMSUNG_HD154UI_S1Y6J1LS722609 (sdk) 1465138584 Mar 18 17:07:01 Tower emhttp: ST2000DM001-1CH164_Z340TGCX (sdl) 1953514584 Mar 18 17:07:01 Tower emhttp: SAMSUNG_HD154UI_S1Y6J1LS722618 (sdm) 1465138584 Mar 18 17:07:01 Tower emhttp: WDC_WD20EARX-22PASB0_WD-WMAZA6632512 (sdn) 1953514584 Mar 18 17:07:01 Tower emhttp: WDC_WD20EARS-00MVWB0_WD-WMAZ20231864 (sdo) 1953514584 Mar 18 17:07:01 Tower emhttp: MAXTOR_STM3500630AS_6QG0C5VJ (sdp) 488386584 Mar 18 17:07:01 Tower kernel: mdcmd (1): import 0 8,16 1953514552 WDC_WD20EARS-00MVWB0_WD-WMAZ20227263 Mar 18 17:07:01 Tower kernel: md: import disk0: [8,16] (sdb) WDC_WD20EARS-00MVWB0_WD-WMAZ20227263 size: 1953514552 Mar 18 17:07:01 Tower kernel: md: disk0 replaced Mar 18 17:07:01 Tower kernel: mdcmd (2): import 1 8,32 732574552 ST3750330AS_5QK0263X Mar 18 17:07:01 Tower kernel: md: import disk1: [8,32] (sdc) ST3750330AS_5QK0263X size: 732574552 Mar 18 17:07:01 Tower kernel: mdcmd (3): import 2 8,48 732574552 ST3750330AS_5QK00LE3 Mar 18 17:07:01 Tower kernel: md: import disk2: [8,48] (sdd) ST3750330AS_5QK00LE3 size: 732574552 Mar 18 17:07:01 Tower kernel: mdcmd (4): import 3 8,64 1953514552 WDC_WD20EURS-63S48Y0_WD-WCAZAK714963 Mar 18 17:07:01 Tower kernel: md: import disk3: [8,64] (sde) WDC_WD20EURS-63S48Y0_WD-WCAZAK714963 size: 1953514552 Mar 18 17:07:01 Tower kernel: mdcmd (5): import 4 8,80 1953514552 WDC_WD20EARS-00MVWB0_WD-WCAZA5801616 Mar 18 17:07:01 Tower kernel: md: import disk4: [8,80] (sdf) WDC_WD20EARS-00MVWB0_WD-WCAZA5801616 size: 1953514552 Mar 18 17:07:01 Tower kernel: mdcmd (6): import 5 8,96 488386552 WDC_WD5000AAKS-00YGA0_WD-WCAS82518954 Mar 18 17:07:01 Tower kernel: md: import disk5: [8,96] (sdg) WDC_WD5000AAKS-00YGA0_WD-WCAS82518954 size: 488386552 Mar 18 17:07:01 Tower kernel: mdcmd (7): import 6 8,112 1953514552 WDC_WD20EARS-00MVWB0_WD-WMAZ20274922 Mar 18 17:07:01 Tower kernel: md: import disk6: [8,112] (sdh) WDC_WD20EARS-00MVWB0_WD-WMAZ20274922 size: 1953514552 Mar 18 17:07:01 Tower kernel: mdcmd (: import 7 8,128 1953514552 WDC_WD20EARS-60MVWB0_WD-WCAZA8355188 Mar 18 17:07:01 Tower kernel: md: import disk7: [8,128] (sdi) WDC_WD20EARS-60MVWB0_WD-WCAZA8355188 size: 1953514552 Mar 18 17:07:02 Tower emhttp: shcmd (17): /usr/local/sbin/emhttp_event driver_loaded Mar 18 17:07:02 Tower kernel: mdcmd (9): import 8 8,144 732574552 ST3750330AS_5QK025CX Mar 18 17:07:02 Tower kernel: md: import disk8: [8,144] (sdj) ST3750330AS_5QK025CX size: 732574552 Mar 18 17:07:02 Tower kernel: mdcmd (10): import 9 8,160 1465138552 SAMSUNG_HD154UI_S1Y6J1LS722609 Mar 18 17:07:02 Tower kernel: md: import disk9: [8,160] (sdk) SAMSUNG_HD154UI_S1Y6J1LS722609 size: 1465138552 Mar 18 17:07:02 Tower kernel: mdcmd (11): import 10 8,176 1953514552 ST2000DM001-1CH164_Z340TGCX Mar 18 17:07:02 Tower kernel: md: import disk10: [8,176] (sdl) ST2000DM001-1CH164_Z340TGCX size: 1953514552 Mar 18 17:07:02 Tower kernel: mdcmd (12): import 11 8,192 1465138552 SAMSUNG_HD154UI_S1Y6J1LS722618 Mar 18 17:07:02 Tower kernel: md: import disk11: [8,192] (sdm) SAMSUNG_HD154UI_S1Y6J1LS722618 size: 1465138552 Mar 18 17:07:02 Tower kernel: mdcmd (13): import 12 8,208 1953514552 WDC_WD20EARX-22PASB0_WD-WMAZA6632512 Mar 18 17:07:02 Tower kernel: md: import disk12: [8,208] (sdn) WDC_WD20EARX-22PASB0_WD-WMAZA6632512 size: 1953514552 Mar 18 17:07:02 Tower kernel: mdcmd (14): import 13 8,224 1953514552 WDC_WD20EARS-00MVWB0_WD-WMAZ20231864 Mar 18 17:07:02 Tower kernel: md: import disk13: [8,224] (sdo) WDC_WD20EARS-00MVWB0_WD-WMAZ20231864 size: 1953514552 Quote Link to comment
DaleWilliams Posted March 18, 2014 Share Posted March 18, 2014 Post a full syslog...the problem is probably earlier in the boot sequence than what you posted. http://lime-technology.com/wiki/index.php/Troubleshooting#Capturing_your_syslog Quote Link to comment
etk29321 Posted March 19, 2014 Author Share Posted March 19, 2014 syslog attached syslog.txt Quote Link to comment
SSD Posted March 19, 2014 Share Posted March 19, 2014 I would recommend reaching out to Tom / Limetech. You are in a very sensitive state. If you've had a drive fail and now your system is not recognizing parity, taking the wrong step could easily mean an inability to recover. In the old days we had an undocumented feature (set invalidslot 99) to recover from these situations, but it is no longer supported based on latest info I have. Tom is you best hope of a successful outcome. Quote Link to comment
DaleWilliams Posted March 19, 2014 Share Posted March 19, 2014 In the old days we had an undocumented feature (set invalidslot 99) So that's where the magik was hidden! Quote Link to comment
Frank1940 Posted March 19, 2014 Share Posted March 19, 2014 Another possibility is that your present parity has a HPA partition on it and hence is smaller than the new disk. See this thread for further discussion: http://lime-technology.com/forum/index.php?topic=10866.0 Quote Link to comment
etk29321 Posted March 19, 2014 Author Share Posted March 19, 2014 I'm in discussion with Tom right now. He says my super.dat file appears corrupt and is sending me a procedure to fix. The strange line in your system log is this: Mar 18 17:07:01 Tower kernel: md: converting superblock version 0 to version 2 This implies that a file on your flash called 'super.dat' is corrupted. You need to restore the array configuration but since you have a disabled disk you must be very careful to follow an exact procedure because if parity gets overwritten you will not be able to recover data of the disabled disk. Quote Link to comment
SSD Posted March 19, 2014 Share Posted March 19, 2014 I'm in discussion with Tom right now. He says my super.dat file appears corrupt and is sending me a procedure to fix. The strange line in your system log is this: Mar 18 17:07:01 Tower kernel: md: converting superblock version 0 to version 2 This implies that a file on your flash called 'super.dat' is corrupted. You need to restore the array configuration but since you have a disabled disk you must be very careful to follow an exact procedure because if parity gets overwritten you will not be able to recover data of the disabled disk. Excellent! You're in good hands. Quote Link to comment
etk29321 Posted March 27, 2014 Author Share Posted March 27, 2014 Tom sorted me out and I was able to rebuild my failed drive. I did have some read errors on rebuild, but that appears to be due to my parity drive getting ready to fail too. Thankfully it hung in there just long enough to do the job. Here's what he had me do: <quote> > 1. Execute Utilities/New Config. > > 2. on Main, assign all your drives, being very careful to assign Parity and your new disk (disk10) correctly. > > 3. From a console or a telnet session type this command: > > mdcmd set invalidslot 10 > > (the 10 corresponds to disk10) > > 4. Click 'Start' on the webGui. > > What should happen now is array gets started with disk10 reconstruct in process. > > IMPORTANT: between steps 3 and 4 do NOT refresh your browser or navigate to any other pages in the webGui - just click the Start button which is already being displayed there (if you navigate to a different page or even refresh the browser after step 3 it will "cancel" the effect of that 'mdcmd' and result in your parity disk getting written - not good). </quote> Quote Link to comment
SSD Posted March 27, 2014 Share Posted March 27, 2014 Great news! Last I remembered the "set invalidslot" command didn't work. But that was early in the 5.0 betas. Great to hear it is working now!!! Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.