Sync errors after swap disable

Android · July 15, 2013

My array was 5 2TB drives with 1 assigned as parity.

Disk 3 failed last week and I had been thinking I would start upgrading to 3TB drives, looking around the wiki, I found that it was possible with a swap disable so I bought one and installed it on Friday night.

Since the 3TB would become a parity drive I knew it wasn't necessary to preclear it but I did it anyway, this took about 31 hours. I then set the 3TB to the parity slot and the old parity to the disk 3 swap and clicked copy. This took about 9 hours. Then I started the array and disk 3 began to rebuild. This took about 7 hours.

I then ran a non-correcting parity check overnight and found over 200,000,000 sync errors, this took about 11 hours. I then ran a correcting check. This has been running about 12 hours now and has slowed to a crawl.

Total Size 2,930,266,532 KB

Current 2,147,588,284 (73.3%)

Speed 10,671 KB/sec

Finish 1218 minutes

Sync Errors 48,518,590 (corrected)

Is this normal? I've never had a single sync error before and run monthly parity checks. I've had several power cuts as well and even the checks that run when it reboots have no errors. I'm starting to think there could be a problem with the new drive?

Syslog: http://pastebin.com/wcmeS9Zv

Android · July 15, 2013

I have just noticed that RC16c is the latest, I'm using RC15a. I will stop the check, update, and then run again. It seems to have only slowed down once it has gotten past the 2TB mark which is my biggest data drive. Should parity be all 0s after that?

madburg · July 15, 2013

My array was 5 2TB drives with 1 assigned as parity.

What assignment DID you have these drives in before you started this process? e.g. Disk1, Disk3, Disk5, Disk7 (Disk2,4,6,8-24 unassigned, no parity) (its understood one of the 5 was allocated for parity.)

P.S. i have a feeling you removed parity and disk#3 at the same time, but don't want to jump to conclusions.

dgaschk · July 15, 2013

Post SMART reports.

Android · July 15, 2013

I have disk 1-4 assigned + parity.

Disk 3 had a failure so after the preclear, I assigned the old parity to disk 3 and the new 3TB to parity.

Smart report for new parity (3TB): http://pastebin.com/AzQNbMUT

Smart report for old parity, now disk 3: http://pastebin.com/FxKbxa1f

Smart report for disk 1: http://pastebin.com/6kBKcdbV

Smart report for disk 2: http://pastebin.com/X6srEMaM

Smart report for disk 4: http://pastebin.com/mbFmvv0M

Android · July 15, 2013

I meant to add, I've updated now to RC16c and disabled plugins and started a new parity check. It is currently running at 88MB/s but I think it started like this before, will have to see what it is doing in the morning.

madburg · July 15, 2013

Sorry, hope I got this right, you had a failed disk and REMOVED your parity drive and assigned it in place for the failed data drive?

(hang on to your failed drive, and don't do anything to it)

Android · July 16, 2013

Disk 3 failed and so I removed it, the new drive was bigger than my parity so I had to use the new one parity and the old parity for disk 3.Unraid copied the parity data to the new drive and then rebuilt disk 3.

The new parity check has done the same as before.It has reached about 75% and slowed to 10MB/s.

since it is past 2tb all my other disks have spun down.

madburg · July 16, 2013

Thats the part i don't understand.

Are you stating you removed the failed drive (disk3) AND the parity drive (basically) at the same time, added the (new never been used) 4TB drive as parity and moved your old parity drive and assigned it as Disk3?

garycase · July 16, 2013

Thats the part i don't understand.

Are you stating you removed the failed drive (disk3) AND the parity drive (basically) at the same time, added the (new never been used) 4TB drive as parity and moved your old parity drive and assigned it as Disk3?

Yes, that's what he's saying ==> it's the NORMAL "swap/disable" process that UnRAID supports for exactly the situation he encountered. READ the documentation !! :)

FYI, the documentation for replacing a failed disk -- including the "swap/disable" option -- is here:

http://lime-technology.com/wiki/index.php/UnRAID_Manual#Replace_a_failed_disk

madburg · July 16, 2013

Yes, that's what he's saying ==> it's the NORMAL "swap/disable" process that UnRAID supports for exactly the situation he encountered. READ the documentation !! :)

FYI, the documentation for replacing a failed disk -- including the "swap/disable" option -- is here:

http://lime-technology.com/wiki/index.php/UnRAID_Manual#Replace_a_failed_disk

===>Thats what your saying<=== I am posting to the individual not you ==> HELLO <==

Is this what retired old folk with PH. D's and MVP's with 50 years experience do, and lets not forget you just added 'MOD' to your old belt recently.

garycase · July 16, 2013

You clearly didn't understand the swap/disable process ... so I was simply letting you know about it, so you can learn and help others when you encounter the same issue in the future.

And the OP already SAID he was doing this ... in his very first post he said "... I found that it was possible with a swap disable so I bought one and installed it on Friday night."

madburg · July 16, 2013

I know about swap/disable, I have never done it myself and personally question it and until I understand how that works (not just a click, do this, click do this) I am hesitant about it, but thats just me.

I want to be sure (for me or anyone else reading who might not be sure as well) the steps the individual took exactly. So I am asking him/her not you, once again.

Right now, from what we see is with a drive failed, a swap/disable was utilized (not on a health system but one with a fail drive) and a parity check generated tons of errors. So do we ask what steps he took exactly (to be sure) or do we blame swap disable immediately?

People make statements and do things wrong (not stating he did anything wrong) or they misunderstand what is to be done, it happens, it human.

Android · July 16, 2013

garycase is correct. I used the normal swap disable procedure. As far as I can tell, it worked without any problems, I believe the parity was valid before the failure. A parity check was completed about 5 days prior to the failure. We had a power cut last week so Unraid performed one when I rebooted. It had 0 sync errors. Several days later I started getting problems with the drive and so replaced it. It wasn't a total failure, just started getting unreadable sectors and having to build them from parity when I read the data.

I don't think the sync errors and extremely slow parity didn't start until the parity check got past 2TB (the size of my largest data disk). The final 1TB of the parity drive has no corresponding data drives, all 4 were spun down when I checked this morning and the parity check was carrying on. I'm wondering if this has something to do with that? It has been running about 24 hours now and still has 4 hours remaining at over 150,000,000 sync errors.

Total Size 2,930,266,532 KB

Current 2,768,955,704 (94.5%)

Speed 10,219 KB/sec

Finish 262 minutes

Sync Errors 152,082,163 (corrected)

It is only this correcting check that is slow. The non-correcting check that I ran prior to this completed in a normal time (11 hours).

garycase · July 16, 2013

Very interesting. Yours is not the only issue that's been reported recently where the space AFTER the largest data drive is resulting in very slow parity check speeds -- and a LOT of sync errors in that space (indicating the parity drive isn't cleared at that point ... but also resulting in very slow writes).

I'm going to be sure LimeTech is aware of these issues, as they MAY indicate some issue with the newest kernel, which was changed a couple versions ago.

Let the check finish -- then run another check and see if it completes in your "normal" time (11 hrs).

Android · July 16, 2013

Glad I'm not the only one. At least I know the drive is probably good.

I'll run another parity check once this one finishes and post back the results.

Android · July 19, 2013

The second parity check completed in about 10 hours with no sync errors. It seems the issue was in correcting the parity on the extra 1TB was just very slow, and it being wrong in first place.

Sync errors after swap disable

Recommended Posts

Android

Link to comment

Android

Link to comment

madburg

Link to comment

dgaschk

Link to comment

Android

Link to comment

Android

Link to comment

madburg

Link to comment

Android

Link to comment

madburg

Link to comment

garycase

Link to comment

madburg

Link to comment

garycase

Link to comment

madburg

Link to comment

Android

Link to comment

garycase

Link to comment

Android

Link to comment

Android

Link to comment

Join the conversation