Jump to content

Sync errors after swap disable


Recommended Posts

My array was 5 2TB drives with 1 assigned as parity.

 

Disk 3 failed last week and I had been thinking I would start upgrading to 3TB drives, looking around the wiki, I found that it was possible with a swap disable so I bought one and installed it on Friday night.

 

Since the 3TB would become a parity drive I knew it wasn't necessary to preclear it but I did it anyway, this took about 31 hours. I then set the 3TB to the parity slot and the old parity to the disk 3 swap and clicked copy. This took about 9 hours. Then I started the array and disk 3 began to rebuild. This took about 7 hours.

 

I then ran a non-correcting parity check overnight and found over 200,000,000 sync errors, this took about 11 hours. I then ran a correcting check. This has been running about 12 hours now and has slowed to a crawl.

 

Total Size 2,930,266,532 KB

Current 2,147,588,284 (73.3%)

Speed 10,671 KB/sec

Finish 1218 minutes

Sync Errors 48,518,590 (corrected)

 

Is this normal? I've never had a single sync error before and run monthly parity checks. I've had several power cuts as well and even the checks that run when it reboots have no errors. I'm starting to think there could be a problem with the new drive?

 

Syslog: http://pastebin.com/wcmeS9Zv

Link to comment

I have just noticed that RC16c is the latest, I'm using RC15a. I will stop the check, update, and then run again. It seems to have only slowed down once it has gotten past the 2TB mark which is my biggest data drive. Should parity be all 0s after that?

Link to comment

My array was 5 2TB drives with 1 assigned as parity.

 

What assignment DID you have these drives in before you started this process? e.g. Disk1, Disk3, Disk5, Disk7 (Disk2,4,6,8-24 unassigned, no parity) (its understood one of the 5 was allocated for parity.)

 

P.S. i have a feeling you removed parity and disk#3 at the same time, but don't want to jump to conclusions.

Link to comment

I have disk 1-4 assigned + parity.

 

Disk 3 had a failure so after the preclear, I assigned the old parity to disk 3 and the new 3TB to parity.

 

Smart report for new parity (3TB): http://pastebin.com/AzQNbMUT

Smart report for old parity, now disk 3: http://pastebin.com/FxKbxa1f

Smart report for disk 1: http://pastebin.com/6kBKcdbV

Smart report for disk 2: http://pastebin.com/X6srEMaM

Smart report for disk 4: http://pastebin.com/mbFmvv0M

 

Link to comment

Disk 3 failed and so I removed it, the new drive was bigger than my parity so I had to use the new one parity and the old parity for disk 3.Unraid copied the parity data to the new drive and then rebuilt disk 3.

 

 

The new parity check has done the same as before.It has reached about 75% and slowed to 10MB/s.

since it is past 2tb all my other disks have spun down.

 

 

Link to comment

Thats the part i don't understand.

 

Are you stating you removed the failed drive (disk3) AND the parity drive (basically) at the same time, added the (new never been used) 4TB drive as parity and moved your old parity drive and assigned it as Disk3?

 

Yes, that's what he's saying ==> it's the NORMAL "swap/disable" process that UnRAID supports for exactly the situation he encountered.  READ the documentation !!  :) :)

 

FYI, the documentation for replacing a failed disk -- including the "swap/disable" option -- is here:

http://lime-technology.com/wiki/index.php/UnRAID_Manual#Replace_a_failed_disk

 

Link to comment

Yes, that's what he's saying ==> it's the NORMAL "swap/disable" process that UnRAID supports for exactly the situation he encountered.  READ the documentation !!  :) :)

 

FYI, the documentation for replacing a failed disk -- including the "swap/disable" option -- is here:

http://lime-technology.com/wiki/index.php/UnRAID_Manual#Replace_a_failed_disk

===>Thats what your saying<=== I am posting to the individual not you ==> HELLO <==

Is this what retired old folk with PH. D's and MVP's with 50 years experience do, and lets not forget you just added 'MOD' to your old belt recently.

Link to comment

You clearly didn't understand the swap/disable process ... so I was simply letting you know about it, so you can learn and help others when you encounter the same issue in the future.

 

And the OP already SAID he was doing this ... in his very first post he said "...  I found that it was possible with a swap disable so I bought one and installed it on Friday night."

 

 

Link to comment

I know about swap/disable, I have never done it myself and personally question it and until I understand how that works (not just a click, do this, click do this) I am hesitant about it, but thats just me.

 

I want to be sure (for me or anyone else reading who might not be sure as well) the steps the individual took exactly. So I am asking him/her not you, once again.

 

Right now, from what we see is with a drive failed, a swap/disable was utilized (not on a health system but one with a fail drive) and a parity check generated tons of errors. So do we ask what steps he took exactly (to be sure) or do we blame swap disable immediately?

 

People make statements and do things wrong (not stating he did anything wrong) or they misunderstand what is to be done, it happens, it human.

Link to comment

garycase is correct. I used the normal swap disable procedure. As far as I can tell, it worked without any problems, I believe the parity was valid before the failure. A parity check was completed about 5 days prior to the failure. We had a power cut last week so Unraid performed one when I rebooted. It had 0 sync errors. Several days later I started getting problems with the drive and so replaced it. It wasn't a total failure, just started getting unreadable sectors and having to build them from parity when I read the data.

 

I don't think the sync errors and extremely slow parity didn't start until the parity check got past 2TB (the size of my largest data disk). The final 1TB of the parity drive has no corresponding data drives, all 4 were spun down when I checked this morning and the parity check was carrying on. I'm wondering if this has something to do with that? It has been running about 24 hours now and still has 4 hours remaining at over 150,000,000 sync errors.

 

Total Size 2,930,266,532 KB

Current 2,768,955,704 (94.5%)

Speed 10,219 KB/sec

Finish 262 minutes

Sync Errors 152,082,163 (corrected)

 

It is only this correcting check that is slow. The non-correcting check that I ran prior to this completed in a normal time (11 hours).

Link to comment

Very interesting.  Yours is not the only issue that's been reported recently where the space AFTER the largest data drive is resulting in very slow parity check speeds -- and a LOT of sync errors in that space (indicating the parity drive isn't cleared at that point ... but also resulting in very slow writes).

 

I'm going to be sure LimeTech is aware of these issues, as they MAY indicate some issue with the newest kernel, which was changed a couple versions ago.

 

Let the check finish -- then run another check and see if it completes in your "normal" time (11 hrs).

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...