Sign in to follow this  
go69cars

Parity rebuild incredibly slow

6 posts in this topic Last Reply

Recommended Posts

My problem started with my USB stick crapping out about a week ago.   I got that fixed using a 2 week old backup I had made.  When it first came back on line it attempted to run a parity check and it was running at about 700 KBps.   I tried rebooting a couple times and stopping and starting the array and nothing made it faster 

 

Since then I’ve tried different things.   I tried to take one of the parity drives off line and just use one for the initial parity check.   In the process of doing that it reported that disk 8 was unmountabke and needed formatting.  I attempted to format it a few times but each time it would format for some time and stop.  Each time I came back to the computer it would show that disk 8 was unmountable and ask me to format it.   

 

I decided to remove disk 8 from the array temporarily.   With the 2nd parity disk and disk 8 removed the array was running well and wasn’t wanting to do a parity check or rebuild.   I added the 2nd parity disk back online and it did a parity rebuild in that disk which took the usual 30 hours  (8 TB WD Red).   While that was running I also did a preclear on disk 8.  This ran at 150 MBps through the whole process.   

 

Now ive tried to add Disk 8 back into the array.   Parity rebuild is incredibly slow.  Running at the 600-700 KBps as it did initially.   When it first starts the speed briefly shoots up to 20-30 MBps and then goes back down to 0.  After that it will shoot up to 1.5-2 MBps and then drop back down to 0 again.  It’s like something is holding it back from going any faster

 

I did SMART check on the discs and can’t find a bad one.   I put the server into maintenance mode and checked the file system in each disk.   I couldn’t find any errors,  but to be honest I don’t know what I am looking for there. I tried shutting off all my dockers and that made no difference.   

 

My system is running UnRaid 6.4.  It has a new i5 8400 on an ASRock MB.  It has a mix of 8 and 4 TB WD Red disks.   Disk 8 and the parity’s are all 8s.  I have 2 LSI Raid controllers to hook up the disks which are connected to the 4-1 breakout cables.   There are 13 hard drives total.  It has 3 Cache drives (240, 120, 120).  

 

I'm really lost as to what could be causing this issue.    I’d like some guidance on what to do next

 

im my phone now.  I’ll post the logs when I get home

 

Share this post


Link to post
23 minutes ago, go69cars said:

I’d like some guidance on what to do next

 

You ought to have asked earlier. Post your diagnostics zip.

Share this post


Link to post

I hope you didn't have anything important on disk8 since you formatted it. Some people have made the mistake of formatting a disk in the array and then expected parity to get their data back. It won't.

Share this post


Link to post

Here is the diagnostics zip file

 

Fortunately I have irreplaceable files backed up for something just like this

 

I'm a bit confused what the array is writing to Disk 8 then if it's not trying to rebuild it.    

tower-diagnostics-20180414-2130.zip

Share this post


Link to post
5 hours ago, go69cars said:

m a bit confused what the array is writing to Disk 8 then if it's not trying to rebuild it.


If the system saw you format disk8 then the system will try to restore a formatted disk8 - a disk with zero files since the format command overwrites the index of what is stored on the disk. Format is - by definition - a lossy operation intended to be performed before you start storing files on the disk.

Share this post


Link to post

Constant timeout errors on two disks (disk8 and parity2):

 

Apr 14 04:41:12 Tower kernel: sd 8:0:1:0: task abort: SUCCESS scmd(ffff880351682148)
Apr 14 04:41:44 Tower kernel: sd 8:0:1:0: attempting task abort! scmd(ffff8800590f9148)
Apr 14 04:41:44 Tower kernel: sd 8:0:1:0: [sdk] tag#5 CDB: opcode=0x88 88 00 00 00 00 00 00 9b bf 08 00 00 04 00 00 00
Apr 14 04:41:44 Tower kernel: scsi target8:0:1: handle(0x0009), sas_address(0x4433221100000000), phy(0)
Apr 14 04:41:44 Tower kernel: scsi target8:0:1: enclosure_logical_id(0x500605b0060ad510), slot(0)
Apr 14 04:41:44 Tower kernel: sd 8:0:1:0: task abort: SUCCESS scmd(ffff8800590f9148)
Apr 14 04:41:45 Tower kernel: mpt2sas_cm1: log_info(0x31120100): originator(PL), code(0x12), sub_code(0x0100)
Apr 14 04:41:45 Tower kernel: mpt2sas_cm1: log_info(0x31120100): originator(PL), code(0x12), sub_code(0x0100)
Apr 14 04:41:57 Tower kernel: mpt2sas_cm1: log_info(0x31120100): originator(PL), code(0x12), sub_code(0x0100)
Apr 14 04:42:28 Tower kernel: sd 8:0:4:0: attempting task abort! scmd(ffff88037f3f5d48)
Apr 14 04:42:28 Tower kernel: sd 8:0:4:0: [sdn] tag#7 CDB: opcode=0x8a 8a 00 00 00 00 00 00 9c b3 10 00 00 04 00 00 00
Apr 14 04:42:28 Tower kernel: scsi target8:0:4: handle(0x0010), sas_address(0x4433221103000000), phy(3)
Apr 14 04:42:28 Tower kernel: scsi target8:0:4: enclosure_logical_id(0x500605b0060ad510), slot(3)
Apr 14 04:42:28 Tower kernel: sd 8:0:4:0: task abort: SUCCESS scmd(ffff88037f3f5d48)
Apr 14 04:42:59 Tower kernel: sd 8:0:1:0: attempting task abort! scmd(ffff880059193548)
Apr 14 04:42:59 Tower kernel: sd 8:0:1:0: [sdk] tag#1 CDB: opcode=0x88 88 00 00 00 00 00 00 9d 77 18 00 00 04 00 00 00

Check cables, both power and SATA or enclosure.

 

 

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  


Copyright © 2005-2018 Lime Technology, Inc.
unRAID® is a registered trademark of Lime Technology, Inc.