Sign in to follow this  
NGMK

Parity Check Finish 1 Errors

8 posts in this topic Last Reply

Recommended Posts

I built this unraid server about 2 months ago. Its been working fine for the most part. Since the whole house is backed up by a set of Tesla Powerwalls I do not have the server on a UPS backup. The other day tesla was conducting some repairs and shut off the whole system down without allowing me time to properly shutdown the server.

 

When power was restore and the server came back online it automatically started a parity check, 22 hrs later it gave out a warning that 1 error was found Parity check completed. I always have the server to auto correct errors on scheduled parity checks (once a month) but I don't know if this setting was enforced so I decided to manually start a parity check and this time made sure that the fix parity errors setting was checked. 

 

Just now the second parity check finished and gave me this notification:  What should I do next?  

unRAID Parity check: 12-07-2018 17:10
Notice [POSEIDON] - Parity check finished (1 errors)
Duration: 22 hours, 45 minutes, 18 seconds. Average speed: 146.5 MB/s
 
 
 
 

Share this post


Link to post
2 hours ago, jonathanm said:

Another non-correcting check.

unRAID Parity check: 12-07-2018 20:19
Notice [POSEIDON] - Parity check started
Size: 12.0 TB

Share this post


Link to post

The automatic parity check in case of an unclean shutdown is always non-correct, if one starts and finds one or more errors might as well cancel it and start a correcting check.

Share this post


Link to post
11 hours ago, NGMK said:

I always have the server to auto correct errors on scheduled parity checks (once a month)

 

Be careful about using correcting parity except when adding new disks or after a power loss. It's required to correct the parity after a power loss because you normally always have a couple of blocks with wrong parity caused by all the data disks not being properly unmounted.


But for an already running system that is expected to have correct parity, auto-repair may potentially destroy a valid parity because of a data disk goofing and silently reading out wrong data. When the parity computation detects a difference, unRAID doesn't know why there is a difference. It isn't possible to know which of all the disks that have read out a value that doesn't agree with the content of all the other drives. That's the danger with silent errors on RAID systems. When a disk fails and stops being able to read out data, then it's easy for the system to figure out that the data from all other disks can be used to recompute the data from the problem disk. But with a silent error, any disk may be at fault.

 

So a parity error for a fully working system means that you want to be able to sit down and analyze everything carefully to see if the error is repeatable or if it was a single transfer error. With an automatic parity repair, you don't get this chance because the system will then always assume all the data disks are correct and that it's the parity drive that should be rewritten.

  • Like 2

Share this post


Link to post
5 minutes ago, pwm said:

Be careful about using correcting parity except when adding new disks or after a power loss

Good point, forget to mention that, scheduled parity checks should always be non correct.

Share this post


Link to post
5 hours ago, pwm said:

 

Be careful about using correcting parity except when adding new disks or after a power loss. It's required to correct the parity after a power loss because you normally always have a couple of blocks with wrong parity caused by all the data disks not being properly unmounted.


But for an already running system that is expected to have correct parity, auto-repair may potentially destroy a valid parity because of a data disk goofing and silently reading out wrong data. When the parity computation detects a difference, unRAID doesn't know why there is a difference. It isn't possible to know which of all the disks that have read out a value that doesn't agree with the content of all the other drives. That's the danger with silent errors on RAID systems. When a disk fails and stops being able to read out data, then it's easy for the system to figure out that the data from all other disks can be used to recompute the data from the problem disk. But with a silent error, any disk may be at fault.

  

So a parity error for a fully working system means that you want to be able to sit down and analyze everything carefully to see if the error is repeatable or if it was a single transfer error. With an automatic parity repair, you don't get this chance because the system will then always assume all the data disks are correct and that it's the parity drive that should be rewritten.

Thanks for all this information, unraid should give out at least a warning of the danger of doing error correcting parity checks specially during the scheduled ones.  

 

My array disk composition is as following, Parity  drive 12TB Ironwolf Pro, all the data disks are WD Red 8TB (all shuck from WD easystore bestbuy drives), so far the current position on the check is 8.2TB with no error this far yet reported, does this means that any errors encountered  from this point on solely on the parity Disk?   What is the advantage of continuing with the parity check from beyond the initial 8TB position?

Share this post


Link to post
4 hours ago, NGMK said:

What is the advantage of continuing with the parity check from beyond the initial 8TB position?

 

You should have a routine where all surface of all disks are read end-to-end regularly. Because it's only when the disk tries to read the individual sectors that the disk can detect problems with the surface or with locking on to the servo information that describes the location of the tracks and sectors.

 

Most data loss in traditional RAID systems (except user errors like file overwrites or accidental deletes) is caused by people not having scheduled testing of the drives. So as long as they don't get a read error when trying to view a film or opening a document, they don't know the state of the drives. So when they finally get a read error, they may already have multiple disks with errors - and not enough parity data to recover.

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  


Copyright © 2005-2018 Lime Technology, Inc.
unRAID® is a registered trademark of Lime Technology, Inc.