Bit rot in unRAID


Recommended Posts

http://lime-technology.com/forum/index.php?topic=20612.msg182945#msg182945

 

from this message onwards in this post bit rot is discussed...

 

 

i just want to make a post where we can bring it out and discuss it, it is always something that i don't really fully understand and the implications are quite severe.

 

i know that over time no disk access causes it besides all the factors that i have read here..

 

http://en.wikipedia.org/wiki/Bit_rot

 

there are some people who use md5 check sums and some other exotic means to try avoid it,

is there anything that is easily done, as the a fore mentioned solutions are really impractical i think as they take vast amounts of time

 

this is not a comparison by any means because unRAID is unique ...  i would like to ask how zfs claims that it does not occur this problem

 

thanks to any

Link to comment

I don't really think it's an issue. It happens on CDs/DVDs/etc too, and I have CDs from the day they came out that still work like new. If I was running a server that would result in 1 byte causing my company to be sued or go bankrupt, then i'd worry about it, but in 99.99% of cases 1 corrupted byte won't even be noticeable. I've been using electronics and massive storage servers for about 20 years and i've never experienced bit rot.

 

 

Link to comment

Moderator:

Please consider merging the referenced (above, in OP) section of the other (very generic) thread into this (specific, and aptly titled) thread.

 

Thanks.

 

(This is a very worthwhile topic that should not be diluted, or worse, go unnoticed.)

 

Link to comment

this is not a comparison by any means because unRAID is unique ...  i would like to ask how zfs claims that it does not occur this problem

 

ZFS does 'end to end' checksumming of the data and metadata blocks it writes to disk.

 

If, on read back, it finds that the checksum of the block it's just read disagrees with the previously stored checksum it can attempt to fix it by either reconstructing that block using a raidz parity rebuild for that individual block (presuming this problem is occurring in a raidz pool!) or by going to another copy of the block if you have replication enabled. There may also be something it can do based on it's copy on write methodology - I don't know how long it keeps 'old' copies of data around once it's written a new version or if it even tracks this internally.

 

I'm not convinced how infallible this protection is - it still needs to be able to reconstruct the block and I would presume in the (unlikely?) event where that particular block of data has problems on multiple disks it won't be able to reconstruct. And if you're only running zfs on a single disk or collections of single disks with no sort of replication enabled or parity based recovery possible - all it can do is warn you a checksum has failed.

 

So 'having zfs' as a filesystem I don't think inherently protects you from this. You still need to be careful and appreciate there may be edge cases.

 

This is all just my understanding though, I could be very wrong.

 

I'd be more worried, personally, about bad hardware causing data corruption than over time bitrot on the disks. ZFS may not protect you from this as if the data is corrupted before it's written to the filesystem then the checksum will still be correct - just for the corrupted data.

 

In short I'd be kitting out with ECC ram and enterprise kit as a priority before I relied on ZFS to save me. Though I appreciate ZFS could (if you were planning on using it anyway) be a quick and easy 'rude not to' layer of protection. In my own experience I've never (noticeably) had any problems with this sort of corruption so I don't bother with any of it and just have a decent backup methodology in place including verification and versioning of data. But as drive densities increase and the overall amount of data I store increases I may change my approach but likely only as a future result of being bitten hard by the problem.

 

I'd be very interested in any case studies or papers where people have prodded at ZFS' recovery from bitrot.

Link to comment

I am in the business where a single bit error can be very destructive.... a single bit error can million dollar court judgments get reversed or even mean someone goes to jail... or not.  That is why MD5 and SHA-1 hashes are done on all evidence files.

 

I have processed hundreds of terabytes of data... some stored for a decade or more.... and never had a hash value change.  I have never seen, or heard of from a reliable source, of any "bit rot" from a hard drive (optical media, yes).  I've heard of corruption of a file at the file system level due to known hazards (and a diff of the two files shows it was corruption, and not bit rot), but never "bit rot" of a random bit flipping, and when you consider modern drives ECC, you realize that bit rot is not likely to manifest itself even if it happened.

 

Worrying about bit rot instead of MUCH more likely issues, is like walking around with a hard had in case some random piece of fascia falls off a building, but then crossing against the light on a busy street.

Link to comment

I am in the business where a single bit error can be very destructive.... a single bit error can million dollar court judgments get reversed or even mean someone goes to jail... or not.  That is why MD5 and SHA-1 hashes are done on all evidence files.

 

I have processed hundreds of terabytes of data... some stored for a decade or more.... and never had a hash value change.  I have never seen, or heard of from a reliable source, of any "bit rot" from a hard drive (optical media, yes).  I've heard of corruption of a file at the file system level due to known hazards (and a diff of the two files shows it was corruption, and not bit rot), but never "bit rot" of a random bit flipping, and when you consider modern drives ECC, you realize that bit rot is not likely to manifest itself even if it happened.

 

Worrying about bit rot instead of MUCH more likely issues, is like walking around with a hard had in case some random piece of fascia falls off a building, but then crossing against the light on a busy street.

With all due respect, I tend to agree... but... I can remember at least two or three instances of a specific disk drive returning no error, but a different checksum when the same set of blocks were read over the years.  They caused intermittent, seemingly random parity errors.

 

Now, there are roughly 10,000 members of the lime-tech forum... If you figure three disks per, that is representative of at least 30,000 disks.  To me, that says that one out of 10,000 disks, over its lifetime might exhibit the behavior we are concerned about.    (not every unRAID owner is a member, and most probably have more than 3 disks, but is is as good a point to guess the populations of disks we are talking about.

 

I personally think on rare occasion bits pass the ECC code when read from the disk, and pass the CRC checks when transmitted to the disk controller, but flip state while in the cache ram of the disk drive.  That is not something that is easy to detect.  If you are just playing a movie, or music, you'll likely never notice a single bit error.

 

If it were not that we perform periodic parity checks where EVERY bit of EVERY disk is read, odds of the inconsistent read from disk-cache RAM bit problems being detected are slim.  In fact, I would suspect power supply issues, or failing filter capacitors in the disk electronics for most inconsistent bit errors when read from cache RAM on the disk electronics.  (induced by noise on the power supply to the disks from other disks)  It is possible for a manufacturer to set a parity bit on it (use ECC ram?) but I've not read of any who do on the drives themselves.  I suppose that if one did, it might be a small and selective market. If you then consider most PCs only have one disk, and are less likely to experience noise from other disks simultaneously seeking, you'll understand most disks would never exhibit symptoms, even if marginal electronics are present.

 

Nothing except MD5 and SHA-1 checksums (or something similar) will detect that the file written is identical to that read.

 

Joe L.

 

Link to comment

Moderator:

Please consider merging the referenced (above, in OP) section of the other (very generic) thread into this (specific, and aptly titled) thread.

 

I apologise if my previous thread was not "aptly titled" although the conversation did go off topic :P Feel free to move the bit rot discussion to this thread, it's where it belongs.

 

I do agree that this is something that, however unlikely should be addressed but perhaps not directly as "bit rot," more as "invalid bit/s" or "bad checksums." Regardless of what causes the data to change unexpectedly, unRAID should be able to detect this change one way or another. I know this is what the parity drive does to some extent but there are plenty of cases on the forum where relying on parity can get you in to trouble (not intentionally starting the corrective vs non-corrective p/check discussion, but..)

 

Has Tom ever mentioned a filesystem change down the track? Considering Reiser is unlikely to be developed or well supported in the long term, should we really be weighing up the pros & cons of a new FS?

Link to comment

There was talks of unRAID being filesystem agnostic by opening things up to allow the user to select whatever filesystem they'd want to use. I don't know if that was strictly for the cache drive, but I imagine there's some practical restrictions for drives in the array. The likely restriction needing to be the filesystem supporting the ability to grow with the disk in the instances where someone is rebuilding a failed disk onto a larger replacement disk.

Link to comment

Has Tom ever mentioned a filesystem change down the track? Considering Reiser is unlikely to be developed or well supported in the long term, should we really be weighing up the pros & cons of a new FS?

He talked about why he choose reiserfs... and stated reasons why others were not as easy to use.

 

The functionality required is:

ability to be re-sized in place.

stable

 

The desired functionality is:

no need to specify number of "inodes" up front (number of directory entries)

reasonable performance with large files.

not wasteful of space.

journaling, to prevent data loss if power loss, etc.

 

As far as reiserfs... vs. another...    I've not had experience attempting to recover other file-systems when a user accidentally clobbered them, but it is amazing how much can be recovered by reiserfsck.  For that reason alone, I see no reason to replace it.    It just needs to store the files.  When unRAID was developed initially, there was no read-write NTFS driver.  The NTFS drive ware read-only.  Today, I would guess it would have been an alternative choice as there is an ntfs-3g driver that would probably work as well.

 

As far as bit rot...  if there is a read error on a disk, unRAID (in combination with SMART firmware on the disks) is somewhat self-repairing.  The un-readable sector is re-constructed from the other disks, then sent to the process reading it, and ALSO re-written to the disk where the read failed.  The SMART firmware can then re-allocate the sector if needed.  This does not solve the issue with bits flipping in RAM, but does handle the far more common mechanical failures of disk platters.

Link to comment

I suppose bad sectors could be considered bit rot as it's defined.

 

I have seen hardware issues posted here that caused bad data, but that wasn't really bitrot, or bad data caused by the HDD platter deteriorating over time. The bad data was caused by bad electronics not processing or transferring the bits correctly.

 

All the parity error isues posted seem to be are caused by things like the previous paraghraph or were caused by things like the upgrade bug or things like hard-powering the server off. People with well-built stable servers don't seem to have any issues with an unexplained parity error or two just randomly popping up every so often.

 

The other linked thread is wrong about RAID5. RAID5 can reconstruct from a bit error, since the data is stored in stripes, not disk by disk, and the stripe can be reconstructed. unRAID with the disk by disk protection could not reconstruct the error.

 

 

Link to comment

When I wrote "This is a very worthwhile topic ..."

I didn't mean that it was a common problem, and everybody had a good chance of being bitten. (But, it is grossly misunderstood by the vast majority.)

 

... That is why MD5 and SHA-1 hashes are done on all evidence files.

 

I have processed hundreds of terabytes of data... some stored for a decade or more.... and never had a hash value change.  I have never seen, or heard of from a reliable source, of any "bit rot" from a hard drive (optical media, yes).  I've heard of corruption of a file at the file system level due to known hazards (and a diff of the two files shows it was corruption, and not bit rot), but never "bit rot" of a random bit flipping, and when you consider modern drives ECC, you realize that bit rot is not likely to manifest itself even if it happened.

You and I are in a very small minority. (I've been maintaining MD5s on all my media files [~12TB] for the last 5 years.) I haven't had any bit rot either. But, I have caught errors, following disk-to-disk, and network, copying of file(s). For the vast majority of users, without a means of verifying the integrity of the destination copy, those errors would typically go undetected; if/when such an error does eventually get detected, it gets "blamed" on the disk drive it was read from, and incorrectly categorized as "bit rot".

 

But ... every (current era) disk drive has bit rot! It is unavoidable, and it was planned for in the design of the drive and its firmware. In almost every occurrence, the firmware/ECC detects AND corrects the "bit rot" and it never manifests in the real world. In those rare, but (wince!) dreaded cases where firmware/ECC detects but CAN NOT correct the error (serious rot), the drive issues the UCE (UnCorrectableError). Before issuing the UCE, the drive will make several (10+) attempts to get a correctable read. And, the kernel driver will make a couple of re-tries on the UCE it does get.

 

The super-elusive (and, maybe, mythologically apocryphal) case is where the "bit rot" is such, that, when ECC is applied to it, it (appears to) correct it, but actually produces data different from the original.  [in NerdSpeak: a "Carl Sagan" "hash collision".] Probably as likely as three albinos, on separate continents, each winning their nation's lottery on the same day.

 

To [begin to] appreciate the complexities in modern disk drives (magnetic recording; not SSD), consider that there are  several hundred data tracks packed (onto each surface) within the width of a single human hair!!  (... my head hurts :) ...) That's about 200-300K tracks per surface on today's 3.5" drive. [Historical note: 40 years ago, there were about 400 tracks per surface on a 14" drive! [and only 10 sectors per track] (my first Unix driver :))]

 

 

 

Link to comment

[Historical note: 40 years ago, there were about 400 tracks per surface on a 14" drive! [and only 10 sectors per track] (my first Unix driver :))]

And you could sprinkle magnetic developer powder on the disk and read the tracks and bit patterns with a magnifying glass... 

http://en.wikipedia.org/wiki/Magnetic_developer

 

Those days are long gone.  Today you need far better tools.  (If your eyes are still good enough  :o)

Link to comment

Unraid has exacty the same capabilities as raid 5

Raid5 has no ability to repair that umRAID does not have.

 

Well, I should have read up on RAID5 again. RAID5 doesn't use the parity data when reading back a disk so it wouldn't automatically recover from bit rot. However, if it did use the parity on reads it could recover from bit rot. I wonder it any of the better controllers could be set to check parity on reads?

 

Link to comment

Unraid has exacty the same capabilities as raid 5

Raid5 has no ability to repair that umRAID does not have.

 

Well, I should have read up on RAID5 again. RAID5 doesn't use the parity data when reading back a disk so it wouldn't automatically recover from bit rot. However, if it did use the parity on reads it could recover from bit rot. I wonder it any of the better controllers could be set to check parity on reads?

It does not matter.  With only a single parity calculation it is impossible to determine which disk is in error.    In it simplest form, lets say you have a raid array with 5 disks.  You read from all 5 + parity and discover that parity is wrong on one byte.  Which disk has the error....  or is it more than one?  In fact, if TWO disks have their bits flipped, parity is then correct, but the data is wrong.  But what if one was "1" and the other "0"  ... which was which...

 

No single parity disk RAID can deal with correcting a bit error.  All they can do is detect a single bit error, and then only if ALL the disks are read  for every byte accessed.

 

Now, you can add additional bits to then have the ability to correct the detect AND correct the bad bit when you read at a word level.  The first "computer" I worked on had that capability. 

 

It had a 40 bit wide data word, with 7additional hamming and parity bits.  (the memory bus was 47 bits wide) The extra bits gave it the ability to detect and correct any single bit error at the word level, on the fly, and to detect, but (unfortunately) not correct any double bit error. (It would not know how to set them as multiple bit combination possibilities would fix the hamming and parity.) 

 

Joe L.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.