Let's talk about Raid-6


Recommended Posts

Agree ... I think an automatic failover would be relatively easy to implement, but may in fact result in some drive replacements that weren't actually necessary.    On the other hand, I suspect that many of those "failures that aren't really failures" are due to movement of the server; changes where cables have been plugged/unplugged/bumped; etc. ... so in normal operations a failure is indeed likely a failure -- and as noted, it would have a very nice WAF  :)

 

In general I am not a big fan of hot spares and auto rebuilds. The causes of red balls are often not related to a failed drive, and a rebuild might do more harm than good. Heat issues could also cause a real failure and the last thing you'd want to do is start s rebuild when your HVAC went out on a hot summer day. In general I would prefer that the human decide how to recover then asking the computer.

 

Dual parity would make the hot spares less valuable, because even one disk down, you are still protected from a second failure.

Link to comment
  • Replies 92
  • Created
  • Last Reply

Top Posters In This Topic

Certainly agree that with dual parity the need to initiate an immediate rebuild is reduced.    It's still important to do it as soon as possible -- but with notifications the user will know about it, so having a rebuild start automatically isn't as critical.    Remember, however, that the primary reason for dual parity is to be able to sustain a failure during a rebuild -- NOT to eliminate the need to do a rebuild just because you've only had one failure  :)

 

r.e. your other question => not sure how the Linux driver is implemented, so don't know how easy it is to do; but ANY of the techniques that enable dual fault tolerance allow computationally isolating the exact location of a fault.  I'll leave it to Tom to answer just how easy that is with the Linux implementation.

 

Link to comment

I spent the weekend studying the math behind Raid-6 - learning Galois field algebra: now that's an exiting Saturday night!  It turns out there's no reason all data devices have to be spun up after all to recompute Q, and in fact read/modify/write of target data disk, P and Q disks should indeed be possible (even desirable in some cases).

 

In looking at linux md-layer again, yes indeed they do force reconstruct-write.  Time to start googling... It turns out there have been efforts in the past to introduce rmw for raid-6 "small write", even patch sets which didn't get merged.  Then, lo and behold, what got added in 4.1 kernel release?  That's right, rmw operation for raid-6!

 

Bottom line is that we should be able to incorporate this into the unraid flavor of md-driver.  It won't be a quick-and-dirty change however, since a lot of other code needs to be P+Q aware (such as user interface).

Very good news!

Link to comment

I spent the weekend studying the math behind Raid-6 - learning Galois field algebra: now that's an exiting Saturday night!  It turns out there's no reason all data devices have to be spun up after all to recompute Q, and in fact read/modify/write of target data disk, P and Q disks should indeed be possible (even desirable in some cases).

 

In looking at linux md-layer again, yes indeed they do force reconstruct-write.  Time to start googling... It turns out there have been efforts in the past to introduce rmw for raid-6 "small write", even patch sets which didn't get merged.  Then, lo and behold, what got added in 4.1 kernel release?  That's right, rmw operation for raid-6!

 

Bottom line is that we should be able to incorporate this into the unraid flavor of md-driver.  It won't be a quick-and-dirty change however, since a lot of other code needs to be P+Q aware (such as user interface).

 

 

You're my Hero !!!

Thanks!!!

Link to comment

Ok Tom => You have us all drooling now ... so what kind of timeline should we anticipate for this ??  :) :)

 

No dates please ... just an indication of how much effort this entails & whether this is relatively near-term (a 6.0 enhancement)  or a longer term project (v7).  i.e. this year, next year, or "whenever"  8)

 

Link to comment

It seems that unRaid "droolers" are in for a long haul.  Just when I thought I could replenish my drool, here you go again.....

 

I did say "No dates please" ==> don't want to get expectations too high.    But it would be nice to know in general whether this is a this year, next year, or even longer likelihood.

 

Link to comment
Also triple-redundancy, P+Q+ I guess R?

 

I was just about to ask the question as to how many parity disks could be configured with this technology.

 

Edit to add:  Also, how many 'parity' drives would be needed to identify which drive has failed?  One concern has always been to determine whether a parity error is due to a data drive corruption or a parity drive corruption - 'standard' memory ecc uses 8+3, doesn't it?  Do more recent methods reduce the ratio of parity/data bits?

Link to comment

Also triple-redundancy, P+Q+ I guess R?

 

I was just about to ask the question as to how many parity disks could be configured with this technology.

 

Edit to add:  Also, how many 'parity' drives would be needed to identify which drive has failed?  One concern has always been to determine whether a parity error is due to a data drive corruption or a parity drive corruption - 'standard' memory ecc uses 8+3, doesn't it?  Do more recent methods reduce the ratio of parity/data bits?

 

As long as number of data + check disks < 256 you can have as many as you like but the math gets pretty hairy past P+Q.

 

Here's another paper:

http://web.eecs.utk.edu/~plank/plank/papers/CS-96-332.pdf

 

A good bedtime read...

Link to comment

Reed-Solomon codes can indeed be pretty computationally intense.  That's why RAID controllers that use them have hardware encoding chips that offload all of the RAID computations from the CPU.    You can do dual-fault tolerance computations using nothing more than exclusive-ORs, which, as you know, require very little CPU overhead.

 

I'd think dual failure tolerance is plenty => all we need is protection against a 2nd failure during a drive rebuild.    We're not trying to build high-availability systems with 4 or 5 9's  :)

Link to comment

I doubt the P+Q parity is going to be computation bound with a respectable CPU. It seems to be optimized to use XORs which are fast and lightweight. It's adding the 3rd that I expect would create a sharp performance drop.

 

I am hopeful that Tom will be more gentle with the Linux md driver mods so as not to destroy its ability to create normal Linux RAID arrays. For example, would be nice to be able to define a RAID1 pair (without btrfs) or even a RAID5 for fast writes on a small striped array. Right now takes a hardware RAID card to mix RAID and unRAID.

Link to comment

I doubt the P+Q parity is going to be computation bound with a respectable CPU. It seems to be optimized to use XORs which are fast and lightweight. It's adding the 3rd that I expect would create a sharp performance drop.

 

I am hopeful that Tom will be more gentle with the Linux md driver mods so as not to destroy its ability to create normal Linux RAID arrays. For example, would be nice to be able to define a RAID1 pair (without btrfs) or even a RAID5 for fast writes on a small striped array. Right now takes a hardware RAID card to mix RAID and unRAID.

 

Thats already been killed long ago and quite brutally. Currently unraid md code completely replaces Linux md driver, so it's one or the other but not both.  I suspect he'd have to completely change the driver name from md for unraid to something else to have any hopes of restoring Linux MD functionality. This is one reason why BTRFS was brought to the table,  since it implements raid1/raid0 differently than using "Linux md"

Link to comment

I doubt the P+Q parity is going to be computation bound with a respectable CPU. It seems to be optimized to use XORs which are fast and lightweight. It's adding the 3rd that I expect would create a sharp performance drop.

 

I am hopeful that Tom will be more gentle with the Linux md driver mods so as not to destroy its ability to create normal Linux RAID arrays. For example, would be nice to be able to define a RAID1 pair (without btrfs) or even a RAID5 for fast writes on a small striped array. Right now takes a hardware RAID card to mix RAID and unRAID.

 

Thats already been killed long ago and quite brutally. Currently unraid md code completely replaces Linux md driver, so it's one or the other but not both.  I suspect he'd have to completely change the driver name from md for unraid to something else to have any hopes of restoring Linux MD functionality. This is one reason why BTRFS was brought to the table,  since it implements raid1/raid0 differently than using "Linux md"

 

You may well be right, but Tom has had close to 10.years to consider his approach, and if anyone could come up with a gentler way, it would be him. But if, as you believe, maintaining the default functionality is difficult, I offer no objection to.butchering it again. :)

Link to comment

Reed-Solomon codes can indeed be pretty computationally intense. 

I'm just waiting for the 'Yeah but'

 

 

I doubt the P+Q parity is going to be computation bound with a respectable CPU.

 

I should have been more explicit => I was referring to the math involved for additional levels of fault tolerance beyond 2 failures.    P + Q can be achieved with nothing more than XOR's => not at all computationally intense.

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.