Ultra Low Power 24-Bay Server - Thoughts on Build?


Pauven

Recommended Posts

Interesting thread, but didn't seem to really conclude anything (except perhaps that SimpleFeatures can cause slow parity checks when used on certain releases of UnRAID).

 

I suppose one thing it DID do is confirm that others are having similar issues (so at least you have company).

 

One interesting point:  The new LimeTech server uses an AOC-SASLP-MV8 card, and all 8 of its ports are in use if the server is fully populated with 14 drives.    I suspect this means that Tom will soon be at least somewhat focused on this slowdown issue !!    Doesn't mean it will be SOLVED .. but at least it should get some focused attention  :)

 

Link to comment
  • Replies 307
  • Created
  • Last Reply

Top Posters In This Topic

One interesting point:  The new LimeTech server uses an AOC-SASLP-MV8 card, and all 8 of its ports are in use if the server is fully populated with 14 drives.    I suspect this means that Tom will soon be at least somewhat focused on this slowdown issue !!    Doesn't mean it will be SOLVED .. but at least it should get some focused attention  :)

 

The AOC-SASLP-MV8 card is based upon the older Marvell 6480 host controller.  Luckily it uses the same mvsas driver, but not exactly an apples to apples situation.

 

Since the mvsas driver is used, then I'm hopeful that Tom might focus some attention on it.

 

Of course, the server specs have this little disclaimer at the bottom: 

Specifications are subject to change. Lime Technology reserves the right to upgrade and/or substitute components according to availability.

 

Tom could simply give up and try a different controller.

 

I agree with your assessment of the other thread, though many people did point out that they had issues even without Simple Features, so it should be obvious to Tom that the core issue seems to linked to the Linux kernel, and possibly the mvsas driver, in each unRAID build.

Link to comment

You might be interested in the response I got in this thread r.e. parity check times:

http://lime-technology.com/forum/index.php?topic=27664.msg243965#msg243965

 

Two things of note:  (a) Populating the AOC-SAS2LP-MV8 had the same impact you've seen on significantly degraded parity check speeds; and (b) the parity check speeds jump significantly once the check gets past the 2TB boundary (i.e. all of the 2TB drives are no longer "in play")

 

This could be because the 2TB drives are lower density and thus much lower speed;  or it could be that once it's past that point there are far fewer drives actually active on the interface.    In any event, it begs the question of what happens to your performance as you pass the 1.5TB boundary in a check (when all your 1.5TB drives are no longer in play); and then again as you cross the 2TB boundary, when you'll then have only 3TB Reds "in play" ???

 

Link to comment

mvsas driver version 0.8.16 - has at least been around since 2011 Sep 28.

 

seems that there are a few issues that seem to have occured at the 0.8.16 driver rev...  this is possibly somewhat relevent info... ?

 

http://comments.gmane.org/gmane.linux.scsi/79430

 

(same informatioon is replicated on multiple mirrors...)

 

(also other reported issues, also seemingly widespread at same rev - needing the driver early loading fix...)

 

possibly see also :  https://bugzilla.kernel.org/show_bug.cgi?id=51881

 

then there was this: https://patchwork.kernel.org/patch/1984091/

 

and this: https://patchwork.kernel.org/patch/1989901/

 

and from Marvell... not sure if most current...

https://git.ipxe.org/mirror/scst/.git/blob_plain?f=mvsas_tgt/README

 

Link to comment

Two things of note:  (a) Populating the AOC-SAS2LP-MV8 had the same impact you've seen on significantly degraded parity check speeds; and (b) the parity check speeds jump significantly once the check gets past the 2TB boundary (i.e. all of the 2TB drives are no longer "in play")

 

garycase thank you for the info.  I theorize that parity speeds improve beyond 1.5TB and 2TB for three reasons:

    A) Smaller drives on the slow inner cylinders drop out of the mix (well documented phenomenon)

    B) The # of drives a controller sees is reduced, reducing overall workload

    C) The total volume of data on the system bus is reduced as drives drop out of the mix

 

Part of my theory is that, for any given SATA/SAS controller, it is capable of max work Z.  That maximum amount of work can be divided between actual throughput X, and overhead Y, but X+Y can never be greater than Z.  In different kernel and driver revisions, some code changes are causing additional overhead, taking away max throughput.  Some of these code changes have been to address critical bugs (well noted in the mvsas drivers), and may very well be required for stable operation.  It's also possible that the overhead code is inefficient, stealing away processing cycles from throughput to implement the the overhead code.

 

This thread gives an excellent way to test for controller throughput saturation outside of unRAID:  http://lime-technology.com/forum/index.php?topic=25676.0

 

That thread gets really interesting on the third page.  They are using the dskt.sh script to call hdparm and test both hd maximum speed individually (with the X parameter), and controller max speed by testing all hd on that controller at the same time (without the X parameter).

 

The test script is written by UhClem and available here:  http://lime-technology.com/forum/index.php?topic=17851.0

 

I am going to perform these tests now to see if there is a trend.

 

Link to comment

I theorize that parity speeds improve beyond 1.5TB and 2TB for three reasons:

    A) Smaller drives on the slow inner cylinders drop out of the mix (well documented phenomenon)

    B) The # of drives a controller sees is reduced, reducing overall workload

    C) The total volume of data on the system bus is reduced as drives drop out of the mix

 

(A) and  (B) are certainly factors.  I don't think © makes much difference ... computationally, the parity check is not much of a demand on the CPU.    My low-power Atom system computed parity when I first set it up (with only 3 drives) in almost exactly the same time it now takes fully populated (6 drives) ... unfortunately I didn't record the exact times, but both were right ~ 8 hrs.    Might have added 5 minutes or so, but definitely not enough that I'd consider the computational load a significant factor.

 

As I noted earlier, in addition to the clear impact from hitting the slower areas of the zoned sectors as you close in on the end of each different size disk; there's also the simple fact that older, smaller disks, have notably lower areal density.    No 1TB/platter drives there -- more likely 500MB platters.  So even the fastest outer cylinders are still significant bottlenecks relative to the 3 and 4TB units with 1TB platters.

 

Link to comment

Below are my results from using the dskt.sh script mentioned in my previous post.

 

My original intention was to simply test a single Marvel 88SE9485 SAS/SATA controller on the 2760A, and I thought I would get the most informative results by putting all of the same drive model on a single controller.  Since I have 8 3TB Red data drives, I moved them all to the first controller.  I also moved the remaining 7 data drives (Samsung 1.5TB & 2TB) onto the second controller chip, leaving only the parity drive on the third controller chip.

 

I then ran dskt.sh both with and without the X parameter, getting individual drive max speed and concurrent drive max speed.  Both tests were run 3 times, and the totals below are the average of the 3 runs.  I tested all 15 drives concurrently, as my test showed I didn't need to focus in on the 8 Red 3TB drives on the single controller chip.

 

Somewhat surprisingly, a performance issue didn't show up.  As you can see, the tested concurrent drive throughput is about 99% of max throughput, and the 1% delta is well within the margin of error.  This leads me to believe I am getting max throughput.

 

Confused by the results, I started up a parity check with the new configuration, and again I was surprised to see speeds in the 60 MB/s range.  I had expected to see speeds drop to 40 MB/s, like my previous tests with all data drives onto the first two controllers.

 

Apparently each Marvel 88SE9485 works better with all matching drives.

 

Since my parity check times are still over 2 hours longer than my previous build (using the exact same drives) I'm going to do one more test.  I will move the parity drive to the motherboard's onboard SATA port, and move the Samsung 2TB drives to the 3rd controller.  That way, each controller only has one drive type/size.  Then I'll test again.

 

    DRIVE NAME, CONTROLLER AND TYPE   

    INDIVIDUAL DRIVE SPEED   

    CONCURRENT DRIVES SPEED   

sdc - Ctrl 1 - 3TB Red

141 MB/s

sdd - Ctrl 1 - 3TB Red

151 MB/s

138 MB/s

sde - Ctrl 1 - 3TB Red

146 MB/s

146 MB/s

sdf - Ctrl 1 - 3TB Red

138 MB/s

134 MB/s

sdg - Ctrl 1 - 3TB Red

140 MB/s

138 MB/s

sdh - Ctrl 1 - 3TB Red

140 MB/s

139 MB/s

sdi - Ctrl 1 - 3TB Red

145 MB/s

144 MB/s

sdj - Ctrl 1 - 3TB Red

143 MB/s

142 MB/s

sdk - Ctrl 2 - Smsng 1.5TB

107 MB/s

107 MB/s

sdl - Ctrl 2 - Smsng 1.5TB

106 MB/s

105 MB/s

sdm - Ctrl 2 - Smsng 1.5TB

104 MB/s

101 MB/s

sdn - Ctrl 2 - Smsng 1.5TB

102 MB/s

103 MB/s

sdo - Ctrl 2 - Smsng 2TB

102 MB/s

101 MB/s

sdp - Ctrl 2 - Smsng 2TB

107 MB/s

106 MB/s

sdq - Ctrl 2 - Smsng 2TB

114 MB/s

114 MB/s

TOTAL

1887 MB/s

1859 MB/s

142 MB/s

 

Link to comment

I don't think © makes much difference ... computationally, the parity check is not much of a demand on the CPU.   

 

© was in reference to the PCIe bandwidth and/or system bus bandwidth between the PCIe bus and the CPU, not the CPU horsepower itself.

 

I mentioned that because in my previous server I had an older AMD CPU AM2 that only supported HT 1.0, and parity checks on all the drives at the same time appeared to hit a throughput brick wall.  I was actually thinking of swapping the processor for an AM2+/AM3 model that supports HT 3.0, which should have removed that bandwidth bottleneck.

 

I agree 100% that the CPU is not being maxed out by parity checks.

Link to comment

Those speeds are actually looking quite good now !!

 

Now you need to run a parity check to completion ... and check the speed as it nears/crosses the 1.5TB point and again at the 2TB point.  I'm particularly interested in what you see after it crosses 2TB and the only remaining drives "in play" are your 3TB WD Reds.  I EXPECT you'll see well over 100MB/s, as that's still not far enough in to have much inner-cylinder slowdown (but is well past the fastest outer cylinders).    If that's the behavior you see, I'd say you're ready to go !!  :)

Link to comment

mvsas driver version 0.8.16 - has at least been around since 2011 Sep 28.

 

seems that there are a few issues that seem to have occured at the 0.8.16 driver rev...  this is possibly somewhat relevent info... ?

 

http://comments.gmane.org/gmane.linux.scsi/79430

 

(same informatioon is replicated on multiple mirrors...)

 

(also other reported issues, also seemingly widespread at same rev - needing the driver early loading fix...)

 

possibly see also :  https://bugzilla.kernel.org/show_bug.cgi?id=51881

 

then there was this: https://patchwork.kernel.org/patch/1984091/

 

and this: https://patchwork.kernel.org/patch/1989901/

 

and from Marvell... not sure if most current...

https://git.ipxe.org/mirror/scst/.git/blob_plain?f=mvsas_tgt/README

 

Thanks for the links electron286!

 

Those first 4 links talk about mvsas changes between kernels 3.5 and 3.7.  I don't think any unRAID beta/RC has ventured beyond 3.4, so I'm not sure that info is relevant (but perhaps a little scary about going to newer kernel releases!).

 

I'm a bit confused, though, as I only see mvsas 0.8.16 mentioned in this timeframe - are the mvsas version numbers not changing with the changes?

 

That last link, from Marvell, doesn't have any date or solid release info, but since it mentions 2.6.5, I think it is old old info.

 

 

Link to comment

  From what I gather... I may be way off though...  The mvsas driver code has not changed itself in quite a while...  BUT with the kernel changes, (as with kernels for different variants of Linux), often libraries and other dependents do change, so even though the mvsas driver version number may remain the same, it gets new builds with the different kernels.

 

  Prior to 0.8.16, it looks like there was a lot of changes in code, I found at least 5 different -x revs for 0.8.15 (0.8.15-d as an example)

 

  So, from what I can tell, 0.8.16 may still have all the bugs identified to date, BUT they may mostly, (if not fully), be caused by changes and updates in the various linked libraries and dependencies... which will vary from build and flavor of the kernel in use.

 

  I am also a bit perplexed, as some of the links seem to specifically CHANGE the mvsas code, and yes do not seem to update the version to a different rev!  So then how do you know which mvsas code you are running as compared to what has been altered?

 

  Sometimes things like that seem to get a bit out of hand with open source...  Changes for the sake of attempting fixes... committing changes, etc...

 

Link to comment

For the first time I really feel every AOC-SAS2LP-MV8 owner's frustration with unRAID and Tom.

 

Even a day ago I was willing to go to bat for Tom, saying there's no way he can be held responsible for the quality of the drivers being integrated into the various Linux kernels.  I thought he was just along for the ride like the rest of us.

 

But after a series of tests, I believe I can now conclusively say that 'something' is wrong with unRAID.  The parity check performance issues I am experiencing are not a driver issue.

 

First I used UhClem's handy script to evaluate controller performance, and while I was expecting to see some type of bottleneck, the 2760A performed flawlessly.  It delivers 100% of single drive performance even when all drives are accessed simultaneously.  Kudos to HighPoint, Marvel, and the various developers of the mvsas driver.

 

I just ran a rebuild to upgrade a disk, and the rebuild screamed along at max throughput, only limited by the speed of my slowest drive.  In my dskt.sh tests, the slowest drive was about 102 MB/s, and sure enough the rebuild started at about 102 MB/s, and at 15% was still close to 100 MB/s.  Kudos to Tom and unRAID, rebuilds are performing perfectly.

 

But then there are parity checks.  In theory, parity checks should be pretty much the same speed as a rebuild.  The only difference is that one drive is written to rather than read from.

 

In my previous build, using the Adaptec 1430SA controller cards, a full rebuild and a full parity check both completed within less than a minute of each other.  That's right, for an 11+ hour job, both completed with less than one minute of variance.

 

But the controller cards using the Marvell 88SE9485 SAS/SATA controller are dramatically slower on a parity check.

 

How much slower?  The rebuild I just completed took 10h45m (~77 MB/s), while the parity check took 14h17m (~56 MB/s).  27% slower processing speed, resulting in a 33% longer running time.  On the Adaptec 1430SA controller cards, both rebuild and parity took 11h17m (~72 MB/s).

 

The variance doesn't make sense.  It is the same hard drives, the same motherboard, the same memory, the same cpu, the same cables, the same controller card, even the same drivers, and the jobs being executed are very similar.  But the parity check is significantly slower than a parity rebuild.

 

At this point, the only culprit left is the unRAID parity check code.

 

And this is why I'm now joining the camp of frustrated Marvell 88SExxxx series controller card owners.  For a very long time now, years if I'm not mistaken, users have been telling Tom there is a problem.  More unRAID users are running Marvell powered add-in cards than any other provider.  Even unRAID's new shipping servers utilize a Marvell 88SExxxx series hd controller chip, though perhaps not a model affected by this issue.

 

Tom Harms, aka Tom#2, in your daily/weekly/monthly status reports with Tom#1, please help illustrate that this is a long running code issue in unRAID that needs to be addressed.  Please feel free to reach out to me for troubleshooting assistance.  I'm willing to help in any way possible.  And I'm not the only one.

 

And Tom#2, in case you are unsure why this matters, this is more than a convenience factor.  Parity checks are the most common reason that all drives are spun up for long periods of time, generating high heat, and accelerating wear and tear.  When a parity check runs slow, these drives are spun up longer, significantly increasing wear and tear, reducing lifespan, and wasting electricity, and ultimately costing the users more money.

 

As long as parity checks to secure our data are a necessary evil of the great product that is unRAID, we need efficient code that doesn't contribute towards premature hd failures.

Link to comment

Excellent post ==> and clearly you are absolutely correct.  A parity check should NOT take any longer than a rebuild ... virtually everything about the two operations is identical except that the disk being rebuilt is written to instead of read from.

 

Your results clearly exonerate the Linux drivers as the source of the issue ... and also eliminate the controller chips as potential bottlenecks.

 

Hopefully putting a focus on this will encourage Tom to look into just why it's working this way; and will result in a fix !!

 

Link to comment

Since my parity check times are still over 2 hours longer than my previous build (using the exact same drives) I'm going to do one more test.  I will move the parity drive to the motherboard's onboard SATA port, and move the Samsung 2TB drives to the 3rd controller.  That way, each controller only has one drive type/size.  Then I'll test again.

 

As a quick follow-up, the above test resulted in the exact same processing speed. 

 

I think my previous adjustment, moving the 3TB Red drives onto the same controller, had an impact because of the large speed differential between the Reds (~140MB/s) and my Samsung F2/F3's (~105MB/s).  Since the Samsung drives are about the same speed, regardless of size, splitting them onto separate controllers had no benefit.

Link to comment

Tom has directed me to test both "md_sync_window" and "Force NCQ disabled" to see if parity check speeds improve on the HighPoint 2760A.

 

I increased md_sync_window from a default of 384 all the way to 1280, in increments of 128.  Note, I am not sure if md_sync_window needs to be in increments of 128, but that's what I used in my testing.

 

All of my tests were performed to a Parity Check (NoCorrect) position of 1.0% only.  I would start the Parity Check, then monitor on the unMENU>MyMain>Detail status screen, manually refreshing to see the counter increment from 0.9% to 1.0%, then immediately cancel the Parity Check.  I would pull the start and end times from the log.  I found this method to be accurate and repeatable (and a whole lot quicker than waiting 10-14 hours for a full Parity Check).

 

  Time in Seconds    Time in Seconds 

  md_sync_window 

No NCQ

W/ NCQ

384

439

395

512

296

299

640

275

280

768

270

274

896

269

269

1024

270

270

1280

271

275

 

I found the best results with a md_sync_window value of 896, regardless of whether NCQ was enabled or disabled.  At this value, the Parity Check speed appeared to be equivalent to a Rebuild, possibly event a bit faster.

 

Memory utilization did increase less than 1% from increasing the md_sync_window value.  I typically use a little over 10% of my 4GB of RAM, and I noticed almost 11% used with a md_sync_window value of 1280.  According to Tom, each increment uses 4K of memory per drive, so going from 384 to 1280 on my 16 drive server would only use an additional 56 MB.

 

In general, NCQ didn't seem to have an impact except at the default md_sync_window value of 384, where having NCQ on seemed to help.  Since it doesn't seem to help when md_sync_window is properly tuned, I plan to leave it off, since I have (possibly unfounded) fears about NCQ interfering with a parity check or rebuild when I'm simultaneously writing to the array, which does happen from time to time.

 

I will now kick off a full Parity Check with md_sync_window set to 896 and NCQ disabled.  Once that is complete, I will upgrade another 1.5TB drive to a WD Red 3TB drive.  Hopefully the server will produce equivalent run times for both jobs.

Link to comment

Wow!  If that improvement holds over the whole range of the disk (no reason to think it won't) it should make a VERY nice improvement in parity check times.    And if it's consistent with your next rebuild, I'd say you have this puppy "licked" !!  8)

 

Now I just need to see how it compares to the results with the Adaptec 72405 and we'll know what the "best" 24 drive single-slot controller is for UnRAID.    I suspect it's the RocketRAID, since the Adaptec costs appreciably more, and I can't imagine the 72405 will get any better performance than what it seems you're going to get with the proper "tuning" !!

 

Link to comment

  WOW!  I will have to say this is getting pretty exciting!  Very cool to see what effects occur with changing md_sync_window!  So does anyone know, or have a feeling about if the optimum value might change for size of array, or more about the controller and bus speed timing?  Possibly all effect the optimum?

 

  Is this possibly something we should all look at if trying to TUNE for peak performance?

 

  What system changes are likely to need a different value for optimum performance again?

 

  Are there any pre-cautions needed or things to look for when changing the values?

 

  Are there any other questions that should be asked?...

 

 

Link to comment

Thanks for this. 896 made both my SAS2LP systems roughly 2x faster over defaults. However, unlike you 1024 made my systems even faster (another 15% or so) over 896! This has been bothering me for almost a year! Still not quite as fast as Beta11 and earlier, but 8-10 hours is better than 18+ hours.

Link to comment

Thanks for this. 896 made both my SAS2LP systems roughly 2x faster for parity checks. However, unlike you 1024 made my systems even faster (another 15% or so) over 896! This has been bothering me for almost a year! Still not quite as fast as Beta11 and earlier, but 8-10 hours is better than 18+ hours.

 

 

I wonder if the reason you get better results at 1024 could be that you have one chipset per card while the RocketRaid has multiple chipsets per card.

Link to comment

Thanks for this. 896 made both my SAS2LP systems roughly 2x faster for parity checks. However, unlike you 1024 made my systems even faster (another 15% or so) over 896! This has been bothering me for almost a year! Still not quite as fast as Beta11 and earlier, but 8-10 hours is better than 18+ hours.

 

 

I wonder if the reason you get better results at 1024 could be that you have one chipset per card while the RocketRaid has multiple chipsets per card.

 

I'm not sure. Upon further testing with 1280, it seems to be about 5% better than 1024. It's really close, but I think i'll just leave my systems at 1280. I'd suggest people having this issue try 898, 1024, and 1280 and choose the one they feel is the fastest.

 

EDIT: After letting it go even further on 1280, one server sitting at 117.6MB/s and the other at 86.2MB/s bringing it back up to Beta11 levels! So definitely try different values. ;D

Link to comment

Wow!  If that improvement holds over the whole range of the disk (no reason to think it won't) it should make a VERY nice improvement in parity check times.    And if it's consistent with your next rebuild, I'd say you have this puppy "licked" !!  8)

 

Now I just need to see how it compares to the results with the Adaptec 72405 and we'll know what the "best" 24 drive single-slot controller is for UnRAID.    I suspect it's the RocketRAID, since the Adaptec costs appreciably more, and I can't imagine the 72405 will get any better performance than what it seems you're going to get with the proper "tuning" !!

 

 

The upside of the Adaptec card would be, though, that it's 8x PCIe 3.0. There's a large number of us out there now that are running with a Supermicro X9SCM and an Ivy Bridge Xeons which support PCIe 3.0. We couldn't use the RocketRaid card because our boards have all 8x slots. It would be nice to dump the multiple cards and/or expanders and run everything off a single 8x card and that would be able to supply 333MB/s to each disk. It wouldn't be an advantage cost wise but it would be in simplicity.

Link to comment

The upside of the Adaptec card would be, though, that it's 8x PCIe 3.0. There's a large number of us out there now that are running with a Supermicro X9SCM and an Ivy Bridge Xeons which support PCIe 3.0. We couldn't use the RocketRaid card because our boards have all 8x slots. It would be nice to dump the multiple cards and/or expanders and run everything off a single 8x card and that would be able to supply 333MB/s to each disk. It wouldn't be an advantage cost wise but it would be in simplicity.

 

Yes, that's indeed an advantage.  You could, of course use the 2760a ==> you'd simply be limited to "only" 166MB/s to each disk  :)    That may or may not be a real bottleneck in actual use => the only time it would seem that's the case is on parity checks or rebuilds for the outer cylinders of 1TB/platter drives.    Of course as drive density increases even further, then it would be much more or a notable bottleneck !!

 

Link to comment

The upside of the Adaptec card would be, though, that it's 8x PCIe 3.0. There's a large number of us out there now that are running with a Supermicro X9SCM and an Ivy Bridge Xeons which support PCIe 3.0. We couldn't use the RocketRaid card because our boards have all 8x slots. It would be nice to dump the multiple cards and/or expanders and run everything off a single 8x card and that would be able to supply 333MB/s to each disk. It wouldn't be an advantage cost wise but it would be in simplicity.

 

Yes, that's indeed an advantage.  You could, of course use the 2760a ==> you'd simply be limited to "only" 166MB/s to each disk  :)    That may or may not be a real bottleneck in actual use => the only time it would seem that's the case is on parity checks or rebuilds for the outer cylinders of 1TB/platter drives.    Of course as drive density increases even further, then it would be much more or a notable bottleneck !!

 

 

Actually you can't use that 16x card. The 8x slots on the board have a closed end and there are no notches on the RocketRaid to allow it to fit in a 8x slot with a closed end. So if you want that many ports on one card the Adaptec is your only option.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.