unraid-tunables-tester.sh - A New Utility to Optimize unRAID md_* Tunables


Recommended Posts

You can edit the /flash/config/disk.cfg and change all three parameters.  I don't know why they can't all three be changed in the webgui.

 

Don't give up so easily.  The script helped me improve my parity checks a lot.  I went from about 80MB/sec to about 115MB/sec.  Try to change all three in the disk.cfg file and test again.

 

Just curious what levels you ended up with for the 3 parameters vs. what you started with.  That's a VERY nice improvement in speed !!

 

Link to comment
  • 4 weeks later...
  • 3 weeks later...

Some questions arise:

 

I ran the script a while back in fullauto mode and updated the settings accordingly.

 

I want to re-run the script.

 

Does the script use any of the current settings?

 

Should I restore the default settings as after a fresh install of unraid?

 

Should the script be run on an as clean as possible server, without any dockers or plugins?

 

 

Link to comment

I didn't see the comments about the script failing with 6.1+, and got the

 

/root/mdcmd: No such file or directory

 

failures so I ^C'd out of the script.  Does anyone know if I need to restore any configuration values due to prematurely exiting out of the script?

 

Thx

 

Nothing gets changed until the very end when it asks you to change values, and then confirm. You are safe.

 

For what it's worth I re-downloaded the txt file in the first post, and using notepad I did a search and replace of /root/mdcmd to the new path, saved it to my flash drive and then ran the script.

 

...and what is the "new path" ?  :o

Can the author modify this Script for v6.1.4 please?

Thanks  ;)

Link to comment

I didn't see the comments about the script failing with 6.1+, and got the

 

/root/mdcmd: No such file or directory

 

failures so I ^C'd out of the script.  Does anyone know if I need to restore any configuration values due to prematurely exiting out of the script?

 

Thx

 

Nothing gets changed until the very end when it asks you to change values, and then confirm. You are safe.

 

For what it's worth I re-downloaded the txt file in the first post, and using notepad I did a search and replace of /root/mdcmd to the new path, saved it to my flash drive and then ran the script.

 

...and what is the "new path" ?  :o

Can the author modify this Script for v6.1.4 please?

Thanks  ;)

 

http://lime-technology.com/forum/index.php?topic=29009.msg402678#msg402678

Link to comment

This is going to give me the following:

md_num_stripes

md_write_limit

md_sync_window

 

However unRAID 6.1.4 has the following values:

Tunable (poll_attributes):

Tunable (md_num_stripes):

Tunable (md_sync_window):

 

Should I ignore the md_write_limit output?

 

Md_write limit was removed from V6, don’t know if it makes a difference in speed but I change it directly in the flash/config/disk.cfg

Link to comment
  • 2 weeks later...

Hey everyone,

 

Kizer noticed I hadn't been around in a while, so he sent me a PM and asked me to check in.

 

I just got through reading through several months of posts, and had a few thoughts to share.

 

First, sorry, I've been really busy in work and life, and have completely neglected this script.  It has two main issues right now, 1) a file path has changed under v6 (maybe that was 6.1) that breaks the script, and 2) there are changes in v6 that don't align with the testing this script performs. 

 

As some have pointed out, you can fix the file path change by either by creating a symbolic link or a simple text replacement of the script.  I would update it, but releasing a new version might represent to some a level of compatiblity with 6.x, which is not the case.

 

The script (with the path fix) may work for you on 6.x and even produce good results, but it definitely has not been updated for 6.x.

 

Some additional thoughts:

These tunables are all about optimizing your HD controller(s).  While the number and size of installed drives has an impact on parity check speeds, I do not believe these tunables compensate for that.  Rather, each unique configuration of HD controllers on each system may require different amounts of memory to work efficiently.  Tom and LimeTech have chosen default values that are both frugal and often pretty fast.  Certain controllers, like the one on my system (see my sig), work horribly with the default settings (regardless of drives attached), and I experience a massive performance boost by throwing more memory at the problem (increasing the values).  In other words, the default values might be choking performance by not allocating enough memory for your particular HD controller(s).

 

Each test starts from scratch, and does not look at your current settings other than to inform you how much additional memory the Unthrottled setting would use compared to the current settings.  I agree, it would be a good idea to print the current settings as part of the test results.

 

There are no guarantees that the reported settings will ultimately perform as tested.  The test simply sets the values, starts a parity check, lets it run for a bit to see how fast it is, stops it, then repeats with the next set of values.  It's certainly possible that the process of running multiple partial parity checks might falsely influence the reported speeds - I believe this is evidenced by a couple users who noted that a reboot after setting the new values resulted in horrible performance.  These users did the right thing and went back to default values.

 

Not every system has a problem with the default values.  That's why I created this script, to try and easily figure out what values might work on any system.  Sometimes default is fastest, and sometimes changing the values has no impact - congratulate yourself you have a robust system.  I wish my system worked well with default settings, but I have to increase the memory to get decent performance.

 

The Best Bang for the Buck is basically a "fast enough" setting with low memory utilization.  Yes, you might be able to go faster, but you have to throw increasing amounts of memory to get there for very little return.  I recommend the Best Bang result in almost all cases.

 

As some have pointed out, the script doesn't always get the fastest result at the end of the final pass.  Part of this is because of inconsistency between the multiple runs at the exact same setting.  The first pass might find test #7 was fastest, but when it zeroes in on that range in the second pass the server may decide to behave differently and give slower results.  While I could try and make the script smarter, and ever more complex, I recommend that instead of expecting the script to do everything for you, look at the results yourself and make a judgement call.  That's the main reason I print all the results of every test, because the logic to find the fastest and best bang settings are not foolproof.

 

Also, there are manual settings, so if you feel a range of values is worth examining in closer detail, run it in manual mode and target a specific range.

 

If you're ever not sure what settings to use, go with the smallest values that give acceptable results.

 

I've done zero testing on 6.x with this script (sorry, but it's true).  I also don't understand the impact of the changes to the tunables settings in 6.x.

 

Because this script can test some extreme memory settings, I always recommend to not be doing anything that might cause data loss while testing.  Sure, run your VM's and what not, to load up the system per normal, but don't be reading/writing to the array while testing, because that may both throw off the results and also expose you to data loss if the memory settings somehow run the server out of memory and cause issues.  If you have VM's that are reading/writing and you can't somehow stop that activity, then yes do shutdown that VM before testing.

Oh, and not to forget Kizer, who called me in:  Sorry, I don't know why you got a zero for the results, especially on a 5.0 system.  There must be something about your system that is breaking the script.  As long as your parity checks complete in a normal amount of time, this is purely a script issue.  If your parity check is really slow, you could try increasing these values yourself, manually in the GUI, and run a parity check to test.  You can try using the same values the script tests, but since you would be doing it manually, I would compare the default values to maybe something like tests #7, 14 and 20.  If you don't see any worth increases in speed, just stick with your defaults, and if you see a worth increase, perhaps run some more manual tests around the good values.

 

Lastly, I'll apologize again, as I'm still busy with work and life, and don't foresee being able to work on this script in the near future.

 

Paul

Link to comment
  • 8 months later...

Since I always found this script to be of immense use, here is the modified script that will work under 6.2 and under 6.1.9.

 

Only difference for 6.2 compatibility is that md_write_limit has been removed from 6.2, so the adjustments that the script makes as it runs have been removed.  (Note that it will still display the md_write_limit values in the summary)  (it was setting the md_write_limit that would crash the script on 6.2 even with changing the location of mdcmd within the original script)

 

And you should really read Pauven's comments a couple posts up (http://lime-technology.com/forum/index.php?topic=29009.msg424206#msg424206)  This script is not the end-all-be-all, but can help.  It did for me back in the day, but now my system works fast enough using LT's default values)

unraid-tunables-tester.zip

Link to comment

Hey Squid,

 

So funny that you posted this today (much appreciated by the way).  I've spent all day working on the next version, compatible with 6.2.  Today is the first time I've touched it in nearly 3 years.

 

Your fix for md_write_limit is necessary.  Not sure about changing the location of the mdcmd.  I was going to make that change today, but found that on 6.2 it works in the original location.  Perhaps LimeTech did something to reverse what they changed in 6.0/6.1.

 

I've found a few things in my testing with 6.2.0-RC4 today - last time I tested was on unRAID 5.0.  The old values that worked so well on 5.0 don't seem to apply to 6.2.  Significantly lower md_sync_window values are now working better, for my server at least.  Of course, a lot has changed since 5.0 (32 to 64 bit, newer drivers, newer kernel, newer unRAID code, and on my server, 4GB to 16GB and now Dual Parity), any of which could be a factor in my server now responding better to different values than on unRAID 5.0.

 

Also, in testing and trouble-shooting bugs, I accidentally ran a detailed Pass 2 in the wrong region - it should have run md_sync_window values in the 640-768 region, and instead it ran from 384 - 768.  This was interesting because there was a hidden gem at 560, where it ran 1.5MB/s faster than any other setting, and at a much lower value than the 768 value I was targeting from Pass 1 (though the values around 768 more consistently produced fast results).  Had my script run correctly, I would have never tested 560, and never found this hidden value with so much potential. 

 

560 might just be an anomaly, I plan to do some more testing to see if it consistently produces good results, but for now it has me rethinking the search routine.  I'm actually hoping further testing proves it was just a fluke, otherwise, I'm not sure what I will do to try and find these hidden nuggets of goodness.

 

There's lots of other improvements in my new version, but lots of testing to do before it is ready to share.

 

I'm also thinking it would be nice to put a GUI on it and make it a real plug-in, but there's a steep learning curve for me to figure out how to do that.

 

-Paul

Link to comment

Hey Squid,

 

So funny that you posted this today (much appreciated by the way).  I've spent all day working on the next version, compatible with 6.2.  Today is the first time I've touched it in nearly 3 years.

 

Your fix for md_write_limit is necessary.  Not sure about changing the location of the mdcmd.  I was going to make that change today, but found that on 6.2 it works in the original location.  Perhaps LimeTech did something to reverse what they changed in 6.0/6.1.

 

I've found a few things in my testing with 6.2.0-RC4 today - last time I tested was on unRAID 5.0.  The old values that worked so well on 5.0 don't seem to apply to 6.2.  Significantly lower md_sync_window values are now working better, for my server at least.  Of course, a lot has changed since 5.0 (32 to 64 bit, newer drivers, newer kernel, newer unRAID code, and on my server, 4GB to 16GB and now Dual Parity), any of which could be a factor in my server now responding better to different values than on unRAID 5.0.

 

Also, in testing and trouble-shooting bugs, I accidentally ran a detailed Pass 2 in the wrong region - it should have run md_sync_window values in the 640-768 region, and instead it ran from 384 - 768.  This was interesting because there was a hidden gem at 560, where it ran 1.5MB/s faster than any other setting, and at a much lower value than the 768 value I was targeting from Pass 1 (though the values around 768 more consistently produced fast results).  Had my script run correctly, I would have never tested 560, and never found this hidden value with so much potential. 

 

560 might just be an anomaly, I plan to do some more testing to see if it consistently produces good results, but for now it has me rethinking the search routine.  I'm actually hoping further testing proves it was just a fluke, otherwise, I'm not sure what I will do to try and find these hidden nuggets of goodness.

 

There's lots of other improvements in my new version, but lots of testing to do before it is ready to share.

 

I'm also thinking it would be nice to put a GUI on it and make it a real plug-in, but there's a steep learning curve for me to figure out how to do that.

 

-Paul

It just popped into my head while I was at work and spent an entire 2 minutes determining which lines wouldn't let it run under 6.2, commenting them out (my local copies I had long ago made the change in the directory for the move in mdcmd)

 

The underlying algorithms you are the best expert on that with regards to the script, and with my changes I make no guarantees at all that anything even works.  All I know is that if I happen to set the md_sync value to an obscene number (so low as to guarantee that I'm going to get a low rate), the script does indeed work.  Beyond that, under 6.2 I've had zero problems and am really happy with the rates that the default values return (and the script basically returns a flat curve for the different test points), and my 6.1.9 server has always returned a basically flat curve.

 

I basically did this to help out Frank1940 in another thread as it couldn't hurt to modify the script for 6.2 functionality (and as I already stated I was bored at work)

 

Should you decide in the future to reconfigure the script from being interactive to instead taking command line switches for everything, then I can help you out with a plugin front end for it.

 

Link to comment

... then I can help you out with a plugin front end for it.

 

That would be awesome, I appreciate the offer.  Don't be surprised if I take you up on it.

 

By the way, I've been meaning to ask:  Has anyone created a plugin that tests these tunables?  I didn't see one available, but thought I might have missed it if someone already created one. 

 

I'm still finding that I get a nice bell curve on my server, not a flat line like you get on yours.  Only the peak of the curve has shifted to lower values.

 

Considering how little attention this thread has seen in the past couple years, I'm thinking LimeTech's done something right to make this less of an issue.

 

-Paul

Link to comment

By the way, I've been meaning to ask:  Has anyone created a plugin that tests these tunables?  I didn't see one available, but thought I might have missed it if someone already created one. 

Nope

I'm still finding that I get a nice bell curve on my server, not a flat line like you get on yours.  Only the peak of the curve has shifted to lower values.

more or less a flat line.  Script actually seemed to return lower values than what a actual parity check would do at the start of it, but I didn't really take the time to ensure the server was set up for an optimal parity check time (didn't stop any thing else running on the server at the time of my quick test).  My secondary server however had a flat line running 5.x / 6.x probably due to the Br10i controller, and its results under my hack pretty much match exactly my parity check rate. (and it is nothing but a NAS)

 

Considering how little attention this thread has seen in the past couple years, I'm thinking LimeTech's done something right to make this less of an issue.

I believe it is less of an issue than what it was (that and combined with the fact that the script didn't work without modification of the paths under 6.1 and wouldn't work at all under 6.2 without removing two lines), but the tunables are still in the OS, and many people out here simply want the best possible performance out of their hardware possible. 

 

I do still see intermittently some users (johnny.black?) posting to users to adjust their tunables, but those suggestions have always seemed to me to be off-the-cuff based upon his own experiences, and may not be suitable to all user's hardware.  My own system when on 6.0 showed a definite bell curve with this script and higher values helped to a point and after the peak was reached, the rates dropped precipitously.

 

Only time will tell if there is still a need for adjusting tunables if people (hopefully) begin posting their results again.

 

 

 

Link to comment

The old values that worked so well on 5.0 don't seem to apply to 6.2.  Significantly lower md_sync_window values are now working better, for my server at least. 

 

Great to see you working on the script again.

 

A lot has changed since 5.0, the biggest difference being the addition of a new tunable: md_sync_thresh, you can read Tom's explanation here.

 

This tunable was added to improve the parity check performance some users were having with the SAS2LP, IIRC, you controller uses the same chipset, basically the SASLP and SAS2LP work faster if this value is approximately half the md_sync_window value, other controllers like most LSI based ones, work faster with a md_sync_threash just a little lower than md_sync_window.

 

Hope you can include testing for this new tunable in your plugin.

Link to comment

A lot has changed since 5.0, the biggest difference being the addition of a new tunable: md_sync_thresh, you can read Tom's explanation here.

 

That helps catch me up, thanks!  I was going to ask about it, and nr_requests.  I've got some ideas for preliminary testing of md_sync_thresh, so I'll add that into the mix and see how it goes.

 

I was also going to ask if Tom had ever revealed how md_num_stripes vs md_sync_window works now that md_write_limit is gone.  The script I'm currently testing I never released to the public, and it has Write testing.  I had found that md_write_limit had a large impact on write speeds, and now that it is gone, I don't know what to think. 

 

I think I had a pretty solid understanding of md_num_stripes in 5.0, but now I'm lost.  Any rules of thumb for setting num_stripes vs. sync_window?

 

I also see a new tunable named md_write_method, with selectable values of read/modify/write, or reconstruct write.  What in the world is that?

 

-Paul

Link to comment

Oh, another thought/question:  Now that we're on 64-bit unRaid, do I still need to worry about the "Best Bang for the Buck", or just focus on the "Unthrottled" fastest possible values?

 

I had come up with some new metrics to try and derive the thriftiest values, but if memory is no longer a concern, perhaps that's just silly.

 

-Paul

Link to comment

Nr_request was what solved the aoc-sas2lp problem.  That was because certain hard drives running a certain version of ata interface firmware would slow down parity checks.  Start here http://lime-technology.com/forum/index.php?topic=42629.msg417261.msg#417261 and 

http://lime-technology.com/forum/index.php?topic=42629.msg417447.msg#417447

 

md_write_method isn't really a tunable but rather a setting.  What it does if it's set to reconstruct mode (on tapatalk so not sure what the actual setting is) then all drives get spun up to do a write and then instead of the having to read the parity disk to see what the existing parity value is calculate what the new parity value would be and then write it, all the  disks (except for the one being written to and parity) are read concurrently and parity and data disk are written to simultaneously.  Much faster writes depending upon the width of the array and the number of disks currently spinning.  Side note  you would think that setting it to auto turns it off and on depending upon if all the drives are spinning or not, but auto means the same as disabled

 

Sent from my LG-D852 using Tapatalk

 

Link to comment

A lot has changed since 5.0, the biggest difference being the addition of a new tunable: md_sync_thresh, you can read Tom's explanation here.

 

That helps catch me up, thanks!  I was going to ask about it, and nr_requests.  I've got some ideas for preliminary testing of md_sync_thresh, so I'll add that into the mix and see how it goes.

 

I was also going to ask if Tom had ever revealed how md_num_stripes vs md_sync_window works now that md_write_limit is gone.  The script I'm currently testing I never released to the public, and it has Write testing.  I had found that md_write_limit had a large impact on write speeds, and now that it is gone, I don't know what to think. 

 

I think I had a pretty solid understanding of md_num_stripes in 5.0, but now I'm lost.  Any rules of thumb for setting num_stripes vs. sync_window?

 

I also see a new tunable named md_write_method, with selectable values of read/modify/write, or reconstruct write.  What in the world is that?

 

-Paul

Memory wise it's not a big deal anymore even worrying about the extra that un throttled takes over best bang.

 

Although I had found in my 6.0 days that on one of my old servers unthrottled  really brought to the forefront if you had cabling wasn't up to snuff while the slower best bang didn't.

 

Sent from my SM-T560NU using Tapatalk

 

 

Link to comment

Memory wise it's not a big deal anymore even worrying about the extra that un throttled takes over best bang.

 

+1

 

I was also going to ask if Tom had ever revealed how md_num_stripes vs md_sync_window works now that md_write_limit is gone.  The script I'm currently testing I never released to the public, and it has Write testing.  I had found that md_write_limit had a large impact on write speeds, and now that it is gone, I don't know what to think. 

 

On v6.1 you could change the md_write_limit manually on disk.cfg, and I did several tests at the time and never noticed any difference in write speed, maybe the setting was needless and that's way it was removed, AFAIK Tom never explained it, nor how stripes vs window works.

 

I think I had a pretty solid understanding of md_num_stripes in 5.0, but now I'm lost.  Any rules of thumb for setting num_stripes vs. sync_window?

 

I find that I get best results with md_num_stripes set to aprx. twice the md_sync_window value.

 

 

nr_requests is not an unRAID tunable, it's a Linux setting, it was a workaround found to fix the SAS2LP issue before the md_sync_thresh tunable was added, it's possible that changing it affects write performance, though I believe nobody ever noticed any issue, it can still be useful, e.g., if a server as both a SAS2LP and a LSI controller setting nr_request to 8 and md_sync_thresh close to md_sync_window gives the best performance.

 

Link to comment

Okay, I have my first challenge, and can use a little help.

 

I can set the md_sync_thresh value using mdcmd, but this doesn't work for nr_requests.

 

In a thread I found that you can set nr_requests individually per drive using the following:

  • echo 8 > /sys/block/sdX/queue/nr_requests

 

Question 1, is there a way to set nr_requests globally for the array, like Tunable(nr_requests) setting on the "Disk Settings" panel, but from the command line?

 

Question 2, if I have to set each drive individually using the above format, what is an efficient routine?  I basically need a list of sdX array drives to cycle through, but at the same time, skipping any cache or other drives not in the array.  Strike that question.  I already solved the problem 3 years ago, I have a routine that finds all the disks in the array, and I can easily cycle through them.  Still would prefer a global method, though.

 

Thanks,

Paul

Link to comment

The thing about nr_requests is that it is per drive, and different drives (and what controllers they are attached to) respond differently to it (as I understand it - could be completely off base here)

 

EG: I have the SAS2LP.  For the longest time, I never believed anyone when they stated that it was returning slow parity checks, as I never experienced the issue.  As it turned out, by a fluke, the drives that I had that were affected (IIRC using ATA-8 for the interface version) were not attached to the controller, so I never noticed the problem.  Had they been attached to the controller, then I would have.  As it stands, the nr_requests fix didn't have any affect on my system because I had never been affected by the problem in the first place due to how potentially affected drives were connected.

 

By extension of this, there are still reports of certain Samsung drives when involved in an array causing slow parity checks, but when doing normal reads/writes run full speed.

 

I guess that the problem with nr_requests now is that if you can get better results by individually setting the value per drive, then unRaid is going to clobber those set values on a reboot as it only offers a global setting to apply to all drives.  You could probably get around this by having a secondary script to set individual nr_requests when the array starts however.

 

Probably would be wise to PM eschultz on nr_requests, since he figured that one out in the first place.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.