Ultimate Unraid Server - The Sequel


GaryMaster

Recommended Posts

I plan to test the UNRAID system using a DOE (Design of Experiments).  This is many times faster than perfoming an OFAT experiment (One Factor at a Time) as you suggest.

 

By all means, faster results that are useless is a much better approach than slower results that are useful.

 

How long does it take to do:

 

swapoff -a
dd if=/dev/zero of=/mnt/disk1/test.dd bs=1m count=1000
dd if=/dev/zero of=/mnt/disk2/test.dd bs=1m count=1000
dd if=/dev/zero of=/mnt/cache/test.dd bs=1m count=1000
dd if=/dev/zero of=/tmp/test.dd bs=1m count=1000

 

The last one assumes you have more than 1GB of physical RAM.  You don't even have to do the math... dd does it for you.

 

And of course, any serious optimization effort should start with a profiler setup to actually SEE where the bottleneck is.  You could get a clue from mpstat (hint:  it's waiting for disk I/O).

 

Many experienced people here, including at least 2 of us with PhDs in engineering, have been working with performance tuning in unRAID for years.  The forums are full of such threads.  Your approach reminds me of the apocryphal young lieutenant, fresh out of college, who takes command of his first platoon and ignores the advice of the 20 year career sergeants, confident of the superiority of his college-educated approach.

 

BTW, ICH7 controllers had some well-known SATA bandwith issues.  The -M version only did SATA-150, and the non-M version specified it did SATA-300, but in the field some early batches didn't deliver.  In particular, some PCIe addin SATA cards will easily outperform the ICH7.

Link to comment
  • Replies 209
  • Created
  • Last Reply

Top Posters In This Topic

I plan to test the UNRAID system using a DOE (Design of Experiments).  This is many times faster than perfoming an OFAT experiment (One Factor at a Time) as you suggest.

 

By all means, faster results that are useless is a much better approach than slower results that are useful.

 

How long does it take to do:

 

swapoff -a
dd if=/dev/zero of=/mnt/disk1/test.dd bs=1m count=1000
dd if=/dev/zero of=/mnt/disk2/test.dd bs=1m count=1000
dd if=/dev/zero of=/mnt/cache/test.dd bs=1m count=1000
dd if=/dev/zero of=/tmp/test.dd bs=1m count=1000

 

The last one assumes you have more than 1GB of physical RAM.  You don't even have to do the math... dd does it for you.

 

Many experienced people here, including at least 2 of us with PhDs in engineering, have been working with performance tuning in unRAID for years.  The forums are full of such threads.  Your approach reminds me of the apocryphal young lieutenant, fresh out of college, who takes command of his first platoon and ignores the advice of the 20 year career sergeants, confident of the superiority of his college-educated approach.

 

bubbaQ:

 

I could do without the lectures and personal attacks.  I expressed to you I would test more deeply as I get time and I will.  When I do, I will take a structured approach.  Why do you berate my approach when you apparently have no training in statistical experimentation?  Your suggestions for measurements can be used in my structured test - the only difference is that I can run 4 combinations of that test and learn the same thing that you would in 16 different tests changing one input variable at a time. 

 

If you want to start beating me with credentials, I ALSO hold a PhD in Electrical Engineering, am a registered Professional Engineer with the state and have been working in the industry for 16 years.  I've been around the block a few times myself.

 

I don't question that you and others here know a great deal about the low level function of this software.  I value that knowledge and intend to use it in detailed testing when all of the hardware is available. 

Link to comment

 

A couple of betas ago many of us did a very basic DD test local in the machine.

This taxed the machine directly without the network layer.

It showed the basis of what the maximum your write speed could ever be.

 

It proved to be very enlightening when limetech adjusted the md driver parameters for better write speed.

 

Please see thread:  http://lime-technology.com/forum/index.php?topic=4625.0

 

It would be helpful if you did this on your test systems to provide a bast case baseline.

 

#!/bin/bash

if [ ! -z "${1}" ]
   then TMPFILE="${1}"
   else TMPFILE="test.$$"
fi

trap "rm -f ${TMPFILE}" EXIT HUP INT QUIT TERM

echo "`date` writing to: ${TMPFILE}"
dd if=/dev/zero of="${TMPFILE}" count=4000000 bs=1024
echo "`date` Done."
ls -l --si ${TMPFILE}

rm -f "${TMPFILE}"

 

run it with the name of the script and argument 1 as the path to write.

 

WeeboTech:

 

That's a very interesting thread.  I was previously unaware of these variables that were exposed to the user to change queue depth to allow for more system memory.  It sounds like the current software is still optimized for the user only having 512MB.  After reading through the thread, it didn't look like anyone arrived at optimal numbers for various memory configurations.  Hmmm... this is becoming much more complicated and the number of inputs are growing exponentially.  I may end up doing two tests - one for the hardware with "stock" settings and another on the fastest system to see how much farther it can go with the software tweaked to take full advantage of the system resources.

Link to comment
If you want to start beating me with credentials, I ALSO hold a PhD in Electrical Engineering, am a registered Professional Engineer with the state and have been working in the industry for 16 years.  I've been around the block a few times myself.

 

I assumed as much... as soon as you mentioned 6 sigma.... I never hear buzzwords like that except from other bookworms.

 

Unless you work in one of those wimpy, politically correct, touchy-feely corporate environments, 16 years as a PE should have given you a much tougher skin.  Technical forums (like technical meetings) get a little rough and tumble.

 

I don't question that you and others here know a great deal about the low level function of this software.  I value that knowledge and intend to use it in detailed testing when all of the hardware is available.

 

That's good.  But BRiT is right -- you can quickly reach a point of diminishing returns.  We don't mind if people rearrange deckchairs on the Titanic with benchmarks that don't reveal very useful information ... we just don't waste other people's attention getting us to care about experiments that other people already know the answer to.  This isn't a peer-review journal where duplicating and verifying prior results is what users are looking for.  We've tweaked the tunable parameters.... the scheduler... read-ahead ... O/S buffers.  Tried large buffers in large memory (PAE) systems.

 

You'd get much more interest from folks here by addressing real-world questions such as comparing 16M cache VRaptor or a 7200 RPM Hitachi or Seagate versus 64M cache WD EARS drives in the exact same system, on the same controller.

 

The VRaptors are a red herring... not too many people are likely to limit their unRAID disk choices to 300GB per disk, and not at the VRaptor prices.

 

You have 2 of VRaptors and 2 Seagates.  Why not swap them in the same system?  Compare the 2 VRaptors to the 2 Seagates with EVERYTHING else the same?  If performance is the same, then all the whaling and gnashing of teeth over spending a 5x premium per GB for storage using the VRaptors goes away.  That's immediately useful information to a chunk of people.

 

Also by testing LAN I/O independantly (writing to a RAMdisk on unRIAD) and testing the disk I/O internally, you get important info.  Lets say LAN I/O is 60MB/sec and disk I/O to a parity protected disk is also 60MB/sec.... when when combining the two you only get 20MB/sec over the wire.  That's a very different piece of VERY important information which could lead to examination of scheduling, chipset, interrupts, and buss access/congestion.

 

I'm whittling down choices on caching SATA controllers, and when I get one, I'll bench several drives with it under unRAID even though that means recompiling the kernel.  Suppose that gives me a 2x or 3x improvement in unRAID write speed such as enabling async I/O in Samba and tuning the scheduler did.  That would militate largely in favor of getting caching controllers and SAS support into unRAID stock and point to a easy $350 upgrade for users, who can both keep all their existing drives and continue to buy storage in the sweetspot of $/GB and low-power green drives.  That will also give impetus to efforts to improve network i/o.  As it is now, disk bottlenecks are much more relevant that LAN bottlenecks.  Get parity-protected disk I/O up to 80MB/sec (internally) and then optimization efforts will turn to LAN I/O.

 

If, OTOH, if attempts at improvement in disk I/O don't ever produce something north of 60MB/sec for parity-protected writes, then improving LAN throughput will be of little use w/o a cache drive.

 

Link to comment

Fair enough, bubbaQ - I think we are finally on the same page.  I just didn't want to let this thread go down a rathole again.

 

I don't have a matched pair of Seagates available (the matched pair of 1.5TB drives I do have are in use in my current system and I don't want to risk that data by transplanting the drives from system to system).

 

And just to be perfectly clear - the vraptors were included only to show the best that could be expected from rotating media for a given testbed.  I have no intention of using them in my array.  Perhaps with SSD coming on strong, WD may start selling 10,000 RPM large capacity drives at mainstream prices.  You never know.

 

Your ideas about disk cache vs RPM has given me something to think about and I was planning to put a pair of 7200 RPM, 32MB Cache drives into both test beds to show the impact of the on drive cache.  I just don't have a matched pair, which would be most ideal.

 

In fact, I would like to have:  

 

(2) 7200 RPM 16 MB drives

(2) 7200 RPM 32 MB drives

(2) 10,000 RPM 16 MB drives

 

I have the last pair, I have the 2nd pair (but only when mixing two drives from different vendors), but only have one 16MB 7200 drive in my collection.  And two drives from different suppliers even with the same cache and spindle speed configuration are going to differ in performance.

 

With all 3 combinations from the same supplier, we may quickly see the impact of rotational speed vs cache and the interaction between the two.

 

 

 

Link to comment

You don't need matched pairs.  I know this goes against the grain but there are reasons:

 

First, in any RAID array, you always want different drives from different batches, preferably several months or more different in age.... otherwise you increase the risk of multiple drives failing in close temporal space.

 

Second, you can benchmark, then swap data and cache drives around, to see if it makes a difference.  If they do, then you look at firmware changes.

 

Third, if there are any significant changes in firmware, it will be generally known and discussed somewhere (if one sucks, the blogsphere will know it... if one is better than the old, the manufacturer will trumpet it).

 

I'm also interested in a pure rotational analysis... a 7200 32MB cache paired with a 5400 32MB cache drive.. and see what the performance difference is with the 7200 as parity and 5400 as data, and the reverse.

 

Don't forget there is also at least one Seagate 64MB cache 7200 rpm drive, and any test should include the WD EARS 64MB cache 5400 rpm drives, since that looks like it is going to be a very popular drive, and soon the sweetspot for $/GB.

 

Finally, once you find the fastest pair, you need to rerun those with the drives on different SATA ports on the mobo.  Both odd... both even... mix odd/even adjacent, and hi/lo (1 and 6 for example).  On some mobos, that will make a difference.

 

It sounds like you are benchmarking with production data.  I prefer to do serious benchmarking and swap things around with NEW drives, before they have data on them so I can reformat, move, etc., w/o worry.

 

Link to comment

 

It sounds like you are benchmarking with production data.  I prefer to do serious benchmarking and swap things around with NEW drives, before they have data on them so I can reformat, move, etc., w/o worry.

 

 

Only my baseline system is with production data in place.  All of my other tests (and future tests) will be from from disks which are freshly formatted or new.  I can give the mismatched 7200 RPM, 32MB drives a shot and see how they fare.  I can do a quick check at storagereview.com to see if the drives are comparable in structure and performance.

 

My first tests show that both systems show a similar read performance "pattern" regardless of the disks.  I did a graph of the output and there are step function changes in read performance between 32MB files and 64MB files and a similar, less dramatic step change between 512MB and 1GB file sizes.  Write performance is certainly not linear, but doesn't have the abrupt steps seen in the read rates.

 

You're tempting me on the EARS drives, but my current setups seem to be showing a trend that higher drive cache trumps rotation speed only for small file sizes in write performance.  Read performance is not as clear.

Link to comment

This is the motherboard I'm going to buy when it is released.  

 

I finally had a chance to checkout the motherboard you had been looking at.  This is an interesting choice.  During my work on this project I had also been thinking that the new pineview ATOM chips may be a great fit (even lower power, greater processing performance, passively cooled and on-package graphics).  I wonder why Supermicro is bundling that with Matrox graphics (there's a name I haven't seen in the headlines for a while).  Seems it would defeat some of the great low-power performance the new chip has to offer - but they may be gearing this for the HTPC market if the Matrox does full hardware decode of bluray.  

 

It does have a nice number of onboard SATA headers, mini ITX form factor - all good stuff.  Nice find!  

Link to comment

Gary, BubbaQ has some valid points in the past couple of posts in regards to tests I would find interesting.

i.E. drive cache/rotational variances in the same system first (then in your other system).

 

Still would like to see your DD benchmarks posted too.

 

my current setups seem to be showing a trend that higher drive cache trumps rotation speed only for small file sizes in write performance.  Read performance is not as clear.

 

This may be so. It all depends on what you are doing on the server. Dumping a huge rip is going to show one behavior. Doing allot of torrents is going to show other behavior.

 

Link to comment

If this will help, here are some pure LAN measurements with my setup:

 

-- unRAID: 1GHz ULV Intel Mobile, no L2 cache, 1GB RAM.

-- Desktop PC: 2.4GHz Core-2 Duo, 2GB RAM, winXP.

-- Gigabit LAN over a crappy $30 switch.

 

Trensfers are between a RAM disk on the XP and the rootfs on unRAID.

No hard disks are involved.  File transferred is about 0.6GB.

 

Samba:

-- XP -> unRAID: ~51MB/s.

-- unRAID -> XP: ~39MB/s.

 

FTP:

-- XP -> unRAID: ~53MB/s.

-- unRAID -> XP: ~88MB/s.

 

Strangely, when using Samba, writing to my unRAID is faster than reading from it.

When using FTP, reading from my unraid is faster than writing to it.

 

------------------------

 

Now, some pure disk measurements. No LAN involved.

 

-- Writing to a disk outside the protected array:

dd if=/dev/zero  bs=1M count=1000  of=/mnt2/diskX/trash/test1

1048576000 bytes (1.0 GB) copied, 22.3 s, 47.0 MB/s

 

-- Reading from a disk outside the protected array:

echo 3 > /proc/sys/vm/drop_caches

dd of=/dev/null  bs=1M  if=/mnt2/diskX/trash/test1

1048576000 bytes (1.0 GB) copied, 12.84 s, 81.7 MB/s

 

-- Writing to a disk in the protected array:

dd if=/dev/zero  bs=1M count=1000  of=/mnt/disk2/trash/test2

1048576000 bytes (1.0 GB) copied, 42.68 s, 24.6 MB/s

 

-- Reading from a disk in the protected array:

echo 3 > /proc/sys/vm/drop_caches

dd of=/dev/null  bs=1M  if=/mnt/disk2/trash/test2

1048576000 bytes (1.0 GB) copied, 12.75 s, 82.2 MB/s

 

Notice my disk reads exceed my Samba pure LAN read speed.

 

It should also be said that the above speeds vary greatly depending on whether we are writing to a empty disk or to a disk that's almost full.

 

 

Link to comment
Trensfers are between a RAM disk on the XP and the rootfs on unRAID.

No hard disks are involved.  File transferred is about 0.6GB.

 

Samba:

-- XP -> unRAID: ~51MB/s.

-- unRAID -> XP: ~39MB/s.

 

FTP:

-- XP -> unRAID: ~53MB/s.

-- unRAID -> XP: ~88MB/s.

 

Strangely, when using Samba, writing to my unRAID is faster than reading from it.

When using FTP, reading from my unraid is faster than writing to it.

 

That's not unexpected.  FTP and NFS reads are always better than Samba.  Try it Linux to Linux instead of XP to/from Linux and you will (likely) get even better results.

 

 

Link to comment

Trensfers are between a RAM disk on the XP and the rootfs on unRAID.

No hard disks are involved.  File transferred is about 0.6GB.

 

Samba:

-- XP -> unRAID: ~51MB/s.

-- unRAID -> XP: ~39MB/s.

 

FTP:

-- XP -> unRAID: ~53MB/s.

-- unRAID -> XP: ~88MB/s.

 

Strangely, when using Samba, writing to my unRAID is faster than reading from it.

When using FTP, reading from my unraid is faster than writing to it.

 

FTP and NFS reads are always better than Samba.

 

 

That was expected. 

 

What wasn't expected was that Samba had slower reads that writes, while the FTP was the other way around.

(in both the Samba and the FTP cases, it was a XP client connecting to the unRAID server)

 

Link to comment
I did a graph of the output and there are step function changes in read performance between 32MB files and 64MB files and a similar, less dramatic step change between 512MB and 1GB file sizes.

 

You should be realizing that how you intend to use a server can make a big difference in how you tune it.... just like designing a camshaft for a car -- optimize it for highway speeds, city traffic, or a compromise between the two.

 

I mostly write very large files and indexes to unRAID.  Someone who writes many small files will optimize it differently.  Someone using handbrake or GK to transcode video will want more CPU.  Someone running VMs will want more RAM.  Someone running MySQL and other applications may want more RAM, CPU, and swap space.

Link to comment

Here is some data from mpstat:

 

Copying 0.6GB file to cache disk outside the array from XP RAMdisk to unRAID w/Crosssover cable over SAMBA.

 

Network throughput was 19MB/sec.

 

22:45:57     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
22:46:00     all    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
22:46:03     all    0.83    0.00    7.00   39.33    1.67    1.83    0.00    0.00   49.33
22:46:06     all    0.33    0.00    4.32   79.73    1.16    1.33    0.00    0.00   13.12
22:46:09     all    0.50    0.00    3.50   75.67    1.83    2.50    0.00    0.00   16.00
22:46:12     all    0.17    0.00    3.67   77.00    1.00    1.00    0.00    0.00   17.17
22:46:15     all    1.16    0.00    7.64   68.60    2.33    2.49    0.00    0.00   17.77
22:46:18     all    0.83    0.00    4.33   73.83    1.50    1.00    0.00    0.00   18.50
22:46:21     all    0.00    0.00    4.17   69.50    2.00    1.67    0.00    0.00   22.67
22:46:24     all    0.17    0.00    4.17   74.50    2.00    1.33    0.00    0.00   17.83
22:46:27     all    0.17    0.00    4.00   67.00    1.17    1.50    0.00    0.00   26.17
22:46:30     all    0.50    0.00    6.50   58.50    3.17    3.83    0.00    0.00   27.50
22:46:33     all    0.33    0.00    3.17   69.83    1.00    0.33    0.00    0.00   25.33
22:46:36     all    0.00    0.00    1.33   30.50    0.17    0.00    0.00    0.00   68.00
22:46:39     all    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
22:46:42     all    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00

 

Now copying the same file to a RAMdisk on unRAID with the same setup:

Network throughput was 80MB/sec.

 

22:52:08     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
22:52:11     all    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
22:52:14     all    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
22:52:17     all    2.33    0.00    5.15    0.00    2.99    7.64    0.00    0.00   81.89
22:52:20     all    2.33    0.00    7.14    0.00    4.32    5.98    0.00    0.00   80.23
22:52:23     all    2.66    0.00    5.81    0.00    4.98    6.81    0.00    0.00   79.73
22:52:27     all    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
22:52:30     all    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
22:52:33     all    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
22:52:36     all    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
22:52:39     all    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00

 

This is a huge red flashing arrow pointing to I/O wait.  Is CPU is waiting for the disk, the IRQ, the bus bandwidth?

 

So I tested the RAMdisk while running mpstat:

 

root@Tower:/# dd if=/dev/zero of=/test.dd bs=1M count=600
600+0 records in
600+0 records out
629145600 bytes (629 MB) copied, 0.832671 s, 756 MB/s


22:57:46     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
22:57:47     all    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
22:57:48     all    2.48    0.00   18.32    0.00    0.00    0.00    0.00    0.00   79.21
22:57:49     all    1.98    0.00   26.73    0.00    0.00    0.99    0.00    0.00   70.30
22:57:50     all    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
22:57:51     all    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00

 

Then tested the cache disk:

 

root@Tower:/# dd if=/dev/zero of=/mnt/cache/test.dd bs=1M count=600
600+0 records in
600+0 records out
629145600 bytes (629 MB) copied, 32.5752 s, 19.3 MB/s

22:59:09     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
22:59:12     all    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
22:59:15     all    0.17    0.00    7.31   62.13    0.00    0.00    0.00    0.00   30.40
22:59:18     all    0.00    0.00    5.81   92.69    0.00    0.00    0.00    0.00    1.50
22:59:21     all    0.00    0.00    2.83   94.19    0.14    0.00    0.00    0.00    2.83
22:59:24     all    0.00    0.00    2.61   80.52    0.00    0.00    0.00    0.00   16.87
22:59:27     all    0.00    0.00    4.32   67.94    0.00    0.00    0.00    0.00   27.74
22:59:30     all    0.00    0.00    3.49   76.41    0.17    0.00    0.00    0.00   19.93
22:59:33     all    0.00    0.00    3.82   60.80    0.17    0.00    0.00    0.00   35.22
22:59:36     all    0.00    0.00    3.49   70.43    0.50    0.00    0.00    0.00   25.58
22:59:39     all    0.00    0.00    2.82   64.45    0.50    0.00    0.00    0.00   32.23
22:59:42     all    0.00    0.00    3.99   82.23    0.17    0.00    0.00    0.00   13.62
22:59:45     all    0.00    0.00    3.65   73.09    0.66    0.00    0.00    0.00   22.59
22:59:48     all    0.00    0.00    1.00   33.39    0.00    0.00    0.00    0.00   65.61
22:59:51     all    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
22:59:54     all    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
22:59:57     all    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00

 

So what is the conclusion?  Writing to this disk, in this system, is crap and not because ot the LAN.  

 

Here's the specs on the cache drive:

 

 Model=WDC WD600JB-00CRA1, FwRev=17.07W17, SerialNo=WD-WCA8F3616341
Config={ HardSect NotMFM HdSw>15uSec SpinMotCtl Fixed DTR>5Mbs FmtGapReq }
RawCHS=16383/16/63, TrkSize=57600, SectSize=600, ECCbytes=40
BuffType=DualPortCache, BuffSize=8192kB, MaxMultSect=16, MultSect=16
CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=117229295
IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
PIO modes:  pio0 pio1 pio2 pio3 pio4
DMA modes:  mdma0 mdma1 mdma2
UDMA modes: udma0 udma1 udma2 udma3 udma4 *udma5
AdvancedPM=no WriteCache=enabled
Drive conforms to: Unspecified:  ATA/ATAPI-1,2,3,4,5

 

Yup... an old piece of crap, PATA 60GB WD with 8M cache.  Now I've got a good baseline to compare this to a parity-protected drive, and to start trying different drives.  I also know, based on NIC performance,  that a cache drive faster than 80MB/sec is wasted... so no need to wonder if an SSD or VRaptor for cache would be worth the money.  I have a WD3200AAKS (SATA/300, 16M cache, 7200 rpm) I'll try in it tomorrow.

 

But an SSD cache or VRaptor would be useful for testing since they are significantly faster than what the NIC alone can do in a perfect scenario, to see if the entire disk subsystem (disk and controller and bus and NIC working together) as a system will deliver close to the max of the NIC alone.  If not, then something in that listdoesn't play nice together (which is much tougher to tune or troubleshoot).  If it does, then I'm golden.

 

The point I am trying to make, is that testing the entire system, without testing individual components to establish what could be possible if everything plays together perfectly, doesn't give you a picture that is nearly as useful.

Link to comment

Now the exact same setup, with the WD Blue 7200/16MB, and a 6GB file:

(Note, this is an unRAID system, but w/o a parity disk)

 

Internal write, outside the array:

root@Tower:/mnt/disk3# dd if=/dev/zero of=test.dd bs=1M count=6000
6000+0 records in
6000+0 records out
6291456000 bytes (6.3 GB) copied, 64.7915 s, 97.1 MB/s

08:29:05     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
08:29:10     all    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
08:29:15     all    0.60    0.00   23.55   24.95    0.00    1.40    0.00    0.00   49.50
08:29:20     all    0.70    0.00   25.15   19.36    0.20    1.50    0.00    0.00   53.09
08:29:25     all    1.20    0.00   26.15   28.64    0.00    2.00    0.00    0.00   42.02
08:29:30     all    0.20    0.00   24.25   32.34    0.30    1.70    0.00    0.00   41.22
08:29:35     all    0.60    0.00   29.24   27.74    0.10    2.59    0.00    0.00   39.72
08:29:40     all    0.60    0.00   28.24   30.44    0.00    1.50    0.00    0.00   39.22
08:29:45     all    0.40    0.00   24.35   23.35    0.10    1.70    0.00    0.00   50.10
08:29:50     all    0.40    0.00   25.65   23.05    0.00    1.50    0.00    0.00   49.40
08:29:55     all    0.40    0.00   26.82   22.73    0.00    1.40    0.00    0.00   48.65
08:30:00     all    0.60    0.00   24.88   24.68    0.00    2.80    0.00    0.00   47.05
08:30:05     all    0.90    0.00   29.74   20.96    0.10    1.50    0.00    0.00   46.81
08:30:10     all    0.40    0.00   25.55   24.75    0.20    1.80    0.00    0.00   47.31
08:30:15     all    0.70    0.00   27.05   22.65    0.30    2.10    0.00    0.00   47.21
08:30:20     all    0.00    0.00    0.80    1.10    0.10    0.30    0.00    0.00   97.70
08:30:25     all    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
08:30:30     all    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00

 

LAN write:

LAN I/O was 60MB/sec:

 

08:32:01     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
08:32:06     all    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
08:32:11     all    0.00    0.00    4.09    7.88    2.50    1.00    0.00    0.00   84.53
08:32:16     all    0.40    0.00    8.18   12.77    4.79    2.00    0.00    0.00   71.86
08:32:21     all    0.50    0.00    7.58   13.67    5.59    2.20    0.00    0.00   70.46
08:32:26     all    0.40    0.00    7.19   14.47    5.99    3.39    0.00    0.00   68.56
08:32:31     all    0.30    0.00    7.09   15.57    4.99    2.20    0.00    0.00   69.86
08:32:36     all    0.40    0.00    8.28   14.27    4.29    2.99    0.00    0.00   69.76
08:32:41     all    0.10    0.00    8.68   14.47    4.39    2.69    0.00    0.00   69.66
08:32:46     all    0.60    0.00    8.08   15.17    4.69    1.70    0.00    0.00   69.76
08:32:51     all    0.60    0.00    6.49   14.17    4.89    2.20    0.00    0.00   71.66
08:32:56     all    0.60    0.00    7.78   13.97    6.69    2.40    0.00    0.00   68.56
08:33:01     all    0.50    0.00    7.88   15.67    5.19    1.60    0.00    0.00   69.16
08:33:06     all    0.60    0.00    7.19   12.77    5.49    2.40    0.00    0.00   71.56
08:33:11     all    0.50    0.00    8.18   11.88    6.19    2.10    0.00    0.00   71.16
08:33:16     all    0.00    0.00    7.88   13.27    6.19    1.70    0.00    0.00   70.96
08:33:21     all    0.90    0.00    8.58   12.67    5.89    2.89    0.00    0.00   69.06
08:33:26     all    0.20    0.00    8.68   14.27    5.59    2.20    0.00    0.00   69.06
08:33:31     all    0.50    0.00    7.98   11.58    5.69    2.30    0.00    0.00   71.96
08:33:36     all    0.60    0.00    6.89   14.07    5.29    2.40    0.00    0.00   70.76
08:33:41     all    0.30    0.00    7.88   14.87    6.29    2.00    0.00    0.00   68.66
08:33:46     all    1.10    0.00    8.68   13.07    4.79    2.30    0.00    0.00   70.06
08:33:51     all    0.50    0.00    8.38   13.77    4.29    2.99    0.00    0.00   70.06
08:33:56     all    0.40    0.00    9.28   13.47    4.59    2.20    0.00    0.00   70.06
08:34:01     all    0.10    0.00    7.09   15.07    3.39    1.90    0.00    0.00   72.46
08:34:06     all    0.50    0.00    7.98   10.78    3.59    2.20    0.00    0.00   74.95
08:34:11     all    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
08:34:16     all    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00

 

Here are the drive specs:

Model=WDC WD3200AAKS-00L9A0                   , FwRev=01.03E01, SerialNo=     WD-WMAV20487664
Config={ HardSect NotMFM HdSw>15uSec SpinMotCtl Fixed DTR>5Mbs FmtGapReq }
RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=50
BuffType=unknown, BuffSize=16384kB, MaxMultSect=16, MultSect=?16?
CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=625140335
IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
PIO modes:  pio0 pio3 pio4
DMA modes:  mdma0 mdma1 mdma2
UDMA modes: udma0 udma1 udma2 udma3 udma4 udma5 *udma6
AdvancedPM=no WriteCache=enabled
Drive conforms to: Unspecified:  ATA/ATAPI-1,2,3,4,5,6,7

 

So this drive, is faster (97MB/sec) than the top LAN transfer capability (80MB/sec) yet when writing to it over the LAN (via Samba), throughput is only 60MB/sec.  But in this run, it is not time wasted waiting for IO -- I/O wait is lower than the internal copy!  The culprit appears to be time spent servicing interrupts -- hw and sw.  Writing internally also had more CPU time spend in the kernel.  So servicing irqs appears to be throttling availability to do needed kernel work.

 

Anyone who follows Linux kernel development knows that the trifecta of IRQ, BKL, and scheduling has been a sticky wicket for a long time, with a lot of work in the 2.6 kernel devoted to it.

 

The mobo is a Gigabyte  G31M-ES2L, ICH7, using the RTL8111C which is on the PCIe x1 to the ICH7.  But there is a PCIe 16x slot, which bypasses the ICH7 and talks directly to the G31.  Also note that the internal write test to the new drive generated very little time in hw IRQs, so signs point strongly to the hw IRQs from the NIC.  Using the same NIC and Samba, but bypassing the disk and writing to a RAMdisk on unRAID, gave 80MB/sec of throughput, and similar higher times in hw IRQs.

 

Bottom line: I'd like to test this setup with 1) a NIC that offloads work and minimizes interrupts and/or 2) a PCIe x16 SATA card.

 

This test should also be enlightening to folks that are anxious for Ethernet teaming -- it won't help much (at least not on this mobo setup) because the problem appears to not be LAN bandwidth, but servicing NIC IRQs while writing to a disk on hte ICH7.  More CPU and RAM won't help either.  A caching controller won't likely help either -- however a PCIe x16 SATA controller may help, because it would bypass the ICH7 for disk i/o.

Link to comment

Bottom line: I'd like to test this setup with 1) a NIC that offloads work and minimizes interrupts and/or 2) a PCIe x16 SATA card.

 

This test should also be enlightening to folks that are anxious for Ethernet teaming -- it won't help much (at least not on this mobo setup) because the problem appears to not be LAN bandwidth, but servicing NIC IRQs while writing to a disk on hte ICH7.  More CPU and RAM won't help either.  A caching controller won't likely help either -- however a PCIe x16 SATA controller may help, because it would bypass the ICH7 for disk i/o.

 

Would jumbo frames help?

Would there be less IRQ's to be serviced if more data was pumped into the card's buffer?

I know you've mentioned before that jumbo frames would not help much.

It would still be interesting to see the hard numbers of a test. (if you had the time).

Link to comment

Jumbo frames help the most when you have high latency.... pounding packets from Miami to LA.  They also help reduce CPU utilization in routers.  I'd have to do some research into large MTU effects on IRQs in a desktop environment.  My gut feeling is that it won't help, as IRQs from the nic are principally coming from block transfers and thus proportional to total bytes, and not to number of packets.  You could find something that gives you actual IRQ counts from the kernel, and compare that to the packet count.

 

In any event, I can't test them with this setup.

Link to comment
I wonder why Supermicro is bundling that with Matrox graphics (there's a name I haven't seen in the headlines for a while).  Seems it would defeat some of the great low-power performance the new chip has to offer - but they may be gearing this for the HTPC market if the Matrox does full hardware decode of bluray.

I'm not sure about the Matrox choice, but the H and L models use the onboard Atom GPU.  However, I should be able to disable the video in the bios and boot headless once I've gotten everything the way I want it, so I don't think the extra power requirements will be an issue in the long run.  I wanted the HF board because of the extra KVM management it offers.  I just ordered one, so I might be able to test it out this weekend.

Link to comment

bubbaQ:

 

I was looking at this motherboard for a system I am building for another purpose.  Do you think these new high speed Marvell controllers would help alleviate the I/O bottleneck on the faster drives?  worth including in some I/O testing?  There are some really good low power CPU options on these H55 boards also.

 

http://www.newegg.com/Product/Product.aspx?Item=N82E16813128412&cm_re=usb_3.0_motherboard-_-13-128-412-_-Product

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.