[SOLVED] Slow Parity Writes?


Recommended Posts

I think I'm experiencing slow write performance (parity protected) with my array.  It's 9 data drives plus the parity (no cache disk).  Usually, I'm getting 8-12MB/s writing to the array from network.  I'm connected at 1000MB/s at full-duplex according to ethtool.  I didn't think anything of my slow speeds until reading more on the forums here.  I've had roughly the same performance on version 4.7 and now with 5.0 b12a.

 

The CPU is an AMD Sempron LE-1200, 1GB of DDR2 memory, Giga-byte GA-MA78G-DS3H motherboard (HPA disabled).  I have a Monoprice 2-port pci-E 1x (SIL3132) card and a Rosewill RC-218 4 port SATAII pciE-4x (Marvell 88SX7042) card.  The power supply is an Antec Signature 850W.

 

I'm more concerned right now with the write test below.  The numbers seem slow, but I'm not sure.  If the numbers are high here and low between copies across the network, then I can isolate it to a network issue.  But it seems like writing to the array is slow internally within the server.

 

The first column is the average of two dd if=/dev/zero of=//mnt/disk#/test.dd count=8192000 test runs.
The second column is hdParm -t

Parity drive is a 2TB Seagate Barracuda LP (5900rpm).

Disk 1 (sdj)     16.9     80          500GB Western Digital Caviar Blue (7200rpm)
Disk 2 (sdh)     15.2    123        1TB Seagate Barracuda 7200.12 (7200rpm)
Disk 3 (sdg)     17       108        640GB Western Digital Caviar Blue (7200rpm)
Disk 4 (sdk)     12.5    113        1TB Samsung F1 (7200rpm)
Disk 5 (sdb)     19.5    75          500GB Hitachi P7K500 (7200rpm)
Disk 6 (sdc)     20.2     83         500GB Hitachi P7K500 (7200rpm)
Disk 7 (sdi)      16.1    108        1TB Samsung F1 (7200rpm)
Disk 8 (sde)     17.6    108        1TB Western Digital Caviar Green (5400rpm)
Disk 9 (sdd)     13.3    120        2TB Western Digital Caviar Green (5400rpm)

syslog-2011-09-12.txt

Link to comment

That PSU has 4 12V rails. See this for PSU info: http://lime-technology.com/forum/index.php?topic=12219.0

 

My mistake.  I realized that the power supply for the server is actually a Thermaltake Toughpower 1000w Cable Management.

 

One more thing I realized is that the parity drive isn't 4k aligned.  Could that be what is causing the performance issue?  If so, I have a brand new 2TB Seagate LP 64MB cache drive that I'll preclear with 4k alignment, and replace the existing parity drive with it.

Link to comment

The Thermaltake also has four 12V rails. The maximum amperage is 36A.

 

What is nice about that power supply is that each modular connection is labelled as to which voltage rail it is part of.  I've distributed the rails pretty evenly (which was easy since half the drives are direct SATA power connections, and the other half are in hotswap racks that take molex.

Link to comment

That PSU has 4 12V rails. See this for PSU info: http://lime-technology.com/forum/index.php?topic=12219.0

 

My mistake.  I realized that the power supply for the server is actually a Thermaltake Toughpower 1000w Cable Management.

 

One more thing I realized is that the parity drive isn't 4k aligned.  Could that be what is causing the performance issue?  If so, I have a brand new 2TB Seagate LP 64MB cache drive that I'll preclear with 4k alignment, and replace the existing parity drive with it.

4K alignment is ONLY an issue with 2TB EARS (so-called "advanced format") drives.   Your parity drive is a seagate...   Alignment will not make any difference.  

 

What will make a difference is replacing the parity drive with one with a faster rotational speed.  (since you have a large number of 7200 RPM drives, a 7200 RPM parity drive will make writing to those drives about a third faster.)   When writing to the array, the slowest rotational speed drive involved dictates the overall write speed.

Link to comment

That PSU has 4 12V rails. See this for PSU info: http://lime-technology.com/forum/index.php?topic=12219.0

 

My mistake.  I realized that the power supply for the server is actually a Thermaltake Toughpower 1000w Cable Management.

 

One more thing I realized is that the parity drive isn't 4k aligned.  Could that be what is causing the performance issue?  If so, I have a brand new 2TB Seagate LP 64MB cache drive that I'll preclear with 4k alignment, and replace the existing parity drive with it.

4K alignment is ONLY an issue with 2TB EARS (so-called "advanced format") drives.   Your parity drive is a seagate...   Alignment will not make any difference.  

 

What will make a difference is replacing the parity drive with one with a faster rotational speed.  (since you have a large number of 7200 RPM drives, a 7200 RPM parity drive will make writing to those drives about a third faster.)   When writing to the array, the slowest rotational speed drive involved dictates the overall write speed.

 

So are the speeds I'm getting typical?  It seems like others with all Green/LP drives are able to get faster speeds than me.  I know that the Green/LP drives in my array are holding me back, but I'd assume that I'd still be able to hit 20MB/s on transfers.  Most of the internal writes using dd are around 15MB/s.

Link to comment

Which controller are these on?

Line 1102: Sep  8 15:34:51 fileserver kernel: ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen

Line 1114: Sep  8 15:55:20 fileserver kernel: ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen

Line 1134: Sep  8 16:20:14 fileserver kernel: ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen

Line 1339: Sep 10 02:12:00 fileserver kernel: ata6.00: exception Emask 0x50 SAct 0x0 SErr 0x280900 action 0x6 frozen

Line 1448: Sep 10 18:58:50 fileserver kernel: ata6.00: exception Emask 0x50 SAct 0x0 SErr 0x280900 action 0x6 frozen

Line 1691: Sep 12 12:25:29 fileserver kernel: ata6.00: exception Emask 0x50 SAct 0x0 SErr 0x280900 action 0x6 frozen

Line 1704: Sep 12 12:27:07 fileserver kernel: ata6.00: exception Emask 0x50 SAct 0x0 SErr 0x280900 action 0x6 frozen

Line 1717: Sep 12 12:27:27 fileserver kernel: ata6.00: exception Emask 0x50 SAct 0x0 SErr 0x280900 action 0x6 frozen

 

Are these normal for your system/card?

http://lime-technology.com/wiki/index.php?title=The_Analysis_of_Drive_Issues#Drive_interface_issue_.232

Link to comment

Which controller are these on?

Line 1102: Sep  8 15:34:51 fileserver kernel: ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen

Line 1114: Sep  8 15:55:20 fileserver kernel: ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen

Line 1134: Sep  8 16:20:14 fileserver kernel: ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen

Line 1339: Sep 10 02:12:00 fileserver kernel: ata6.00: exception Emask 0x50 SAct 0x0 SErr 0x280900 action 0x6 frozen

Line 1448: Sep 10 18:58:50 fileserver kernel: ata6.00: exception Emask 0x50 SAct 0x0 SErr 0x280900 action 0x6 frozen

Line 1691: Sep 12 12:25:29 fileserver kernel: ata6.00: exception Emask 0x50 SAct 0x0 SErr 0x280900 action 0x6 frozen

Line 1704: Sep 12 12:27:07 fileserver kernel: ata6.00: exception Emask 0x50 SAct 0x0 SErr 0x280900 action 0x6 frozen

Line 1717: Sep 12 12:27:27 fileserver kernel: ata6.00: exception Emask 0x50 SAct 0x0 SErr 0x280900 action 0x6 frozen

 

Are these normal for your system/card?

http://lime-technology.com/wiki/index.php?title=The_Analysis_of_Drive_Issues#Drive_interface_issue_.232

 

How can I check what drives those correspond to?  I will check in an hour or two when I'm at home.

Link to comment

Which controller are these on?

Line 1102: Sep  8 15:34:51 fileserver kernel: ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen

Line 1114: Sep  8 15:55:20 fileserver kernel: ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen

Line 1134: Sep  8 16:20:14 fileserver kernel: ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen

Line 1339: Sep 10 02:12:00 fileserver kernel: ata6.00: exception Emask 0x50 SAct 0x0 SErr 0x280900 action 0x6 frozen

Line 1448: Sep 10 18:58:50 fileserver kernel: ata6.00: exception Emask 0x50 SAct 0x0 SErr 0x280900 action 0x6 frozen

Line 1691: Sep 12 12:25:29 fileserver kernel: ata6.00: exception Emask 0x50 SAct 0x0 SErr 0x280900 action 0x6 frozen

Line 1704: Sep 12 12:27:07 fileserver kernel: ata6.00: exception Emask 0x50 SAct 0x0 SErr 0x280900 action 0x6 frozen

Line 1717: Sep 12 12:27:27 fileserver kernel: ata6.00: exception Emask 0x50 SAct 0x0 SErr 0x280900 action 0x6 frozen

 

Are these normal for your system/card?

http://lime-technology.com/wiki/index.php?title=The_Analysis_of_Drive_Issues#Drive_interface_issue_.232

 

How can I check what drives those correspond to?  I will check in an hour or two when I'm at home.

 

Ok so I checked the system now that I'm home from work.  They are both devices on the onboard SATA ports.  The device ata7 had a semi loose SATA power cable.  It was plugged in, but when I pushed on it, it moved maybe a milimetre and it clicked in.

 

I also didn't remember that there was a Promise PCI SATA150 card installed (no drives on it), and I removed it.  I also swapped the ata7 device from the onboard SATA ports to the remaining port on the SIL3132 pci-E card.  I figured might as well since I was getting SB600/700 softreset errors from the onboard ports.

 

One other final change.  I checked the manual for the motherboard, and found that since I was using a specific pci-E slot for the Monoprice SATA controller on my motherboard, the pci-E 4x slot where the Rosewill controller card was plugged into was running at 1x speeds.  I moved over the Monoprice card to the pci-E 16x slot, so now the Rosewill controller can now use the full 4x slot (rather than being limited to 1x).

 

Attached is a new syslog and screenshot of my BIOS settings.

syslog-2011-09-121.txt

Link to comment

Don't have a chance to look right now, but looks like you found a few things that should improve your writes.

 

Did find this on the SB600

[ 2.056412] ata5: applying SB600 PMP SRST workaround and retrying

 

The above two are expected. It's a bug in SB600 controller being

worked around.

http://linux.derkeiler.com/Mailing-Lists/Kernel/2009-09/msg05203.html

 

I just got a new drive today as I'm running out of space.  It's preclearing now, so I'll redo tests tomorrow or the day after when it's part of the array (replacing one of the 500GB drives).

Link to comment

Don't have a chance to look right now, but looks like you found a few things that should improve your writes.

 

Did find this on the SB600

[ 2.056412] ata5: applying SB600 PMP SRST workaround and retrying

 

The above two are expected. It's a bug in SB600 controller being

worked around.

http://linux.derkeiler.com/Mailing-Lists/Kernel/2009-09/msg05203.html

 

I just got a new drive today as I'm running out of space.  It's preclearing now, so I'll redo tests tomorrow or the day after when it's part of the array (replacing one of the 500GB drives).

 

So the new drive I was going to add ended up having lots (1000+) bad sectors during preclear.  So the new drive is being held off while I get it RMA'd.  I did do new tests on my file server.

 

The numbers in brackets are the new results.  The numbers are generally the same.  I'll have to check when I go home, but the drives (1,2,6,7) that write faster seem to be on the Rosewill pcie 4x card.  Could the motherboard's SATA controller be saturated?  I have 5 drives (parity is one of the drives) on it.  I have a spare Promise 4-port SATA300 PCI controller I could use dedicated for just the parity drive.  Is that worth trying?

 

The first column is the average of two dd if=/dev/zero of=//mnt/disk#/test.dd count=8192000 test runs.
The second column is hdParm -t

Parity drive is a 2TB Seagate Barracuda LP (5900rpm).

Disk 1 (sdj)     16.9(15.6)     80(81)           500GB Western Digital Caviar Blue (7200rpm)
Disk 2 (sdh)     15.2(17.5)    123(92)        1TB Seagate Barracuda 7200.12 (7200rpm)
Disk 3 (sdg)     17(19.6)       108(108)       640GB Western Digital Caviar Blue (7200rpm)
Disk 4 (sdk)     12.5(13.    113(115)       1TB Samsung F1 (7200rpm)
Disk 5 (sdb)     19.5(14.4)    75(73)           500GB Hitachi P7K500 (7200rpm)
Disk 6 (sdc)     20.2(18.9)     83(84)          500GB Hitachi P7K500 (7200rpm)
Disk 7 (sdi)      16.1(18.4)    108(105)       1TB Samsung F1 (7200rpm)
Disk 8 (sde)     17.6(13.0)    108(108)       1TB Western Digital Caviar Green (5400rpm)
Disk 9 (sdd)     13.3(13.9)    120(122)       2TB Western Digital Caviar Green (5400rpm)

Link to comment

So I checked my memory usage and top processes when running dd (which would be the same as writing anything to the array right?), and I'm concerned with the results.

 

top - 08:39:48 up 3 days, 13:30,  2 users,  load average: 1.89, 1.70, 1.04
Tasks:  96 total,   2 running,  94 sleeping,   0 stopped,   0 zombie
Cpu(s):  3.2%us, 18.3%sy,  0.0%ni, 50.1%id, 26.4%wa,  0.0%hi,  2.1%si,  0.0%st
Mem:    901436k total,   892476k used,     8960k free,    48316k buffers
Swap:        0k total,        0k used,        0k free,   722876k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND            
11058 root      20   0  2304  692  520 R 75.3  0.1   2:42.12 dd              

 

It looks like cpu usage is high for the dd process, and my load average is quite high too right?  I'm using a single core Sempron LE-1200.  Could that be holding things back?

Link to comment

So I think I figured out what the problem is.  I swapped the Sempron LE-1200 processor (2.1GHz single-core) with a Phenom II 7750 (2.7GHz dual-core).  Speeds have increased.  With the old CPU, it was maxing out all the time when doing a write to the array.

 

The first column is the average of two dd if=/dev/zero of=//mnt/disk#/test.dd count=8192000 test runs.
The second column is hdParm -t

Parity drive is a 2TB Seagate Barracuda LP (5900rpm).

The unformatted numbers are the original system, then round bracketed ones are after my initial attempts to fix the issue (an earlier post),
and the square bracketed numbers are the current numbers.

Disk 1 (sdj)     16.9(15.6)[25.5]    80 (81 )[80 ]       500GB Western Digital Caviar Blue (7200rpm)
Disk 2 (sdb)     15.2(17.5)[27.7]    123(92 )[121]       1TB Seagate Barracuda 7200.12 (7200rpm)
Disk 3 (sdh)     17.0(19.6)[28.8]    108(108)[108]       640GB Western Digital Caviar Blue (7200rpm)
Disk 4 (sdk)     12.5(13.[22.7]    113(115)[114]       1TB Samsung F1 (7200rpm)
Disk 5 (sdc)     19.5(14.4)[26.4]    75 (73 )[75 ]       500GB Hitachi P7K500 (7200rpm)
Disk 6 (sdf)     20.2(18.9)[16.5]    83 (84 )[81 ]       500GB Hitachi P7K500 (7200rpm) ----> now a Seagate LP 2TB
Disk 7 (sdi)     16.1(18.4)[26.7]    108(105)[110]       1TB Samsung F1 (7200rpm)
Disk 8 (sde)     17.6(13.0)[24.5]    108(108)[122]       1TB Western Digital Caviar Green (5400rpm)
Disk 9 (sdd)     13.3(13.9)[22.0]    120(122)[121]       2TB Western Digital Caviar Green (5400rpm)

 

The new Seagate 2TB is on the Promise PCI controller.  I moved it to that controller for the preclear, but I forgot to move it back to the Monoprice 2-port PCI-e controller.  I'm assuming it'll be fast when I hook it back up.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.