unRAID Server release 4.4.2 available


Recommended Posts

I am having a performance problem with the 4.4x releases on a P5B VM DO.

 

It takes close to 6 minutes to write a 1 Gig to the array in AHCI mode!  (Read performance is not affected,)

 

If I change the motherboard SATA ports to IDE mode, I am able to copy the exact same file in 1:30.

 

This does not happen in v4.3.3.  It takes about 1:52 to copy that file.

 

I thought that AHCI should be faster than IDE, not slower.  And in 4.4 it is WAY slower.  Something must be wrong with the 4.4 driver.

 

Tom, are you seeing this type of behavior when testing with your P5B VM DO systems?

 

(See this thread for more details and syslogs)

Link to comment

This was posted in the other thread but thought it should be here too

 

I started searching the net and I'm not sure if I found the complete problem but I think NCQ is the issue.  I don't know if this is a chipset problem, a motherboard not being able to support NCQ or a driver issue but when I turned off NCQ I saw a huge performance increase. First make sure you're in AHCI mode and then disable NCQ with the following.   

 

To turn off NCQ do the following for each disk

 

echo 1 > /sys/block/sdX/device/queue_depth 

 

you can than use

 

cat /sys/block/sdX/device/queue_depth  to see if it was set.   So if you get a number greater than 1 it is enabled if its a 1 its disabled.

 

Then do a parity check.  I just started one and I had been seeing speeds of 55MB/Sec its now at 80MB/Sec  I haven't tried to write to the array yet but the results should be better too!  I believe when you change to the IDE mode the ata_piix driver does not have NCQ support so it's disabled by default resulting in better performance.

 

Ok as the parity check was taking place I enabled NCQ on one of the data drives and my parity speed dropped back to 55MB/Sec

 

if you want to test this as well you can enable NCQ with the following command

echo 31 > /sys/block/sdX/device/queue_depth

 

I guess we need to edit the go script to disable NCQ every time we boot our systems.

 

Erik

 

EDIT:  I thought I'd add I only have 6 SATA drives in my ARRAY all on the onboard ICH10 controller.  Parity speeds might differ if you have more drives.

Link to comment

unRAID reporting 8GB free space, but I cannot copy over a ~4.5GB DVD image to it. I get no space message. Is that OK in this way?

 

Thanks!

 

Did anyone experienced similar behaviour? :(

Is all the free space in a single drive? Or split between drives?  Is it showing 8GB free in the user-share? or on a disk-share?

 

no unix/linux system can split a single file between multiple file-systems.  Since unRAID has a file-system per drive, this can't be changed. 

 

Does 1 of your physical disks have more than 4.5 GB of free space on it?

 

If yes, and one of your drives has more than 4.5 GB of space and the user-share you have exported on the LAN is using a smaller drive, then that is dfferent. 

 

That would depend on if the program creating the file creates it at its full size and then fills it, or creates the new file at zero bytes and then fills it.  In the first case, unRAID will know the size needed and choose (hopefully) the dive with enough space.  In the second case, it would have no idea of the total space needed once the file is completely written, so it might choose what it thinks is the best drive, even though the file will not fit when the program finishes writing it.

 

For any more guidance from those who might know more you will need to give more specifics on how your user-shares are configured and the free space on all the physical drives involved.

 

Joe L.

Link to comment

Thanks for picking this up, Joe L.

 

Unfortunately I don't use user shares. I am accessing all my disks by disk shares.

 

Yes, one of my physical disk reporting 8GB free disk space.

 

I start to copy over a 4.5GB ISO image with Total Commander (under Vista), and at 71-74% it stops, and the error message is, that there is no free space. Interestingly, the percentage is vary between 71% and 74%, if I try more times.

Link to comment

I am also seeing write performance problems. I'm glad I found this thread because I was wondering what was happening. I running on a GIGABYTE GA-MA74GM-S2 board. It is also set up to run AHCI mode (according to syslog it looks like it is using the atiixp module).

 

With NCQ turned on my 8112603136 byte test movie.iso takes 13m22s = 9.65MB/s.

 

With NCQ turned off the same file takes 9m41s = 13.32MB/s!

 

There is definitely something fishy going on with that driver. If anyone finds a kernel patch, I'd be happy to give it a whirl at see if it improves. I still think 13.32MB/s is a bit slow, but I don't know what others are getting out of this board.

 

Cheers,

Paul

Link to comment
  • 2 weeks later...

I'm seeing Sata Port Multiplier problem when I updated from 4.3.3 to 4.4.2 :(

The SATA PM is Sil3726 based, connected to Sil3132 PCI-e card on an ASUS P5B-E Plus. It has 4 drives (750 and 1000GB mixed). This is what I'm seeing in the log, over and over...:

 

Jan 30 13:35:55 archive kernel: ata5: exception Emask 0x10 SAct 0x0 SErr 0x180000 action 0x6 frozen

Jan 30 13:35:55 archive kernel: ata5: edma_err_cause=00000020 pp_flags=00000002, SError=00180000

Jan 30 13:35:55 archive kernel: ata5: SError: { 10B8B Dispar }

Jan 30 13:35:55 archive kernel: ata5: hard resetting link

Jan 30 13:35:56 archive kernel: ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)

Jan 30 13:35:56 archive kernel: ata5: EH complete

Jan 30 13:35:58 archive kernel: ata5: exception Emask 0x10 SAct 0x0 SErr 0x180000 action 0x6 frozen

Jan 30 13:35:58 archive kernel: ata5: edma_err_cause=00000020 pp_flags=00000002, SError=00180000

Jan 30 13:35:58 archive kernel: ata5: SError: { 10B8B Dispar }

Jan 30 13:35:58 archive kernel: ata5: hard resetting link

Jan 30 13:35:59 archive kernel: ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)

Jan 30 13:35:59 archive kernel: ata5: EH complete

Jan 30 13:36:07 archive kernel: ata5: exception Emask 0x10 SAct 0x0 SErr 0x180000 action 0x6 frozen

Jan 30 13:36:07 archive kernel: ata5: edma_err_cause=00000020 pp_flags=00000002, SError=00180000

Jan 30 13:36:07 archive kernel: ata5: SError: { 10B8B Dispar }

Jan 30 13:36:07 archive kernel: ata5: hard resetting link

Jan 30 13:36:08 archive kernel: ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)

Jan 30 13:36:08 archive kernel: ata5: EH complete

Jan 30 13:36:21 archive kernel: ata5: exception Emask 0x10 SAct 0x0 SErr 0x180000 action 0x6 frozen

Jan 30 13:36:21 archive kernel: ata5: edma_err_cause=00000020 pp_flags=00000002, SError=00180000

Jan 30 13:36:21 archive kernel: ata5: SError: { 10B8B Dispar }

Jan 30 13:36:21 archive kernel: ata5: hard resetting link

Jan 30 13:36:22 archive kernel: ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)

Jan 30 13:36:22 archive kernel: ata5: EH complete

Jan 30 13:36:26 archive kernel: ata5: exception Emask 0x10 SAct 0x0 SErr 0x180000 action 0x6 frozen

Jan 30 13:36:26 archive kernel: ata5: edma_err_cause=00000020 pp_flags=00000002, SError=00180000

Jan 30 13:36:26 archive kernel: ata5: SError: { 10B8B Dispar }

Jan 30 13:36:26 archive kernel: ata5: hard resetting link

Jan 30 13:36:27 archive kernel: ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)

Jan 30 13:36:27 archive kernel: ata5: EH complete

Jan 30 13:36:28 archive kernel: ata5: exception Emask 0x10 SAct 0x0 SErr 0x180000 action 0x6 frozen

Jan 30 13:36:28 archive kernel: ata5: edma_err_cause=00000020 pp_flags=00000002, SError=00180000

Jan 30 13:36:28 archive kernel: ata5: SError: { 10B8B Dispar }

Jan 30 13:36:28 archive kernel: ata5: hard resetting link

Jan 30 13:36:29 archive kernel: ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)

Jan 30 13:36:29 archive kernel: ata5: EH complete

Jan 30 13:36:30 archive kernel: ata5: exception Emask 0x10 SAct 0x0 SErr 0x180000 action 0x6 frozen

Jan 30 13:36:30 archive kernel: ata5: edma_err_cause=00000020 pp_flags=00000002, SError=00180000

Jan 30 13:36:30 archive kernel: ata5: SError: { 10B8B Dispar }

Jan 30 13:36:30 archive kernel: ata5: hard resetting link

Jan 30 13:36:31 archive kernel: ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)

Jan 30 13:36:31 archive kernel: ata5: EH complete

 

Oh, how the performance suffers ;(

From the UNRAID menu, I don't see any Read\write errors though, and system is working 'ok'.

Link to comment

Looks pretty bad, worse than it really is though.  It does not look like any serious errors, so nothing would be reported, and if you hadn't looked at the syslog, you probably would not even know there was anything wrong, except for the terrible performance while it deals with the repeating exceptions.

 

In this syslog scrap, there is only one drive causing trouble.  Why not determine which one it is and reconnect it off the Port Multiplier.  The MyMain plugin sy link should help you figure out which drive it is.

Link to comment

There have been a few negative reports about v4.4.2, but all I can say is that it has been great for me.  I have now been running for several weeks with v4.4.2, and before that tested briefly v4.4 final, v4.4.1, and v4.5-beta1, all of which use the 2.6.27.7 kernel, and I have seen absolutely NO drive errors at all, and I have carefully examined the syslogs of every test.  Previously, I had been consistently getting the frozen/timeout exceptions (as I have been seeing infrequently in many others' syslogs), but they are now gone in mine, with the new kernel.

 

Furthermore, I have tested and I have now removed the noapic boot option AND the swncq=0 boot option from my syslinux.cfg, without any errors or issues.  My nForce board required the noapic option as of the v4.1 releases, the v4.3 releases, and the v4.4 beta releases.  The swncq=0 option was required for the v4.4 betas.  Subject to others confirmation, I can now say that nForce boards no longer require any extra boot options for unRAID, and I plan to update the Wiki soon to that effect.

 

As to the AHCI/NCQ issues related above, I have done some limited research, and found hints that it may be related to certain WD drives.  That is, certain WD drives on detecting NCQ, will disable read-ahead, which obviously will severely impact sequential read performance.  It basically negates the drive cache.

 

I can't help thinking this may be what is causing the slowdowns reported above.  We need more testing, to determine if it really has anything to do with AHCI, and if not, is it just a specific WD drive problem.  For those with the slowdowns, the workaround suggested above seems like a very good idea, especially if it can be confirmed that it is not *directly* related to AHCI.  If it proves to be just a WD / read-ahead / NCQ issue, then I would suggest users putting some pressure on WD to provide a firmware release without that behavior.  We clearly need more research and testing first.

Link to comment

As to the AHCI/NCQ issues related above, I have done some limited research, and found hints that it may be related to certain WD drives.  That is, certain WD drives on detecting NCQ, will disable read-ahead, which obviously will severely impact sequential read performance.  It basically negates the drive cache.

 

I can't help thinking this may be what is causing the slowdowns reported above.  We need more testing, to determine if it really has anything to do with AHCI, and if not, is it just a specific WD drive problem.  For those with the slowdowns, the workaround suggested above seems like a very good idea, especially if it can be confirmed that it is not *directly* related to AHCI.  If it proves to be just a WD / read-ahead / NCQ issue, then I would suggest users putting some pressure on WD to provide a firmware release without that behavior.  We clearly need more research and testing first.

 

I ran some tests this morning to try to prove or disprove RobJ's hypthesis.

 

Setuip:

1.  Parity is a Seagate 7200.11 1T drive with AD14 firmware.  It is connected to the motherboard (ICH8) "/dev/sdl"

2.  The Data Disk I used is a WD GP EACS drive.  It is connected to an Adaptec 1430SA controller "/dev/sdc"

3.  All tests were copying a 1G file from my local workstation to a Samba share.

4.  Before each test I deleted the file on unRAID, and copied the exact same file from the exact same place in the exact same way.  I used a stopwatch to time.

 

Help me interpret this ...

 

With 4.4.2

 

time  description

------  -----------------------------------------

2:00 - All queue_depth set to 1

5:03 - Parity set to 31, all others 1

2:05 - All disks set back to 1

2:07 - Data set to 31, all others 1

 

With 4.3.3

3:16 - Default  (First 8 disks set to 1 (includes data disk), rest set to 31 (repeated twice in a row after fresh boot)

1:53 - All disks set to 1

1:48 - Parity set to 31, all others 1

(unable to change queue_depth on data disk with 4.3.3)

 

It is striking the slowest time on 4.4.2 (5:03) used the queue_depth settings as the fastest time on 4.3.3 (1:48).

 

There were no entries in the syslog during the time I was doing these tests.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.