unRAID Server release 4.4.2 available

NAS · January 16, 2009

Are you sure you dont have something like telnet open to those disks. That causes that symptom and is a well known and very annoying and poterntially the most serious unraid bug

btlupin · January 16, 2009

One has to wonder why this hasn't been fixed yet. It isn't all telnet sessions either. MC in a telnet session will definitely cause it. A session when you run cat syslog won't.

olympia · January 16, 2009

unRAID reporting 8GB free space, but I cannot copy over a ~4.5GB DVD image to it. I get no space message. Is that OK in this way?

Thanks!

bubbaQ · January 16, 2009

If you can't stop the array from the unRAID management interface, run this command from telnet to kill all processes that have open files on the array:

fuser -mvk /dev/md*

then you can stop the array normally.

SSD · January 17, 2009

I am having a performance problem with the 4.4x releases on a P5B VM DO.

It takes close to 6 minutes to write a 1 Gig to the array in AHCI mode! (Read performance is not affected,)

If I change the motherboard SATA ports to IDE mode, I am able to copy the exact same file in 1:30.

This does not happen in v4.3.3. It takes about 1:52 to copy that file.

I thought that AHCI should be faster than IDE, not slower. And in 4.4 it is WAY slower. Something must be wrong with the 4.4 driver.

Tom, are you seeing this type of behavior when testing with your P5B VM DO systems?

(See this thread for more details and syslogs)

erikatcuse · January 17, 2009

This was posted in the other thread but thought it should be here too

I started searching the net and I'm not sure if I found the complete problem but I think NCQ is the issue. I don't know if this is a chipset problem, a motherboard not being able to support NCQ or a driver issue but when I turned off NCQ I saw a huge performance increase. First make sure you're in AHCI mode and then disable NCQ with the following.

To turn off NCQ do the following for each disk

echo 1 > /sys/block/sdX/device/queue_depth

you can than use

cat /sys/block/sdX/device/queue_depth to see if it was set. So if you get a number greater than 1 it is enabled if its a 1 its disabled.

Then do a parity check. I just started one and I had been seeing speeds of 55MB/Sec its now at 80MB/Sec I haven't tried to write to the array yet but the results should be better too! I believe when you change to the IDE mode the ata_piix driver does not have NCQ support so it's disabled by default resulting in better performance.

Ok as the parity check was taking place I enabled NCQ on one of the data drives and my parity speed dropped back to 55MB/Sec

if you want to test this as well you can enable NCQ with the following command

echo 31 > /sys/block/sdX/device/queue_depth

I guess we need to edit the go script to disable NCQ every time we boot our systems.

Erik

EDIT: I thought I'd add I only have 6 SATA drives in my ARRAY all on the onboard ICH10 controller. Parity speeds might differ if you have more drives.

olympia · January 17, 2009

unRAID reporting 8GB free space, but I cannot copy over a ~4.5GB DVD image to it. I get no space message. Is that OK in this way?

Thanks!

Did anyone experienced similar behaviour?

Joe L. · January 17, 2009

unRAID reporting 8GB free space, but I cannot copy over a ~4.5GB DVD image to it. I get no space message. Is that OK in this way?

Thanks!

Did anyone experienced similar behaviour?

Is all the free space in a single drive? Or split between drives? Is it showing 8GB free in the user-share? or on a disk-share?

no unix/linux system can split a single file between multiple file-systems. Since unRAID has a file-system per drive, this can't be changed.

Does 1 of your physical disks have more than 4.5 GB of free space on it?

If yes, and one of your drives has more than 4.5 GB of space and the user-share you have exported on the LAN is using a smaller drive, then that is dfferent.

That would depend on if the program creating the file creates it at its full size and then fills it, or creates the new file at zero bytes and then fills it. In the first case, unRAID will know the size needed and choose (hopefully) the dive with enough space. In the second case, it would have no idea of the total space needed once the file is completely written, so it might choose what it thinks is the best drive, even though the file will not fit when the program finishes writing it.

For any more guidance from those who might know more you will need to give more specifics on how your user-shares are configured and the free space on all the physical drives involved.

Joe L.

olympia · January 17, 2009

Thanks for picking this up, Joe L.

Unfortunately I don't use user shares. I am accessing all my disks by disk shares.

Yes, one of my physical disk reporting 8GB free disk space.

I start to copy over a 4.5GB ISO image with Total Commander (under Vista), and at 71-74% it stops, and the error message is, that there is no free space. Interestingly, the percentage is vary between 71% and 74%, if I try more times.

pjneder · January 20, 2009

I am also seeing write performance problems. I'm glad I found this thread because I was wondering what was happening. I running on a GIGABYTE GA-MA74GM-S2 board. It is also set up to run AHCI mode (according to syslog it looks like it is using the atiixp module).

With NCQ turned on my 8112603136 byte test movie.iso takes 13m22s = 9.65MB/s.

With NCQ turned off the same file takes 9m41s = 13.32MB/s!

There is definitely something fishy going on with that driver. If anyone finds a kernel patch, I'd be happy to give it a whirl at see if it improves. I still think 13.32MB/s is a bit slow, but I don't know what others are getting out of this board.

Cheers,

Paul

ysss · January 30, 2009

I'm seeing Sata Port Multiplier problem when I updated from 4.3.3 to 4.4.2

The SATA PM is Sil3726 based, connected to Sil3132 PCI-e card on an ASUS P5B-E Plus. It has 4 drives (750 and 1000GB mixed). This is what I'm seeing in the log, over and over...:

Jan 30 13:35:55 archive kernel: ata5: exception Emask 0x10 SAct 0x0 SErr 0x180000 action 0x6 frozen

Jan 30 13:35:55 archive kernel: ata5: edma_err_cause=00000020 pp_flags=00000002, SError=00180000

Jan 30 13:35:55 archive kernel: ata5: SError: { 10B8B Dispar }

Jan 30 13:35:55 archive kernel: ata5: hard resetting link

Jan 30 13:35:56 archive kernel: ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)

Jan 30 13:35:56 archive kernel: ata5: EH complete

Jan 30 13:35:58 archive kernel: ata5: exception Emask 0x10 SAct 0x0 SErr 0x180000 action 0x6 frozen

Jan 30 13:35:58 archive kernel: ata5: edma_err_cause=00000020 pp_flags=00000002, SError=00180000

Jan 30 13:35:58 archive kernel: ata5: SError: { 10B8B Dispar }

Jan 30 13:35:58 archive kernel: ata5: hard resetting link

Jan 30 13:35:59 archive kernel: ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)

Jan 30 13:35:59 archive kernel: ata5: EH complete

Jan 30 13:36:07 archive kernel: ata5: exception Emask 0x10 SAct 0x0 SErr 0x180000 action 0x6 frozen

Jan 30 13:36:07 archive kernel: ata5: edma_err_cause=00000020 pp_flags=00000002, SError=00180000

Jan 30 13:36:07 archive kernel: ata5: SError: { 10B8B Dispar }

Jan 30 13:36:07 archive kernel: ata5: hard resetting link

Jan 30 13:36:08 archive kernel: ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)

Jan 30 13:36:08 archive kernel: ata5: EH complete

Jan 30 13:36:21 archive kernel: ata5: exception Emask 0x10 SAct 0x0 SErr 0x180000 action 0x6 frozen

Jan 30 13:36:21 archive kernel: ata5: edma_err_cause=00000020 pp_flags=00000002, SError=00180000

Jan 30 13:36:21 archive kernel: ata5: SError: { 10B8B Dispar }

Jan 30 13:36:21 archive kernel: ata5: hard resetting link

Jan 30 13:36:22 archive kernel: ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)

Jan 30 13:36:22 archive kernel: ata5: EH complete

Jan 30 13:36:26 archive kernel: ata5: exception Emask 0x10 SAct 0x0 SErr 0x180000 action 0x6 frozen

Jan 30 13:36:26 archive kernel: ata5: edma_err_cause=00000020 pp_flags=00000002, SError=00180000

Jan 30 13:36:26 archive kernel: ata5: SError: { 10B8B Dispar }

Jan 30 13:36:26 archive kernel: ata5: hard resetting link

Jan 30 13:36:27 archive kernel: ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)

Jan 30 13:36:27 archive kernel: ata5: EH complete

Jan 30 13:36:28 archive kernel: ata5: exception Emask 0x10 SAct 0x0 SErr 0x180000 action 0x6 frozen

Jan 30 13:36:28 archive kernel: ata5: edma_err_cause=00000020 pp_flags=00000002, SError=00180000

Jan 30 13:36:28 archive kernel: ata5: SError: { 10B8B Dispar }

Jan 30 13:36:28 archive kernel: ata5: hard resetting link

Jan 30 13:36:29 archive kernel: ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)

Jan 30 13:36:29 archive kernel: ata5: EH complete

Jan 30 13:36:30 archive kernel: ata5: exception Emask 0x10 SAct 0x0 SErr 0x180000 action 0x6 frozen

Jan 30 13:36:30 archive kernel: ata5: edma_err_cause=00000020 pp_flags=00000002, SError=00180000

Jan 30 13:36:30 archive kernel: ata5: SError: { 10B8B Dispar }

Jan 30 13:36:30 archive kernel: ata5: hard resetting link

Jan 30 13:36:31 archive kernel: ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)

Jan 30 13:36:31 archive kernel: ata5: EH complete

Oh, how the performance suffers ;(

From the UNRAID menu, I don't see any Read\write errors though, and system is working 'ok'.

RobJ · January 30, 2009

Looks pretty bad, worse than it really is though. It does not look like any serious errors, so nothing would be reported, and if you hadn't looked at the syslog, you probably would not even know there was anything wrong, except for the terrible performance while it deals with the repeating exceptions.

In this syslog scrap, there is only one drive causing trouble. Why not determine which one it is and reconnect it off the Port Multiplier. The MyMain plugin sy link should help you figure out which drive it is.

RobJ · January 31, 2009

There have been a few negative reports about v4.4.2, but all I can say is that it has been great for me. I have now been running for several weeks with v4.4.2, and before that tested briefly v4.4 final, v4.4.1, and v4.5-beta1, all of which use the 2.6.27.7 kernel, and I have seen absolutely NO drive errors at all, and I have carefully examined the syslogs of every test. Previously, I had been consistently getting the frozen/timeout exceptions (as I have been seeing infrequently in many others' syslogs), but they are now gone in mine, with the new kernel.

Furthermore, I have tested and I have now removed the noapic boot option AND the swncq=0 boot option from my syslinux.cfg, without any errors or issues. My nForce board required the noapic option as of the v4.1 releases, the v4.3 releases, and the v4.4 beta releases. The swncq=0 option was required for the v4.4 betas. Subject to others confirmation, I can now say that nForce boards no longer require any extra boot options for unRAID, and I plan to update the Wiki soon to that effect.

As to the AHCI/NCQ issues related above, I have done some limited research, and found hints that it may be related to certain WD drives. That is, certain WD drives on detecting NCQ, will disable read-ahead, which obviously will severely impact sequential read performance. It basically negates the drive cache.

I can't help thinking this may be what is causing the slowdowns reported above. We need more testing, to determine if it really has anything to do with AHCI, and if not, is it just a specific WD drive problem. For those with the slowdowns, the workaround suggested above seems like a very good idea, especially if it can be confirmed that it is not *directly* related to AHCI. If it proves to be just a WD / read-ahead / NCQ issue, then I would suggest users putting some pressure on WD to provide a firmware release without that behavior. We clearly need more research and testing first.

SSD · February 1, 2009

As to the AHCI/NCQ issues related above, I have done some limited research, and found hints that it may be related to certain WD drives. That is, certain WD drives on detecting NCQ, will disable read-ahead, which obviously will severely impact sequential read performance. It basically negates the drive cache.

I can't help thinking this may be what is causing the slowdowns reported above. We need more testing, to determine if it really has anything to do with AHCI, and if not, is it just a specific WD drive problem. For those with the slowdowns, the workaround suggested above seems like a very good idea, especially if it can be confirmed that it is not *directly* related to AHCI. If it proves to be just a WD / read-ahead / NCQ issue, then I would suggest users putting some pressure on WD to provide a firmware release without that behavior. We clearly need more research and testing first.

I ran some tests this morning to try to prove or disprove RobJ's hypthesis.

Setuip:

1. Parity is a Seagate 7200.11 1T drive with AD14 firmware. It is connected to the motherboard (ICH8) "/dev/sdl"

2. The Data Disk I used is a WD GP EACS drive. It is connected to an Adaptec 1430SA controller "/dev/sdc"

3. All tests were copying a 1G file from my local workstation to a Samba share.

4. Before each test I deleted the file on unRAID, and copied the exact same file from the exact same place in the exact same way. I used a stopwatch to time.

Help me interpret this ...

With 4.4.2

time description

------ -----------------------------------------

2:00 - All queue_depth set to 1

5:03 - Parity set to 31, all others 1

2:05 - All disks set back to 1

2:07 - Data set to 31, all others 1

With 4.3.3

3:16 - Default (First 8 disks set to 1 (includes data disk), rest set to 31 (repeated twice in a row after fresh boot)

1:53 - All disks set to 1

1:48 - Parity set to 31, all others 1

(unable to change queue_depth on data disk with 4.3.3)

It is striking the slowest time on 4.4.2 (5:03) used the queue_depth settings as the fastest time on 4.3.3 (1:48).

There were no entries in the syslog during the time I was doing these tests.

ysss · February 1, 2009

@RobJ: Brilliant! Thanks for the concise analysis. I will do that within a few days.

Edit:

Note: All my drives (11 of them) are WD and Seagate of various sizes, and one Samsung. I think the Samsung might be the offender...

unRAID Server release 4.4.2 available

Recommended Posts

NAS

Link to comment

btlupin

Link to comment

olympia

Link to comment

bubbaQ

Link to comment

SSD

Link to comment

erikatcuse

Link to comment

olympia

Link to comment

Joe L.

Link to comment

olympia

Link to comment

pjneder

Link to comment

ysss

Link to comment

RobJ

Link to comment

RobJ

Link to comment

SSD

Link to comment

ysss

Link to comment

Join the conversation