unRAID Server release 4.4 (final) available


limetech

Recommended Posts

  • Replies 61
  • Created
  • Last Reply

Top Posters In This Topic

Not working at all. No problems with 4.3.3, but with 4.4 (final) unable to boot. Errors:

 

ATA 1.00 FAILED TO IDENTIFY (IO_ERROR, ERR_MASK=0x4)

SOFT RESET FAILED (DEVICE NOT READY)

FAILED DUE TO HW BUG

 

and this is repeating over and over for ATA 1.00 and ATA 2.00. Had to downgrade back. Pretty recent hardware ( ASUS M3A-H/HDMI, AMD 780G).  ???

 

It would be helpful if you could start a separate support thread, and post the syslog for v4.4 final, as well as post a v4.3.3 syslog for a baseline.  Without seeing the syslog, it is hard for me to even imagine what the different kernels could be seeing so differently.

 

Well, I just did some searching, and there are scattered reports of a bug in the Linux driver for AHCI support on the SB600/SB700 chipsets.  The soft reset failures and "FAILED DUE TO HW BUG" messages are typical.  It *may* be only involved in the PMP support on those chipsets.  There is a patch, but you may have to wait for its inclusion, in a future kernel release.

Link to comment

 

It would be helpful if you could start a separate support thread, and post the syslog for v4.4 final, as well as post a v4.3.3 syslog for a baseline.  Without seeing the syslog, it is hard for me to even imagine what the different kernels could be seeing so differently.

 

I would gladly start a new thread, but as was already mentioned, there is no support thread for 4.4 and there is no point to create a new thread here in Announcements. I also do not know how to get syslog from the system that is failing to boot and therefore unable to log in a user or be available from the network.

 

Dusan

Link to comment

I recently upgraded from 4.3.3 to 4.4 also and get similar errors in the syslog using a SB700 / 780G based motherboard (Foxconn A7GM-S).  I also got a couple of other new errors that were not present in 4.3.3.  However, my system boots through them and actually seems to be less concerned about the error messages than I am.  I've sent a few GBs back and forth and even completed a parity check and no complaints in the syslog.

 

I will post a 4.3.3 and a 4.4 syslog later tonight for your guys' comments. 

 

-------

Syslog from 4.3.3 and 4.4 attached.  Before I saved 4.4 syslog, I made a few more tweaks in the BIOS and managed to eliminate two errors that were unique to 4.4:

1. ACPI Warning (tbutils-0217): Incorrect checksum in table  . . . .

2. Tower kernel: spurious 8259A interrupt: IRQ7

 

However, you will still see the following in 4.4 syslog that are not present in 4.3.3:

3. ata 1: soft reset failed (device not ready)

4. ata 1: failed due to HW bug, retry pmp=0

5. ata 2: soft reset failed (device not ready)

6. ata2: failed due to HW bug, retry pmp=0

 

Again, other than the above, the system boots fine and seems to be operating correctly.  Also, this is the latest BIOS from Foxconn.  Note that in 4.3.3 I had to append the 'noapic' statement to syslinux . . . otherwise everything else remains unchanged b/w the two.

 

The burning questions I have:

1. Do the Linux / Unraid gurus think the above errors in 4.4 are important enough to revert back to 4.3.3?  It doesn't really bother me to do so . . . just that the spin down of my Samsung 1TB parity works better in 4.4.  There is a new "Error attaching device data" in my 4.3.3 syslog that I do not believe I have seen before - likely a result of my BIOS fiddling under 4.4.  I suspect I can make that disappear though.

 

2. Kind of unrelated, but can someone smart please tell me what the following means:

ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)

ata1.00: ATA-7: SAMSUNG HD103UJ, 1AA01113, max UDMA7

ata1.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 31/32)

ata1.00: configured for UDMA/133

 

Am I getting my full SATA 3.0 speeds?  Is UDMA/133 the same as SATA 3.0?  What am I not understanding?  I have tried to figure this out on my own - but honestly I think I've been to the end of the internet and I never ran across a good explanation of how to interpret. 

 

In case you can't tell, I'm Linux-challenged.  Please be gentle.

Link to comment

Thank you for the syslogs, very interesting.  I should start by saying I'm in no way a Linux expert either.  I just use common sense and a lot of experience with computers in general, and some aptitude in pattern recognition and troubleshooting.  (I wish I could make money at it!)

 

soft reset failed (device not ready)

failed due to HW bug, retry pmp=0

It looks like you are subject to the same bug, same set of errors, and you have the chipset involved.  For Tom (and any others who are interested), this is the patch (may have to add an exception for their security certificate).

    ahci: Workaround HW bug for SB600/700 SATA controller PMP support
    
    There is one bug in ATI SATA PMP of SB600 and SB700 old revision, which leads
    to soft reset failure. This patch can fix the bug.
    
    Signed-off-by: Shane Huang <[email protected]>
    Acked-by: Tejun Heo <[email protected]>
    Signed-off-by: Jeff Garzik <[email protected]>

...patch code changes follow...

The patch looks appropriate, legit, and you'll notice it comes from AMD (the maker of those chipsets), and signed off by some big names.  I can't speak for Tom as to how he wants to handle this.  Perhaps he can add the patch himself, or move to a later release with the patch already applied, or determine the faulty module and revert to the v2.6.24 version of that module.  This patches the ahci module, however neither libata or ahci changed between v2.6.24 and v2.6.26 (unRAID v4.3.3 and v4.4), so something even lower in the core is involved, and changed in v2.6.26.  I see a *lot* of core changes in the syslog.  Perhaps some are only cosmetic, but they are substantial and involve core modules.  Some are definitely improvements, but with changes come risks.  It is quite common for something new and better to break something older, seemingly unrelated.  It just needs another minor version or 2 to stabilize.  (The fact that it appears to involve PMP support is probably not important, because if the chipset supports Port Multipliers, then it has to handle it, whether or not you are using it.)

 

So unRAID users with SB600 and SB700 based boards may want to hold off on updating to v4.4 or v4.5beta1, and await further developments.

 

2. Kind of unrelated, but can someone please tell me what the following means:

ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)

ata1.00: ATA-7: SAMSUNG HD103UJ, 1AA01113, max UDMA7

ata1.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 31/32)

ata1.00: configured for UDMA/133

 

Those are the normal device and capability identification and setup logging for each SATA drive.  You were set up with the highest modes available.  UDMA/133 and UDMA7 are the DMA modes for data transfer, and I'll leave it to someone else to explain the relation between SATA link speeds and DMA modes.

 

To Dusan:  I probably should have suggested a forum to use.  And I did not realize you had not been able to boot far enough at the console, to be able to copy the syslog.  I automatically (and wrongly) assumed those error messages came from your syslog, because I tend to think in terms of the syslog first.  I'm sorry.

 

What I would suggest to users is use the unRAID Server 4.3 forum for now, until Tom has a chance to catch up.  If the support need is not version specific, then use the Hardware or Software forums.  I have suggested that he add an unRAID Server 4.X forum for support issues related to v4.4 and higher, and add a Hard Disks forum just for hard drive problems.

Link to comment

To Dusan:  I probably should have suggested a forum to use.  And I did not realize you had not been able to boot far enough at the console, to be able to copy the syslog.  I automatically (and wrongly) assumed those error messages came from your syslog, because I tend to think in terms of the syslog first.  I'm sorry.

 

Thanks for your input. I will wait for the next stable unRaid version to check if the kernel version supplied with it will have patch included . Would have use for some extra performance boost by 4.4, but as the old version is working (slowly but working) it is not critical for me.

 

Dusan

Link to comment

I have a SB700 chipset motherboard and just upgraded to 4.4..

 

I'm looking at my syslog and am experiencing the same messages as previous posters:

ata1: softreset failed (device not ready)
ata1: failed due to HW bug, retry pmp=0

 

I also have two promise SATA PCI cards and the drives on those do not exhibit this error message--only the ones connected to the motherboard's SATA ports..

 

The software seems to be working fine however.. I did a parity check.. 0 errors.. I copied some files back to check performance and on my system at least it's about 15% faster on average writing files.. I was having all sorts of problems with adding and deleting sub folders on 4.3.3 and the problem seems to be fixed in 4.4.. Additionally, any folder with a sub folder in 4.3.3 used to always get the most current datestamp after a reboot so if i sorted by modified date they would always pop up 1st but now they keep their actual dates..

 

So I'm hesitant in going back to 4.3.3 because all my issues seem to be fixed ...

 

Link to comment

Thanks for the syslog feedback RobJ!

 

I'm not sure what this means:  "I see a *lot* of core changes in the syslog.  Perhaps some are only cosmetic, but they are substantial and involve core modules" . . . but it certainly sounds ominous so I think I will go back to 4.3.3 and wait things out for a while.

 

AGW

Link to comment

Hi all,

 

I've Just updated to 4.4 from 4.3 and my parity checks have increased from 45,000KB to 66,082KB!

 

I'm running a celeron 440 with 1GB of ram plugged into the P5B

Parity drive is a Samsung F1 750GB and the other 5 are a mix of 500GB & 750GB Samsung & Seagate.

 

Thats quite a performance increase.

 

 

Thanks for the hard work you folks all put into this!

Link to comment

Hmm, i can't seem to get network access with either 4.4 or 4.5beta1. I was running 4.3.2 with no network problems (except for very slow access times from my vista machine). Using a wired gigabit network.

 

WHen i downgraded back to 4.3.3 (yes, upgrade from my starting version), network access was back. Do I need to do something when 4.4 first starts up?

Link to comment

Hmm, i can't seem to get network access with either 4.4 or 4.5beta1. I was running 4.3.2 with no network problems (except for very slow access times from my vista machine). Using a wired gigabit network.

 

WHen i downgraded back to 4.3.3 (yes, upgrade from my starting version), network access was back. Do I need to do something when 4.4 first starts up?

 

There should be no change with the upgrade, except possibly improved performance.  We need to see syslogs from the working and non-working versions, to determine what is different.  Also see the Networking section of the FAQ for several commands that display the driver chosen for your NIC.  Compare their results from both unRAID versions, v4.3.3 and v4.4.

Link to comment

RE: SB600/700 messages, "failed due to HW bug"...

 

Inspecting the code I believe this is an informational message only.

 

Here is the offending code from drivers/ata/ahci.c (BTW, the "patch" referenced earlier by RobJ is already in the current kernel):

 

static int ahci_sb600_softreset(struct ata_link *link, unsigned int *class,
                                unsigned long deadline)
{
        struct ata_port *ap = link->ap;
        void __iomem *port_mmio = ahci_port_base(ap);
        int pmp = sata_srst_pmp(link);
        int rc;
        u32 irq_sts;

        DPRINTK("ENTER\n");

        rc = ahci_do_softreset(link, class, pmp, deadline,
                               ahci_sb600_check_ready);

        /*
         * Soft reset fails on some ATI chips with IPMS set when PMP
         * is enabled but SATA HDD/ODD is connected to SATA port,
         * do soft reset again to port 0.
         */
        if (rc == -EIO) {
                irq_sts = readl(port_mmio + PORT_IRQ_STAT);
                if (irq_sts & PORT_IRQ_BAD_PMP) {
                        ata_link_printk(link, KERN_WARNING,
                                        "failed due to HW bug, retry pmp=0\n");
                        rc = ahci_do_softreset(link, class, 0, deadline,
                                               ahci_check_ready);
                }
        }

        return rc;
}

 

The first message, "soft reset failed (device not ready)" is generated by the first call to ahci_do_softreset().  The code then looks for a certain type of error (PORT_IRQ_BAD_PMP), outputs the second message, "failed due to HW bug, retry pmp=0", which is informational, and then calls ahci_do_softreset() again with third parameter set to 0.  This subsequent call succeeds because there are no further error messages.

 

Hence it looks to me that this set of messages is "normal" for certain H/W, and it's an unfortunate choice of wording in the message.

 

So - to those with SB600/700 chipsets, other than the messages, are the drives recognized anyway, and system works?

Link to comment
BTW, the "patch" referenced earlier by RobJ is already in the current kernel

 

Then I may have over-dramatized the issue, for which I am sorry.  At the time, 2 instances had occurred, one of which seemed to correlate with a failure to boot, see this post.  Those who have subsequently reported those errors and chipsets have been able to boot and run, without apparent issue.

 

To Dusan:  would it be possible to capture your v4.4 console screen (with boot failure) with a digital camera pic?

 

 

(edit: wish I had looked closer at that code myself!)

Link to comment

I just upgraded my test machine with no problems.

 

FWIW I used the unraid_upgrade.awk script from this post:

 

   http://lime-technology.com/forum/index.php?topic=2577.msg20877#msg20877

 

and it worked properly.

 

Hi bubbaQ,

 

Thank you for the script, I used it for upgrade from 4.3.3 to 4.4. The upgrade went well, but than I have some difficulties after the restart.

Unraid booted up, I could access my disks, but the /boot directory was completely empty. Logged in via telnet, check the dir, empty in this way too. Than I initiate e new reboot, but it didn't boot up. As this is headless rig hidden in a closet, I don't know what was the problem, because after a swich off, headed up and start, it booted up again.  ??? Started a new parity build immedietely. ???

And.... Surprisingly, now I have all stuff in /boot, as before, but unable to copy/delete/edit anything in this directory. If I try to edit go script and want to save it says: this is a read only file system. If I try to copy over something via smb to the flash drive, I get access denied.

Lastly a minor issue: I had the reaply the time zone, as it seems forgotten...

 

Certeinly this is not a complaint, and I am not sure if any of the aboves are anything to do with the sript, but I wanted to give a feedback.

 

Could anybody help me on how to resolve the writing issue to the flash drive?

 

Thanks!

 

edit1: wow, it's even worst, after a new reboot it forget all array information.... and seems to have reinitiate the array.... ??? (the drives are configured, but the status is "Stopped. Initial configuration".... That's not so great.... seems to me that unraid cannot write to the flash drives as well....

 

edit2: now I am still in the above situation (edit1), but I am able to write to the flash disk again. What is going on here?

But if I set the time zone, it is always forgotten at the next reboot. What shoud I do now? should I reinitialize the array? By the way I guess I have to....

Link to comment

To Dusan:  would it be possible to capture your v4.4 console screen (with boot failure) with a digital camera pic?

 

Actually I can do more than that. I have just realized that I was mistaken. I was not patient enough before, because after several minutes the reported error messages went away and the machine continued to boot. Although after the "successful" boot the network is unavailable (due to DHCP timeout) and as I understand disks as well, I was able to run several debug statements and I can supply also syslog now.

 

Output lsmod working (4.3.3):

Module                  Size  Used by
md_mod                 49992  2 
fuse                   34580  3 
atiixp                  3472  0 [permanent]
ide_core               88236  1 atiixp
atl1                   26252  0 
mii                     4096  1 atl1
ahci                   21508  3 
libata                122552  1 ahci 

 

Output lsmod not working (4.4):

Module                  Size  Used by
md_mod                 51816  0 
atiixp                  3716  0 
ide_core               72324  1 atiixp
ahci                   26248  0 
libata                129492  1 ahci
atl1                   28552  0  

 

Output ethtool eth0 working (4.3.3):

Settings for eth0:
Supported ports: [ TP ]
Supported link modes:   10baseT/Half 10baseT/Full 
                        100baseT/Half 100baseT/Full 
                        1000baseT/Full 
Supports auto-negotiation: Yes
Advertised link modes:  10baseT/Half 10baseT/Full 
                        100baseT/Half 100baseT/Full 
                        1000baseT/Full 
Advertised auto-negotiation: Yes
Speed: 1000Mb/s
Duplex: Full
Port: Twisted Pair
PHYAD: 0
Transceiver: internal
Auto-negotiation: on
Supports Wake-on: umbg
Wake-on: d
Link detected: yes 

 

Output ethtool eth0 not working (4.4):

Settings for eth0:
Supported ports: [ TP ]
Supported link modes:   10baseT/Half 10baseT/Full 
                        100baseT/Half 100baseT/Full 
                        1000baseT/Full 
Supports auto-negotiation: Yes
Advertised link modes:  10baseT/Half 10baseT/Full 
                        100baseT/Half 100baseT/Full 
                        1000baseT/Full 
Advertised auto-negotiation: Yes
Speed: 1000Mb/s
Duplex: Full
Port: Twisted Pair
PHYAD: 0
Transceiver: internal
Auto-negotiation: on
Supports Wake-on: umbg
Wake-on: d
Current message level: 0x0000003f (63)
Link detected: yes 

 

Output ethtool -i eth0 working (4.3.3):

driver: atl1
version: 2.0.7
firmware-version: N/A
bus-info: 0000:02:00.0 

 

 

Output ethtool -i eth0 not working (4.4):

driver: atl1
version: 2.1.3
firmware-version: N/A
bus-info: 0000:02:00.0 

Link to comment
Kernel command line: initrd=bzroot rootdelay=10 pci=noacpi nobiospnp noapic nolapic BOOT_IMAGE=bzimage

 

Dusan:  your syslog shows numerous boot options added.  Is there some reason you need them?  You do not want those, unless you absolutely have to have them in order to run successfully.  Otherwise, they are crippling your system somewhat.  If you can remove all or some of them, and get another syslog, would you please post it?

Link to comment

Kernel command line: initrd=bzroot rootdelay=10 pci=noacpi nobiospnp noapic nolapic BOOT_IMAGE=bzimage

 

Dusan:  your syslog shows numerous boot options added.  Is there some reason you need them?  You do not want those, unless you absolutely have to have them in order to run successfully.  Otherwise, they are crippling your system somewhat.  If you can remove all or some of them, and get another syslog, would you please post it?

 

Unfortunately, I need them. It is a workaround to solve a known bug related to APIC error messages in the syslog (and freezes during data transfers). In fact exactly this boot option sequence is from info posted on this very forum. Some chipsets need these to run properly. I may try to lose some of these, but it is a tedious process as the error that they are solving is random and may take several hours to manifest itself.

 

Dusan

Link to comment

Dusan:  It greatly complicates troubleshooting.  With the current pace of kernel development, there have to be numerous changes that have not been fully tested in all environments, especially non-standard ones.  Although they may be necessary, each of those boot options cripples the system in some way, and creates a non-standard system.  By adding them, there is a higher chance of adverse interactions between the various sub-systems, which makes it harder to isolate issues, to determine exactly what is wrong, to determine if there is a hardware problem with your system, or just a bad combination of drivers, modules, and settings, in a possibly untested situation.

 

Yes, it may be hard or time consuming to resolve.  I would use a binary attack, perhaps start by dropping the first 2 options and monitoring the system.

Link to comment

Dusan:  It greatly complicates troubleshooting.  With the current pace of kernel development, there have to be numerous changes that have not been fully tested in all environments, especially non-standard ones.  Although they may be necessary, each of those boot options cripples the system in some way, and creates a non-standard system.  By adding them, there is a higher chance of adverse interactions between the various sub-systems, which makes it harder to isolate issues, to determine exactly what is wrong, to determine if there is a hardware problem with your system, or just a bad combination of drivers, modules, and settings, in a possibly untested situation.

 

Yes, it may be hard or time consuming to resolve.  I would use a binary attack, perhaps start by dropping the first 2 options and monitoring the system.

 

I have chosen more direct approach. At the first I dropped them altogether to check if the problem is related to these boot options. The truth is that it is and without them the system is booting with no error messages and seems working as should be. As the next step I will try to use 4.4 without these boot options and wait if the APIC problem will return or if the new kernel fixes that particular problem.

 

I also did basic performance tests and I can confirm some read performance increase (about 23MB per second for GB plus files) while write performance seems worse than ever (stable throughput for big files just 3MB per second). It is strange, but it seems to me that with each new version, read performance goes up and write down.  ???

Link to comment

I think the speeds have something to do with a combo of your board, drives, boot options, etc.  I'm experiencing increased speeds all around.  I'm seeing 15-20MB/S write speeds with 4.4 Final.

 

You can achieve SUSTAINED write speed above 15MB/s? And by sustained I mean writing several gigabytes with this speed. I'm asking because for systems like UNRAID there is typical an initial burst of speed and then huge speed decrease as transfer continues. This behavior is common not only for UNRAID, but also for cheap NAS solutions for home segment. I found an UNRAID review in some online computer magazine some time ago and they come to the same conclusion with sustained writing speed way below ten megabytes per second.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.