Darts Posted December 16, 2008 Share Posted December 16, 2008 I think we need a 4.4 support forum Quote Link to comment
RobJ Posted December 16, 2008 Share Posted December 16, 2008 Not working at all. No problems with 4.3.3, but with 4.4 (final) unable to boot. Errors: ATA 1.00 FAILED TO IDENTIFY (IO_ERROR, ERR_MASK=0x4) SOFT RESET FAILED (DEVICE NOT READY) FAILED DUE TO HW BUG and this is repeating over and over for ATA 1.00 and ATA 2.00. Had to downgrade back. Pretty recent hardware ( ASUS M3A-H/HDMI, AMD 780G). It would be helpful if you could start a separate support thread, and post the syslog for v4.4 final, as well as post a v4.3.3 syslog for a baseline. Without seeing the syslog, it is hard for me to even imagine what the different kernels could be seeing so differently. Well, I just did some searching, and there are scattered reports of a bug in the Linux driver for AHCI support on the SB600/SB700 chipsets. The soft reset failures and "FAILED DUE TO HW BUG" messages are typical. It *may* be only involved in the PMP support on those chipsets. There is a patch, but you may have to wait for its inclusion, in a future kernel release. Quote Link to comment
Dusan Posted December 16, 2008 Share Posted December 16, 2008 It would be helpful if you could start a separate support thread, and post the syslog for v4.4 final, as well as post a v4.3.3 syslog for a baseline. Without seeing the syslog, it is hard for me to even imagine what the different kernels could be seeing so differently. I would gladly start a new thread, but as was already mentioned, there is no support thread for 4.4 and there is no point to create a new thread here in Announcements. I also do not know how to get syslog from the system that is failing to boot and therefore unable to log in a user or be available from the network. Dusan Quote Link to comment
agw Posted December 16, 2008 Share Posted December 16, 2008 I recently upgraded from 4.3.3 to 4.4 also and get similar errors in the syslog using a SB700 / 780G based motherboard (Foxconn A7GM-S). I also got a couple of other new errors that were not present in 4.3.3. However, my system boots through them and actually seems to be less concerned about the error messages than I am. I've sent a few GBs back and forth and even completed a parity check and no complaints in the syslog. I will post a 4.3.3 and a 4.4 syslog later tonight for your guys' comments. ------- Syslog from 4.3.3 and 4.4 attached. Before I saved 4.4 syslog, I made a few more tweaks in the BIOS and managed to eliminate two errors that were unique to 4.4: 1. ACPI Warning (tbutils-0217): Incorrect checksum in table . . . . 2. Tower kernel: spurious 8259A interrupt: IRQ7 However, you will still see the following in 4.4 syslog that are not present in 4.3.3: 3. ata 1: soft reset failed (device not ready) 4. ata 1: failed due to HW bug, retry pmp=0 5. ata 2: soft reset failed (device not ready) 6. ata2: failed due to HW bug, retry pmp=0 Again, other than the above, the system boots fine and seems to be operating correctly. Also, this is the latest BIOS from Foxconn. Note that in 4.3.3 I had to append the 'noapic' statement to syslinux . . . otherwise everything else remains unchanged b/w the two. The burning questions I have: 1. Do the Linux / Unraid gurus think the above errors in 4.4 are important enough to revert back to 4.3.3? It doesn't really bother me to do so . . . just that the spin down of my Samsung 1TB parity works better in 4.4. There is a new "Error attaching device data" in my 4.3.3 syslog that I do not believe I have seen before - likely a result of my BIOS fiddling under 4.4. I suspect I can make that disappear though. 2. Kind of unrelated, but can someone smart please tell me what the following means: ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata1.00: ATA-7: SAMSUNG HD103UJ, 1AA01113, max UDMA7 ata1.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 31/32) ata1.00: configured for UDMA/133 Am I getting my full SATA 3.0 speeds? Is UDMA/133 the same as SATA 3.0? What am I not understanding? I have tried to figure this out on my own - but honestly I think I've been to the end of the internet and I never ran across a good explanation of how to interpret. In case you can't tell, I'm Linux-challenged. Please be gentle. Quote Link to comment
RobJ Posted December 17, 2008 Share Posted December 17, 2008 Thank you for the syslogs, very interesting. I should start by saying I'm in no way a Linux expert either. I just use common sense and a lot of experience with computers in general, and some aptitude in pattern recognition and troubleshooting. (I wish I could make money at it!) soft reset failed (device not ready) failed due to HW bug, retry pmp=0 It looks like you are subject to the same bug, same set of errors, and you have the chipset involved. For Tom (and any others who are interested), this is the patch (may have to add an exception for their security certificate). ahci: Workaround HW bug for SB600/700 SATA controller PMP support There is one bug in ATI SATA PMP of SB600 and SB700 old revision, which leads to soft reset failure. This patch can fix the bug. Signed-off-by: Shane Huang <[email protected]> Acked-by: Tejun Heo <[email protected]> Signed-off-by: Jeff Garzik <[email protected]> ...patch code changes follow... The patch looks appropriate, legit, and you'll notice it comes from AMD (the maker of those chipsets), and signed off by some big names. I can't speak for Tom as to how he wants to handle this. Perhaps he can add the patch himself, or move to a later release with the patch already applied, or determine the faulty module and revert to the v2.6.24 version of that module. This patches the ahci module, however neither libata or ahci changed between v2.6.24 and v2.6.26 (unRAID v4.3.3 and v4.4), so something even lower in the core is involved, and changed in v2.6.26. I see a *lot* of core changes in the syslog. Perhaps some are only cosmetic, but they are substantial and involve core modules. Some are definitely improvements, but with changes come risks. It is quite common for something new and better to break something older, seemingly unrelated. It just needs another minor version or 2 to stabilize. (The fact that it appears to involve PMP support is probably not important, because if the chipset supports Port Multipliers, then it has to handle it, whether or not you are using it.) So unRAID users with SB600 and SB700 based boards may want to hold off on updating to v4.4 or v4.5beta1, and await further developments. 2. Kind of unrelated, but can someone please tell me what the following means: ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata1.00: ATA-7: SAMSUNG HD103UJ, 1AA01113, max UDMA7 ata1.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 31/32) ata1.00: configured for UDMA/133 Those are the normal device and capability identification and setup logging for each SATA drive. You were set up with the highest modes available. UDMA/133 and UDMA7 are the DMA modes for data transfer, and I'll leave it to someone else to explain the relation between SATA link speeds and DMA modes. To Dusan: I probably should have suggested a forum to use. And I did not realize you had not been able to boot far enough at the console, to be able to copy the syslog. I automatically (and wrongly) assumed those error messages came from your syslog, because I tend to think in terms of the syslog first. I'm sorry. What I would suggest to users is use the unRAID Server 4.3 forum for now, until Tom has a chance to catch up. If the support need is not version specific, then use the Hardware or Software forums. I have suggested that he add an unRAID Server 4.X forum for support issues related to v4.4 and higher, and add a Hard Disks forum just for hard drive problems. Quote Link to comment
Dusan Posted December 17, 2008 Share Posted December 17, 2008 To Dusan: I probably should have suggested a forum to use. And I did not realize you had not been able to boot far enough at the console, to be able to copy the syslog. I automatically (and wrongly) assumed those error messages came from your syslog, because I tend to think in terms of the syslog first. I'm sorry. Thanks for your input. I will wait for the next stable unRaid version to check if the kernel version supplied with it will have patch included . Would have use for some extra performance boost by 4.4, but as the old version is working (slowly but working) it is not critical for me. Dusan Quote Link to comment
leemik Posted December 18, 2008 Share Posted December 18, 2008 I have a SB700 chipset motherboard and just upgraded to 4.4.. I'm looking at my syslog and am experiencing the same messages as previous posters: ata1: softreset failed (device not ready) ata1: failed due to HW bug, retry pmp=0 I also have two promise SATA PCI cards and the drives on those do not exhibit this error message--only the ones connected to the motherboard's SATA ports.. The software seems to be working fine however.. I did a parity check.. 0 errors.. I copied some files back to check performance and on my system at least it's about 15% faster on average writing files.. I was having all sorts of problems with adding and deleting sub folders on 4.3.3 and the problem seems to be fixed in 4.4.. Additionally, any folder with a sub folder in 4.3.3 used to always get the most current datestamp after a reboot so if i sorted by modified date they would always pop up 1st but now they keep their actual dates.. So I'm hesitant in going back to 4.3.3 because all my issues seem to be fixed ... Quote Link to comment
TexasAg Posted December 19, 2008 Share Posted December 19, 2008 Shares don't seem to be working the same from 4.4 beta2 to 4.4 final. Looking into this... Any updates? Thanks. Quote Link to comment
agw Posted December 19, 2008 Share Posted December 19, 2008 Thanks for the syslog feedback RobJ! I'm not sure what this means: "I see a *lot* of core changes in the syslog. Perhaps some are only cosmetic, but they are substantial and involve core modules" . . . but it certainly sounds ominous so I think I will go back to 4.3.3 and wait things out for a while. AGW Quote Link to comment
Copey Posted December 20, 2008 Share Posted December 20, 2008 Hi all, I've Just updated to 4.4 from 4.3 and my parity checks have increased from 45,000KB to 66,082KB! I'm running a celeron 440 with 1GB of ram plugged into the P5B Parity drive is a Samsung F1 750GB and the other 5 are a mix of 500GB & 750GB Samsung & Seagate. Thats quite a performance increase. Thanks for the hard work you folks all put into this! Quote Link to comment
nautarch Posted December 20, 2008 Share Posted December 20, 2008 Hmm, i can't seem to get network access with either 4.4 or 4.5beta1. I was running 4.3.2 with no network problems (except for very slow access times from my vista machine). Using a wired gigabit network. WHen i downgraded back to 4.3.3 (yes, upgrade from my starting version), network access was back. Do I need to do something when 4.4 first starts up? Quote Link to comment
RobJ Posted December 20, 2008 Share Posted December 20, 2008 Hmm, i can't seem to get network access with either 4.4 or 4.5beta1. I was running 4.3.2 with no network problems (except for very slow access times from my vista machine). Using a wired gigabit network. WHen i downgraded back to 4.3.3 (yes, upgrade from my starting version), network access was back. Do I need to do something when 4.4 first starts up? There should be no change with the upgrade, except possibly improved performance. We need to see syslogs from the working and non-working versions, to determine what is different. Also see the Networking section of the FAQ for several commands that display the driver chosen for your NIC. Compare their results from both unRAID versions, v4.3.3 and v4.4. Quote Link to comment
limetech Posted December 20, 2008 Author Share Posted December 20, 2008 RE: SB600/700 messages, "failed due to HW bug"... Inspecting the code I believe this is an informational message only. Here is the offending code from drivers/ata/ahci.c (BTW, the "patch" referenced earlier by RobJ is already in the current kernel): static int ahci_sb600_softreset(struct ata_link *link, unsigned int *class, unsigned long deadline) { struct ata_port *ap = link->ap; void __iomem *port_mmio = ahci_port_base(ap); int pmp = sata_srst_pmp(link); int rc; u32 irq_sts; DPRINTK("ENTER\n"); rc = ahci_do_softreset(link, class, pmp, deadline, ahci_sb600_check_ready); /* * Soft reset fails on some ATI chips with IPMS set when PMP * is enabled but SATA HDD/ODD is connected to SATA port, * do soft reset again to port 0. */ if (rc == -EIO) { irq_sts = readl(port_mmio + PORT_IRQ_STAT); if (irq_sts & PORT_IRQ_BAD_PMP) { ata_link_printk(link, KERN_WARNING, "failed due to HW bug, retry pmp=0\n"); rc = ahci_do_softreset(link, class, 0, deadline, ahci_check_ready); } } return rc; } The first message, "soft reset failed (device not ready)" is generated by the first call to ahci_do_softreset(). The code then looks for a certain type of error (PORT_IRQ_BAD_PMP), outputs the second message, "failed due to HW bug, retry pmp=0", which is informational, and then calls ahci_do_softreset() again with third parameter set to 0. This subsequent call succeeds because there are no further error messages. Hence it looks to me that this set of messages is "normal" for certain H/W, and it's an unfortunate choice of wording in the message. So - to those with SB600/700 chipsets, other than the messages, are the drives recognized anyway, and system works? Quote Link to comment
RobJ Posted December 20, 2008 Share Posted December 20, 2008 BTW, the "patch" referenced earlier by RobJ is already in the current kernel Then I may have over-dramatized the issue, for which I am sorry. At the time, 2 instances had occurred, one of which seemed to correlate with a failure to boot, see this post. Those who have subsequently reported those errors and chipsets have been able to boot and run, without apparent issue. To Dusan: would it be possible to capture your v4.4 console screen (with boot failure) with a digital camera pic? (edit: wish I had looked closer at that code myself!) Quote Link to comment
olympia Posted December 21, 2008 Share Posted December 21, 2008 I just upgraded my test machine with no problems. FWIW I used the unraid_upgrade.awk script from this post: http://lime-technology.com/forum/index.php?topic=2577.msg20877#msg20877 and it worked properly. Hi bubbaQ, Thank you for the script, I used it for upgrade from 4.3.3 to 4.4. The upgrade went well, but than I have some difficulties after the restart. Unraid booted up, I could access my disks, but the /boot directory was completely empty. Logged in via telnet, check the dir, empty in this way too. Than I initiate e new reboot, but it didn't boot up. As this is headless rig hidden in a closet, I don't know what was the problem, because after a swich off, headed up and start, it booted up again. Started a new parity build immedietely. And.... Surprisingly, now I have all stuff in /boot, as before, but unable to copy/delete/edit anything in this directory. If I try to edit go script and want to save it says: this is a read only file system. If I try to copy over something via smb to the flash drive, I get access denied. Lastly a minor issue: I had the reaply the time zone, as it seems forgotten... Certeinly this is not a complaint, and I am not sure if any of the aboves are anything to do with the sript, but I wanted to give a feedback. Could anybody help me on how to resolve the writing issue to the flash drive? Thanks! edit1: wow, it's even worst, after a new reboot it forget all array information.... and seems to have reinitiate the array.... (the drives are configured, but the status is "Stopped. Initial configuration".... That's not so great.... seems to me that unraid cannot write to the flash drives as well.... edit2: now I am still in the above situation (edit1), but I am able to write to the flash disk again. What is going on here? But if I set the time zone, it is always forgotten at the next reboot. What shoud I do now? should I reinitialize the array? By the way I guess I have to.... Quote Link to comment
Dusan Posted December 21, 2008 Share Posted December 21, 2008 To Dusan: would it be possible to capture your v4.4 console screen (with boot failure) with a digital camera pic? Actually I can do more than that. I have just realized that I was mistaken. I was not patient enough before, because after several minutes the reported error messages went away and the machine continued to boot. Although after the "successful" boot the network is unavailable (due to DHCP timeout) and as I understand disks as well, I was able to run several debug statements and I can supply also syslog now. Output lsmod working (4.3.3): Module Size Used by md_mod 49992 2 fuse 34580 3 atiixp 3472 0 [permanent] ide_core 88236 1 atiixp atl1 26252 0 mii 4096 1 atl1 ahci 21508 3 libata 122552 1 ahci Output lsmod not working (4.4): Module Size Used by md_mod 51816 0 atiixp 3716 0 ide_core 72324 1 atiixp ahci 26248 0 libata 129492 1 ahci atl1 28552 0 Output ethtool eth0 working (4.3.3): Settings for eth0: Supported ports: [ TP ] Supported link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full Supports auto-negotiation: Yes Advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full Advertised auto-negotiation: Yes Speed: 1000Mb/s Duplex: Full Port: Twisted Pair PHYAD: 0 Transceiver: internal Auto-negotiation: on Supports Wake-on: umbg Wake-on: d Link detected: yes Output ethtool eth0 not working (4.4): Settings for eth0: Supported ports: [ TP ] Supported link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full Supports auto-negotiation: Yes Advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full Advertised auto-negotiation: Yes Speed: 1000Mb/s Duplex: Full Port: Twisted Pair PHYAD: 0 Transceiver: internal Auto-negotiation: on Supports Wake-on: umbg Wake-on: d Current message level: 0x0000003f (63) Link detected: yes Output ethtool -i eth0 working (4.3.3): driver: atl1 version: 2.0.7 firmware-version: N/A bus-info: 0000:02:00.0 Output ethtool -i eth0 not working (4.4): driver: atl1 version: 2.1.3 firmware-version: N/A bus-info: 0000:02:00.0 Quote Link to comment
RobJ Posted December 22, 2008 Share Posted December 22, 2008 Kernel command line: initrd=bzroot rootdelay=10 pci=noacpi nobiospnp noapic nolapic BOOT_IMAGE=bzimage Dusan: your syslog shows numerous boot options added. Is there some reason you need them? You do not want those, unless you absolutely have to have them in order to run successfully. Otherwise, they are crippling your system somewhat. If you can remove all or some of them, and get another syslog, would you please post it? Quote Link to comment
Dusan Posted December 22, 2008 Share Posted December 22, 2008 Kernel command line: initrd=bzroot rootdelay=10 pci=noacpi nobiospnp noapic nolapic BOOT_IMAGE=bzimage Dusan: your syslog shows numerous boot options added. Is there some reason you need them? You do not want those, unless you absolutely have to have them in order to run successfully. Otherwise, they are crippling your system somewhat. If you can remove all or some of them, and get another syslog, would you please post it? Unfortunately, I need them. It is a workaround to solve a known bug related to APIC error messages in the syslog (and freezes during data transfers). In fact exactly this boot option sequence is from info posted on this very forum. Some chipsets need these to run properly. I may try to lose some of these, but it is a tedious process as the error that they are solving is random and may take several hours to manifest itself. Dusan Quote Link to comment
RobJ Posted December 22, 2008 Share Posted December 22, 2008 Dusan: It greatly complicates troubleshooting. With the current pace of kernel development, there have to be numerous changes that have not been fully tested in all environments, especially non-standard ones. Although they may be necessary, each of those boot options cripples the system in some way, and creates a non-standard system. By adding them, there is a higher chance of adverse interactions between the various sub-systems, which makes it harder to isolate issues, to determine exactly what is wrong, to determine if there is a hardware problem with your system, or just a bad combination of drivers, modules, and settings, in a possibly untested situation. Yes, it may be hard or time consuming to resolve. I would use a binary attack, perhaps start by dropping the first 2 options and monitoring the system. Quote Link to comment
Dusan Posted December 22, 2008 Share Posted December 22, 2008 Dusan: It greatly complicates troubleshooting. With the current pace of kernel development, there have to be numerous changes that have not been fully tested in all environments, especially non-standard ones. Although they may be necessary, each of those boot options cripples the system in some way, and creates a non-standard system. By adding them, there is a higher chance of adverse interactions between the various sub-systems, which makes it harder to isolate issues, to determine exactly what is wrong, to determine if there is a hardware problem with your system, or just a bad combination of drivers, modules, and settings, in a possibly untested situation. Yes, it may be hard or time consuming to resolve. I would use a binary attack, perhaps start by dropping the first 2 options and monitoring the system. I have chosen more direct approach. At the first I dropped them altogether to check if the problem is related to these boot options. The truth is that it is and without them the system is booting with no error messages and seems working as should be. As the next step I will try to use 4.4 without these boot options and wait if the APIC problem will return or if the new kernel fixes that particular problem. I also did basic performance tests and I can confirm some read performance increase (about 23MB per second for GB plus files) while write performance seems worse than ever (stable throughput for big files just 3MB per second). It is strange, but it seems to me that with each new version, read performance goes up and write down. Quote Link to comment
TexasAg Posted December 22, 2008 Share Posted December 22, 2008 I think the speeds have something to do with a combo of your board, drives, boot options, etc. I'm experiencing increased speeds all around. I'm seeing 15-20MB/S write speeds with 4.4 Final. Quote Link to comment
dertbv Posted December 23, 2008 Share Posted December 23, 2008 I have developed a weird issue. When i try to create a new share it disables all of my shares until i reboot. It looks as if i was setting my shares from scratch. Has any one seen this? thanks Quote Link to comment
Dusan Posted December 23, 2008 Share Posted December 23, 2008 I think the speeds have something to do with a combo of your board, drives, boot options, etc. I'm experiencing increased speeds all around. I'm seeing 15-20MB/S write speeds with 4.4 Final. You can achieve SUSTAINED write speed above 15MB/s? And by sustained I mean writing several gigabytes with this speed. I'm asking because for systems like UNRAID there is typical an initial burst of speed and then huge speed decrease as transfer continues. This behavior is common not only for UNRAID, but also for cheap NAS solutions for home segment. I found an UNRAID review in some online computer magazine some time ago and they come to the same conclusion with sustained writing speed way below ten megabytes per second. Quote Link to comment
Darts Posted December 23, 2008 Share Posted December 23, 2008 I have sustained above 20MBps between my main PC and the unRAID box while transferring 10GB+ files Quote Link to comment
Dusan Posted December 23, 2008 Share Posted December 23, 2008 I found the test that I mentioned before, it was reviewing Unraid 4.2, the most interesting part is this: Complete test is here: http://www.smallnetbuilder.com/content/view/30247/75/ Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.