Micaiah12 Posted January 1, 2017 Share Posted January 1, 2017 Hello everyone. New here to unraid. I have 20 days left in my trial and I was loving unraid and considering to purchase it. However today I've been having massive issues. All from not being able to use the server unless in gui mode, to having it lock up on me and give me errors when trying to start and restart my dockers. It also is doing some weird things with hard drives like "unraid signature on mbr failed" when pre clearing drives. Hard drives are also falling out of the array randomly and I have to reboot to get them back online. Can anyone help me? also this just started happening yesterday. Up till that point everything was working flawlessly. Parity checks were clean and dockers were normal so if anyone could help that would be great! Quote Link to comment
1812 Posted January 1, 2017 Share Posted January 1, 2017 failing usb or other failing hardware? what do your logs show? Quote Link to comment
Micaiah12 Posted January 1, 2017 Author Share Posted January 1, 2017 Usb is brand new. I'll upload logs as soon as I can. I'm trying to softly shut it down right now. Quote Link to comment
Micaiah12 Posted January 1, 2017 Author Share Posted January 1, 2017 Based on sys logs. This seemed to be the predominate error. kernel: ata4: exception Emask 0x10 SAct 0x0 SErr 0x1810000 action 0xe frozen Followed by "hard resetting link" Quote Link to comment
trurl Posted January 1, 2017 Share Posted January 1, 2017 Go to Tools - Diagnostics and post complete diagnostic zip. Quote Link to comment
Micaiah12 Posted January 1, 2017 Author Share Posted January 1, 2017 Sorry been running around with a tablet all day. Here is the zip tower-diagnostics-20161231-2228.zip Quote Link to comment
JorgeB Posted January 1, 2017 Share Posted January 1, 2017 You are having multiple issues: - Disk1 and the unassigned WD1003FBYX have constant interface issues - ATA7 - ST31500341AS_9VS5A127 (sdg) dropped offline. - The emulated disk3 filesystem is corrupt, you need to run xfs_repair (after fixing the ATA errors) My first guess is that there's something wrong with the Nvidia onboard SATA controller. Quote Link to comment
Micaiah12 Posted January 1, 2017 Author Share Posted January 1, 2017 So most likely the motherboard hmm? Ok I'll check it out. Thanks. Quote Link to comment
Micaiah12 Posted January 1, 2017 Author Share Posted January 1, 2017 So I've moved around my sata ports and the server seems to run much better. I can start it up and start the parity check. Dockers work and such. However. It seems that when I'm using my dockers and such while the parity check is running the whole web interface drops connection. None of the dockers are accessible and the web interface page loads and loads but never shows up. However I can ping the server I just can't access the web interface nor connect via ssh. Quote Link to comment
Micaiah12 Posted January 2, 2017 Author Share Posted January 2, 2017 I had to hard reset the tower today. Here are the diagnostics posted below. Thanks! tower-diagnostics-20170102-0646.zip Quote Link to comment
itimpi Posted January 2, 2017 Share Posted January 2, 2017 So I've moved around my sata ports and the server seems to run much better. I can start it up and start the parity check. Dockers work and such. However. It seems that when I'm using my dockers and such while the parity check is running the whole web interface drops connection. None of the dockers are accessible and the web interface page loads and loads but never shows up. However I can ping the server I just can't access the web interface nor connect via ssh. In my experience that often means that the system is having problems with one or more drives not reading reliably. Looking at the syslog you are getting errors reported on ata5 ata5.00: ATA-8: WDC WD1003FBYZ-010FB0, WD-WCAW33LPFF9Y, 01.01V03, max UDMA/133 Quote Link to comment
Micaiah12 Posted January 2, 2017 Author Share Posted January 2, 2017 Hmm ok, I will look at those two drives. If I remember right, one of those drives in an external. It does seem that once it starts it's parity check and about 2-5 hours in it will absolutely freeze everything. Bad hard drives very well may be the problem. I will remedy the situation when I get home and let you know. Thank you so much! Quote Link to comment
Micaiah12 Posted January 2, 2017 Author Share Posted January 2, 2017 So I've moved around my sata ports and the server seems to run much better. I can start it up and start the parity check. Dockers work and such. However. It seems that when I'm using my dockers and such while the parity check is running the whole web interface drops connection. None of the dockers are accessible and the web interface page loads and loads but never shows up. However I can ping the server I just can't access the web interface nor connect via ssh. In my experience that often means that the system is having problems with one or more drives not reading reliably. Looking at the syslog you are getting errors reported on ata5 ata5.00: ATA-8: WDC WD1003FBYZ-010FB0, WD-WCAW33LPFF9Y, 01.01V03, max UDMA/133 After taking a look at the dashboard it does look like that drive is causing some problems. However it has been disabled by unRaid for quite some time now due to the problems it was having. Is there any way that could still be causing this whole meltdown. Even though it was disabled? Thanks. Quote Link to comment
JorgeB Posted January 2, 2017 Share Posted January 2, 2017 You have multiple disks with ATA errors, much more likely the problem is the board controller problem, do you have a PCIe controller you could connect all disks, leaving the onboard SATA free? Quote Link to comment
Micaiah12 Posted January 2, 2017 Author Share Posted January 2, 2017 I have a rocket raid controller. However, none of the disks are seen when I connect those to it, and I have looked all over for a solution for that model and have found none. Quote Link to comment
Micaiah12 Posted January 3, 2017 Author Share Posted January 3, 2017 Just an update. Parity check is complete after weeding out the bad drives. There were a few errors but it said it corrected them all. So far so good. The data seems to be steady now. I'm not loosing any folders or files and the web portal hasn't frozen up. I'll replace the drives with good drives and carry on from there. Thanks all for your help! Quote Link to comment
tyrindor Posted January 5, 2017 Share Posted January 5, 2017 Run parity again and make sure it says 0 errors to ensure all the problems are gone. Any error during parity check means something is going on and needs to be addressed or data rebuild won't work as intended. unRAID is rock solid stable assuming your hardware is stable, and you won't regret you purchase . Quote Link to comment
Micaiah12 Posted January 15, 2017 Author Share Posted January 15, 2017 Hello Everyone. 10 days since my previous issues I am having maybe the same/different problem. It worked flawlessly for 10 days with new hard drives. I have also moved over some of the drives to a sata card so that I could help out the chipset on the motherboard. It was working just find and now it seems to be having the same problem. Whenever a parity check is started it will go till about 80-90% then the entire unsaid system will freeze. I can ping it, however I can not access the web GUI, shares, or dockers. I will have to hard reboot it to get everything back online. Anyone have any idea what is going on? I have analyzed the logs, but I am still learning on how to read them. I have attached them for further review. My current setup is 3 drives on my onboard sata controller and then one drive on by sata pci card. Thanks all! tower-diagnostics-20170114-1650.zip Quote Link to comment
RobJ Posted January 18, 2017 Share Posted January 18, 2017 Hello Everyone. 10 days since my previous issues I am having maybe the same/different problem. It worked flawlessly for 10 days with new hard drives. I have also moved over some of the drives to a sata card so that I could help out the chipset on the motherboard. It was working just find and now it seems to be having the same problem. Whenever a parity check is started it will go till about 80-90% then the entire unsaid system will freeze. I can ping it, however I can not access the web GUI, shares, or dockers. I will have to hard reboot it to get everything back online. Anyone have any idea what is going on? I have analyzed the logs, but I am still learning on how to read them. I have attached them for further review. My current setup is 3 drives on my onboard sata controller and then one drive on by sata pci card. The syslog is only 5 minutes long, and the array isn't started, so there's little data available to conclude much. But just on what I can see, I'll make some comments. I'm afraid there's no easy way to say it, your hardware kind of stacks the deck against you. While unRAID does run on old systems, it's going to be hard to get good performance or reliability from your setup. * The motherboard is old, nForce based, with a BIOS from 2009. I've had a couple myself, so believe me when I say *please* consider replacing it! Yours is newer than the original awful ones, many bugs fixed, but still has some issues. The 2 network ports are prone to failing, and I think one of yours has already failed and the other isn't working right, more on that later. The nForce boards and boards based on derivative chipsets are notorious for spurious IRQ 7's. On one of mine, I was able to reserve IRQ 7, effectively removing it from assignment, which saved me from the failures associated with the kernel noticing a spurious IRQ 7 and shutting it off, effectively disabling every system attached to it! On the newer kernels we use now, I've noticed that the kernel usually recognizes an NForce board and removes IRQ 7 from the available IRQ's, but on yours, for some unknown reason, it's still available, which *may* be a cause of trouble. I don't see anything using it, but not every device reports the IRQ it's using. And I strongly recommend checking for a newer BIOS, which will usually work better with newer technologies like virtualization. * It's an older CPU but appears fast enough, dual core. But I don't think it will be good enough for virtualization, especially with that old BIOS. I recommend turning off virtualization. * You have added a HighPoint RocketRAID card, a model I don't see in the Hardware Compatibility wiki. It's either not fully compatible or it's not configured right, as it's not providing the correct drive identifications, and it's not providing SMART access. I don't believe the RocketRAID's have a good reputation here, but then I don't know of anyone with that card. You might try the advice in this thread, perhaps it will help. If it does correct the drive ID's and SMART access, you'll have to do a New Config and reassign the drive (and set 'Parity is already valid'). I don't know anything about that card, as to how to configure it correctly. The SATA ports on it use the hptiop driver, one I've never seen before, so I have no confidence in it. If it was decent, I would have seen others using it successfully. Doesn't mean it won't work, but... I'd recommend instead an ASM1062 based card, they are cheap (under $15) and fast, fully supported, but only 2 ports. * The evidence is odd, conflicting, but bonding is enabled, and it's trying to bond both onboard network ports together. But the second port isn't working, and the first port is only able to do 100mbps. Both are supposed to be gigabit ports. Turn bonding off, it's only complicating the situation, not helping, and see if you can get the first port to do gigabit. Better yet, disable both and add an Intel gigabit network card, highly recommended around here. * Your Parity drive (the Hitachi) has a very nice SMART report, that even after about 38000 hours shows no evidence of any mechanical problems, and no evidence of ever having to correct bad sectors. Which would be great if we could stop there! But it also has an error log showing multiple bad sectors in the past, and one IDNF! The IDNF was over 450 hours ago, probably before you even thought about unRAID, and the bad sectors were even longer before. But what is odd, is that neither are reflected in any way in the SMART stats. An IDNF is a sector ID Not Found, something that should NEVER happen, unless something is seriously wrong, either mechanical issues or serious corruption in the low level formatting, something we can't correct. Yet the SMART data shows no evidence of it, or the previous bad sectors! And you have prepared the drive for unRAID, and it's working fine! I don't understand it, and would definitely monitor this drive very closely. * Disk 1 (ST31000524AS) has fewer hours (20983) and no bad sectors currently, but has remapped 329, with a Reported_Uncorrect of 9527. It looks mechanically fine at the moment, but has had issues in the past. I would closely monitor this drive too. * Disk 2 is an unknown drive on the RocketRAID, without SMART so cannot say anything about it. * Disk 3 (ST31500341AS) has 29 remapped sectors, no current bad sectors, but has had a few mechanical issues. And the drive temperature at some point reached 58, over its limit of 55 (100 - 45). I'd monitor it too. * Disk 4 is an APPLE_HDD_ST1000DM003, an Apple drive made by Seagate. Hopefully it's better than some of the other Seagate DM models! I don't like the drop in the Start_Stop_Count, and the power cycling seems very high for a Seagate, but otherwise the SMART report looks fine. * The system indicates an unclean shutdown, so once you try to start the array, it's going to want to do a parity check. * And finally - what we really need to see is the diagnostics after it hangs, so what I would advise is to install the Fix Common Problems plugin first, and start its Troubleshooting Mode. That will automatically save syslogs and diagnostics repeatedly to the flash drive, so that once it hangs and you restart, you can retrieve and post them. We can then see what happened at the trouble point. Quote Link to comment
Micaiah12 Posted January 18, 2017 Author Share Posted January 18, 2017 Hello Everyone. 10 days since my previous issues I am having maybe the same/different problem. It worked flawlessly for 10 days with new hard drives. I have also moved over some of the drives to a sata card so that I could help out the chipset on the motherboard. It was working just find and now it seems to be having the same problem. Whenever a parity check is started it will go till about 80-90% then the entire unsaid system will freeze. I can ping it, however I can not access the web GUI, shares, or dockers. I will have to hard reboot it to get everything back online. Anyone have any idea what is going on? I have analyzed the logs, but I am still learning on how to read them. I have attached them for further review. My current setup is 3 drives on my onboard sata controller and then one drive on by sata pci card. The syslog is only 5 minutes long, and the array isn't started, so there's little data available to conclude much. But just on what I can see, I'll make some comments. I'm afraid there's no easy way to say it, your hardware kind of stacks the deck against you. While unRAID does run on old systems, it's going to be hard to get good performance or reliability from your setup. * The motherboard is old, nForce based, with a BIOS from 2009. I've had a couple myself, so believe me when I say *please* consider replacing it! Yours is newer than the original awful ones, many bugs fixed, but still has some issues. The 2 network ports are prone to failing, and I think one of yours has already failed and the other isn't working right, more on that later. The nForce boards and boards based on derivative chipsets are notorious for spurious IRQ 7's. On one of mine, I was able to reserve IRQ 7, effectively removing it from assignment, which saved me from the failures associated with the kernel noticing a spurious IRQ 7 and shutting it off, effectively disabling every system attached to it! On the newer kernels we use now, I've noticed that the kernel usually recognizes an NForce board and removes IRQ 7 from the available IRQ's, but on yours, for some unknown reason, it's still available, which *may* be a cause of trouble. I don't see anything using it, but not every device reports the IRQ it's using. And I strongly recommend checking for a newer BIOS, which will usually work better with newer technologies like virtualization. I am looking into getting a new motherboard soon. I am most likely rebuilding this entire server once I get the funds later this week. * It's an older CPU but appears fast enough, dual core. But I don't think it will be good enough for virtualization, especially with that old BIOS. I recommend turning off virtualization. * You have added a HighPoint RocketRAID card, a model I don't see in the Hardware Compatibility wiki. It's either not fully compatible or it's not configured right, as it's not providing the correct drive identifications, and it's not providing SMART access. I don't believe the RocketRAID's have a good reputation here, but then I don't know of anyone with that card. You might try the advice in this thread, perhaps it will help. If it does correct the drive ID's and SMART access, you'll have to do a New Config and reassign the drive (and set 'Parity is already valid'). I don't know anything about that card, as to how to configure it correctly. The SATA ports on it use the hptiop driver, one I've never seen before, so I have no confidence in it. If it was decent, I would have seen others using it successfully. Doesn't mean it won't work, but... I'd recommend instead an ASM1062 based card, they are cheap (under $15) and fast, fully supported, but only 2 ports. I am going to replace the card. Is the ASM1062 your only suggestion? I would prefer a four port card. * The evidence is odd, conflicting, but bonding is enabled, and it's trying to bond both onboard network ports together. But the second port isn't working, and the first port is only able to do 100mbps. Both are supposed to be gigabit ports. Turn bonding off, it's only complicating the situation, not helping, and see if you can get the first port to do gigabit. Better yet, disable both and add an Intel gigabit network card, highly recommended around here. Is that a setting somewhere in unraid? * Your Parity drive (the Hitachi) has a very nice SMART report, that even after about 38000 hours shows no evidence of any mechanical problems, and no evidence of ever having to correct bad sectors. Which would be great if we could stop there! But it also has an error log showing multiple bad sectors in the past, and one IDNF! The IDNF was over 450 hours ago, probably before you even thought about unRAID, and the bad sectors were even longer before. But what is odd, is that neither are reflected in any way in the SMART stats. An IDNF is a sector ID Not Found, something that should NEVER happen, unless something is seriously wrong, either mechanical issues or serious corruption in the low level formatting, something we can't correct. Yet the SMART data shows no evidence of it, or the previous bad sectors! And you have prepared the drive for unRAID, and it's working fine! I don't understand it, and would definitely monitor this drive very closely. It is a very old drive. It has passed several diagnostics as well as a surface scan without any error. Like I said though, it is an old drive. I have a 4TB that is being tested right now to replace the hitachi. The other drives are spares that have been sitting around the office. Mostly pulled out of donated computers. My whole idea with unraid was to use it to utilize these types of drives, if it is having problems with them, I will most likely be going out and getting brand new drives. * Disk 1 (ST31000524AS) has fewer hours (20983) and no bad sectors currently, but has remapped 329, with a Reported_Uncorrect of 9527. It looks mechanically fine at the moment, but has had issues in the past. I would closely monitor this drive too. * Disk 2 is an unknown drive on the RocketRAID, without SMART so cannot say anything about it. * Disk 3 (ST31500341AS) has 29 remapped sectors, no current bad sectors, but has had a few mechanical issues. And the drive temperature at some point reached 58, over its limit of 55 (100 - 45). I'd monitor it too. * Disk 4 is an APPLE_HDD_ST1000DM003, an Apple drive made by Seagate. Hopefully it's better than some of the other Seagate DM models! I don't like the drop in the Start_Stop_Count, and the power cycling seems very high for a Seagate, but otherwise the SMART report looks fine. * The system indicates an unclean shutdown, so once you try to start the array, it's going to want to do a parity check. Here is the strange part. It actually did the parity check. It ran it and finished everything with only 2 errors. It has also done 2 more since the unclean shutdown. It's been up for 3 whole days now with now problem. * And finally - what we really need to see is the diagnostics after it hangs, so what I would advise is to install the Fix Common Problems plugin first, and start its Troubleshooting Mode. That will automatically save syslogs and diagnostics repeatedly to the flash drive, so that once it hangs and you restart, you can retrieve and post them. We can then see what happened at the trouble point. It's running now as we speak however there has been no errors or problems yet. I will update you on anything. In the mean time I am going to replace the motherboard and the cpu as well as the raid card eventually. Do you have any suggestions on motherboard/cpu paring that would do well for just NAS operations. I don't really plan to virtualize much off this server aside from a Linux OS to code and develop in. I plan to use this server mainly for storage and docker containers for home automation. So a good cpu/mobo paring would be nice for that. let me know if you have any suggestions and I will keep you guys updated if anything goes wrong with the server. So far so good. Quote Link to comment
RobJ Posted January 20, 2017 Share Posted January 20, 2017 I am going to replace the card. Is the ASM1062 your only suggestion? I would prefer a four port card. There's a popular 4 port card that's widely recommended, the Adaptec 1430SA. There are notes about it in the Hardware Compatibility wiki page. It's not high end performance, so better for 4 spinners, not 4 SSD's. * The evidence is odd, conflicting, but bonding is enabled, and it's trying to bond both onboard network ports together. But the second port isn't working, and the first port is only able to do 100mbps. Both are supposed to be gigabit ports. Turn bonding off, it's only complicating the situation, not helping, and see if you can get the first port to do gigabit. Better yet, disable both and add an Intel gigabit network card, highly recommended around here. Is that a setting somewhere in unraid? Should be in the Network settings. In the mean time I am going to replace the motherboard and the cpu as well as the raid card eventually. Do you have any suggestions on motherboard/cpu paring that would do well for just NAS operations. I don't really plan to virtualize much off this server aside from a Linux OS to code and develop in. I plan to use this server mainly for storage and docker containers for home automation. So a good cpu/mobo paring would be nice for that. let me know if you have any suggestions and I will keep you guys updated if anything goes wrong with the server. I'll have to let others help you with that. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.