I'm having massive issues.


Recommended Posts

Hello everyone. New here to unraid. I have 20 days left in my trial and I was loving unraid and considering to purchase it. However today I've been having massive issues. All from not being able to use the server unless in gui mode, to having it lock up on me and give me errors when trying to start and restart my dockers. It also is doing some weird things with hard drives like "unraid signature on mbr failed" when pre clearing drives. Hard drives are also falling out of the array randomly and I have to reboot to get them back online. Can anyone help me? also this just started happening yesterday. Up till that point everything was working flawlessly. Parity checks were clean and dockers were normal so if anyone could help that would be great!

Link to comment

You are having multiple issues:

 

- Disk1 and the unassigned WD1003FBYX have constant interface issues

 

- ATA7 - ST31500341AS_9VS5A127 (sdg) dropped offline.

 

- The emulated disk3 filesystem is corrupt, you need to run xfs_repair (after fixing the ATA errors)

 

My first guess is that there's something wrong with the Nvidia onboard SATA controller.

 

 

Link to comment

So I've moved around my sata ports and the server seems to run much better. I can start it up and start the parity check. Dockers work and such. However. It seems that when I'm using my dockers and such while the parity check is running the whole web interface drops connection. None of the dockers are accessible and the web interface page loads and loads but never shows up. However I can ping the server I just can't access the web interface nor connect via ssh.

Link to comment

So I've moved around my sata ports and the server seems to run much better. I can start it up and start the parity check. Dockers work and such. However. It seems that when I'm using my dockers and such while the parity check is running the whole web interface drops connection. None of the dockers are accessible and the web interface page loads and loads but never shows up. However I can ping the server I just can't access the web interface nor connect via ssh.

In my experience that often means that the system is having problems with one or more drives not reading reliably.

 

Looking at the syslog you are getting errors reported on ata5

ata5.00: ATA-8: WDC WD1003FBYZ-010FB0,      WD-WCAW33LPFF9Y, 01.01V03, max UDMA/133

 

 

Link to comment

Hmm ok, I will look at those two drives. If I remember right, one of those drives in an external.

It does seem that once it starts it's parity check and about 2-5 hours in it will absolutely freeze everything. Bad hard drives very well may be the problem. I will remedy the situation when I get home and let you know. Thank you so much!

Link to comment

So I've moved around my sata ports and the server seems to run much better. I can start it up and start the parity check. Dockers work and such. However. It seems that when I'm using my dockers and such while the parity check is running the whole web interface drops connection. None of the dockers are accessible and the web interface page loads and loads but never shows up. However I can ping the server I just can't access the web interface nor connect via ssh.

In my experience that often means that the system is having problems with one or more drives not reading reliably.

 

Looking at the syslog you are getting errors reported on ata5

ata5.00: ATA-8: WDC WD1003FBYZ-010FB0,      WD-WCAW33LPFF9Y, 01.01V03, max UDMA/133

 

After taking a look at the dashboard it does look like that drive is causing some problems. However it has been disabled by unRaid for quite some time now due to the problems it was having. Is there any way that could still be causing this whole meltdown. Even though it was disabled?

 

Thanks.

Link to comment

Just an update.

 

Parity check is complete after weeding out the bad drives. There were a few errors but it said it corrected them all. So far so good. The data seems to be steady now. I'm not loosing any folders or files and the web portal hasn't frozen up. I'll replace the drives with good drives and carry on from there. Thanks all for your help!

Link to comment

Run parity again and make sure it says 0 errors to ensure all the problems are gone. Any error during parity check means something is going on and needs to be addressed or data rebuild won't work as intended.

 

unRAID is rock solid stable assuming your hardware is stable, and you won't regret you purchase .

Link to comment
  • 2 weeks later...

Hello Everyone.

 

10 days since my previous issues I am having maybe the same/different problem. It worked flawlessly for 10 days with new hard drives. I have also moved over some of the drives to a sata card so that I could help out the chipset on the motherboard. It was working just find and now it seems to be having the same problem. Whenever a parity check is started it will go till about 80-90% then the entire unsaid system will freeze. I can ping it, however I can not access the web GUI, shares, or dockers. I will have to hard reboot it to get everything back online. Anyone have any idea what is going on? I have analyzed the logs, but I am still learning on how to read them. I have attached them for further review.

 

My current setup is 3 drives on my onboard sata controller and then one drive on by sata pci card.

 

Thanks all!

tower-diagnostics-20170114-1650.zip

Link to comment

Hello Everyone.

 

10 days since my previous issues I am having maybe the same/different problem. It worked flawlessly for 10 days with new hard drives. I have also moved over some of the drives to a sata card so that I could help out the chipset on the motherboard. It was working just find and now it seems to be having the same problem. Whenever a parity check is started it will go till about 80-90% then the entire unsaid system will freeze. I can ping it, however I can not access the web GUI, shares, or dockers. I will have to hard reboot it to get everything back online. Anyone have any idea what is going on? I have analyzed the logs, but I am still learning on how to read them. I have attached them for further review.

 

My current setup is 3 drives on my onboard sata controller and then one drive on by sata pci card.

 

The syslog is only 5 minutes long, and the array isn't started, so there's little data available to conclude much.  But just on what I can see, I'll make some comments.  I'm afraid there's no easy way to say it, your hardware kind of stacks the deck against you.  While unRAID does run on old systems, it's going to be hard to get good performance or reliability from your setup.

 

* The motherboard is old, nForce based, with a BIOS from 2009.  I've had a couple myself, so believe me when I say *please* consider replacing it!  Yours is newer than the original awful ones, many bugs fixed, but still has some issues.  The 2 network ports are prone to failing, and I think one of yours has already failed and the other isn't working right, more on that later.  The nForce boards and boards based on derivative chipsets are notorious for spurious IRQ 7's.  On one of mine, I was able to reserve IRQ 7, effectively removing it from assignment, which saved me from the failures associated with the kernel noticing a spurious IRQ 7 and shutting it off, effectively disabling every system attached to it!  On the newer kernels we use now, I've noticed that the kernel usually recognizes an NForce board and removes IRQ 7 from the available IRQ's, but on yours, for some unknown reason, it's still available, which *may* be a cause of trouble.  I don't see anything using it, but not every device reports the IRQ it's using.  And I strongly recommend checking for a newer BIOS, which will usually work better with newer technologies like virtualization.

 

* It's an older CPU but appears fast enough, dual core.  But I don't think it will be good enough for virtualization, especially with that old BIOS.  I recommend turning off virtualization.

 

* You have added a HighPoint RocketRAID card, a model I don't see in the Hardware Compatibility wiki.  It's either not fully compatible or it's not configured right, as it's not providing the correct drive identifications, and it's not providing SMART access.  I don't believe the RocketRAID's have a good reputation here, but then I don't know of anyone with that card.  You might try the advice in this thread, perhaps it will help.  If it does correct the drive ID's and SMART access, you'll have to do a New Config and reassign the drive (and set 'Parity is already valid').  I don't know anything about that card, as to how to configure it correctly.  The SATA ports on it use the hptiop driver, one I've never seen before, so I have no confidence in it.  If it was decent, I would have seen others using it successfully.  Doesn't mean it won't work, but...  I'd recommend instead an ASM1062 based card, they are cheap (under $15) and fast, fully supported, but only 2 ports.

 

* The evidence is odd, conflicting, but bonding is enabled, and it's trying to bond both onboard network ports together.  But the second port isn't working, and the first port is only able to do 100mbps.  Both are supposed to be gigabit ports.  Turn bonding off, it's only complicating the situation, not helping, and see if you can get the first port to do gigabit.  Better yet, disable both and add an Intel gigabit network card, highly recommended around here.

 

* Your Parity drive (the Hitachi) has a very nice SMART report, that even after about 38000 hours shows no evidence of any mechanical problems, and no evidence of ever having to correct bad sectors.  Which would be great if we could stop there!  But it also has an error log showing multiple bad sectors in the past, and one IDNF!  The IDNF was over 450 hours ago, probably before you even thought about unRAID, and the bad sectors were even longer before.  But what is odd, is that neither are reflected in any way in the SMART stats.  An IDNF is a sector ID Not Found, something that should NEVER happen, unless something is seriously wrong, either mechanical issues or serious corruption in the low level formatting, something we can't correct.  Yet the SMART data shows no evidence of it, or the previous bad sectors!  And you have prepared the drive for unRAID, and it's working fine!  I don't understand it, and would definitely  monitor this drive very closely.

 

* Disk 1 (ST31000524AS) has fewer hours (20983) and no bad sectors currently, but has remapped 329, with a Reported_Uncorrect of 9527.  It looks mechanically fine at the moment, but has had issues in the past.  I would closely monitor this drive too.

 

* Disk 2 is an unknown drive on the RocketRAID, without SMART so cannot say anything about it.

 

* Disk 3 (ST31500341AS) has 29 remapped sectors, no current bad sectors, but has had a few mechanical issues.  And the drive temperature at some point reached 58, over its limit of 55 (100 - 45).  I'd monitor it too.

 

* Disk 4 is an APPLE_HDD_ST1000DM003, an Apple drive made by Seagate.  Hopefully it's better than some of the other Seagate DM models!  I don't like the drop in the Start_Stop_Count, and the power cycling seems very high for a Seagate, but otherwise the SMART report looks fine.

 

* The system indicates an unclean shutdown, so once you try to start the array, it's going to want to do a parity check.

 

* And finally - what we really need to see is the diagnostics after it hangs, so what I would advise is to install the Fix Common Problems plugin first, and start its Troubleshooting Mode.  That will automatically save syslogs and diagnostics repeatedly to the flash drive, so that once it hangs and you restart, you can retrieve and post them.  We can then see what happened at the trouble point.

Link to comment

Hello Everyone.

 

10 days since my previous issues I am having maybe the same/different problem. It worked flawlessly for 10 days with new hard drives. I have also moved over some of the drives to a sata card so that I could help out the chipset on the motherboard. It was working just find and now it seems to be having the same problem. Whenever a parity check is started it will go till about 80-90% then the entire unsaid system will freeze. I can ping it, however I can not access the web GUI, shares, or dockers. I will have to hard reboot it to get everything back online. Anyone have any idea what is going on? I have analyzed the logs, but I am still learning on how to read them. I have attached them for further review.

 

My current setup is 3 drives on my onboard sata controller and then one drive on by sata pci card.

 

The syslog is only 5 minutes long, and the array isn't started, so there's little data available to conclude much.  But just on what I can see, I'll make some comments.  I'm afraid there's no easy way to say it, your hardware kind of stacks the deck against you.  While unRAID does run on old systems, it's going to be hard to get good performance or reliability from your setup.

 

* The motherboard is old, nForce based, with a BIOS from 2009.  I've had a couple myself, so believe me when I say *please* consider replacing it!  Yours is newer than the original awful ones, many bugs fixed, but still has some issues.  The 2 network ports are prone to failing, and I think one of yours has already failed and the other isn't working right, more on that later.  The nForce boards and boards based on derivative chipsets are notorious for spurious IRQ 7's.  On one of mine, I was able to reserve IRQ 7, effectively removing it from assignment, which saved me from the failures associated with the kernel noticing a spurious IRQ 7 and shutting it off, effectively disabling every system attached to it!  On the newer kernels we use now, I've noticed that the kernel usually recognizes an NForce board and removes IRQ 7 from the available IRQ's, but on yours, for some unknown reason, it's still available, which *may* be a cause of trouble.  I don't see anything using it, but not every device reports the IRQ it's using.  And I strongly recommend checking for a newer BIOS, which will usually work better with newer technologies like virtualization.

 

I am looking into getting a new motherboard soon. I am most likely rebuilding this entire server once I get the funds later this week.

 

* It's an older CPU but appears fast enough, dual core.  But I don't think it will be good enough for virtualization, especially with that old BIOS.  I recommend turning off virtualization.

 

* You have added a HighPoint RocketRAID card, a model I don't see in the Hardware Compatibility wiki.  It's either not fully compatible or it's not configured right, as it's not providing the correct drive identifications, and it's not providing SMART access.  I don't believe the RocketRAID's have a good reputation here, but then I don't know of anyone with that card.  You might try the advice in this thread, perhaps it will help.  If it does correct the drive ID's and SMART access, you'll have to do a New Config and reassign the drive (and set 'Parity is already valid').  I don't know anything about that card, as to how to configure it correctly.  The SATA ports on it use the hptiop driver, one I've never seen before, so I have no confidence in it.  If it was decent, I would have seen others using it successfully.  Doesn't mean it won't work, but...  I'd recommend instead an ASM1062 based card, they are cheap (under $15) and fast, fully supported, but only 2 ports.

 

I am going to replace the card. Is the ASM1062 your only suggestion? I would prefer a four port card.

 

* The evidence is odd, conflicting, but bonding is enabled, and it's trying to bond both onboard network ports together.  But the second port isn't working, and the first port is only able to do 100mbps.  Both are supposed to be gigabit ports.  Turn bonding off, it's only complicating the situation, not helping, and see if you can get the first port to do gigabit.  Better yet, disable both and add an Intel gigabit network card, highly recommended around here.

 

Is that a setting somewhere in unraid?

 

* Your Parity drive (the Hitachi) has a very nice SMART report, that even after about 38000 hours shows no evidence of any mechanical problems, and no evidence of ever having to correct bad sectors.  Which would be great if we could stop there!  But it also has an error log showing multiple bad sectors in the past, and one IDNF!  The IDNF was over 450 hours ago, probably before you even thought about unRAID, and the bad sectors were even longer before.  But what is odd, is that neither are reflected in any way in the SMART stats.  An IDNF is a sector ID Not Found, something that should NEVER happen, unless something is seriously wrong, either mechanical issues or serious corruption in the low level formatting, something we can't correct.  Yet the SMART data shows no evidence of it, or the previous bad sectors!  And you have prepared the drive for unRAID, and it's working fine!  I don't understand it, and would definitely  monitor this drive very closely.

 

It is a very old drive. It has passed several diagnostics as well as a surface scan without any error. Like I said though, it is an old drive. I have a 4TB that is being tested right now to replace the hitachi. The other drives are spares that have been sitting around the office. Mostly pulled out of donated computers. My whole idea with unraid was to use it to utilize these types of drives, if it is having problems with them, I will most likely be going out and getting brand new drives.

 

* Disk 1 (ST31000524AS) has fewer hours (20983) and no bad sectors currently, but has remapped 329, with a Reported_Uncorrect of 9527.  It looks mechanically fine at the moment, but has had issues in the past.  I would closely monitor this drive too.

 

* Disk 2 is an unknown drive on the RocketRAID, without SMART so cannot say anything about it.

 

* Disk 3 (ST31500341AS) has 29 remapped sectors, no current bad sectors, but has had a few mechanical issues.  And the drive temperature at some point reached 58, over its limit of 55 (100 - 45).  I'd monitor it too.

 

* Disk 4 is an APPLE_HDD_ST1000DM003, an Apple drive made by Seagate.  Hopefully it's better than some of the other Seagate DM models!  I don't like the drop in the Start_Stop_Count, and the power cycling seems very high for a Seagate, but otherwise the SMART report looks fine.

 

* The system indicates an unclean shutdown, so once you try to start the array, it's going to want to do a parity check.

Here is the strange part. It actually did the parity check. It ran it and finished everything with only 2 errors. It has also done 2 more since the unclean shutdown. It's been up for 3 whole days now with now problem.

* And finally - what we really need to see is the diagnostics after it hangs, so what I would advise is to install the Fix Common Problems plugin first, and start its Troubleshooting Mode.  That will automatically save syslogs and diagnostics repeatedly to the flash drive, so that once it hangs and you restart, you can retrieve and post them.  We can then see what happened at the trouble point.

 

It's running now as we speak however there has been no errors or problems yet. I will update you on anything. In the mean time I am going to replace the motherboard and the cpu as well as the raid card eventually. Do you have any suggestions on motherboard/cpu paring that would do well for just NAS operations. I don't really plan to virtualize much off this server aside from a Linux OS to code and develop in. I plan to use this server mainly for storage and docker containers for home automation. So a good cpu/mobo paring would be nice for that. let me know if you have any suggestions and I will keep you guys updated if anything goes wrong with the server. So far so good.

Link to comment

I am going to replace the card. Is the ASM1062 your only suggestion? I would prefer a four port card.

There's a popular 4 port card that's widely recommended, the Adaptec 1430SA.  There are notes about it in the Hardware Compatibility wiki page.  It's not high end performance, so better for 4 spinners, not 4 SSD's.

 

* The evidence is odd, conflicting, but bonding is enabled, and it's trying to bond both onboard network ports together.  But the second port isn't working, and the first port is only able to do 100mbps.  Both are supposed to be gigabit ports.  Turn bonding off, it's only complicating the situation, not helping, and see if you can get the first port to do gigabit.  Better yet, disable both and add an Intel gigabit network card, highly recommended around here.

 

Is that a setting somewhere in unraid?

Should be in the Network settings.

 

In the mean time I am going to replace the motherboard and the cpu as well as the raid card eventually. Do you have any suggestions on motherboard/cpu paring that would do well for just NAS operations. I don't really plan to virtualize much off this server aside from a Linux OS to code and develop in. I plan to use this server mainly for storage and docker containers for home automation. So a good cpu/mobo paring would be nice for that. let me know if you have any suggestions and I will keep you guys updated if anything goes wrong with the server.

I'll have to let others help you with that.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.