Locking up hard, sync errors


Recommended Posts

Running 4.7 for a couple of months now.  No changes for weeks if not months.

 

Here's the chain of events the best I can remember.  Couple of nights ago I had problems with Unraid where it seemed the shares became unavailable.  I was able to get into the web interface so I tried to stop the array with the goal of rebooting.  After a couple of hours of being stuck on "unmounting" I checked and tried to kill the PID.  The only items showing up were SMBD and it wouldn't let me kill them.  Or rather it had no effect.  So, I powered off and let it sit for a bit.

 

Started it up and now I always get the following.  It boots all the way and I'm able to login.  I usually have about a minute or so to look at the syslog before the system locks up hard.  No error message or warning.  Just frozen.

 

I can look at the GUI during the same interval and see that it starts a parity check.  When it gets that far I get thousands of sync errors before it locks up.

 

I'm now running Memtest and it's ok so far but not finished.

 

Any suggestions if the Memtest doesn't find anything?

Link to comment

Can you point me to where it shows a problem with Disk1?

 

After leaving it off all night I've booted with all of the disks still installed and so far it's running longer than it was yesterday.  The parity check has gotten farther and the sync errors appear to have stopped around 20k.  I'm not going to hold my breath but it seems to have improved at least.

 

If I manage to get through a complete parity check are there some follow on diagnostics I should do?

Link to comment

Thanks, I tried booting without Disk1 and it got all the way in and locked up as soon as I opened a share and tried to copy something.  Tried rebooting again with Disk1 in and then got that it wanted to rebuild onto Disk1.  Maybe that was a mistake but I did.  Now I have two disks unavailable including Disk1.  See the attached.  I see that it can't find the file system on Disk1 and Disk6.  No way am I reformatting now unless someone tells me it's the right thing to do.

syslog_2_down_drives.txt

unraid_unformatted.PNG.a8395b3f66371723b11814e36bc0c773.PNG

Link to comment

Already stopped the parity check before the last post.

 

GA-EP45-UD3R - using 6 onboard SATA ports

2x SiI 2 port SATA cards

You can see the drive brands from the previous picture

 

I'm still in the same state.  The array is online and accessible but I have two drives showing up as unformatted.

 

Link to comment

I'm assuming memtest ran overnight and found no errors?

 

You are correct in not reformatting.  Is there any physical similarity shared between disk1 and disk6?  Do they share a controller card, a power splitter, or anything that could explain why both of them failed so close in time?  I would exhaust all possibilities of hardware failure (loose connection, bad SATA card, etc.) first and foremost.

 

If you are unable to find any hardware reason why these disks failed, then I believe the next step is to run ReiserFSCK on your failed disks (1 and 6).  Here's how:

 

Wiki - Check Disk Filesystems

 

This process isn't too difficult, but there are things that can go wrong.  Please be careful, be patient, stay calm, and take your time.  Double check every command before hitting 'enter'.

Link to comment

Ok, I was able to determine that I have a bad port on one of the two port SiI cards.  So, I borrowed a friends 6 port Adaptec and now have the attached results.

 

Disk1 is in the "working" port of the two port SiI although I'm not sure I can trust it.

 

Disk6 is on the Adaptec and is showing up as blue.  The naming of the disk isn't getting passed through.

 

I tried both disks on the Adaptec but then there were too many unknown disks and I wasn't given the option to start it.

 

Should I start the array and let it try to rebuild the data?

unraid_with_extra_controller.PNG.fcf803d3563cf62fbe703999b782cc27.PNG

Link to comment

Ok, got a replacement SiI card installed and the attached screenshot shows the current state. 

 

I pulled up unmenu and Disk1 is showing "disk_invalid" and Disk6 is showing "disk_dsbl".

 

Smart reports are attached.  I didn't see anything wrong with them but I might be missing it.

 

Syslog is also attached and shows some errors.  I tried the reiserfsck and got the following "bread: Cannot read the block (2): (Input/output error)" for both drives.

 

So, what's my next step?

unraid_with_new_SiI.PNG.e3b8ed11d5741c46cefb34ee232410bd.PNG

disk1_smart.txt

disk6_smart.txt

syslog-2011-05-25.txt

Link to comment

I feel for you.  I've had nothing buy problems with my unraid system. random lock ups, weird sync issues, slow sync, incredibly slow transfer rates.  All unexplained.

 

This is the fourth NAS/SAN type device I've built and used and never had issues running raid 1 or raid 5.  The things just ran and rarely did I have hard drive failures.

 

With unraid it seems every time you have a blip, the community will tell you, "bad drive!" or "bad cable".  Power supplies are also a common target. 

 

I've swapped more hardware that I can I count in the past six months.  I still have random hangs where I have to hard restart only to have the system come back up and have a parity check die 7 hours in or slow to a crawl.

 

I'm currently troubleshooting this very similar problem with you and have a full set of brand new hardware sitting under the desk ready for a total rebuild and migration.  If it fails again on me, I'm done.

 

best of luck...

Link to comment
  • 2 weeks later...

Back from vacation.  Now, I've tried reiserfsck --check and --rebuild-tree on MD6 and both give me the following:

 

bread: Cannot read the block (2): (Input/output error).

 

I'm being presented the option to format the drives on the GUI but what I'm not clear on is if I would in fact lose data at this point.  Obviously I have two drives with problems but not the parity drive and the array is online.  Does that mean I haven't lost anything and I'm just running unprotected?

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.