Bitbass Posted May 20, 2011 Share Posted May 20, 2011 Running 4.7 for a couple of months now. No changes for weeks if not months. Here's the chain of events the best I can remember. Couple of nights ago I had problems with Unraid where it seemed the shares became unavailable. I was able to get into the web interface so I tried to stop the array with the goal of rebooting. After a couple of hours of being stuck on "unmounting" I checked and tried to kill the PID. The only items showing up were SMBD and it wouldn't let me kill them. Or rather it had no effect. So, I powered off and let it sit for a bit. Started it up and now I always get the following. It boots all the way and I'm able to login. I usually have about a minute or so to look at the syslog before the system locks up hard. No error message or warning. Just frozen. I can look at the GUI during the same interval and see that it starts a parity check. When it gets that far I get thousands of sync errors before it locks up. I'm now running Memtest and it's ok so far but not finished. Any suggestions if the Memtest doesn't find anything? Quote Link to comment
opentoe Posted May 20, 2011 Share Posted May 20, 2011 Did you have any packages that you installed and run at boot? Quote Link to comment
Bitbass Posted May 20, 2011 Author Share Posted May 20, 2011 I've had unmenu installed for quite some time but that's it. Quote Link to comment
Bitbass Posted May 20, 2011 Author Share Posted May 20, 2011 Here's my syslog. No improvement so far. syslog.txt Quote Link to comment
dgaschk Posted May 20, 2011 Share Posted May 20, 2011 The log indicates a problem with disk1. Try removing disk1 and see if the problem persists. Quote Link to comment
Bitbass Posted May 21, 2011 Author Share Posted May 21, 2011 Can you point me to where it shows a problem with Disk1? After leaving it off all night I've booted with all of the disks still installed and so far it's running longer than it was yesterday. The parity check has gotten farther and the sync errors appear to have stopped around 20k. I'm not going to hold my breath but it seems to have improved at least. If I manage to get through a complete parity check are there some follow on diagnostics I should do? Quote Link to comment
dgaschk Posted May 21, 2011 Share Posted May 21, 2011 The end of the long shows problems with ata3. Starting at the top of the log and tracking ata3 shows that is is disk1. Quote Link to comment
Bitbass Posted May 21, 2011 Author Share Posted May 21, 2011 Thanks, I tried booting without Disk1 and it got all the way in and locked up as soon as I opened a share and tried to copy something. Tried rebooting again with Disk1 in and then got that it wanted to rebuild onto Disk1. Maybe that was a mistake but I did. Now I have two disks unavailable including Disk1. See the attached. I see that it can't find the file system on Disk1 and Disk6. No way am I reformatting now unless someone tells me it's the right thing to do. syslog_2_down_drives.txt Quote Link to comment
Bitbass Posted May 22, 2011 Author Share Posted May 22, 2011 Can someone give me a suggestion for next step? Quote Link to comment
prostuff1 Posted May 22, 2011 Share Posted May 22, 2011 Can someone give me a suggestion for next step? Bring the server up and stop the parity check, you need to check to see if there is any filesystem corruption on the drives. Also, if you could give us a complete hardware breakdown it would be much appreciated. Quote Link to comment
Bitbass Posted May 23, 2011 Author Share Posted May 23, 2011 Already stopped the parity check before the last post. GA-EP45-UD3R - using 6 onboard SATA ports 2x SiI 2 port SATA cards You can see the drive brands from the previous picture I'm still in the same state. The array is online and accessible but I have two drives showing up as unformatted. Quote Link to comment
Bitbass Posted May 24, 2011 Author Share Posted May 24, 2011 Come on guys...can anyone give me some guidance? Quote Link to comment
Rajahal Posted May 24, 2011 Share Posted May 24, 2011 I'm assuming memtest ran overnight and found no errors? You are correct in not reformatting. Is there any physical similarity shared between disk1 and disk6? Do they share a controller card, a power splitter, or anything that could explain why both of them failed so close in time? I would exhaust all possibilities of hardware failure (loose connection, bad SATA card, etc.) first and foremost. If you are unable to find any hardware reason why these disks failed, then I believe the next step is to run ReiserFSCK on your failed disks (1 and 6). Here's how: Wiki - Check Disk Filesystems This process isn't too difficult, but there are things that can go wrong. Please be careful, be patient, stay calm, and take your time. Double check every command before hitting 'enter'. Quote Link to comment
dgaschk Posted May 24, 2011 Share Posted May 24, 2011 Post SMART reports for the drives. Quote Link to comment
Bitbass Posted May 25, 2011 Author Share Posted May 25, 2011 Ok, I was able to determine that I have a bad port on one of the two port SiI cards. So, I borrowed a friends 6 port Adaptec and now have the attached results. Disk1 is in the "working" port of the two port SiI although I'm not sure I can trust it. Disk6 is on the Adaptec and is showing up as blue. The naming of the disk isn't getting passed through. I tried both disks on the Adaptec but then there were too many unknown disks and I wasn't given the option to start it. Should I start the array and let it try to rebuild the data? Quote Link to comment
Bitbass Posted May 25, 2011 Author Share Posted May 25, 2011 Ok, got a replacement SiI card installed and the attached screenshot shows the current state. I pulled up unmenu and Disk1 is showing "disk_invalid" and Disk6 is showing "disk_dsbl". Smart reports are attached. I didn't see anything wrong with them but I might be missing it. Syslog is also attached and shows some errors. I tried the reiserfsck and got the following "bread: Cannot read the block (2): (Input/output error)" for both drives. So, what's my next step? disk1_smart.txt disk6_smart.txt syslog-2011-05-25.txt Quote Link to comment
Bitbass Posted May 27, 2011 Author Share Posted May 27, 2011 One more time before I start taking risks. Suggestions? Quote Link to comment
mikechy Posted May 28, 2011 Share Posted May 28, 2011 I feel for you. I've had nothing buy problems with my unraid system. random lock ups, weird sync issues, slow sync, incredibly slow transfer rates. All unexplained. This is the fourth NAS/SAN type device I've built and used and never had issues running raid 1 or raid 5. The things just ran and rarely did I have hard drive failures. With unraid it seems every time you have a blip, the community will tell you, "bad drive!" or "bad cable". Power supplies are also a common target. I've swapped more hardware that I can I count in the past six months. I still have random hangs where I have to hard restart only to have the system come back up and have a parity check die 7 hours in or slow to a crawl. I'm currently troubleshooting this very similar problem with you and have a full set of brand new hardware sitting under the desk ready for a total rebuild and migration. If it fails again on me, I'm done. best of luck... Quote Link to comment
dgaschk Posted May 28, 2011 Share Posted May 28, 2011 Disk6 six looks slightly better than disk1. I would try reiserfsck -rebuild-tree on /dev/md6. Someone with more experience may have better advice. Quote Link to comment
Bitbass Posted June 6, 2011 Author Share Posted June 6, 2011 Back from vacation. Now, I've tried reiserfsck --check and --rebuild-tree on MD6 and both give me the following: bread: Cannot read the block (2): (Input/output error). I'm being presented the option to format the drives on the GUI but what I'm not clear on is if I would in fact lose data at this point. Obviously I have two drives with problems but not the parity drive and the array is online. Does that mean I haven't lost anything and I'm just running unprotected? Quote Link to comment
dgaschk Posted June 6, 2011 Share Posted June 6, 2011 See this: http://lime-technology.com/wiki/index.php?title=Check_Disk_Filesystems You can try reiserfsck with --rebuild-sd Quote Link to comment
Bitbass Posted June 6, 2011 Author Share Posted June 6, 2011 Tried the rebuild-sd on both drives and got the same error. My same question still hasn't been answered. With 2 drives down, neither being the parity, have I lost data? Quote Link to comment
dgaschk Posted June 7, 2011 Share Posted June 7, 2011 Post the output so we can see the error. Quote Link to comment
Bitbass Posted June 7, 2011 Author Share Posted June 7, 2011 It's the same error. bread: Cannot read the block (2): (Input/output error). And my question is still not answered. With 2 drives down, neither being the parity, have I lost data? Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.