Getting the same error with two different drives

dougnliz · October 9, 2015

Hi folks,

I just upgraded from 5 to v6 over the weekend and everything has been going really well. However, yesterday morning I noticed I had a disabled drive. It's an older 500GB one so I thought okay it was time for it to die. So last night when I got home I replaced it with drive I'm no longer using in my desktop since I have a SSD now and I know the drive works. I replaced it and started the rebuild. This morning the same drive is disabled again and it looks like it's the exact same error. So I'm wondering if something else is going on. I'm enclosing the zip unraid now provides for diagnostics. Any ideas?

I had to delete a section of the log file that was just a READ ERROR over and over to get the filesize down to one I could upload.

Thanks,

Doug

tower-diagnostics-20151009-0617.zip

dougnliz · October 9, 2015

I need to pay closer attention. It's not the same drive/location. This is another drive that failed. First one was disk 6 and now it's disk 7. I noticed it this morning when I was trying some different things. Got a new drive on order so we'll see if that fixes this. Weird to have two drives fail one right after the other though isn't it?

Thanks,

Doug

trurl · October 9, 2015

No SMART data for ST3000DM001-9YN166_W1F16TEK. Might be dead but possibly just a bad connection.

WDC WD20EARS-00MVWB0-WD-WMAZA2764693 has 4 pending sectors.

Since we don't have diagnostics from the drive you replaced, maybe nothing wrong with it.

dougnliz · October 9, 2015

I'm actually thinking it's a connection issue as well. I have the drive I already replaced back in the server now running 3 preclear cycles. I'm going to do the same with this second drive once I get the replacement.

We'll see how it goes.

dougnliz · October 13, 2015

Okay so I thought I had this resolved. It turns out the original problem was a failing Norco cage. I figured that out after 3 more drives disappeared from the array. I moved the drives around and got everything working again. After realizing my drive didn't actually have an issue I decided to reset the array config after a couple of days of it running fine, to clear the error. I reset the array, assigned the drives back, and let it build parity. That all completed fine. After that was done I decided to check parity. I woke up the next morning and again Disk 7 is disabled with 384 errors reported on the GUI. So after letting the new 4TB drive preclear finish I assign it it to Disk 7 and start the rebuild process. I wake up this morning and Disk 7 is disabled again with 384 errors. The 4TB drive is in a different slot on a different cable than the 3TB drive. So I don't understand what's happening here.

I've enclosed a screenshot of the array and the diagnostics file.

Doug

tower-diagnostics-20151013-1849.zip

dougnliz · October 14, 2015

Any advice here? I do believe now the 3TB drive has actually failed, and so my decision to reset the array was probably not the right one. Even though the parity build finished it looks like there are issues. But at this point how to I get the array back to normal operation?

dougnliz · October 16, 2015

After some more digging around I'm back to thinking my drive is actually okay, but something is going on with the array. I can see missing data in the array (empty folders and such), however when I go to the actual disk the data is there. After looking through the log a bit I found some of these errors which match my data that's missing from the array.

Oct 15 01:35:45 Tower kernel: REISERFS warning: reiserfs-5090 is_tree_node: node level 1635 does not match to the expected one 1
Oct 15 01:35:45 Tower kernel: REISERFS error (device md6): vs-5150 search_by_key: invalid format found in block 106496001. Fsck?
Oct 15 01:35:45 Tower kernel: REISERFS error (device md6): vs-13070 reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [687 682 0x0 SD]
Oct 15 01:35:45 Tower shfs/user: shfs_readdir: fstatat: S01E02.Some of the Things That Molecules Do.mkv (13) Permission denied
Oct 15 01:35:45 Tower shfs/user: shfs_readdir: readdir_r: /mnt/disk6/TV/Cosmos- A Spacetime Odyssey/Season 1 (13) Permission denied
Oct 15 01:35:47 Tower kernel: scsi_io_completion: 97 callbacks suppressed
Oct 15 01:35:47 Tower kernel: sd 1:0:0:0: [sdg] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00

I'm not sure how to go about fixing this though. Any help would be greatly appreciated so I can get my server back to normal.

Doug

Squid · October 16, 2015

After some more digging around I'm back to thinking my drive is actually okay, but something is going on with the array. I can see missing data in the array (empty folders and such), however when I go to the actual disk the data is there. After looking through the log a bit I found some of these errors which match my data that's missing from the array.
Oct 15 01:35:45 Tower kernel: REISERFS warning: reiserfs-5090 is_tree_node: node level 1635 does not match to the expected one 1
Oct 15 01:35:45 Tower kernel: REISERFS error (device md6): vs-5150 search_by_key: invalid format found in block 106496001. Fsck?
Oct 15 01:35:45 Tower kernel: REISERFS error (device md6): vs-13070 reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [687 682 0x0 SD]
Oct 15 01:35:45 Tower shfs/user: shfs_readdir: fstatat: S01E02.Some of the Things That Molecules Do.mkv (13) Permission denied
Oct 15 01:35:45 Tower shfs/user: shfs_readdir: readdir_r: /mnt/disk6/TV/Cosmos- A Spacetime Odyssey/Season 1 (13) Permission denied
Oct 15 01:35:47 Tower kernel: scsi_io_completion: 97 callbacks suppressed
Oct 15 01:35:47 Tower kernel: sd 1:0:0:0: [sdg] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00
I'm not sure how to go about fixing this though. Any help would be greatly appreciated so I can get my server back to normal.

Doug

It looks like disk6 has some corruption on it, and you'll need to restart the array in maintenance mode, then run reiserfsck on it (you can do it in the GUI), but those last two lines also imply a problem with a drive / controller card. You should post a new diagnostics log from the 15th.

dougnliz · October 16, 2015

Thanks for the post. I had found the information about running reiserfsck this morning on the wiki and that's actually running on the disk now, so we'll see how that turns out. I found references to being able to do this in the GUI but I couldn't find it anywhere. Where is it?

Squid · October 16, 2015

Stop the array, restart it in maintenance mode. Then from the Main tab, select the disk and its somewhere there. (At work right now)

dougnliz · October 16, 2015

Ah I didn't think to check there. I was looking under tools and the different sections. That totally makes sense though that it would be there.

dougnliz · October 17, 2015

Looks like the --rebuild-tree did the trick. My array looks to be operating normally again. I have a few files I have to sift through in the LOST+FOUND, but overall not bad. Running a parity check now just to be sure.

Thanks for the help!

Doug

Getting the same error with two different drives

Recommended Posts

dougnliz

Link to comment

dougnliz

Link to comment

trurl

Link to comment

dougnliz

Link to comment

dougnliz

Link to comment

dougnliz

Link to comment

dougnliz

Link to comment

Squid

Link to comment

dougnliz

Link to comment

Squid

Link to comment

dougnliz

Link to comment

dougnliz

Link to comment

Join the conversation