dougnliz Posted October 9, 2015 Share Posted October 9, 2015 Hi folks, I just upgraded from 5 to v6 over the weekend and everything has been going really well. However, yesterday morning I noticed I had a disabled drive. It's an older 500GB one so I thought okay it was time for it to die. So last night when I got home I replaced it with drive I'm no longer using in my desktop since I have a SSD now and I know the drive works. I replaced it and started the rebuild. This morning the same drive is disabled again and it looks like it's the exact same error. So I'm wondering if something else is going on. I'm enclosing the zip unraid now provides for diagnostics. Any ideas? I had to delete a section of the log file that was just a READ ERROR over and over to get the filesize down to one I could upload. Thanks, Doug tower-diagnostics-20151009-0617.zip Quote Link to comment
dougnliz Posted October 9, 2015 Author Share Posted October 9, 2015 I need to pay closer attention. It's not the same drive/location. This is another drive that failed. First one was disk 6 and now it's disk 7. I noticed it this morning when I was trying some different things. Got a new drive on order so we'll see if that fixes this. Weird to have two drives fail one right after the other though isn't it? Thanks, Doug Quote Link to comment
trurl Posted October 9, 2015 Share Posted October 9, 2015 No SMART data for ST3000DM001-9YN166_W1F16TEK. Might be dead but possibly just a bad connection. WDC WD20EARS-00MVWB0-WD-WMAZA2764693 has 4 pending sectors. Since we don't have diagnostics from the drive you replaced, maybe nothing wrong with it. Quote Link to comment
dougnliz Posted October 9, 2015 Author Share Posted October 9, 2015 I'm actually thinking it's a connection issue as well. I have the drive I already replaced back in the server now running 3 preclear cycles. I'm going to do the same with this second drive once I get the replacement. We'll see how it goes. Quote Link to comment
dougnliz Posted October 13, 2015 Author Share Posted October 13, 2015 Okay so I thought I had this resolved. It turns out the original problem was a failing Norco cage. I figured that out after 3 more drives disappeared from the array. I moved the drives around and got everything working again. After realizing my drive didn't actually have an issue I decided to reset the array config after a couple of days of it running fine, to clear the error. I reset the array, assigned the drives back, and let it build parity. That all completed fine. After that was done I decided to check parity. I woke up the next morning and again Disk 7 is disabled with 384 errors reported on the GUI. So after letting the new 4TB drive preclear finish I assign it it to Disk 7 and start the rebuild process. I wake up this morning and Disk 7 is disabled again with 384 errors. The 4TB drive is in a different slot on a different cable than the 3TB drive. So I don't understand what's happening here. I've enclosed a screenshot of the array and the diagnostics file. Doug tower-diagnostics-20151013-1849.zip Quote Link to comment
dougnliz Posted October 14, 2015 Author Share Posted October 14, 2015 Any advice here? I do believe now the 3TB drive has actually failed, and so my decision to reset the array was probably not the right one. Even though the parity build finished it looks like there are issues. But at this point how to I get the array back to normal operation? Quote Link to comment
dougnliz Posted October 16, 2015 Author Share Posted October 16, 2015 After some more digging around I'm back to thinking my drive is actually okay, but something is going on with the array. I can see missing data in the array (empty folders and such), however when I go to the actual disk the data is there. After looking through the log a bit I found some of these errors which match my data that's missing from the array. Oct 15 01:35:45 Tower kernel: REISERFS warning: reiserfs-5090 is_tree_node: node level 1635 does not match to the expected one 1 Oct 15 01:35:45 Tower kernel: REISERFS error (device md6): vs-5150 search_by_key: invalid format found in block 106496001. Fsck? Oct 15 01:35:45 Tower kernel: REISERFS error (device md6): vs-13070 reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [687 682 0x0 SD] Oct 15 01:35:45 Tower shfs/user: shfs_readdir: fstatat: S01E02.Some of the Things That Molecules Do.mkv (13) Permission denied Oct 15 01:35:45 Tower shfs/user: shfs_readdir: readdir_r: /mnt/disk6/TV/Cosmos- A Spacetime Odyssey/Season 1 (13) Permission denied Oct 15 01:35:47 Tower kernel: scsi_io_completion: 97 callbacks suppressed Oct 15 01:35:47 Tower kernel: sd 1:0:0:0: [sdg] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00 I'm not sure how to go about fixing this though. Any help would be greatly appreciated so I can get my server back to normal. Doug Quote Link to comment
Squid Posted October 16, 2015 Share Posted October 16, 2015 After some more digging around I'm back to thinking my drive is actually okay, but something is going on with the array. I can see missing data in the array (empty folders and such), however when I go to the actual disk the data is there. After looking through the log a bit I found some of these errors which match my data that's missing from the array. Oct 15 01:35:45 Tower kernel: REISERFS warning: reiserfs-5090 is_tree_node: node level 1635 does not match to the expected one 1 Oct 15 01:35:45 Tower kernel: REISERFS error (device md6): vs-5150 search_by_key: invalid format found in block 106496001. Fsck? Oct 15 01:35:45 Tower kernel: REISERFS error (device md6): vs-13070 reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [687 682 0x0 SD] Oct 15 01:35:45 Tower shfs/user: shfs_readdir: fstatat: S01E02.Some of the Things That Molecules Do.mkv (13) Permission denied Oct 15 01:35:45 Tower shfs/user: shfs_readdir: readdir_r: /mnt/disk6/TV/Cosmos- A Spacetime Odyssey/Season 1 (13) Permission denied Oct 15 01:35:47 Tower kernel: scsi_io_completion: 97 callbacks suppressed Oct 15 01:35:47 Tower kernel: sd 1:0:0:0: [sdg] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00 I'm not sure how to go about fixing this though. Any help would be greatly appreciated so I can get my server back to normal. Doug It looks like disk6 has some corruption on it, and you'll need to restart the array in maintenance mode, then run reiserfsck on it (you can do it in the GUI), but those last two lines also imply a problem with a drive / controller card. You should post a new diagnostics log from the 15th. Quote Link to comment
dougnliz Posted October 16, 2015 Author Share Posted October 16, 2015 Thanks for the post. I had found the information about running reiserfsck this morning on the wiki and that's actually running on the disk now, so we'll see how that turns out. I found references to being able to do this in the GUI but I couldn't find it anywhere. Where is it? Quote Link to comment
Squid Posted October 16, 2015 Share Posted October 16, 2015 Stop the array, restart it in maintenance mode. Then from the Main tab, select the disk and its somewhere there. (At work right now) Quote Link to comment
dougnliz Posted October 16, 2015 Author Share Posted October 16, 2015 Ah I didn't think to check there. I was looking under tools and the different sections. That totally makes sense though that it would be there. Quote Link to comment
dougnliz Posted October 17, 2015 Author Share Posted October 17, 2015 Looks like the --rebuild-tree did the trick. My array looks to be operating normally again. I have a few files I have to sift through in the LOST+FOUND, but overall not bad. Running a parity check now just to be sure. Thanks for the help! Doug Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.