Jump to content

v5.0.6 and 6TB drives


Recommended Posts

I've been running a pair of 6TB drives (1 parity Seagate Desktop, 1 data WD Red) on a fully loaded 24 drive setup for several months now with no problems.  Got a third 6TB WD Red, performed a parity check with no issues, swapped in new 6TB and successfully completed a data rebuild/expansion.  Then I started to move large files around from drive to drive via telnet command lines and a completely different drive in the array started coughing up errors into the thousands (a Hitachi Deskstar 4TB).  Stopped array, performed a SMART test with zero hardware problems, started up in maintenance mode to perform a reiserfsck, which recommended a --rebuild-tree.

 

Commenced the tree rebuild then after the first pass, the telnet session froze with syslog messages:

 

Linux 3.9.11p-unRAID.
root@UnRAID:~# reiserfsck /dev/md13
reiserfsck 3.6.24

Will read-only check consistency of the filesystem on /dev/md13
Will put log info to 'stdout'

Do you want to run this program?[N/Yes] (note need to type Yes if you do):Yes
###########
reiserfsck --check started at Mon Jan 19 15:10:20 2015
###########
Replaying journal: Done.
Reiserfs journal '/dev/md13' in blocks [18..8211]: 0 transactions replayed
Checking internal tree.. \/  1 (of  41|/  1 (of 154// 29 (of 146-block 901775361: The level of the node (0) is not correct, (1) expected
the problem in the internal node occured (901775361), whole subtree is skipped
/  2 (of 154\block 901811789: The level of the node (8466) is not correct, (2) expected
the problem in the internal node occured (901811789), whole subtree is skipped
/  2 (of  41|/ 13 (of 155\/  3 (of 168-block 955285505: The level of the node (0) is not correct, (1) expected
the problem in the internal node occured (955285505), whole subtree is skipped
/135 (of 155-/  8 (of 106-block 823409822: The level of the node (59748) is not correct, (1) expected
the problem in the internal node occured (823409822), whole subtree is skipped
/  3 (of  41|/ 49 (of  87//  3 (of  85|block 878530878: The level of the node (8966) is not correct, (1) expected
the problem in the internal node occured (878530878), whole subtree is skipped
/ 50 (of  87/block 829670944: The level of the node (18829) is not correct, (2) expected
the problem in the internal node occured (829670944), whole subtree is skipped
/  4 (of  41-/ 11 (of  92-block 976390105: The level of the node (56870) is not correct, (2) expected
the problem in the internal node occured (976390105), whole subtree is skipped
/  5 (of  41\/ 40 (of 170// 30 (of  88\block 911754612: The level of the node (36899) is not correct, (1) expected
the problem in the internal node occured (911754612), whole subtree is skipped
/167 (of 170-/122 (of 128|block 876981361: The level of the node (36338) is not correct, (1) expected
the problem in the internal node occured (876981361), whole subtree is skipped
/168 (of 170//  1 (of 149-block 896106497: The level of the node (0) is not correct, (1) expected
the problem in the internal node occured (896106497), whole subtree is skipped
/  6 (of  41// 68 (of 170\/162 (of 170/block 911919319: The level of the node (26663) is not correct, (1) expected
the problem in the internal node occured (911919319), whole subtree is skipped
/ 69 (of 170-/  1 (of 170\block 911928423: The level of the node (20605) is not correct, (1) expected
the problem in the internal node occured (911928423), whole subtree is skipped
/ 70 (of 170|block 912060140: The level of the node (54955) is not correct, (2) expected
the problem in the internal node occured (912060140), whole subtree is skipped
/  7 (of  41// 12 (of 170-/ 47 (of 131/block 830930955: The level of the node (36553) is not correct, (1) expected
the problem in the internal node occured (830930955), whole subtree is skipped
/ 47 (of 170|/156 (of 159|block 893485059: The level of the node (0) is not correct, (1) expected
the problem in the internal node occured (893485059), whole subtree is skipped
/ 48 (of 170//  1 (of  85-block 923205665: The level of the node (32534) is not correct, (1) expected
the problem in the internal node occured (923205665), whole subtree is skipped
/ 51 (of 170//137 (of 170-block 953745426: The level of the node (0) is not correct, (1) expected
the problem in the internal node occured (953745426), whole subtree is skipped
/ 52 (of 170\block 953768738: The level of the node (5301) is not correct, (2) expected
the problem in the internal node occured (953768738), whole subtree is skipped
/  8 (of  41|/ 13 (of 151// 68 (of  86/block 823409792: The level of the node (45337) is not correct, (1) expected
the problem in the internal node occured (823409792), whole subtree is skipped
/ 14 (of 151-/ 21 (of  85\block 823409817: The level of the node (34870) is not correct, (1) expected
the problem in the internal node occured (823409817), whole subtree is skipped
/ 83 (of 151|/ 77 (of  86/block 896106502: The level of the node (51456) is not correct, (1) expected
the problem in the internal node occured (896106502), whole subtree is skipped
/ 84 (of 151-/  1 (of 170\block 896113600: The level of the node (18524) is not correct, (1) expected
the problem in the internal node occured (896113600), whole subtree is skipped
/116 (of 151// 71 (of  86|block 829600305: The level of the node (23230) is not correct, (1) expected
the problem in the internal node occured (829600305), whole subtree is skipped
/117 (of 151/block 829600306: The level of the node (29734) is not correct, (2) expected
the problem in the internal node occured (829600306), whole subtree is skipped
/ 10 (of  41|/ 89 (of 167-/ 74 (of  89|block 960669156: The level of the node (65485) is not correct, (1) expected
the problem in the internal node occured (960669156), whole subtree is skipped
/ 90 (of 167/block 960669157: The level of the node (44947) is not correct, (2) expected
the problem in the internal node occured (960669157), whole subtree is skipped
/ 12 (of  41//110 (of 170\/ 22 (of 170/block 955285512: The level of the node (0) is not correct, (1) expected
the problem in the internal node occured (955285512), whole subtree is skipped
/111 (of 170-block 955325781: The level of the node (61706) is not correct, (2) expected
the problem in the internal node occured (955325781), whole subtree is skipped
/ 13 (of  41\/ 14 (of 128\/157 (of 170|block 956704846: The level of the node (32477) is not correct, (1) expected
the problem in the internal node occured (956704846), whole subtree is skipped
/ 15 (of 128/block 956704854: The level of the node (20546) is not correct, (2) expected
the problem in the internal node occured (956704854), whole subtree is skipped
/ 14 (of  41-block 961210160: The level of the node (40442) is not correct, (3) expected
the problem in the internal node occured (961210160), whole subtree is skipped
finished     
Comparing bitmaps..vpf-10640: The on-disk and the correct bitmaps differs.
Bad nodes were found, Semantic pass skipped
31 found corruptions can be fixed only when running with --rebuild-tree
###########
reiserfsck finished at Mon Jan 19 15:59:29 2015
###########
root@UnRAID:~# reiserfsck --rebuild-tree /dev/md13
reiserfsck 3.6.24

*************************************************************
** Do not  run  the  program  with  --rebuild-tree  unless **
** something is broken and MAKE A BACKUP  before using it. **
** If you have bad sectors on a drive  it is usually a bad **
** idea to continue using it. Then you probably should get **
** a working hard drive, copy the file system from the bad **
** drive  to the good one -- dd_rescue is  a good tool for **
** that -- and only then run this program.                 **
*************************************************************

Will rebuild the filesystem (/dev/md13) tree
Will put log info to 'stdout'

Do you want to run this program?[N/Yes] (note need to type Yes if you do):Yes
Replaying journal: Done.
Reiserfs journal '/dev/md13' in blocks [18..8211]: 0 transactions replayed
###########
reiserfsck --rebuild-tree started at Mon Jan 19 16:07:51 2015
###########

Pass 0:
####### Pass 0 #######
Loading on-disk bitmap .. ok, 968504505 blocks marked used
Skipping 38019 blocks (super block, journal, bitmaps) 968466486 blocks will be read
0%.block 143556206: The number of items (644) is incorrect, should be (1) - corrected
block 143556206: The free space (4105) is incorrect, should be (209) - corrected
pass0: vpf-10110: block 143556206, item (0): Unknown item type found [2214592900 17828098 0x2040501  (15)] - deleted
block 144990516: The number of items (643) is incorrect, should be (1) - corrected
block 144990516: The free space (33792) is incorrect, should be (209) - corrected
pass0: vpf-10110: block 144990516, item (0): Unknown item type found [42139649 2264924417 0x101ff  (15)] - deleted
block 356483528: The number of items (65412) is incorrect, should be (1) - corrected
block 356483528: The free space (23895) is incorrect, should be (3793) - corrected
pass0: vpf-10110: block 356483528, item (0): Unknown item type found [143327325 2231392093 0x75d088b005d57ff  (5)] - deleted
block 434313585: The number of items (65279) is incorrect, should be (1) - corrected
block 434313585: The free space (65023) is incorrect, should be (3280) - corrected
pass0: vpf-10110: block 434313585, item (0): Unknown item type found [4261381888 33618687 0x2000a00  (15)] - deleted
block 434970917: The number of items (29710) is incorrect, should be (1) - corrected
block 434970917: The free space (62977) is incorrect, should be (2416) - corrected
pass0: vpf-10110: block 434970917, item (0): Unknown item type found [2906740068 54575346 0x320a83ffb4060  (] - deleted
block 666515829: The number of items (643) is incorrect, should be (1) - corrected
block 666515829: The free space (8243) is incorrect, should be (1233) - corrected
verify_directory_item: block 666515829, item 234980096 506135064 0xf010e18022a050f DIR (3), len 2815, location 1281 entry count 165, fsck need 0, format new: All entries were deleted from the directory
block 870973441: The number of items (1) is incorrect, should be (0) - corrected
block 870973441: The free space (0) is incorrect, should be (4072) - corrected
block 878009699: The free space (61166) is incorrect, should be (0) - corrected
pass0: vpf-10200: block 878009699, item 0: The item [4093200062 1050670404 0x1ef00002552d87d IND (1)] with wrong offset is deleted
block 911639975: The number of items (1) is incorrect, should be (0) - corrected
block 911639975: The free space (0) is incorrect, should be (4072) - corrected
                                                           left 0, 20157 /sec
33531 directory entries were hashed with "r5" hash.
"r5" hash is selected
Flushing..finished
Read blocks (but not data blocks) 968466486
	Leaves among those 812392
		- corrected leaves 69
		- leaves all contents of which could not be saved and deleted 9
	pointers in indirect items to wrong area 12884 (zeroed)
	Objectids found 34861

Pass 1 (will try to insert 812383 leaves):
####### Pass 1 #######
Looking for allocable blocks .. finished
0%                                                        left 809732, 176 /sec
Message from syslogd@UnRAID at Tue Jan 20 05:31:09 2015 ...
UnRAID kernel: EIP: [<f84703b0>] mvs_slot_task_free+0xf/0x139 [mvsas] SS:ESP 0068:f7715e38

Message from syslogd@UnRAID at Tue Jan 20 05:31:09 2015 ...
UnRAID kernel: Stack:

Message from syslogd@UnRAID at Tue Jan 20 05:31:09 2015 ...
UnRAID kernel: Code: 41 10 b9 00 00 02 00 89 04 24 89 d8 ff 96 c0 00 00 00 31 c0 83 c4 34 5b 5e 5f 5d c3 55 89 e5 57 89 c7 56 89 d6 53 89 cb 83 ec 14 <83> 79 08 00 0f 84 18 01 00 00 f6 42 14 05 75 48 8b 49 0c 85 c9

Message from syslogd@UnRAID at Tue Jan 20 05:31:09 2015 ...
UnRAID kernel: Call Trace:

Message from syslogd@UnRAID at Tue Jan 20 05:31:09 2015 ...
UnRAID kernel: Process scsi_eh_1 (pid: 862, ti=f7714000 task=ed2ab600 task.ti=f7714000)

Message from syslogd@UnRAID at Tue Jan 20 05:31:09 2015 ...
UnRAID kernel: Process scsi_eh_9 (pid: 1065, ti=f7732000 task=ed2fdb00 task.ti=f7732000)

Message from syslogd@UnRAID at Tue Jan 20 05:31:09 2015 ...
UnRAID kernel: EIP: [<f84703b0>] mvs_slot_task_free+0xf/0x139 [mvsas] SS:ESP 0068:f7733e38

Message from syslogd@UnRAID at Tue Jan 20 05:31:09 2015 ...
UnRAID kernel: Code: 41 10 b9 00 00 02 00 89 04 24 89 d8 ff 96 c0 00 00 00 31 c0 83 c4 34 5b 5e 5f 5d c3 55 89 e5 57 89 c7 56 89 d6 53 89 cb 83 ec 14 <83> 79 08 00 0f 84 18 01 00 00 f6 42 14 05 75 48 8b 49 0c 85 c9

Message from syslogd@UnRAID at Tue Jan 20 05:31:09 2015 ...
UnRAID kernel: Call Trace:

Message from syslogd@UnRAID at Tue Jan 20 05:31:09 2015 ...
UnRAID kernel: Stack:

 

So now I'm at a crossroads: do I, A) attempt another reiserfsck, or 2) RMA drive then rebuild data on replacement drive?

 

--

UPDATE:  After a forced reset due to unresponsiveness, unRAID reports an unformatted drive in the target slot (sigh)...  This will be my very first complete loss of the entire data on an unRAID drive since I started with version 4.x five years ago.  It's a media server so I can recover the lost videos (time consuming and laborious), but what could have I done differently to avoid this?  I've performed several reiserfsck --rebuild-tree commands in the past with success: should have I instead installed a brand new drive and rebuilt the data?  There were no indications I could find of detected hardware failure of any sort, nor was it approaching or exceeded maximum recommended power-on hours.

syslog-2015-01-15.txt.zip

Link to comment

Note that if you have started the reiserfsck --rebuild-tree then until it completes unRAID will always show the drive as unformatted if you start the array.  This is because mounts will be failing.  If you can later get the reiserfsck to complete then it will show up again as formatted.

 

As to why you should have had a problem in the first place I have no idea.

Link to comment

Note that if you have started the reiserfsck --rebuild-tree then until it completes unRAID will always show the drive as unformatted if you start the array.  This is because mounts will be failing.  If you can later get the reiserfsck to complete then it will show up again as formatted.

 

Hmmm...  So the possibility exists that I may still be able to recover at least some of the data off the drive then.

 

I just packaged the drive, but will reinstall and initiate another reiserfsck.  Hopefully it will complete successfully, but regardless, I will then RMA the drive.

Link to comment

Well, the first problem that I encountered is that unRAID no longer sees the drive as part of the array ("not installed"), though it acknowledges it's presence as "unformatted".

 

I'm starting the reiserfsck "check disk" option, but I'm afraid that if unRAID does not have it as part of the array, any tree rebuilding may not reflect on the parity drive (if needed).  Maybe any files will still be recovered and viewable but outside of the array, requiring that I manually transfer the files to a new replacement drive added to the array itself (taking up slot 13 the suspect drive previously held).

Link to comment

So far, no good:

 

root@UnRAID:~# reiserfsck /dev/md13
reiserfsck 3.6.24

Will read-only check consistency of the filesystem on /dev/md13
Will put log info to 'stdout'

Do you want to run this program?[N/Yes] (note need to type Yes if you do):Yes
###########
reiserfsck --check started at Tue Jan 20 07:01:39 2015
###########
Replaying journal: Done.
Reiserfs journal '/dev/md13' in blocks [18..8211]: 0 transactions replayed
Checking internal tree..  

Bad root block 0. (--rebuild-tree did not complete)

Aborted (core dumped)

 

I'm hoping that it just spit out the results of the previous reiserfsck which was a complete lockup of the computer (which I never experienced before).  Running it a second time hoping it will initiate a brand new analysis and subsequent recovery recommendations...

 

--

UPDATE: Nope.  Same result and process aborted.  So basically all data is now gone from the drive.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...