Replaced bad drive - now getting "can't read superblock"

ScoHo · March 18, 2012

So I replaced a seemingly bad drive details on that here - (for some reason it doesn't seem to like http links so you might have to copy and paste) - lime-technology.com/forum/index.php?topic=18934.0

After doing the rebuild, I wasn't able to write to any drive on the unRAID server, everything was read-only.

So I did a filesystem check (via unMenu), and it returned the following:

Samba Stopped
/dev/md1 Unmounted

Checking /dev/md1 (/dev/sdd)
reiserfsck 3.6.21 (2009 www.namesys.com)

Will read-only check consistency of the filesystem on /dev/md1
Will put log info to 'stdout'
###########
reiserfsck --check started at Fri Mar 16 09:06:13 2012
###########
Replaying journal: Trans replayed: mountid 257, transid 523003, desc 2005, len 1, commit 2007, next trans offset 1990
Replaying journal: | | 0.1% 1 trans Trans replayed: mountid 257, transid 523004, desc 2008, len 1, commit 2010, next trans offset 1993
Replaying journal: | / 0.2% 2 trans Replaying journal: Done.
Reiserfs journal '/dev/md1' in blocks [18..8211]: 2 transactions replayed
Checking internal tree.. finished
Comparing bitmaps..Bad nodes were found, Semantic pass skipped
1 found corruptions can be fixed only when running with --rebuild-tree
###########
reiserfsck finished at Fri Mar 16 10:05:20 2012
###########
block 29907467: The level of the node (2319) is not correct, (1) expected
the problem in the internal node occured (29907467), whole subtree is skipped
vpf-10640: The on-disk and the correct bitmaps differs.

/dev/md1 mounted on /mnt/disk1

Samba Started

So I ran --rebuild-tree:

root@fileserver:~# cd
root@fileserver:~# samba stop
root@fileserver:~# umount /dev/md1
root@fileserver:~# reiserfsck --rebuild-tree /dev/md1
reiserfsck 3.6.21 (2009 www.namesys.com)

Will rebuild the filesystem (/dev/md1) tree
Will put log info to 'stdout'

Do you want to run this program?[N/Yes] (note need to type Yes if you do):Yes
Replaying journal: Trans replayed: mountid 258, transid 523005, desc 2011, len 1, commit 2013, next trans offset 1996
Replaying journal: Done.
Reiserfs journal '/dev/md1' in blocks [18..8211]: 1 transactions replayed
###########
reiserfsck --rebuild-tree started at Fri Mar 16 11:09:12 2012
###########

Pass 0:
####### Pass 0 #######
Loading on-disk bitmap .. ok, 457791619 blocks marked used
Skipping 23115 blocks (super block, journal, bitmaps) 457768504 blocks will be read
Killed

Wasn't really sure at that point if everything was good, so I tried to mount the new drive:

root@fileserver:~# mount /dev/md1 /mnt/disk1
mount: /dev/md1: can't read superblock

So does this mean I have to run --rebuild-sb? Now I'm nervous because it seems all of my data is there - after I did the rebuild and everything was online, all my data was there, including the new disc. I just couldn't save anything to the server.

ScoHo · March 19, 2012

I rebooted the server from the command line and now that it's back up it's saying disk1 is unformatted. Did I lose everything on that disk?

Joe L. · March 19, 2012

No, it means the rebuild did not complete since it was "killed" un-Formatted simply indicates the reiser file system could not be mounted. Un-Mounted would be a better error message, but that (improving the error message) is a topic for another day. It should say it could not be mounted.

You still need to complete a repair of the file system.

Typically, reiserfsck will be killed by the OS if it does not have enough memory to run. I think you missed that innocent looking "killed" error output.

In some cases, disabling user-shares will help free up some memory, or adding a swap file... certainly terminating any add-ons using memory are in order. disable cache_dirs if you are running it.

ScoHo · March 19, 2012

Thanks Joe. I was definitely wondering about that "killed" return.

I disabled the C compiler and development tools add on (gcc, glibc, binutils) and it looks like it's running now...although taking forever. Been running now for about 10 hours. How long does a rebuild-tree normally take? It's a 2TB drive that's about 90% full.

For the life of me I can't figure out why I had that gcc plug-in installed. Is there some other add-on that requires it that may have prompted me to install it?

UGH...in an attempt to copy and paste a line from my telnet session I just killed the rebuild-tree. Back to square one...

ScoHo · March 19, 2012

After about 8 hours, it got through Pass 0, but then "Killed" again right after it started Pass 1:

root@fileserver:~# reiserfsck --rebuild-tree /dev/md1
reiserfsck 3.6.21 (2009 www.namesys.com)

Will rebuild the filesystem (/dev/md1) tree
Will put log info to 'stdout'

Do you want to run this program?[N/Yes] (note need to type Yes if you do):Yes
Replaying journal: Done.
Reiserfs journal '/dev/md1' in blocks [18..8211]: 0 transactions replayed
###########
reiserfsck --rebuild-tree started at Mon Mar 19 09:53:29 2012
###########

Pass 0:
####### Pass 0 #######
Loading on-disk bitmap .. ok, 457791619 blocks marked used
Skipping 23115 blocks (super block, journal, bitmaps) 457768504 blocks will be read
0%block 1933853: The number of items (80) is incorrect, should be (1) - corrected
block 1933853: The free space (0) is incorrect, should be (4048) - corrected
pass0: vpf-10110: block 1933853, item (0): Unknown item type found [80 0 0x50000000  (15)] - deleted
..block 212733938: The number of items (2) is incorrect, should be (0) - corrected
block 212733938: The free space (32) is incorrect, should be (4072) - corrected
block 228283671: The number of items (2) is incorrect, should be (0) - corrected
block 228283671: The free space (10) is incorrect, should be (4072) - corrected
block 228322973: The number of items (4) is incorrect, should be (0) - corrected
block 228322973: The free space (1) is incorrect, should be (4072) - corrected
block 228851584: The number of items (1) is incorrect, should be (0) - corrected
block 228851584: The free space (6) is incorrect, should be (4072) - corrected
block 238391861: The number of items (3) is incorrect, should be (0) - corrected
block 238391861: The free space (0) is incorrect, should be (4072) - corrected
block 398033057: The number of items (1) is incorrect, should be (0) - corrected
block 398033057: The free space (6) is incorrect, should be (4072) - corrected
block 399964486: The number of items (65024) is incorrect, should be (1) - corrected
block 399964486: The free space (65535) is incorrect, should be (2256) - corrected
pass0: vpf-10110: block 399964486, item (0): Unknown item type found [4261413121 184549375 0x9e000a00  (15)] - deleted
block 400158938: The number of items (1) is incorrect, should be (0) - corrected
block 400158938: The free space (65482) is incorrect, should be (4072) - corrected
block 400812381: The number of items (1) is incorrect, should be (0) - corrected
block 400812381: The free space (17) is incorrect, should be (4072) - corrected
block 401326673: The number of items (1) is incorrect, should be (0) - corrected
block 401326673: The free space (65481) is incorrect, should be (4072) - corrected
block 488332902: The number of items (3) is incorrect, should be (0) - corrected
block 488332902: The free space (0) is incorrect, should be (4072) - corrected
                                                          left 0, 19666 /sec
35643 directory entries were hashed with "r5" hash.
        "r5" hash is selected
Flushing..finished
        Read blocks (but not data blocks) 457768504
                Leaves among those 459718
                        - leaves all contents of which could not be saved and deleted 80
                Objectids found 35603

Pass 1 (will try to insert 459638 leaves):
####### Pass 1 #######
Killed
root@fileserver:~#

So do I need to run this thing yet again? I'm not sure how to free up more memory than I already have. I have 1GB of memory and not many add-ons. I do have CrashPlan installed, should I disable that? I see unMenu has a way to install a swap file via package manager. Would really suck to run it for 8 hours only to have it crash again.

Joe L. · March 21, 2012

After about 8 hours, it got through Pass 0, but then "Killed" again right after it started Pass 1:

root@fileserver:~# reiserfsck --rebuild-tree /dev/md1
reiserfsck 3.6.21 (2009 www.namesys.com)

Will rebuild the filesystem (/dev/md1) tree
Will put log info to 'stdout'

Do you want to run this program?[N/Yes] (note need to type Yes if you do):Yes
Replaying journal: Done.
Reiserfs journal '/dev/md1' in blocks [18..8211]: 0 transactions replayed
###########
reiserfsck --rebuild-tree started at Mon Mar 19 09:53:29 2012
###########

Pass 0:
####### Pass 0 #######
Loading on-disk bitmap .. ok, 457791619 blocks marked used
Skipping 23115 blocks (super block, journal, bitmaps) 457768504 blocks will be read
0%block 1933853: The number of items (80) is incorrect, should be (1) - corrected
block 1933853: The free space (0) is incorrect, should be (4048) - corrected
pass0: vpf-10110: block 1933853, item (0): Unknown item type found [80 0 0x50000000  (15)] - deleted
..block 212733938: The number of items (2) is incorrect, should be (0) - corrected
block 212733938: The free space (32) is incorrect, should be (4072) - corrected
block 228283671: The number of items (2) is incorrect, should be (0) - corrected
block 228283671: The free space (10) is incorrect, should be (4072) - corrected
block 228322973: The number of items (4) is incorrect, should be (0) - corrected
block 228322973: The free space (1) is incorrect, should be (4072) - corrected
block 228851584: The number of items (1) is incorrect, should be (0) - corrected
block 228851584: The free space (6) is incorrect, should be (4072) - corrected
block 238391861: The number of items (3) is incorrect, should be (0) - corrected
block 238391861: The free space (0) is incorrect, should be (4072) - corrected
block 398033057: The number of items (1) is incorrect, should be (0) - corrected
block 398033057: The free space (6) is incorrect, should be (4072) - corrected
block 399964486: The number of items (65024) is incorrect, should be (1) - corrected
block 399964486: The free space (65535) is incorrect, should be (2256) - corrected
pass0: vpf-10110: block 399964486, item (0): Unknown item type found [4261413121 184549375 0x9e000a00  (15)] - deleted
block 400158938: The number of items (1) is incorrect, should be (0) - corrected
block 400158938: The free space (65482) is incorrect, should be (4072) - corrected
block 400812381: The number of items (1) is incorrect, should be (0) - corrected
block 400812381: The free space (17) is incorrect, should be (4072) - corrected
block 401326673: The number of items (1) is incorrect, should be (0) - corrected
block 401326673: The free space (65481) is incorrect, should be (4072) - corrected
block 488332902: The number of items (3) is incorrect, should be (0) - corrected
block 488332902: The free space (0) is incorrect, should be (4072) - corrected
                                                          left 0, 19666 /sec
35643 directory entries were hashed with "r5" hash.
        "r5" hash is selected
Flushing..finished
        Read blocks (but not data blocks) 457768504
                Leaves among those 459718
                        - leaves all contents of which could not be saved and deleted 80
                Objectids found 35603

Pass 1 (will try to insert 459638 leaves):
####### Pass 1 #######
Killed
root@fileserver:~#

So do I need to run this thing yet again? I'm not sure how to free up more memory than I already have. I have 1GB of memory and not many add-ons. I do have CrashPlan installed, should I disable that? I see unMenu has a way to install a swap file via package manager. Would really suck to run it for 8 hours only to have it crash again.

You MUST either get more memory, or stop ALL your add-ons, and probably also install a swap file. The 1GB of memory is apparently not enough when dealing with the larger disks these days.

It is possible installing a swap-file will do it on its own. But certainly stop ALL other processes you've added.

Whatever you do, while running the repair try to minimize the use of the memory.

Don't play media, as playing a 1Gi file will use up all your RAM as cache. Don't use cache_dirs... cannot spare the memory right now for it.

You might even reboot to clear out all from the RAM that you can.

If still not enough, you might need to stop the array to free up more memory from the user-shares and then perform the repair on the /dev/sdX1 partition of the affected disk. If you do that you will then need to perform a correcting parity sync once you get the repair completed, as parity will be out of sync since you could not use the /dev/mdX device with the array stopped.

Joe L.

ScoHo · March 21, 2012

The rebuild finally finished on the third try when I enabled the swap file. Good news, the data looks good. Two small files in the lost+found directory with no extensions and cryptic names...I have no idea what they are.

Should I go ahead and run a parity check now?

Also, I disabled the swapfile...is it okay to remove that .unraid.swapfile from my flash drive now?

Thanks a lot for your help Joe. To be safe I went ahead and ordered up 4GB of memory to replace the old makeshift 1GB I have in there now. Was hoping maybe it will also fix the random intermittent crash problem I've been having.

I do have one general question about one thing you said - about not playing media, etc while the rebuild was happening. If you're required to stop samba while doing the rebuild, how would I have been able to access any of the files on the server? My server has basically been offline for the last week since this all started.

Replaced bad drive - now getting "can't read superblock"

Recommended Posts

ScoHo

Link to comment

ScoHo

Link to comment

Joe L.

Link to comment

ScoHo

Link to comment

ScoHo

Link to comment

Joe L.

Link to comment

ScoHo

Link to comment

Archived