Crashes since beta9


Dimtar

Recommended Posts

Attempting to reboot manually:

 

/root/samba stop

- appears to work, command prompt returns

- umount /dev/md1 hangs ssh session

- umount /dev/md2 results in

 

umount: /mnt/disk2: device is busy.

        (In some cases useful info about processes that use

        the device is found by lsof(8) or fuser(1))

 

- umount /dev/md3 hangs ssh session

- umount /dev/md4 hangs ssh session

- umount /dev/md5 hangs ssh session

- umount /dev/md6 hangs ssh session

- umount /dev/sdl1 hangs ssh session (this is cache)

 

run df -h results in:

Filesystem      Size  Used Avail Use% Mounted on

/dev/sda1      3.9G  293M  3.6G  8% /boot

/dev/md1        16G  432M  16G  3% /mnt/disk1

/dev/md2        1.9T  1.5T  332G  83% /mnt/disk2

shfs            1.9T  1.5T  407G  80% /mnt/user0

shfs            2.0T  1.5T  422G  79% /mnt/user

 

So some drives unmounted.  Now I try this:

 

umount /dev/md1 which results in:

 

umount: /mnt/disk1: not mounted

 

I re-run df -h:

 

Filesystem      Size  Used Avail Use% Mounted on

/dev/sda1      3.9G  293M  3.6G  8% /boot

/dev/md2        1.9T  1.5T  332G  83% /mnt/disk2

/dev/md5        16G  432M  16G  3% /mnt/disk5

shfs            1.9T  1.5T  407G  80% /mnt/user0

shfs            2.0T  1.5T  422G  79% /mnt/user

 

So I try to umount /dev/md2 - it's still busy

So I try to umount /dev/md5 - now I get:

 

umount: /mnt/disk5: not mounted

 

I re-run df -h and get:

 

Filesystem      Size  Used Avail Use% Mounted on

/dev/sda1      3.9G  293M  3.6G  8% /boot

/dev/md2        1.9T  1.5T  332G  83% /mnt/disk2

/dev/md6        16G  432M  16G  3% /mnt/disk6

shfs            1.9T  1.5T  407G  80% /mnt/user0

shfs            2.0T  1.5T  422G  79% /mnt/user

 

Re-running the umount commands, I get:

 

Filesystem      Size  Used Avail Use% Mounted on

/dev/sda1      3.9G  295M  3.6G  8% /boot

/dev/md2        1.9T  1.5T  332G  83% /mnt/disk2

shfs            1.9T  1.5T  407G  80% /mnt/user0

shfs            2.0T  1.5T  422G  79% /mnt/user

 

So disk2 isn't going to un-mount.  Fine.  I'll try to proceed, so I run:

 

/root/mdcmd stop

 

I get: /root/mdcmd: line 11: echo: write error: Device or resource busy

 

So I issue powerdown - to no avail, the system just sits there.  So I hit the power button, and the system hangs on unmounting remote filesystems.  There are a bunch of repeating lines on the console like the following:

 

Cannot stat file /proc/25200/fd/32: No such file or directory

Cannot stat file /proc/25202/fd/26: No such file or directory

Cannot stat file /proc/25202/fd/26: No such file or directory

Cannot stat file /proc/25206/fd/31: No such file or directory

Cannot stat file /proc/25206/fd/31: No such file or directory

Cannot stat file /proc/25209/fd/26: No such file or directory

Cannot stat file /proc/25209/fd/26: No such file or directory

 

etc.

 

hard shut down by holding the power button down.  I closed all putty sessions but no logs were saved (entirely possible I missed a message about this as I had more than 10 putty sessions open from trying to unmount drives).  Attached is the last backup of the syslog taken about 20 minutes before totally giving up (before the manual shutdown procedure).

 

syslog.sept272014.7.zip

Link to comment

I have been having similar crashes, and for some reason i found if I boot into Xen and run as normal with no VM's it hasn't crashed.

 

Trying to get a window to swap my IO card to try to rule that out, but it might be worth trying to see if you still get it to crash under the Xen boot.

 

Funnily enough, i was the person that suggested swapping the cards between your servers.  I'll see what happens on 6b9 in xen mode.  Suggest you try 6b6 normal boot on your problematic system.

Link to comment

I thought that was you, but didn't go back and look.. :)

 

I am running one more parity check right now under non-Xen with the current card just to make sure it is consistently crashing so when I swap it, I can do a good test.  I don't really have good hope since that is already the second card I have tried.  I had a different manufacture sata (non-SAS) card that was using a Marvel chip as well that it was crashing the same way.  Robj had suggested replacing the card, so I replaced it with a model I knew was working.

Link to comment
  • 3 weeks later...

Server crashed again last night, first time in weeks but I didn't get a log. It feels like it hits a corrupt file of some type?

There were errors on the screen but I couldn't enlarge the view to read the text, the only thing I noticed was there was a good 8 errors and they all looked nearly identical.

Link to comment
  • 1 month later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.