"Powerdown" script failure

FreeMan · February 3, 2014

Running unRAID 5.0.4 Pro. I installed Dynamix & the email notification plugin, and suddenly both Dynamix and unMENU stopped responding (posted about that issue here: http://lime-technology.com/forum/index.php?topic=30939.msg287337#msg287337). The server itself was still up, as I could browse files, stream audio & video, CP, Sick & SAB continued to work just fine. Just no web interface to the server.

I decided to telnet in to attempt to power down the server in the hopes that a reboot would at least get me back to my interface, but the powerdown script has failed.

I executed powerdown and got this:

root@NAS:~# powerdown
Capturing information to syslog. Please wait...
version[11326]: Linux version 3.9.11p-unRAID (root@Develop) (gcc version 4.4.4 (GCC) ) #4 SMP Sat Nov 23 11:30:35 PST 2013
ls: cannot access /dev/hd[a-z]: No such file or directory
ls: cannot access /dev/hd[a-z]: No such file or directory
/etc/rc.d/rc.unRAID: line 84: ${FILE}: ambiguous redirect

I've attached /etc/rd.d/rc.unRAID (as .txt so the board's happy with it) and /var/log/syslog (as a zip so it will fit). I have not touched rc.unRAID, though I don't know if anything else has. Does anybody see anything wrong with rc.unRAID, or have any suggestions on what to do next?

Thanks,

FreeMan

rc.unRAID.txt

syslog.zip

dgaschk · February 3, 2014

Need to see the entire syslog.

WeeboTech · February 3, 2014


    BC=/boot/config
    for FILE in ${BC}/*.cfg ${BC}/shares/*
    do  BFILE=${FILE##*/}  # Basename of FILE
        logger -t${BFILE} < ${FILE}
    done

dump the two directories in question, something odd might be there throwing it all off.

It's a simple loop to capture the config files into the syslog.

FreeMan · February 4, 2014

Thanks guys.

root@NAS:/boot/config# ls -la
total 896
drwxrwxrwx 4 root root 32768 2014-01-29 23:30 ./
drwxrwxrwx 9 root root 32768 2014-02-03 19:34 ../
-rwxrwxrwx 1 root root   256 2013-01-18 21:00 Pro.key*
-rwxrwxrwx 1 root root  8203 2014-01-15 15:22 disk.cfg*
-rwxrwxrwx 1 root root   619 2014-01-27 00:13 go*
-rwxrwxrwx 1 root root   376 2013-12-23 22:24 go.BAK*
-rwxrwxrwx 1 root root   247 2013-01-09 00:00 go~*
-rwxrwxrwx 1 root root   326 2014-01-29 23:30 ident.cfg*
-rwxrwxrwx 1 root root   173 2012-12-29 19:04 network.cfg*
-rwxrwxrwx 1 root root   800 2013-12-25 19:39 passwd*
-rwxrwxrwx 1 root root   920 2012-12-30 09:24 passwd.OLD*
drwxrwxrwx 9 root root 32768 2014-01-30 20:09 plugins/
-rwxrwxrwx 1 root root   526 2013-12-25 19:39 shadow*
-rwxrwxrwx 1 root root   499 2014-01-29 23:30 share.cfg*
drwxrwxrwx 2 root root 32768 2014-01-16 20:30 shares/
-rwxrwxrwx 1 root root   150 2013-12-25 16:45 smb-extra.conf*
-rwxrwxrwx 1 root root   196 2013-12-25 16:33 smb-extra.conf.Dec25-2013-163326*
-rwxrwxrwx 1 root root   150 2013-12-27 09:16 smb-extra.conf.Dec27-2013-091607*
-rwxrwxrwx 1 root root   150 2013-12-27 09:31 smb-extra.conf.Dec27-2013-093128*
-rwxrwxrwx 1 root root   150 2013-12-27 09:54 smb-extra.conf.Dec27-2013-095402*
-rwxrwxrwx 1 root root   150 2013-12-27 13:43 smb-extra.conf.Dec27-2013-134325*
-rwxrwxrwx 1 root root   150 2014-01-04 22:10 smb-extra.conf.Jan04-2014-221010*
-rwxrwxrwx 1 root root   117 2013-01-09 22:10 smb-extra.conf.Jan09-2013-221022*
-rwxrwxrwx 1 root root   196 2013-10-27 10:19 smb-extra.conf~*
-rwxrwxrwx 1 root root   204 2013-12-25 19:39 smbpasswd*
-rwxrwxrwx 1 root root   101 2012-12-30 09:24 smbpasswd.OLD*
-rwxrwxrwx 1 root root  4096 2014-01-30 00:24 super.dat*
-rwxrwxrwx 1 root root  4096 2013-12-31 12:20 super.old*
root@NAS:/boot/config#

root@NAS:/boot/config/shares# ls -la
total 384
drwxrwxrwx 2 root root 32768 2014-01-16 20:30 ./
drwxrwxrwx 4 root root 32768 2014-01-29 23:30 ../
-rwxrwxrwx 1 root root   449 2014-01-30 15:56 Audio.cfg*
-rwxrwxrwx 1 root root   484 2014-01-30 14:40 Backups.cfg*
-rwxrwxrwx 1 root root   254 2013-01-30 22:25 F1.cfg*
-rwxrwxrwx 1 root root   449 2014-01-30 14:40 Home\ Movies.cfg*
-rwxrwxrwx 1 root root   457 2014-01-30 14:41 Movies.cfg*
-rwxrwxrwx 1 root root   260 2013-03-30 11:48 Photo.cfg*
-rwxrwxrwx 1 root root   449 2014-01-30 14:41 Photos.cfg*
-rwxrwxrwx 1 root root   456 2014-01-30 14:42 Sport.cfg*
-rwxrwxrwx 1 root root   456 2014-01-30 16:39 TV.cfg*
-rwxrwxrwx 1 root root   465 2014-01-06 16:28 apps.cfg*

and the complete syslog (from boot sometime in Jan) is attached.

syslog.zip

WeeboTech · February 4, 2014

it's a minor issue.

The Home Movies.cfg is messing things up because it's not quoted in the script.

Does the script abend or just keep going.

FreeMan · February 4, 2014

It's been hung since 7ish this morning. A Ctrl-C didn't break it, either.

Do I need to modify the script to quote the directories? I thought the \ escaped the space to make it work - it does from the command line...

It would probably be easier to take the space out of the directory name than mess with scripts. Spaces are highly over rated anyway, unlike commas.

WeeboTech · February 4, 2014

you can try pressing ctrl-d, not sure of that will work.

You can try to telnet in from somewhere else.

do a ps -ef

look for a logger process

then kill it's pid.

Here's a brief example.

root@unRAID:~# ps -ef | grep logger

root 5643 5634 0 20:09 pts/0 00:00:00 logger -tHome

root 5655 5645 0 20:10 pts/1 00:00:00 grep logger

root@unRAID:~# kill 5643

root@unRAID:~# ps -ef | grep logger

root 5657 5645 0 20:10 pts/1 00:00:00 grep logger

This might be the issue of 'hung' rc.unRAID scripts.

The correct code fix would be.

  BC=/boot/config
    for FILE in ${BC}/*.cfg ${BC}/shares/*
    do  BFILE="${FILE##*/}"  # Basename of FILE
        logger -t"${BFILE}" < "${FILE}"
    done

FreeMan · February 4, 2014

Unfortunately, killing the logger process didn't do anything to restart the script - it still seems to be sitting there doing nothing

ps -ef|grep power

shows the powerdown script process is still running, too. Should I kill that and attempt to restart after applying the change you suggested?

WeeboTech · February 4, 2014

Killing the logger will not restart the script.

it will kill the waiting for input 'logger process'.

The diagnostic dump routine loops on every config so if there are more then one with the space, it will happen over and over.

If you start it again from the console do it with stdin redirected to < /dev/null

/etc/rc.d/rc.unRAID stop </dev/null

Frankly, if the whole server stopped responding, there's something else also gumming things up.

You can try and edit the script before doing anything else.

You can try and umount each filesystem manually

then issue the /root/mdcmd stop command

This is a quick script that will show what is active on the array.

#!/bin/bash

for fs in /mnt/user /mnt/disk*    
do  [ ! -d ${fs} ] && continue
    for pid in $(fuser -cu $fs 2>/dev/null)            
    do  ps --no-headers -fp ${pid}                
        kill -0 ${pid} 2>/dev/null && kill -0 ${pid}                
    done
done

Right now it does a kill -0 which tests if the pid is active.

When you are actually ready to kill the suckers, change the second kill -0 to be a kill -TERM

if that doesn't work, do a kill -9

You can run this lil scriptlet called /tmp/psmounts multiple times.

Keep in mind it will kill anything on the array indiscriminately.

-TERM is what the system normally sends when shutting down.

Try that first.

-9 is an untrappable signal, that's a last resort.

this is how to unmount the disks in a loop

#!/bin/bash

    for disk in /mnt/disk* /mnt/cache
    do  /bin/umount -v ${disk}
    done

and finally to stop the array.

echo stop > /proc/mdcmd

FreeMan · February 4, 2014

you can try pressing ctrl-d, not sure of that will work.

You can try to telnet in from somewhere else.

do a ps -ef

look for a logger process

then kill it's pid.

Here's a brief example.

root@unRAID:~# ps -ef | grep logger

root 5643 5634 0 20:09 pts/0 00:00:00 logger -tHome

root 5655 5645 0 20:10 pts/1 00:00:00 grep logger

root@unRAID:~# kill 5643

root@unRAID:~# ps -ef | grep logger

root 5657 5645 0 20:10 pts/1 00:00:00 grep logger

This might be the issue of 'hung' rc.unRAID scripts.

The correct code fix would be.
  BC=/boot/config
    for FILE in ${BC}/*.cfg ${BC}/shares/*
    do  BFILE="${FILE##*/}"  # Basename of FILE
        logger -t"${BFILE}" < "${FILE}"
    done

I tried updating rc.unRAID, and it's still not shutting down.

root@NAS:/etc/rc.d# ls
rc.0@            rc.avahidnsconfd*   rc.inetd*           rc.syslog*
rc.4*            rc.cachedirs*       rc.local*           rc.sysstat*
rc.6*            rc.couchpotato_v2*  rc.local_shutdown*  rc.sysvinit*
rc.K*            rc.darkstat*        rc.messagebus*      rc.udev*
rc.M*            rc.emailnotify*     rc.nfsd*            rc.unRAID*
rc.S*            rc.fuse*            rc.ntpd*            rc.unRAID.OLD*
rc.acpid*        rc.ifplugd*         rc.rpc*             unraid.d/
rc.apcupsd*      rc.inet1*           rc.samba*
rc.atalk*        rc.inet1.conf       rc.sickbeard*
rc.avahidaemon*  rc.inet2*           rc.subsonic*
root@NAS:/etc/rc.d# diff rc.unRAID rc.unRAID.OLD
83,84c83,84
<     do  BFILE="${FILE##*/}"  # Basename of FILE
<         logger -t"${BFILE}" < "${FILE}"
---
>     do  BFILE=${FILE##*/}  # Basename of FILE
>         logger -t${BFILE} < ${FILE}
root@NAS:/etc/rc.d#

It does seem to have hung in a different place now.

root@NAS:/# powerdown
Capturing information to syslog. Please wait...
version[21673]: Linux version 3.9.11p-unRAID (root@Develop) (gcc version 4.4.4 (GCC) ) #4 SMP Sat Nov 23 11:30:35 PST 2013
ls: cannot access /dev/hd[a-z]: No such file or directory
ls: cannot access /dev/hd[a-z]: No such file or directory

syslog attached

Killing the logger will not restart the script.

I knew that, I was hoping terminating the logger process would allow whatever part of the script that kicked it off to continue...

Before I posted, I got to thinking (I manage that on occasion). The two "ls:" errors are probably because I have no PATA drives attached, so I shouldn't be worried about them - normally processing would continue and I'd never even see them.

I think that it's hung on the first of these lines:

    lspci          2>&1 | logger -tpspci
    lsmod          2>&1 | logger -tlsmod
    ifconfig eth0  2>&1 | logger -tifconfig

since I don't see anything about them in in the syslog. Am I on the right track?

I ran your first little code loop and this is what it shows me:

root@NAS:/boot# show_active.sh
nobody   12438 24236  0 17:54 ?        00:00:04 /usr/sbin/smbd -D
root     24361 24236  0 Jan29 ?        00:00:32 /usr/sbin/smbd -D
nobody   30679 24236  0 Feb03 ?        00:01:18 /usr/sbin/smbd -D
root     24222     1  2 Jan29 ?        03:21:14 /usr/local/sbin/shfs /mnt/user -
root@NAS:/boot#

but I don't know what to make of it. I've got two files downloading to the server at the moment, but I'm not too concerned if they die & I have to restart them. Otherwise, it's pretty idle.

syslog.zip

WeeboTech · February 5, 2014

The output of these commands is in the syslog.

lspci          2>&1 | logger -tpspci
lsmod          2>&1 | logger -tlsmod
ifconfig eth0  2>&1 | logger -tifconfig

This is above the diagnostic_dump and capture of the .cfg files.

which show.

Feb  4 18:11:25 NAS apps.cfg[21743]: shareReadListAFP=""^M
Feb  4 18:11:25 NAS apps.cfg[21743]: shareWriteListAFP=""^M
Feb  4 18:11:25 NAS apps.cfg[21743]: shareVolsizelimitAFP=""^M

What I do not see is the output of these commands.

    if [ -e /proc/mdcmd ]
       then echo status > /proc/mdcmd
           sleep 1
            logger -tmdcmd < /proc/mdcmd
    fi

    unRAID_status       | logger -tstatus -s

perhaps it hung during the loop or a lil further down.

when doing the ps -ef what does the logger line say?

ps -ef | grep logger

You can comment out the diagnostic_dump routine in the unraid_stop function if you need to shut down.

# Stop unraid:
unRAID_stop() 
{
    logger "Stopping unRAID."


    diagnostic_dump

or you can debug the script with tracing like this.

DEBUG=3 /etc/rc.d/rc.unRAID status

or

DEBUG=3 /etc/rc.d/rc.unRAID stop

The first one only runs the diagnostic_dump function.

FreeMan · February 5, 2014

Ah, OK. My first thought was the /proc/mdcmd loop, but I didn't recognize the output of the lines above it, so I thought maybe it died earlier.

The logger is not running at all at this point.

ps -ef|grep logger

shows nothing but itself

It must be the /proc/mdcmd that is dying. DEBUG=3 /etc/rc.d/rc.unRAID status shows:

+ MDCMDTMP=/tmp/mdcmd.31211
+ touch /tmp/mdcmd.31211
+ trap 'rm -f /tmp/mdcmd.31211' EXIT HUP INT QUIT TERM
+ '[' '!' -z '' ']'
+ MDCMD=/proc/mdcmd
+ '[' -e /proc/mdcmd ']'
+ echo status

Unfortunately, that's another PuTTY session hung - it hasn't returned to a prompt, either. Ctrl-C, Ctrl-Break and Ctrl-D don't do anything. (Yeah, I can kill the windows session by hitting that big X, just throwing a little extra info your way...)

The show active loop you gave me in the previous message showed three /usr/sbin/smbd -D commands. Can those be safely killed?

Is /etc/rc.d/rc.unRAID something that is normally modified by various and sundry package installs? If not, is mine standard (other than the quotes for the space in the share name)?

WeeboTech · February 5, 2014

For some reason the echo status > /proc/mdcmd is hung

Try this in another session, maybe it will free up some resource somewhere.

cat < /proc/mdcmd

you can manually stop samba with following in another session.

/etc/rc.d/rc.samba status

As I mentioned, if you comment out the diagnostic_dump in the stop section, it may proceed further through.

however at some point in the stop script, we will encounter the following chunklet.

    if [ -e /proc/mdcmd ] 
       then logger "Stopping the Array"
            echo status > /proc/mdcmd
            cat < /proc/mdcmd | tr -d '\000' > /tmp/mdcmd.$$.1
            echo stop > /proc/mdcmd
            sleep 3
            echo status > /proc/mdcmd
            cat < /proc/mdcmd | tr -d '\000' > /tmp/mdcmd.$$.2
            diff -u /tmp/mdcmd.$$.1 /tmp/mdcmd.$$.2 | logger -t mdstatusdiff
            rm   -f /tmp/mdcmd.$$.1 /tmp/mdcmd.$$.2
    fi

which will probably hang again.

if I were messing around, I would do this a few time to see if things get free'ed up.

cat < /proc/mdcmd

I would do it for however many times and probably a few more to see if the md driver hit some kind of deadlock.

stop samba (as above).

kill the shfs with the other scriptlet I provided.

cat < /proc/mdcmd

a few times

What I might do is put the trace on with

/root/mdcmd set md_trace 1

and then do the /root/mdcmd stop

if it hangs, there's nothing we can do, it's some kind of driver issue.

let's see if the cat < /proc/mdcmd frees some resources

WeeboTech · February 5, 2014

Is /etc/rc.d/rc.unRAID something that is normally modified by various and sundry package installs? If not, is mine standard (other than the quotes for the space in the share name)?

It's part of the powerdown package. Not a standard unRAID script, but the powerdown unRAID script.

It was part of an ambitious project to add some sort of start/stop/diagnostic plugin functionality before 5.x announced plugins.

FreeMan · February 5, 2014

OK, I tried your suggestions, and now I can't log on to the server from a PuTTy session...

In the 4 session screen shot, they are arranged in order left to right top to bottom of what I did. Now I get

NAS login: root
Password:
Login incorrect

NAS login:

when I try to log in. Yes, I've tried several times, and typed my password very slowly and carefully - I'm sure I got it right.

Also, despite my attempt to stop samba, I can still browse the server from my Win machine.

Not at all angry at you, just annoyed by the situation. Appreciative of your attempts to help.

WeeboTech · February 5, 2014

If you cannot login, there's more going wrong here then the powerdown script.

if cat < /proc/mdcmd doesn't return anything, then there are other issues with the md driver and/or kernel and/or memory.

Maybe you should reach out to Tom.

I fear if you cannot login you cannot do much.

FreeMan · February 6, 2014

Thanks for your help, Weebo. I've PM'd Tom.

FreeMan · February 9, 2014

Disappointed, I never heard back from Tom. I realize he's got a lot on his plate.

I discovered that I could log in from the console, so I tried a couple of your suggestions there, but powerdown still hung. I ended up holding the power switch until it shut down.

Parity check is running. A couple of plug ins didn't seem to start up properly, but that's really the minor issue. I can get access to unMENU, but Dynamix still isn't responding - I'll head back over to that thread to see if I can get it sorted.

Well... I must have spoken too soon... Dynamix is responding now. sigh...

"Powerdown" script failure

Recommended Posts

FreeMan

Link to comment

dgaschk

Link to comment

WeeboTech

Link to comment

FreeMan

Link to comment

WeeboTech

Link to comment

FreeMan

Link to comment

WeeboTech

Link to comment

FreeMan

Link to comment

WeeboTech

Link to comment

FreeMan

Link to comment

WeeboTech

Link to comment

FreeMan

Link to comment

WeeboTech

Link to comment

WeeboTech

Link to comment

FreeMan

Link to comment

WeeboTech

Link to comment

FreeMan

Link to comment

FreeMan

Link to comment

Join the conversation