FreeMan Posted February 3, 2014 Share Posted February 3, 2014 Running unRAID 5.0.4 Pro. I installed Dynamix & the email notification plugin, and suddenly both Dynamix and unMENU stopped responding (posted about that issue here: http://lime-technology.com/forum/index.php?topic=30939.msg287337#msg287337). The server itself was still up, as I could browse files, stream audio & video, CP, Sick & SAB continued to work just fine. Just no web interface to the server. I decided to telnet in to attempt to power down the server in the hopes that a reboot would at least get me back to my interface, but the powerdown script has failed. I executed powerdown and got this: root@NAS:~# powerdown Capturing information to syslog. Please wait... version[11326]: Linux version 3.9.11p-unRAID (root@Develop) (gcc version 4.4.4 (GCC) ) #4 SMP Sat Nov 23 11:30:35 PST 2013 ls: cannot access /dev/hd[a-z]: No such file or directory ls: cannot access /dev/hd[a-z]: No such file or directory /etc/rc.d/rc.unRAID: line 84: ${FILE}: ambiguous redirect I've attached /etc/rd.d/rc.unRAID (as .txt so the board's happy with it) and /var/log/syslog (as a zip so it will fit). I have not touched rc.unRAID, though I don't know if anything else has. Does anybody see anything wrong with rc.unRAID, or have any suggestions on what to do next? Thanks, FreeMan rc.unRAID.txt syslog.zip Quote Link to comment
dgaschk Posted February 3, 2014 Share Posted February 3, 2014 Need to see the entire syslog. Quote Link to comment
WeeboTech Posted February 3, 2014 Share Posted February 3, 2014 BC=/boot/config for FILE in ${BC}/*.cfg ${BC}/shares/* do BFILE=${FILE##*/} # Basename of FILE logger -t${BFILE} < ${FILE} done dump the two directories in question, something odd might be there throwing it all off. It's a simple loop to capture the config files into the syslog. Quote Link to comment
FreeMan Posted February 4, 2014 Author Share Posted February 4, 2014 Thanks guys. root@NAS:/boot/config# ls -la total 896 drwxrwxrwx 4 root root 32768 2014-01-29 23:30 ./ drwxrwxrwx 9 root root 32768 2014-02-03 19:34 ../ -rwxrwxrwx 1 root root 256 2013-01-18 21:00 Pro.key* -rwxrwxrwx 1 root root 8203 2014-01-15 15:22 disk.cfg* -rwxrwxrwx 1 root root 619 2014-01-27 00:13 go* -rwxrwxrwx 1 root root 376 2013-12-23 22:24 go.BAK* -rwxrwxrwx 1 root root 247 2013-01-09 00:00 go~* -rwxrwxrwx 1 root root 326 2014-01-29 23:30 ident.cfg* -rwxrwxrwx 1 root root 173 2012-12-29 19:04 network.cfg* -rwxrwxrwx 1 root root 800 2013-12-25 19:39 passwd* -rwxrwxrwx 1 root root 920 2012-12-30 09:24 passwd.OLD* drwxrwxrwx 9 root root 32768 2014-01-30 20:09 plugins/ -rwxrwxrwx 1 root root 526 2013-12-25 19:39 shadow* -rwxrwxrwx 1 root root 499 2014-01-29 23:30 share.cfg* drwxrwxrwx 2 root root 32768 2014-01-16 20:30 shares/ -rwxrwxrwx 1 root root 150 2013-12-25 16:45 smb-extra.conf* -rwxrwxrwx 1 root root 196 2013-12-25 16:33 smb-extra.conf.Dec25-2013-163326* -rwxrwxrwx 1 root root 150 2013-12-27 09:16 smb-extra.conf.Dec27-2013-091607* -rwxrwxrwx 1 root root 150 2013-12-27 09:31 smb-extra.conf.Dec27-2013-093128* -rwxrwxrwx 1 root root 150 2013-12-27 09:54 smb-extra.conf.Dec27-2013-095402* -rwxrwxrwx 1 root root 150 2013-12-27 13:43 smb-extra.conf.Dec27-2013-134325* -rwxrwxrwx 1 root root 150 2014-01-04 22:10 smb-extra.conf.Jan04-2014-221010* -rwxrwxrwx 1 root root 117 2013-01-09 22:10 smb-extra.conf.Jan09-2013-221022* -rwxrwxrwx 1 root root 196 2013-10-27 10:19 smb-extra.conf~* -rwxrwxrwx 1 root root 204 2013-12-25 19:39 smbpasswd* -rwxrwxrwx 1 root root 101 2012-12-30 09:24 smbpasswd.OLD* -rwxrwxrwx 1 root root 4096 2014-01-30 00:24 super.dat* -rwxrwxrwx 1 root root 4096 2013-12-31 12:20 super.old* root@NAS:/boot/config# root@NAS:/boot/config/shares# ls -la total 384 drwxrwxrwx 2 root root 32768 2014-01-16 20:30 ./ drwxrwxrwx 4 root root 32768 2014-01-29 23:30 ../ -rwxrwxrwx 1 root root 449 2014-01-30 15:56 Audio.cfg* -rwxrwxrwx 1 root root 484 2014-01-30 14:40 Backups.cfg* -rwxrwxrwx 1 root root 254 2013-01-30 22:25 F1.cfg* -rwxrwxrwx 1 root root 449 2014-01-30 14:40 Home\ Movies.cfg* -rwxrwxrwx 1 root root 457 2014-01-30 14:41 Movies.cfg* -rwxrwxrwx 1 root root 260 2013-03-30 11:48 Photo.cfg* -rwxrwxrwx 1 root root 449 2014-01-30 14:41 Photos.cfg* -rwxrwxrwx 1 root root 456 2014-01-30 14:42 Sport.cfg* -rwxrwxrwx 1 root root 456 2014-01-30 16:39 TV.cfg* -rwxrwxrwx 1 root root 465 2014-01-06 16:28 apps.cfg* and the complete syslog (from boot sometime in Jan) is attached. syslog.zip Quote Link to comment
WeeboTech Posted February 4, 2014 Share Posted February 4, 2014 it's a minor issue. The Home Movies.cfg is messing things up because it's not quoted in the script. Does the script abend or just keep going. Quote Link to comment
FreeMan Posted February 4, 2014 Author Share Posted February 4, 2014 It's been hung since 7ish this morning. A Ctrl-C didn't break it, either. Do I need to modify the script to quote the directories? I thought the \ escaped the space to make it work - it does from the command line... It would probably be easier to take the space out of the directory name than mess with scripts. Spaces are highly over rated anyway, unlike commas. Quote Link to comment
WeeboTech Posted February 4, 2014 Share Posted February 4, 2014 you can try pressing ctrl-d, not sure of that will work. You can try to telnet in from somewhere else. do a ps -ef look for a logger process then kill it's pid. Here's a brief example. root@unRAID:~# ps -ef | grep logger root 5643 5634 0 20:09 pts/0 00:00:00 logger -tHome root 5655 5645 0 20:10 pts/1 00:00:00 grep logger root@unRAID:~# kill 5643 root@unRAID:~# ps -ef | grep logger root 5657 5645 0 20:10 pts/1 00:00:00 grep logger This might be the issue of 'hung' rc.unRAID scripts. The correct code fix would be. BC=/boot/config for FILE in ${BC}/*.cfg ${BC}/shares/* do BFILE="${FILE##*/}" # Basename of FILE logger -t"${BFILE}" < "${FILE}" done Quote Link to comment
FreeMan Posted February 4, 2014 Author Share Posted February 4, 2014 Unfortunately, killing the logger process didn't do anything to restart the script - it still seems to be sitting there doing nothing ps -ef|grep power shows the powerdown script process is still running, too. Should I kill that and attempt to restart after applying the change you suggested? Quote Link to comment
WeeboTech Posted February 4, 2014 Share Posted February 4, 2014 Killing the logger will not restart the script. it will kill the waiting for input 'logger process'. The diagnostic dump routine loops on every config so if there are more then one with the space, it will happen over and over. If you start it again from the console do it with stdin redirected to < /dev/null /etc/rc.d/rc.unRAID stop </dev/null Frankly, if the whole server stopped responding, there's something else also gumming things up. You can try and edit the script before doing anything else. You can try and umount each filesystem manually then issue the /root/mdcmd stop command This is a quick script that will show what is active on the array. #!/bin/bash for fs in /mnt/user /mnt/disk* do [ ! -d ${fs} ] && continue for pid in $(fuser -cu $fs 2>/dev/null) do ps --no-headers -fp ${pid} kill -0 ${pid} 2>/dev/null && kill -0 ${pid} done done Right now it does a kill -0 which tests if the pid is active. When you are actually ready to kill the suckers, change the second kill -0 to be a kill -TERM if that doesn't work, do a kill -9 You can run this lil scriptlet called /tmp/psmounts multiple times. Keep in mind it will kill anything on the array indiscriminately. -TERM is what the system normally sends when shutting down. Try that first. -9 is an untrappable signal, that's a last resort. this is how to unmount the disks in a loop #!/bin/bash for disk in /mnt/disk* /mnt/cache do /bin/umount -v ${disk} done and finally to stop the array. echo stop > /proc/mdcmd Quote Link to comment
FreeMan Posted February 4, 2014 Author Share Posted February 4, 2014 you can try pressing ctrl-d, not sure of that will work. You can try to telnet in from somewhere else. do a ps -ef look for a logger process then kill it's pid. Here's a brief example. root@unRAID:~# ps -ef | grep logger root 5643 5634 0 20:09 pts/0 00:00:00 logger -tHome root 5655 5645 0 20:10 pts/1 00:00:00 grep logger root@unRAID:~# kill 5643 root@unRAID:~# ps -ef | grep logger root 5657 5645 0 20:10 pts/1 00:00:00 grep logger This might be the issue of 'hung' rc.unRAID scripts. The correct code fix would be. BC=/boot/config for FILE in ${BC}/*.cfg ${BC}/shares/* do BFILE="${FILE##*/}" # Basename of FILE logger -t"${BFILE}" < "${FILE}" done I tried updating rc.unRAID, and it's still not shutting down. root@NAS:/etc/rc.d# ls rc.0@ rc.avahidnsconfd* rc.inetd* rc.syslog* rc.4* rc.cachedirs* rc.local* rc.sysstat* rc.6* rc.couchpotato_v2* rc.local_shutdown* rc.sysvinit* rc.K* rc.darkstat* rc.messagebus* rc.udev* rc.M* rc.emailnotify* rc.nfsd* rc.unRAID* rc.S* rc.fuse* rc.ntpd* rc.unRAID.OLD* rc.acpid* rc.ifplugd* rc.rpc* unraid.d/ rc.apcupsd* rc.inet1* rc.samba* rc.atalk* rc.inet1.conf rc.sickbeard* rc.avahidaemon* rc.inet2* rc.subsonic* root@NAS:/etc/rc.d# diff rc.unRAID rc.unRAID.OLD 83,84c83,84 < do BFILE="${FILE##*/}" # Basename of FILE < logger -t"${BFILE}" < "${FILE}" --- > do BFILE=${FILE##*/} # Basename of FILE > logger -t${BFILE} < ${FILE} root@NAS:/etc/rc.d# It does seem to have hung in a different place now. root@NAS:/# powerdown Capturing information to syslog. Please wait... version[21673]: Linux version 3.9.11p-unRAID (root@Develop) (gcc version 4.4.4 (GCC) ) #4 SMP Sat Nov 23 11:30:35 PST 2013 ls: cannot access /dev/hd[a-z]: No such file or directory ls: cannot access /dev/hd[a-z]: No such file or directory syslog attached Killing the logger will not restart the script. I knew that, I was hoping terminating the logger process would allow whatever part of the script that kicked it off to continue... Before I posted, I got to thinking (I manage that on occasion). The two "ls:" errors are probably because I have no PATA drives attached, so I shouldn't be worried about them - normally processing would continue and I'd never even see them. I think that it's hung on the first of these lines: lspci 2>&1 | logger -tpspci lsmod 2>&1 | logger -tlsmod ifconfig eth0 2>&1 | logger -tifconfig since I don't see anything about them in in the syslog. Am I on the right track? I ran your first little code loop and this is what it shows me: root@NAS:/boot# show_active.sh nobody 12438 24236 0 17:54 ? 00:00:04 /usr/sbin/smbd -D root 24361 24236 0 Jan29 ? 00:00:32 /usr/sbin/smbd -D nobody 30679 24236 0 Feb03 ? 00:01:18 /usr/sbin/smbd -D root 24222 1 2 Jan29 ? 03:21:14 /usr/local/sbin/shfs /mnt/user - root@NAS:/boot# but I don't know what to make of it. I've got two files downloading to the server at the moment, but I'm not too concerned if they die & I have to restart them. Otherwise, it's pretty idle. syslog.zip Quote Link to comment
WeeboTech Posted February 5, 2014 Share Posted February 5, 2014 The output of these commands is in the syslog. lspci 2>&1 | logger -tpspci lsmod 2>&1 | logger -tlsmod ifconfig eth0 2>&1 | logger -tifconfig This is above the diagnostic_dump and capture of the .cfg files. which show. Feb 4 18:11:25 NAS apps.cfg[21743]: shareReadListAFP=""^M Feb 4 18:11:25 NAS apps.cfg[21743]: shareWriteListAFP=""^M Feb 4 18:11:25 NAS apps.cfg[21743]: shareVolsizelimitAFP=""^M What I do not see is the output of these commands. if [ -e /proc/mdcmd ] then echo status > /proc/mdcmd sleep 1 logger -tmdcmd < /proc/mdcmd fi unRAID_status | logger -tstatus -s perhaps it hung during the loop or a lil further down. when doing the ps -ef what does the logger line say? ps -ef | grep logger You can comment out the diagnostic_dump routine in the unraid_stop function if you need to shut down. # Stop unraid: unRAID_stop() { logger "Stopping unRAID." diagnostic_dump or you can debug the script with tracing like this. DEBUG=3 /etc/rc.d/rc.unRAID status or DEBUG=3 /etc/rc.d/rc.unRAID stop The first one only runs the diagnostic_dump function. Quote Link to comment
FreeMan Posted February 5, 2014 Author Share Posted February 5, 2014 Ah, OK. My first thought was the /proc/mdcmd loop, but I didn't recognize the output of the lines above it, so I thought maybe it died earlier. The logger is not running at all at this point. ps -ef|grep logger shows nothing but itself It must be the /proc/mdcmd that is dying. DEBUG=3 /etc/rc.d/rc.unRAID status shows: + MDCMDTMP=/tmp/mdcmd.31211 + touch /tmp/mdcmd.31211 + trap 'rm -f /tmp/mdcmd.31211' EXIT HUP INT QUIT TERM + '[' '!' -z '' ']' + MDCMD=/proc/mdcmd + '[' -e /proc/mdcmd ']' + echo status Unfortunately, that's another PuTTY session hung - it hasn't returned to a prompt, either. Ctrl-C, Ctrl-Break and Ctrl-D don't do anything. (Yeah, I can kill the windows session by hitting that big X, just throwing a little extra info your way...) The show active loop you gave me in the previous message showed three /usr/sbin/smbd -D commands. Can those be safely killed? Is /etc/rc.d/rc.unRAID something that is normally modified by various and sundry package installs? If not, is mine standard (other than the quotes for the space in the share name)? Quote Link to comment
WeeboTech Posted February 5, 2014 Share Posted February 5, 2014 For some reason the echo status > /proc/mdcmd is hung Try this in another session, maybe it will free up some resource somewhere. cat < /proc/mdcmd you can manually stop samba with following in another session. /etc/rc.d/rc.samba status As I mentioned, if you comment out the diagnostic_dump in the stop section, it may proceed further through. however at some point in the stop script, we will encounter the following chunklet. if [ -e /proc/mdcmd ] then logger "Stopping the Array" echo status > /proc/mdcmd cat < /proc/mdcmd | tr -d '\000' > /tmp/mdcmd.$$.1 echo stop > /proc/mdcmd sleep 3 echo status > /proc/mdcmd cat < /proc/mdcmd | tr -d '\000' > /tmp/mdcmd.$$.2 diff -u /tmp/mdcmd.$$.1 /tmp/mdcmd.$$.2 | logger -t mdstatusdiff rm -f /tmp/mdcmd.$$.1 /tmp/mdcmd.$$.2 fi which will probably hang again. if I were messing around, I would do this a few time to see if things get free'ed up. cat < /proc/mdcmd I would do it for however many times and probably a few more to see if the md driver hit some kind of deadlock. stop samba (as above). kill the shfs with the other scriptlet I provided. cat < /proc/mdcmd a few times What I might do is put the trace on with /root/mdcmd set md_trace 1 and then do the /root/mdcmd stop if it hangs, there's nothing we can do, it's some kind of driver issue. let's see if the cat < /proc/mdcmd frees some resources Quote Link to comment
WeeboTech Posted February 5, 2014 Share Posted February 5, 2014 Is /etc/rc.d/rc.unRAID something that is normally modified by various and sundry package installs? If not, is mine standard (other than the quotes for the space in the share name)? It's part of the powerdown package. Not a standard unRAID script, but the powerdown unRAID script. It was part of an ambitious project to add some sort of start/stop/diagnostic plugin functionality before 5.x announced plugins. Quote Link to comment
FreeMan Posted February 5, 2014 Author Share Posted February 5, 2014 OK, I tried your suggestions, and now I can't log on to the server from a PuTTy session... In the 4 session screen shot, they are arranged in order left to right top to bottom of what I did. Now I get NAS login: root Password: Login incorrect NAS login: when I try to log in. Yes, I've tried several times, and typed my password very slowly and carefully - I'm sure I got it right. Also, despite my attempt to stop samba, I can still browse the server from my Win machine. Not at all angry at you, just annoyed by the situation. Appreciative of your attempts to help. Quote Link to comment
WeeboTech Posted February 5, 2014 Share Posted February 5, 2014 If you cannot login, there's more going wrong here then the powerdown script. if cat < /proc/mdcmd doesn't return anything, then there are other issues with the md driver and/or kernel and/or memory. Maybe you should reach out to Tom. I fear if you cannot login you cannot do much. Quote Link to comment
FreeMan Posted February 6, 2014 Author Share Posted February 6, 2014 Thanks for your help, Weebo. I've PM'd Tom. Quote Link to comment
FreeMan Posted February 9, 2014 Author Share Posted February 9, 2014 Disappointed, I never heard back from Tom. I realize he's got a lot on his plate. I discovered that I could log in from the console, so I tried a couple of your suggestions there, but powerdown still hung. I ended up holding the power switch until it shut down. Parity check is running. A couple of plug ins didn't seem to start up properly, but that's really the minor issue. I can get access to unMENU, but Dynamix still isn't responding - I'll head back over to that thread to see if I can get it sorted. Well... I must have spoken too soon... Dynamix is responding now. sigh... Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.