theone Posted June 26, 2013 Share Posted June 26, 2013 I decided to STOP and RESTART array to minimize the impact of the problem mentioned here (haven't stopped the array in over 80 days): http://lime-technology.com/forum/index.php?topic=28142.0 But no the WEBUI shows (for over 5 minutes already): Stop SMB...Spinning up all drives...Sync filesystems... It has not reached the trying to unmount stage so I don't think it is a disk being used problem (yet). What does this mean? How can I correct this state? Quote Link to comment
WeeboTech Posted June 26, 2013 Share Posted June 26, 2013 Look at the syslog. (capture it and post it if you can). look to see your mounted filesystems. you can try and see what is holding your filesystems active with the following example. root@unRAID2:~# fuser -c /mnt/disk* (no output) root@unRAID2:~# cd /mnt/disk1 root@unRAID2:/mnt/disk1# fuser -c /mnt/disk* /mnt/disk1: 1919c root@unRAID2:/mnt/disk1# ps -fp 1919 UID PID PPID C STIME TTY TIME CMD root 1919 1918 0 16:00 pts/0 00:00:00 -bash Quote Link to comment
theone Posted June 26, 2013 Author Share Posted June 26, 2013 My syslog is below. As you can see it has reached the sync command and not gone on to trying to unmount the disks. fuser -c /mnt/disk* output is empty - I manually ran all rc.PLUGINS stop commands before. cd /mnt/disk1 fuser -c /mnt/disk* /mnt/disk1: 4176c ps -fp 4176 UID PID PPID C STIME TTY TIME CMD root 4176 4175 0 23:04 pts/0 00:00:00 -bash this is true for all disks. What is the problem? All I tried to do is avoid one problem and got myself into another problem... :'( Quote Link to comment
WeeboTech Posted June 26, 2013 Share Posted June 26, 2013 So sorry, my example of the cd was to show you how it would look if there was a busy process. since pid 4176 is you after the CD, there's nothing else holding the drives mounted. if you are still logged in cd /tmp To be frank, I'm not sure what is going on and Tom may ask you to review something. What version of unRAID are you running? type following command and post the results. cat /etc/unraid-version also you can issue the top command top then press i see what processes may be busy. I send tom a direct email. he may ask you to type in some commands. as a last ditch effort, you can possibly try to umount your disks for safety if you have to crash the system do a mount | grep disk you can try to umount each disk manually with umount /mnt/disk1 umount /mnt/disk2 so on and so forth. umount /mnt/cache if you have a cache drive. Quote Link to comment
theone Posted June 26, 2013 Author Share Posted June 26, 2013 I am running: 5.0-rc12a What is the potential damage/loss if I unmount the drives and then hard reset? for top (i) I have sync running: 1911 root 20 0 1848 244 204 D 0 0.0 0:00.01 sync Quote Link to comment
WeeboTech Posted June 26, 2013 Share Posted June 26, 2013 What is the potential damage/loss if I unmount the drives and then hard reset? I don't know what the issue could be. You could possibly loose some files that were recently written to the filesystem. How much memory do you have? There are a few options that could be done. Write a huge file larger then memory that you don't care about to force flushing. Then remove it after all disk activity stops. A better option might be to use the remount option for each filesystem. Supposedly that forces the superblock flush. if we remount them rw that should force the flush. if possible, they could be remounted ro (read only) then you should be fine too. But this is only if they cannot be umounted manually. But to be frank, I'm not sure that's what this problem is. rc12a is a different version and no one has reported that one to have the corruption issue. Quote Link to comment
theone Posted June 26, 2013 Author Share Posted June 26, 2013 Can I kill the sync process? unmount all disks? would that return the WEBUI to working again and allow to restart the array? Quote Link to comment
WeeboTech Posted June 26, 2013 Share Posted June 26, 2013 Can I kill the sync process? You can, but the system call is still active in the kernel. If it wasn't the sync would exit. Wait a few minutes though. umounting the filesystems manually is the safest thing to do first. emhttp may complain, but it should insure the filesystem is in a good state. How many disks do you have? Post output of mount command. Quote Link to comment
nars Posted June 26, 2013 Share Posted June 26, 2013 Did you checked files open at cache disk? something like: fuser -c /mnt/cache Also if it freezes on sync I guess if you try umount it will probably also freeze on it... at least on the 'problematic' one... Quote Link to comment
theone Posted June 26, 2013 Author Share Posted June 26, 2013 Only 4 + cache: root@Tower:~# mount | grep disk /dev/md1 on /mnt/disk1 type reiserfs (rw,noatime,nodiratime,user_xattr,acl) /dev/md2 on /mnt/disk2 type reiserfs (rw,noatime,nodiratime,user_xattr,acl) /dev/md3 on /mnt/disk3 type reiserfs (rw,noatime,nodiratime,user_xattr,acl) /dev/md4 on /mnt/disk4 type reiserfs (rw,noatime,nodiratime,user_xattr,acl) no open files on cache Quote Link to comment
limetech Posted June 26, 2013 Share Posted June 26, 2013 I've seen 'sync' hang plenty of times during development. In each case I just wait "a while" (a few min) and reset. The server comes back up with a parity-check started. Most of the time it shows zero parity-sync errors so I just cancel it. Quote Link to comment
theone Posted June 26, 2013 Author Share Posted June 26, 2013 OK. So I tried to unmount all disk and cache. Disks unmounted OK but cache didn't: root@Tower:/var/log# umount /mnt/cache umount: /mnt/cache: device is busy. (In some cases useful info about processes that use the device is found by lsof( or fuser(1)) root@Tower:/var/log# fuser -mv /mnt/cache root@Tower:/var/log# lsof /mnt/cache Nothing seems to be holding it ..?! Quote Link to comment
WeeboTech Posted June 26, 2013 Share Posted June 26, 2013 if your data disks umounted OK, your probably good. if you want you can attempt a remount to see what happens. mount -t reiserfs -o remount,user_xattr,acl,noatime,nodiratime /dev/??? /mnt/cache There is also the readonly option of mount -t reiserfs -o remount,ro,user_xattr,acl,noatime,nodiratime /dev/??? /mnt/cache If either of these finishes without hanging. Then a hard boot probably will be ok. Quote Link to comment
theone Posted June 26, 2013 Author Share Posted June 26, 2013 I rebooted and now parity check is running. I hope I didn't loose any data. Quote Link to comment
WeeboTech Posted June 26, 2013 Share Posted June 26, 2013 I rebooted and now parity check is running. I hope I didn't loose any data. I'm confident that if you were able to umount the data disks, you are fine there. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.