mattkhan Posted January 6, 2016 Share Posted January 6, 2016 I've noticed, since upgrading to unraid6, that stopping the array seems to take an awfully long time and it seems the sync command is the offender, checking /sys/block/sd[a-h]/stat gives me output like # for each in $(ls /sys/block/sd[a-h]); do echo "${each/@/} : $(cat ${each/@/}/stat |awk '{print $9}')";done /sys/block/sda : 0 /sys/block/sdb : 0 /sys/block/sdc : 0 /sys/block/sdd : 0 /sys/block/sde : 0 /sys/block/sdf : 0 /sys/block/sdg : 0 /sys/block/sdh : 141 (the 9th column in the output being in flight io requests as per https://www.kernel.org/doc/Documentation/block/stat.txt) which shows that sdh is the only thing with anything to do and that is the drive being precleared. All other stats are not moving for the actual drives in the array. Is preclear holding up the array stop? or is something else going on? To give an example, I triggered an array stop at 0817 today and it's still going 50mins later. The web ui is unresponsive throughout this time but the system is up and running & I can ssh in and look at what is going on. Quote Link to comment
mattkhan Posted January 6, 2016 Author Share Posted January 6, 2016 diagnostics attached sync is still running and UI is unresponsive the disk errors logged are from the preclear drive and are mentioned in another thread -> https://lime-technology.com/forum/index.php?topic=45236.msg431867#msg431867 zalaga-unraid-diagnostics-20160106-0909.zip Quote Link to comment
itimpi Posted January 6, 2016 Share Posted January 6, 2016 In my experience any process doing I/O can cause the sync command to take forever to complete regardless of what disk it is happening to. Killing the preclear process would probably allow the system to stop the array. Quote Link to comment
mattkhan Posted January 6, 2016 Author Share Posted January 6, 2016 seems unfortunate that preclear affects stopping the array (and then makes the web ui completely unresponsive to boot) is there any reason why preclear has to be run on the unraid host as opposed some random linux box? I've read through the script and it seems to just make use of a few unraid config files in a few places but that would be easy enough to stub. Quote Link to comment
itimpi Posted January 6, 2016 Share Posted January 6, 2016 is there any reason why preclear has to be run on the unraid host as opposed some random linux box? I've read through the script and it seems to just make use of a few unraid config files in a few places but that would be easy enough to stub. Preclear can be run on any system. It is common practise to boot a version of unRAID on another system for exactly this purpose. It can also be run on a vanilla Linux system as long as you make sure any dependencies of the script are present. Quote Link to comment
mattkhan Posted January 6, 2016 Author Share Posted January 6, 2016 Preclear can be run on any system. It is common practise to boot a version of unRAID on another system for exactly this purpose. It can also be run on a vanilla Linux system as long as you make sure any dependencies of the script are present. ok thanks, I'll go that route in future then. Quote Link to comment
SSD Posted January 6, 2016 Share Posted January 6, 2016 Preclear can be run on any system. It is common practise to boot a version of unRAID on another system for exactly this purpose. It can also be run on a vanilla Linux system as long as you make sure any dependencies of the script are present. ok thanks, I'll go that route in future then. Sync is a bit of a pig. If the array is spun down it can take a minute or more to come back, spinning up all drives in the process (the newperms script calls sync and has been my main experience with this irritating behavior). Never had a preclear prevent array being stopped (maybe never tried) and am a little skeptical that it is the reason. I have had open Windows explorer sessions with array drives open, and telnet sessions with current directory set to an array disk location hold up array shutdown. You get a pretty unhelpful stream of messages at the bottom of the web gui screen which are at least a tickler to go find what is holding up the shutdown. If you don't find it, the array will never stop. I expect you'd also not be able to bring up a new web gui session, although the existing session will continue to be updated. - but if it were closed it would likely just appear to hang with the symptoms you describe. Quote Link to comment
mattkhan Posted January 6, 2016 Author Share Posted January 6, 2016 Never had a preclear prevent array being stopped (maybe never tried) and am a little skeptical that it is the reason. FWIW I checked the logs this evening and can see that zero'ing the drive completed at 0950 this morning # stat /tmp/zerosdh File: ‘/tmp/zerosdh’ Size: 231873 Blocks: 456 IO Block: 4096 regular file Device: 2h/2d Inode: 123856 Links: 1 Access: (0666/-rw-rw-rw-) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2016-01-06 20:43:32.097103305 +0000 Modify: 2016-01-06 09:50:50.200307361 +0000 Change: 2016-01-06 09:50:50.200307361 +0000 and at the same time in /var/log/syslog we see Jan 6 09:50:49 zalaga-unraid emhttp: shcmd (122): rm -f /boot/config/plugins/dynamix/mover.cron Jan 6 09:50:49 zalaga-unraid emhttp: shcmd (123): /usr/local/sbin/update_cron &> /dev/null Jan 6 09:50:49 zalaga-unraid emhttp: Unmounting disks... Jan 6 09:50:49 zalaga-unraid kernel: mdcmd (131): stop Jan 6 09:50:49 zalaga-unraid kernel: md1: stopping Jan 6 09:50:49 zalaga-unraid kernel: md2: stopping Jan 6 09:50:49 zalaga-unraid kernel: md3: stopping Jan 6 09:50:49 zalaga-unraid kernel: md4: stopping Jan 6 09:50:49 zalaga-unraid kernel: md5: stopping Jan 6 09:50:49 zalaga-unraid emhttp: shcmd (124): rmmod md-mod |& logger Jan 6 09:50:49 zalaga-unraid kernel: md: unRAID driver removed Jan 6 09:50:49 zalaga-unraid emhttp: shcmd (125): modprobe md-mod super=/boot/config/super.dat slots=24 |& logger This looks pretty conclusive that the array shutdown sync is on all disks in the system not just array disks Quote Link to comment
SSD Posted January 7, 2016 Share Posted January 7, 2016 If current directory of the preclear command was in an array disk, that would explain it too. I've had shutdowns hang because I had an old screen session and directory was on the array. Sync is a Linux command. It works on all disks. Just not sure why it would hang on a disk under heavy i/o. May need a Linux expert to weigh in. Quote Link to comment
mattkhan Posted January 7, 2016 Author Share Posted January 7, 2016 Fair point, i was thinking of it syncing a disk at a time which, as you say, it doesn't. Well that would explain it then anyway, preclear zeroing is constantly reading from urandom to generate data to write to the disk so attempting to sync is doomed to sit there forever, ie sync is trying to flush memory to disk while another process of constantly generating data in memory to write to disk. Quote Link to comment
JorgeB Posted January 7, 2016 Share Posted January 7, 2016 I do sometimes stop array during a preclear on my test server, it does take more than usual to stop the array if preclear is zeroing a disk, but it does stop, preclear continues in the background, if I stop the array during a preclear post read it works normally. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.