Safe Powerdown


jonp

Recommended Posts

 

What the emhttp powerdown sequence does is:

1) stop all network protocol components (SMB, NFS, AFP, FTP) - this ensures no new files can be opened via network.

2) invoke all the plugin 'unmounting_disks' events - this lets any and all plugins invoke their specific 'stop' code

3) unmount all the mounted file systems (user shares, then disk and cache) - this ensures unraid driver will see no more writes.

4) 'stop' the array - this commits the super.dat file on the flash which holds a flag that says "cleanly shut down"

 

Where does stopping XEN/KVM Virtual Machines fall into that mix?

Does it happen before STEP 1 so any VM's that use SMB or NFS are able to properly powerdown? It seems like stopping VMs should happen before Step 1. Or is unRAID going to force using local loopback filesystem drivers such as Virt 9P instead of NFS/SMB/FTP/AFP ?

Link to comment

 

What the emhttp powerdown sequence does is:

1) stop all network protocol components (SMB, NFS, AFP, FTP) - this ensures no new files can be opened via network.

2) invoke all the plugin 'unmounting_disks' events - this lets any and all plugins invoke their specific 'stop' code

3) unmount all the mounted file systems (user shares, then disk and cache) - this ensures unraid driver will see no more writes.

4) 'stop' the array - this commits the super.dat file on the flash which holds a flag that says "cleanly shut down"

 

Where does stopping XEN/KVM Virtual Machines fall into that mix?

Does it happen before STEP 1 so any VM's that use SMB or NFS are able to properly powerdown? It seems like stopping VMs should happen before Step 1. Or is unRAID going to force using local loopback filesystem drivers such as Virt 9P instead of NFS/SMB/FTP/AFP ?

 

I glossed over rigorous details.  Here is the actual emhttp_event handler script:

 

#!/bin/bash

# emhttp_event script (a work in process).

# This script is called by the emhttp process as a result of various events that take place.
# Caution: the 'emhttp' process will hang until this script completes!

# The first argument to the script is a string indicating the event:

# driver_loaded
#   Occurs early in emhttp initialization.
#   Can also occur as a result of init-config and device slot change.
#   Status information is valid.

# array_started
#   Occurs during cmdStart execution.
#   The 'md' devices are valid.

# disks_mounted
#   Occurs during cmdStart execution.
#   The disks and user shares (if enabled) are mounted.

# started
#   Signals end of cmdStart execution.

# svcs_restarted
#   Occurs as a result of changing/adding/deleting a share.
#   The network services are started and may be exporting different share(s).

# stopping_svcs
#   Occurs during cmdStop execution.
#   Nothing has actually been stopped yet, about to stop network services.

# unmounting_disks
#   Occurs during cmdStop execution.
#   The network services have been stopped, about to unmount the disks and user shares.
#   The disks have been spun up and a "sync" executed, but no disks un-mounted yet.

# stopping_array
#   Occurs during cmdStop execution.
#   The disks and user shares have been unmounted, about to stop the array.

# stopped
#   Occurs at end of cmdStop execution.
#   The array has been stopped.

# Log the event (more: add way to enable/disable event logging?)
logger -t emhttp_event $1

# Invoke all 'any_event' scripts that might exist
for Dir in /usr/local/emhttp/plugins/*
do
  if [ -x $Dir/event/any_event ]; then
    $Dir/event/any_event $1
  fi
done

# Invoke specific event scripts that might exist for this event
for Dir in /usr/local/emhttp/plugins/*
do
  if [ -x $Dir/event/$1 ]; then
    $Dir/event/$1 $1
  fi
done

 

The VM managers, maybe even Docker should be tied into the "stopping_svcs" event.  This happens first before anything else is shut-down.

 

BTW just need to add a handful of other event types for purposes of notifications:

- disk_disabled

- disk_overtemp

- parity_sync_started

- parity_sync_complete

- coffee_done

- etc.

Link to comment
The webgui shutdown stops SMB, VMs and docker.

 

I presume that the webgui shutdown does much the same as a webgui 'stop array'.  If that is true, then webgui is also stopping nfs.  However, there is a problem regarding the sequence in which these processes are stopped.  My experience suggests that nfs is being stopped BEFORE the VMs are shutdown.  Since the VMs may well be accessing the host drives via nfs, then the processes running in the VM stall because the nfs share has vanished, preventing the VM from shutting down.  Once this has happened, the only way I have found to escape an unclean powerdown is to xl destroy the VM.

 

The other way of stalling the webgui stop is to have a terminal session (telnet/ssh/...) open with the current default directory set to an array drive.

 

The powerdown plugin prevents these two problems ... with the help of an xl shutdown command in a knn script.

Link to comment

The webgui shutdown stops SMB, VMs and docker.

 

I presume that the webgui shutdown does much the same as a webgui 'stop array'.  If that is true, then webgui is also stopping nfs.  However, there is a problem regarding the sequence in which these processes are stopped.  My experience suggests that nfs is being stopped BEFORE the VMs are shutdown.  Since the VMs may well be accessing the host drives via nfs, then the processes running in the VM stall because the nfs share has vanished, preventing the VM from shutting down.  Once this has happened, the only way I have found to escape an unclean powerdown is to xl destroy the VM.

 

The other way of stalling the webgui stop is to have a terminal session (telnet/ssh/...) open with the current default directory set to an array drive.

 

The powerdown plugin prevents these two problems ... with the help of an xl shutdown command in a knn script.

 

Did you read any of my previous posts in this thread?  :o

Link to comment

The webgui shutdown stops SMB, VMs and docker.

 

I presume that the webgui shutdown does much the same as a webgui 'stop array'.  If that is true, then webgui is also stopping nfs.  However, there is a problem regarding the sequence in which these processes are stopped.  My experience suggests that nfs is being stopped BEFORE the VMs are shutdown.  Since the VMs may well be accessing the host drives via nfs, then the processes running in the VM stall because the nfs share has vanished, preventing the VM from shutting down.  Once this has happened, the only way I have found to escape an unclean powerdown is to xl destroy the VM.

 

The other way of stalling the webgui stop is to have a terminal session (telnet/ssh/...) open with the current default directory set to an array drive.

 

The powerdown plugin prevents these two problems ... with the help of an xl shutdown command in a knn script.

 

Did you read any of my previous posts in this thread?  :o

 

Yes, I did .. but there are some things which don't add up, or aren't clear.

 

First, I'm puzzled about this "webgui shutdown".  As far as I'm aware, it is not possible to invoke shutdown from the webgui until the array has been stopped.  Therefore if, as dlandon states, "The webgui shutdown stops SMB, VMs and docker. ", then it is rather late!  However, it may be that there is some emhttp code, of which I'm not aware, which can handle a shutdown without first invoking "Stop Array".

 

What the emhttp powerdown sequence does is:

1) stop all network protocol components (SMB, NFS, AFP, FTP) - this ensures no new files can be opened via network.

2) invoke all the plugin 'unmounting_disks' events - this lets any and all plugins invoke their specific 'stop' code

...

 

This is the problem, right here.  If the VMs are accessing files over a network protocol (which is how Xen has to work, isn't it?), then removing the network protocol daemons before asking the VMs to tidy up their files, is going to cause problems.

The VM managers, maybe even Docker should be tied into the "stopping_svcs" event.  This happens first before anything else is shut-down.

This, or some alternative, is essential, for any process which is accessing the unRAID drives over a network protocol (which dockers don't needto , and shouldn't, do).  Perhaps the VM and Docker managers should allow for the user to specify whether the individual VMs or dockers should terminate on the "stopping_scvs" event.

 

I certainly agree that using the "Events" handlers is the correct way to go about performing an orderly shutdown.

 

[The other common cause is a user has ssh or telnet into a shell, leaving their 'current directory' on an array/cache disk.  I guess my attitude for that has been "too bad, don't do that next time".

 

... but do you intend to address this?  It is a very real problem and, until facilities built in to unRAID can deal with it, the powerdown plugin is the only workable solution.

 

The other reason emhttp doesn't always work is because it' "hanging" on some kind of internal action it's taking.  This shouldn't happen - if it does it means there's a bug.  I've been rather stubborn in the past, wanting to root out such bugs rather than work around them, but it has obviously proven to difficult to find all the possible problems because by nature hard to reproduce.

I would agree with your continuing to be "rather stubborn".  I dislike work arounds simply to avoid what are, clearly, bugs.  What is need is some mechanism to identify whether, and where, the hanging occurs.

 

I would like to suggest two new 'events' - sometimes processes are launched which are known to run for a little while - it makes sense to prevent these from starting if an emergency powerdown is imminent.  The events I'm suggesting are "power_failed" and "power_restored".  This would allow, for instance, my mpop mail fetcher to suspend operation while running on UPS batteries, and to start again if/when power returns.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.