Jump to content

Keep Warm Spare Spun Down


mikedpitt420

Recommended Posts

I know this has been a failed feature request and discussed before in v5. I don't seem to be able to find anything in the forums that actually solved this question. I can't imagine I'm the only person wanting to keep a warm, pre-cleared and ready spare drive in their unRAID system so there has to be a "right" way to do this. For anyone who doesn't have easy and ready access to their NAS or does a lot of remote watching while on the road, this would be an absolutely necessary feature and I'm kind of surprised no one else has really tackled this issue as of yet. I also just want to make sure that some of the info may be different in v6 and above as most of the info I can find on this topic is from v5. I would like to keep a pre-cleared warm spare, plugged in and in my NAS but not added to my array. I would like this drive to be spun down after boot. It would be great if unRAID still would do SMART tests on it etc. and let me know that the drive is still in good status but simply being plugged in and remaining spun down would be sufficient. What would I need to add to the go script to make sure that this drive spins down after everything has booted? Also, some of the solutions I've seen use commands that reference /sdX which seems incorrect as those can be rearranged at reboot. Any help in this topic would be greatly appreciated :-D

Link to comment

Excuse the noobs question :-) I had assumed that if the drives were set to spin down after x hours and the warm spare drive that had been precleared was plugged in but not in the array then this drive would spin down after x hours and not spin back up as nothing is trying to connect to it as it is not in the array.

 

Is my Logic Faulty?

 

Thanks :-)

Link to comment

I believe that only drives included in the array are controlled by the spin-down logic.

 

I'm sure there's a Linux command that would do this, but I'm not a "Linux guy", so I'm really not sure just what it would be -- and it couldn't be directly invoked from the Web GUI anyway.

 

The suggestion to make it an option in the Unassigned Devices plugin seems like the best way to achieve this.

 

Link to comment

I believe that only drives included in the array are controlled by the spin-down logic.

 

I'm sure there's a Linux command that would do this, but I'm not a "Linux guy", so I'm really not sure just what it would be -- and it couldn't be directly invoked from the Web GUI anyway.

 

The suggestion to make it an option in the Unassigned Devices plugin seems like the best way to achieve this.

 

I believe the web gui spins and up all non-array devices on every refresh.

Link to comment

I have a spare 4TB that I pre-clearerd sitting on a shelf in case of an emergency. If a drive fails, I will swap it out and rebuild. What is the advantage of having a warm spare over what I do?

With a warm spare, no need to do anything physical to the server. The whole "drive fails, swap out" operation theoretically could be done from anywhere you have access to the webgui.
Link to comment

If desired, I can post a very rudimentary shell that I use in cron to set the hdparm automated firmware spindown of every drive in the system to some value.

 

In my particular case, I have drives in warm standby unassigned.

 

I run smartd in onecheck mode.  This is to do a smartd analysis of the drive, save the state and email me if there are issues.

For assigned or unassigned drives. It has the side effect of spinning all drives up once a day.

 

Thus I have another shell to set the spindown timer of every drive using hdparm -S2## ( > then the highest unRAID value).

This causes the drives firmware to spin the drive down when inactivity reaches this timer.

 

This allows the drives to spin down automatically even if emhttp is not running (which is sometimes the case when I do maintenance).

 

If this value is too low there will be collisions with what unRAID is doing and the drive's firmware. Therefore this is a stopgap.

 

I used to have a shell that would do specific hdparm options for specific drives, that was helpful for customized hdparms in the go script.

That's another option for those looking to explore possibilities.

 

Ideally the webGui should have the ability to manage unassigned drives, after all our licensing is based on drives in the array or not in the array.

Therefore this should probably be requested in the Feature Request area.

Link to comment

I have a spare 4TB that I pre-clearerd sitting on a shelf in case of an emergency. If a drive fails, I will swap it out and rebuild. What is the advantage of having a warm spare over what I do? I use unRAID for media storage... nothing mission critical.

 

Thanks.

 

The advantages only come in the time to recovery / physical activity required for recovery, department. I think it's a good feature to add, but it's not huge to me since I don't really have mount space (or money to justify) an unused drive. This seems like a situationally good thing.

 

The benifit would increase some if logic was added that detected a failure and automatically switches the failed drive to the hotspare and performs a rebuild. Of course you could also make arguments as to why this could be a bad thing as well. (People who don't like automatic correcting parity checks prob wouldn't want this.)

Link to comment

Weebotech, that would be exactly what I was looking for although I only want hdparm command to affect that warm spare drive. I definitely see why a hot spare, and auto rebuild wold not be preferable in a lot of situations, and that's not what I want. What I do want, as mentioned above, is the ability to have the drive ready, precleared, and KNOW it's in good working order. Then when I get a bad drive, I can affect a drive rebuild immediately from anywhere. I have an OpenVPN server on my router that allows me to get a local subnet IP and access my unRAID webGUI from anywhere, so if I'm on the road I can start a rebuild right away and not have to worry another drive will fail until I can get back to the server.

 

Thanks Weebotech :D

Link to comment

Weebotech, that would be exactly what I was looking for although I only want hdparm command to affect that warm spare drive. I definitely see why a hot spare, and auto rebuild wold not be preferable in a lot of situations, and that's not what I want. What I do want, as mentioned above, is the ability to have the drive ready, precleared, and KNOW it's in good working order. Then when I get a bad drive, I can affect a drive rebuild immediately from anywhere. I have an OpenVPN server on my router that allows me to get a local subnet IP and access my unRAID webGUI from anywhere, so if I'm on the road I can start a rebuild right away and not have to worry another drive will fail until I can get back to the server.

 

Thanks Weebotech :D

 

Currently my script affects all connected drives.

I do this for a few reasons, one of which, the timer is set to a default value and even if emhttp is not running, the drive will spin down after a set period of time.

My script is probably not what you want as you can add a single command to the go script and achieve the desired results.

 

On the hdparm manpage scroll to the -S part.

http://linux.die.net/man/8/hdparm

 

I'm including it here for quick access.

-S

Put the drive into idle (low-power) mode, and also set the standby (spindown) timeout for the drive. This timeout value is used by the drive to determine how long to wait (with no disk activity) before turning off the spindle motor to save power. Under such circumstances, the drive may take as long as 30 seconds to respond to a subsequent disk access, though most drives are much quicker. The encoding of the timeout value is somewhat peculiar. A value of zero means "timeouts are disabled": the device will not automatically enter standby mode. Values from 1 to 240 specify multiples of 5 seconds, yielding timeouts from 5 seconds to 20 minutes. Values from 241 to 251 specify from 1 to 11 units of 30 minutes, yielding timeouts from 30 minutes to 5.5 hours. A value of 252 signifies a timeout of 21 minutes. A value of 253 sets a vendor-defined timeout period between 8 and 12 hours, and the value 254 is reserved. 255 is interpreted as 21 minutes plus 15 seconds. Note that some older drives may have very different interpretations of these values.

 

 

Find the device serial you want with Example:

root@unRAIDm:~# ls -l /dev/disk/by-id | egrep -v '\-part' | egrep 'scsi-|ata-'

root@unRAIDm:~# ls -l /dev/disk/by-id | egrep -v '\-part' | egrep 'scsi-|ata-'

lrwxrwxrwx 1 root root  9 Oct 22 09:06 ata-HGST_HDN726060ALE610_NAG1D7TP -> ../../sdk

lrwxrwxrwx 1 root root  9 Oct 22 09:06 ata-HGST_HDN726060ALE610_NAG1DEKP -> ../../sdj

lrwxrwxrwx 1 root root  9 Oct 22 09:06 ata-ST3000DM001-1CH166_W1F1GTFJ -> ../../sdd

lrwxrwxrwx 1 root root  9 Oct 22 09:06 ata-ST3000DM001-1CH166_Z1F2WFKV -> ../../sdg

lrwxrwxrwx 1 root root  9 Oct 22 09:06 ata-ST4000VN000-1H4168_S3012W7N -> ../../sdh

lrwxrwxrwx 1 root root  9 Oct 22 09:06 ata-ST4000VN000-1H4168_S3012WS6 -> ../../sdi

lrwxrwxrwx 1 root root  9 Oct 22 09:06 ata-ST4000VN000-1H4168_S301HS8H -> ../../sdc

lrwxrwxrwx 1 root root  9 Oct 22 09:06 ata-ST6000DX000-1H217Z_Z4D0EE7M -> ../../sde

lrwxrwxrwx 1 root root  9 Oct 22 09:06 ata-ST6000DX000-1H217Z_Z4D0EEDV -> ../../sdf

lrwxrwxrwx 1 root root  9 Oct 22 09:06 ata-Samsung_SSD_840_PRO_Series_S1AXNSAF701196M -> ../../sdb

 

 

add a call to hdparm like this in your go script using the device/serial of the respective drive.

This value does not change like the /dev/sd[a-z] does.

hdparm -S243 /dev/disk/by-id/ata-ST4000VN000-1H4168_S301HS8H

 

Set the numeric value accordingly to the calculated values in the manpage,

Link to comment

Awesome! So when does unRAID actually do the SMART tests on a disk drive? At boot? With this spun down those results will be inaccessible correct?

 

unRAID does not do automated smart tests, However unRAID does inspect attributes when the drive is spun up and part of the array.

I'm unsure of the current status of unassigned devices.

I believe you can see the attributes if the drive is spinning, but the attributes are not monitored.

 

my smartd shell checks the attributes once a day with the side effect of spinning all drives up.

It is separate from emhttp's attribute monitoring.

Link to comment

 

/boot/local/bin/smartd.sh[

#!/bin/bash

[ ${DEBUG:=0} -gt 0 ] && set -x -v

P=${0##*/}              # basename of program
R=${0%%/$P}             # dirname of program
P=${P%.*}               # strip off after last . characterA

# if fd1(stdout) is not connected to terminal. 
# redirect to logger coprocess.
# COPROC[0] is connected to the standard output of the co-process
# COPROC[1] is connected to the standard input of the co-process.
if [ ! -t 1 ]
   then coproc /usr/bin/logger -t${P}[$$]
        # Redirect stdout/stderr to logger
        eval "exec 1>&${COPROC[1]} 2>&1 ${COPROC[0]}>&-"
fi

[ ! -d /var/lib/smartd ] && mkdir -p /var/lib/smartd

if grep -wq '^DEVICESCAN$' /etc/smartd.conf
   then sed -i -e 's#^DEVICESCAN$#DEVICESCAN -m root#g' /etc/smartd.conf
fi

renice -n 19 $$ >/dev/null
exec /usr/sbin/smartd --savestates='/var/lib/smartd/' --quit=onecheck 

 

 

/boot/local/bin/hdparm_set_default_standby.sh

#!/bin/bash

[ ${DEBUG:=0} -gt 0 ] && set -x -v

P=${0##*/}              # basename of program
R=${0%%/$P}             # dirname of program
P=${P%.*}               # strip off after last . characterA

# From hdparm settings
# -s Enable/disable the power-on in standby feature, if supported by the drive. 
# VERY DANGEROUS. Do not use unless you are absolutely certain that both the system BIOS (or firmware) 
# and the operating system kernel (Linux >= 2.6.22) support probing for drives that use this feature. 
# When enabled, the drive is powered-up in the standby mode to allow the controller to sequence the spin-up of devices
# , reducing the instantaneous current draw burden when many drives share a power supply. 
# Primarily for use in large RAID setups. This feature is usually disabled and the drive is powered-up in the active mode 
# (see -C above). Note that a drive may also allow enabling this feature by a jumper. 
# Some SATA drives support the control of this feature by pin 11 of the SATA power connector. 
# In these cases, this command may be unsupported or may have no effect.
#
# -S Put the drive into idle (low-power) mode, and also set the standby (spindown) timeout for the drive. 
# This timeout value is used by the drive to determine how long to wait (with no disk activity) 
# before turning off the spindle motor to save power. Under such circumstances, 
# the drive may take as long as 30 seconds to respond to a subsequent disk access, 
# though most drives are much quicker. The encoding of the timeout value is somewhat peculiar. 
# A value of zero means "timeouts are disabled": the device will not automatically enter standby mode. 
# Values from 1 to 240 specify multiples of 5 seconds, yielding timeouts from 5 seconds to 20 minutes. 
# Values from 241 to 251 specify from 1 to 11 units of 30 minutes, yielding timeouts from 30 minutes to 5.5 hours. 
# A value of 252 signifies a timeout of 21 minutes. 
# A value of 253 sets a vendor-defined timeout period between 8 and 12 hours, 
# and the value 254 is reserved. 255 is interpreted as 21 minutes plus 15 seconds. 
# Note that some older drives may have very different interpretations of these values.
# 
S=243

# if fd1(stdout) is not connected to terminal. 
# redirect to logger coprocess.
# COPROC[0] is connected to the standard output of the co-process
# COPROC[1] is connected to the standard input of the co-process.
if [ ! -t 1 ]
   then coproc /usr/bin/logger -t${P}[$$]
        # Redirect stdout/stderr to logger
        eval "exec 1>&${COPROC[1]} 2>&1 ${COPROC[0]}>&-"
fi

# ls -l /dev/disk/by-id | grep -v "\-part"  | egrep 'ata-|scsi-' | cut -d" " -f10 | while read device
ls -L1 /dev/disk/by-id | grep -v "\-part"  | egrep 'ata-|scsi-' | while read device
do 
   [ "${device}" == "" ] && continue

   STATE=`hdparm -C /dev/disk/by-id/${device}`
   STATE=${STATE//$'\n'/} # Remove all newlines.
   echo ${STATE}

   STATE="${STATE##* }"
   if  [[ "${STATE}" =~ "standby" ]]
       then : 
            # echo "Skipping set of standby timer for ${device}"
            continue
   fi 

   # -s   Set power-up in standby flag (0/1) (DANGEROUS)
   # -S   Set standby (spindown) timeout

   hdparm -S${S} /dev/disk/by-id/${device}
done

if [ ! -t 1 ]
   then eval "exec ${COPROC[1]}>&-"
fi

 

 

and the cron entry that triggers these daily.

 

/etc/cron.d/smartd (rsynced from /boot/local/etc/cron.d/smartd in go script)

50 05 * * * /boot/local/bin/smartd.sh 2>&1 | exec /usr/bin/logger -tsmartd[$$]
05 06 * * * /boot/local/bin/hdparm_set_default_standby.sh 2>&1 | exec /usr/bin/logger -thdparm_set_default_standby[$$]
#
# * * * * * <command to be executed>
# | | | | |
# | | | | |
# | | | | +---- Day of the Week   (range: 1-7, 1 standing for Monday)
# | | | +------ Month of the Year (range: 1-12)
# | | +-------- Day of the Month  (range: 1-31)
# | +---------- Hour              (range: 0-23)
# +------------ Minute            (range: 0-59)

Link to comment

>> Values from 1 to 240 specify multiples of 5 seconds, yielding timeouts from 5 seconds to 20 minutes.

 

6 seconds may be too low and the firmware is overriding it.

I've never gone underneath 241.  My drives have spun down and are showing the same in the webGui.

 

There also may be a timeout or webGui attribute update based on the polling time of 1800 seconds (half an hour)

 

root@unRAIDm:/usr/local/sbin# cat /etc/unraid-version

version="6.1.3"

root@unRAIDm:/usr/local/sbin# /boot/local/bin/hdparm_set_default_standby.sh

/dev/disk/by-id/ata-HGST_HDN726060ALE610_NAG1D7TP: drive state is: standby

/dev/disk/by-id/ata-HGST_HDN726060ALE610_NAG1DEKP: drive state is: standby

/dev/disk/by-id/ata-ST3000DM001-1CH166_W1F1GTFJ: drive state is: standby

/dev/disk/by-id/ata-ST3000DM001-1CH166_Z1F2WFKV: drive state is: standby

/dev/disk/by-id/ata-ST4000VN000-1H4168_S3012W7N: drive state is: standby

/dev/disk/by-id/ata-ST4000VN000-1H4168_S3012WS6: drive state is: standby

/dev/disk/by-id/ata-ST4000VN000-1H4168_S301HS8H: drive state is: standby

/dev/disk/by-id/ata-ST6000DX000-1H217Z_Z4D0EE7M: drive state is: standby

/dev/disk/by-id/ata-ST6000DX000-1H217Z_Z4D0EEDV: drive state is: standby

/dev/disk/by-id/ata-Samsung_SSD_840_PRO_Series_S1AXNSAF701196M: drive state is: active/idle

 

/dev/disk/by-id/ata-Samsung_SSD_840_PRO_Series_S1AXNSAF701196M:

setting standby to 243 (1 hours + 30 minutes)

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...