unRAID Server Release 5.0-rc16c Possible Spindown issues under ESX and possible enhancements regardi

TheDragon · July 9, 2013

due to lack of temperatures and spindown support for RDM'd drives. I'll give this a go tomorrow... will be great if this works!

You will need to be near your machine to see if they spin down or spin up.

I cannot tell at the current moment, the AC is on, I'm blasting Santana and programming.

While the drives did not show spin up or spin down before in my HP micro server.

I could tell they were spinning down due to time to access and hearing them spin up.

Too much going on for me to verify this right now.

Just in case this helps anyone else, I've just tested this without updating the version of hdparm and smartmontools... and I'm able to see drive temperatures and spin up/down RDM'd drives!! I'll have to test it out a little more to make sure everything still works as expected, but I'm hopeful now that I may be able to make the leap to ESXi without needing new hardware!

Maybe this is as a result of one of the newer kernel versions since I last tested under RC8A?

EDIT: Okay, it's not as clear cut as I first thought, the drives connected to my SAS card seem to work perfectly, the drives connected to onboard controller don't seem to.

Just to follow up, after doing some testing:

With RC16c & my M1015 - all attached drives spindown manually, and automatically, temperatures are also displayed in the unRAID GUI and unMENU GUI.

With RC16c & onboard controller - all attached drives will spindown manually using unMENU, but the spindown drives option in unRAID GUI doesn't have any effect, they also don't spindown automatically according to delays set in unRAID. Additionally the graphics to indicate if a disk is spinning do not reflect whether the drive is actually spinning or not. Temperatures are display correctly in both unRAID GUI and unMENU GUI.

I'm not sure why the behaviour varies between onboard and SAS controller - could it be something to do with drivers in the kernel? My motherboard is a C2SEA, with an ICH10 SATA controller.

It's also curious that different behaviour is observed between emhttp and unMENU - are different methods used to control the drives?

WeeboTech · July 9, 2013

jack0w, you can check if you can manually spin down the drives by disabling spindown from emhttp

do hdparm -y /dev/sd? where ? is the drive you want to test.

if hdparm can spin the drive up and down, then emhttp can and the problem isn't the interface but the detection.

As i mentioned earlier. smartctl is used to detect temperatures. refreshing the gui, could have the effect of keeping the drive in an accessed state.

i.e. the -n standby command may not be effective via the smartctl command line.

(For certain controller interfaces). i.e. RDM drives.

madburg · July 9, 2013

@jack0w so did you try the same tests with the updated components?

madburg · July 9, 2013

With RC16c & onboard controller - all attached drives will spindown manually using unMENU, but the spindown drives option in unRAID GUI doesn't have any effect, they also don't spindown automatically according to delays set in unRAID.

Which is why I've always said that the spinup/spindown functionality should not be burried in the md-mod driver, but should be a plain Bash daemon. We could have debugged the hell out of it by now.

Here I volunteer my own spindown daemon, which I have been using for years. If you want to try it, don't forget to disable unRaid's spindown functionality.

How about a quick guide, like change only xyz... value and nothing else, and how to get this to load (how and where/placement)

MINUTES=${MINUTES:-60} ?

diskTimeout=$(($MINUTES*60)) # seconds ?

loopDelay=60 # seconds ?

No Grouping ability I assume at the moment with this script, right?

I want to see if disk#1 will spin up when mover is executed even though there is no data slatted for it, this has be an issue for sometime now.

madburg · July 9, 2013

Keep in mind VMWare RDM has limitations as its not a pass-through technology, there is a virtual layer in between, I posted a long time again a snippet from VMWare how it works.

The limitations can be found pretty easily via command line to the RDM's using hdparm and smartctl command and seeing what comes back, if anything. The updated components are worth giving the same tests a try as well.

I do not favor RDM's for a production system personally.

madburg · July 9, 2013

Ok thanks, scenario: if i start via GO script but the array is not auto started, it will still function and spin down the disks based on the MINUTES=${MINUTES:-60} value (unlike unRAID which would not until you start the array right?)

Do/can I change the MINUTES=${MINUTES:-60} to MINUTES=${MINUTES:-05} or MINUTES=${MINUTES:-5} for 5 minutes?

You forgot to answer "No Disk Grouping ability I assume at the moment with this script, right?"

WeeboTech · July 9, 2013

Maybe you could externalize the variables with a source line.

i.e. somethng like this. it could even be shortend to a one liner

CONFDIR="${LOCAL}/etc"

if [ -f ${CONFDIR}/${P}.conf ]; then

source ${CONFDIR}/${P}.conf

fi

it could even be shortend to a one liner

[ -f ${CONFDIR}/${P}.conf ] && source ${CONFDIR}/${P}.conf

I use this at the top of all scripts so I can get the basename of the running script for log,tmp,pid and conf files.

[ ${DEBUG:=0} -gt 0 ] && set -x -v

P=${0##*/} # basename of program

R=${0%%$P} # dirname of program

E=${P#*.} # Ext

P=${P%.*} # strip off after last . character

TMPFILE=/tmp/${P}.$$

trap "rm -f ${TMPFILE}" EXIT HUP INT QUIT TERM

madburg · July 9, 2013

Ok thanks, scenario: if i start via GO script but the array is not auto started

It spins down ALL disks on the server, regardless if they are part of the unRaid array or not.

That is great, one other question then, will this run on only Parity/array/cache disks and hopefully not a disk outside of what's assigned.

Reason's being a drive being pre_cleared and i have a vmdk mounted thats not apart of the array. Script should not make attempts on them at all, is that the case? I cant read linux scripts clear enough to understand.

Do/can I change the MINUTES=${MINUTES:-60} ....

You don't have to change anything in the script. If you do: `spind --help`, as I mentioned before, it will show you how you can start it in different ways by prefixing the command.

Can't load it at the moment so wanted to kind of be prepared... can you post the --Help? please

TheDragon · July 9, 2013

jack0w, you can check if you can manually spin down the drives by disabling spindown from emhttp

do hdparm -y /dev/sd? where ? is the drive you want to test.

if hdparm can spin the drive up and down, then emhttp can and the problem isn't the interface but the detection.

As i mentioned earlier. smartctl is used to detect temperatures. refreshing the gui, could have the effect of keeping the drive in an accessed state.

i.e. the -n standby command may not be effective via the smartctl command line.

(For certain controller interfaces). i.e. RDM drives.

Just given this a go with and without the updated hdparms and smartctl.

Without the updated hdparms and smartctl

Drives connected to onboard controller - Running 'hdparm -y /dev/sdb' from the console does spindown the drive, but the spin status isn't accurately reflected in emhttp. (they are always shown as spundown). Temperatures are not displayed whether drive is spinning or not.
Drives connected to SAS controller - Running 'hdparm -y /dev/sdb' from the console does spindown the drive, and the spin status is accurately reflected in emhttp. Temperatures are displayed only when the drive is spinning.

With the updated hdparms and smartctl

Drives connected to onboard controller - Running 'hdparm -y /dev/sdb' from the console does spindown the drive, but the spin status isn't accurately reflected in emhttp. (they are always shown as spunup). Temperatures are displayed whether drive is spinning or not.
Drives connected to SAS controller - Running 'hdparm -y /dev/sdb' from the console does spindown the drive, and the spin status is accurately reflected in emhttp. Temperatures are only display when the drive is spinning. Temperatures are displayed only when the drive is spinning.

I do also see the following response in the console after issuing the 'hdparm -y 'command: 'SG_IO:bad/missing sense data, sb[]: 70 00 00 00 00 00 00 18 00 00 00 00 00 00 00 00 00 00 09 0c 00 00 00 00 00 00 00 00 00 00 40 50'. Although I only see this on drives connected to the onboard controller, this does NOT appear on drives connected to the M1015.

TheDragon · July 9, 2013

With RC16c & onboard controller - all attached drives will spindown manually using unMENU, but the spindown drives option in unRAID GUI doesn't have any effect, they also don't spindown automatically according to delays set in unRAID.

Which is why I've always said that the spinup/spindown functionality should not be burried in the md-mod driver, but should be a plain Bash daemon. We could have debugged the hell out of it by now.

Here I volunteer my own spindown daemon, which I have been using for years. If you want to try it, don't forget to disable unRaid's spindown functionality.

Thanks for posting this xnas! I'll certainly give it a test drive!

My only question, will it log its activity to the syslog? Just so I can be aware of the effect of what I've set the spindown delay to.

madburg · July 9, 2013

How do I find in /dev which is my mounted VMDK? To manually execute hdparm against it to see what hdparm thinks of this disk?

Have you personally tried to preclear a drive while running this daemon? (just to know)

Does unRAID work off of /dev as well to spin down/up drives? (for knowledge sake, if you know)

Great question @jack0w (about the logging)

WeeboTech · July 9, 2013

jack0W,

spindown your drives in question manually.

make sure all are down.

Then refresh the emhttp console.

If you hear the drives spin up, that's the problem. the smartctl is making them come back up.

TheDragon · July 9, 2013

jack0W,

spindown your drives in question manually.

make sure all are down.

Then refresh the emhttp console.

If you hear the drives spin up, that's the problem. the smartctl is making them come back up.

Just tried this. Refreshing emhttp doesn't cause any drives to spin up after I've manually spun them all down.

Just out of curiosity I repeated this procedure, but subtituted emhttp for unMENU, I noticed that as soon as I open unMENU after they've all been spundown one disk does spin up, my only Hitachi drive.

madburg · July 9, 2013

Have you personally tried to preclear a drive while running this daemon? (just to know)

Preclearing a disk, that's constant i/o activity on that disk. Spind won't touch that disk, until there's been at least 60 minutes of complete inactivity on that disk -- no physical reads, and no physical writes.

I understand that, but things don't always works as anticipated, that why I asked have you personally tried? For testing this I would like to set it to 5 mins.

I normally have mine set to 30 mins with unRAID and would like at some point to use spin groups as well (not an emergency at the moment)

spind --help

Usage: spind
Or: MINUTES=45 spind (Sets it for 45 minutes. Default is 60 minutes)

To stop the daemon: spind -q

So to set for 5mins, do i execute:

MINUTES=05 spind

or

MINUTES=5 spind

or either one will work?

madburg · July 9, 2013

... with both smartmontools & hdparm updates baked in

EDIT: comment is for post below, xnas has a thing where he deletes posts and re-posts...?

TheDragon · July 9, 2013

Refreshing emhttp doesn't cause any drives to spin up after I've manually spun them all down.

Just out of curiosity I repeated this procedure, but subtituted emhttp for unMENU, I noticed that as soon as I open unMENU after they've all been spundown one disk does spin up, my only Hitachi drive.

Apparently, smartctl is unnecessary waking up some Hitachi disks.

Test this: spin it down manually, and see if it will wake up if you do a `hdparm -C` on it.

Emhttp could fix this:

Instead of a straight...
smartctl -n sleep,standby -a $dev
...emhttp should do something like...
hdparm -C $dev | grep -q active && smartctl -a $dev
Or, better yet, leave all that crap to external scripts.

I've just tried 'hdparm -C' - this doesn't cause the Hitachi drive to spin up!

Although to be fair, emhttp wasn't causing that drive to spinup anyway - that was only true when I tried opening unMENU.

Regarding the other issue of RDM'd drives, connected to onboard SATA controller - I think I can see why the status indicators in emhttp don't accurately show the disks spin status - if I run hdparm -C on these disks, they always come back as 'drive state is: standby' even if they're actually spinning. Here is the full output of the command:

Disk is spinning

root@Atlas:/boot/custom# hdparm -C /dev/sdb

/dev/sdb:
SG_IO: bad/missing sense data, sb[]:  70 00 00 00 00 00 00 18 00 00 00 00 00 00 00 00 00 00 09 0c 00 00 00 ff 00 00 00 00 00 00 40 50
drive state is:  standby

Disk is NOT spinning

root@Atlas:/boot/custom# hdparm -C /dev/sdb

/dev/sdb:
SG_IO: bad/missing sense data, sb[]:  70 00 00 00 00 00 00 18 00 00 00 00 00 00 00 00 00 00 09 0c 00 00 00 00 00 00 00 00 00 00 40 50
drive state is:  standby

madburg · July 9, 2013

[quote author=xnas link=topic=28437.msg253107#msg253107 date=1373384893]
Apparently, smartctl is unnecessary waking up some Hitachi disks.
Test this: spin it down manually, and see if it will wake up if you do a `hdparm -C` on it.

Huh, I have all Hitachi drives (less then 1 seagate 250GB drive) `hdparm -C` never wakes any of them.

Only Hitachi base variable with hdparm is "-H Read temperature from drive (Hitachi only)" which is sweet.

madburg · July 9, 2013

Jul 9 11:45:51 Tower kernel: mdcmd (160): set md_num_stripes 2560
Jul 9 11:45:51 Tower kernel: mdcmd (161): set md_write_limit 1024

Jul 9 11:45:51 Tower kernel: mdcmd (162): set md_sync_window 1024

Jul 9 11:45:51 Tower kernel: mdcmd (163): set spinup_group 0 0

Jul 9 11:45:51 Tower kernel: mdcmd (164): set spinup_group 1 0

Jul 9 11:45:51 Tower kernel: mdcmd (165): set spinup_group 2 0

Jul 9 11:45:51 Tower kernel: mdcmd (166): set spinup_group 3 0

Jul 9 11:45:51 Tower kernel: mdcmd (167): set spinup_group 4 0

Jul 9 11:45:51 Tower kernel: mdcmd (168): set spinup_group 5 0

Jul 9 11:45:51 Tower kernel: mdcmd (169): set spinup_group 6 0

Jul 9 11:45:51 Tower kernel: mdcmd (170): set spinup_group 7 0

Jul 9 11:45:51 Tower kernel: mdcmd (171): set spinup_group 11 0

Jul 9 11:45:51 Tower kernel: mdcmd (172): set spinup_group 12 0

Jul 9 11:45:51 Tower kernel: mdcmd (173): set spinup_group 13 0

Jul 9 11:45:51 Tower kernel: mdcmd (174): set spinup_group 14 0

Jul 9 11:45:51 Tower kernel: mdcmd (175): set spinup_group 15 0

Jul 9 11:45:51 Tower kernel: mdcmd (176): set spinup_group 16 0

Jul 9 11:45:51 Tower kernel: mdcmd (177): set spinup_group 17 0

Jul 9 11:45:51 Tower kernel: mdcmd (178): set spinup_group 18 0

Jul 9 11:47:17 Tower spind: Version 3.4 <c> 2013 by Pourko Balkanski (For personal use only, not for redistribution!)

Jul 9 11:47:17 Tower spind: Idle timeouts set for 5 minutes.

Jul 9 11:47:17 Tower spind: Monitoring: sdc sdd sde sdf sdg sdh sdi sdj sdk sdl sdm sdn sdo sdp sdq sdr sds

Jul 9 11:53:18 Tower spind: spinning down /dev/sdc

Jul 9 11:53:18 Tower spind: spinning down /dev/sdd

Jul 9 11:53:19 Tower spind: spinning down /dev/sdf

Jul 9 11:53:20 Tower spind: spinning down /dev/sdo

Jul 9 11:53:20 Tower spind: spinning down /dev/sdr

Jul 9 11:53:21 Tower spind: spinning down /dev/sds

No flash, nor VMDK mount being monitored, very good (I can tell from the count)

madburg · July 9, 2013

Apparently, smartctl is unnecessary waking up some Hitachi disks.

Test this: spin it down manually, and see if it will wake up if you do a `hdparm -C` on it.

Huh, I have all Hitachi drives (less then 1 seagate 250GB drive) `hdparm -C` never wakes any of them.

Only Hitachi base variable with hdparm is "-H Read temperature from drive (Hitachi only)" which is sweet.

Right. You didn't read what you quoted.

What do you mean? You are not calling -H in your script, and no one is using it to get temps?

TheDragon · July 9, 2013

Regarding the other issue of RDM'd drives, connected to onboard SATA controller - I think I can see why the status indicators in emhttp don't accurately show the disks spin status - if I run hdparm -C on these disks, they always come back as 'drive state is: standby' even if they're actually spinning. Here is the full output of the command:

Disk is spinning
root@Atlas:/boot/custom# hdparm -C /dev/sdb

/dev/sdb:
SG_IO: bad/missing sense data, sb[]:  70 00 00 00 00 00 00 18 00 00 00 00 00 00 00 00 00 00 09 0c 00 00 00 ff 00 00 00 00 00 00 40 50
drive state is:  standby
Disk is NOT spinning
root@Atlas:/boot/custom# hdparm -C /dev/sdb

/dev/sdb:
SG_IO: bad/missing sense data, sb[]:  70 00 00 00 00 00 00 18 00 00 00 00 00 00 00 00 00 00 09 0c 00 00 00 00 00 00 00 00 00 00 40 50
drive state is:  standby

@xnas I've just been looking at your spind script. Sadly I'm not sure it'll get me out of this hole... since the hdparm -C command doesn't return an accurate status for drives connected to onboard SATA controller, I think I'm right in saying that your script won't attempt to spindown the drive - since it will think it is already spundown. I hope that makes sense!

I've also checked the BIOS to see if there are any SATA controller settings that might change this behaviour - sadly no joy there.

I still can't quite understand why RDM'd drives connected to onboard SATA controller and SAS controller behave differently under ESXi. You never used to be able to get spindown support or temps at all. Where as now, it seems to work perfectly for SAS controllers, but not anything onboard.

Does anyone have any suggestions, or other tests I can perform to help further?

EDIT: One further thought - is there a way, other than 'hdparm -C', to determine the spin state of a drive?

WeeboTech · July 9, 2013

EDIT: One further thought - is there a way, other than 'hdparm -C', to determine the spin state of a drive?

There used to be a program called hddtemp that could get the temperature and the spin status of a drive.

I have not used it in years. Not sure if it will work correctly in ESX either.

Also, are we in agreement that this is more of an ESX situation?

If that is the case, Would another sas controller resolve this issue for those that can migrate to one.

TheDragon · July 9, 2013

Regarding the other issue of RDM'd drives, connected to onboard SATA controller - I think I can see why the status indicators in emhttp don't accurately show the disks spin status - if I run hdparm -C on these disks, they always come back as 'drive state is: standby' even if they're actually spinning. Here is the full output of the command:

Disk is spinning
root@Atlas:/boot/custom# hdparm -C /dev/sdb

/dev/sdb:
SG_IO: bad/missing sense data, sb[]:  70 00 00 00 00 00 00 18 00 00 00 00 00 00 00 00 00 00 09 0c 00 00 00 ff 00 00 00 00 00 00 40 50
drive state is:  standby
Disk is NOT spinning
root@Atlas:/boot/custom# hdparm -C /dev/sdb

/dev/sdb:
SG_IO: bad/missing sense data, sb[]:  70 00 00 00 00 00 00 18 00 00 00 00 00 00 00 00 00 00 09 0c 00 00 00 00 00 00 00 00 00 00 40 50
drive state is:  standby
@xnas I've just been looking at your spind script. Sadly I'm not sure it'll get me out of this hole... since the hdparm -C command doesn't return an accurate status for drives connected to onboard SATA controller...

Does anyone have any suggestions, or other tests I can perform to help further?
Did you try the same with the newer version of hdparm?

I did - sadly the behaviour is the same with both versions

I have just found this however, which seems to suggest another means of determining the spinstate, without using hdparm -C. Quite how/if this could be leveraged in a script or unRAID I'm not sure. I'll leave that to the far more knowledgable people in the room ...

http://forums.freenas.org/threads/how-to-find-out-if-a-drive-is-spinning-down-properly.2068/

TheDragon · July 9, 2013

EDIT: One further thought - is there a way, other than 'hdparm -C', to determine the spin state of a drive?

There used to be a program called hddtemp that could get the temperature and the spin status of a drive.

I have not used it in years. Not sure if it will work correctly in ESX either.

Also, are we in agreement that this is more of an ESX situation?

If that is the case, Would another sas controller resolve this issue for those that can migrate to one.

I think from what I've seen this is a situation limited to ESXi. Moving all the disks to SAS controllers would sidestep the issue, though this isn't an option for me at the moment.

If it could be cracked it could be very useful to those of us who don't have passthrough capable hardware. Passthrough used to be the only way to run unRAID under ESXi, and from what I can see this one issue is the only thing stopping me from doing this with RDM'd drives. I think it just needs some logic adding, so that the string of numbers in my previous post can be interpreted to determine the spin state. If I understand correctly from the post I linked to on the freenas forums, this would be a more reliable way of checking and would work for bare metal and virtualized environments.

TheDragon · July 9, 2013

I think from what I've seen this is a situation limited to ESXi.

Does the kernel even see that disk as rotational? Try:
cat /sys/block/sdb/queue/rotational
If that's zero, then no hdparm will help you. Or anything else for that matter. Blame the way ESXi presents the disk.

Looks like it may not be completely hopeless!!

root@Atlas:/boot/custom# cat /sys/block/sdb/queue/rotational
1

Also this was the comment from the freenas forum I was referring to (it may not be the 10th hexadecimal value, but the rest holds true):

Take a look at the 10th hexadecimal value in the output. If this says "FF", the drive is spinning, if it says "00" the drive is spun down. It's that easy!!

Disk is spinning

root@Atlas:/boot/custom# hdparm -C /dev/sdb
/dev/sdb:
SG_IO: bad/missing sense data, sb[]:  70 00 00 00 00 00 00 18 00 00 00 00 00 00 00 00 00 00 09 0c 00 00 00 ff 00 00 00 00 00 00 40 50
drive state is:  standby

Disk is NOT spinning

root@Atlas:/boot/custom# hdparm -C /dev/sdb
/dev/sdb:
SG_IO: bad/missing sense data, sb[]:  70 00 00 00 00 00 00 18 00 00 00 00 00 00 00 00 00 00 09 0c 00 00 00 00 00 00 00 00 00 00 40 50
drive state is:  standby

Not sure if that helps at all?

RobJ · July 9, 2013

I have just found this however, which seems to suggest another means of determining the spinstate, without using hdparm -C. Quite how/if this could be leveraged in a script or unRAID I'm not sure. I'll leave that to the far more knowledgable people in the room ...

http://forums.freenas.org/threads/how-to-find-out-if-a-drive-is-spinning-down-properly.2068/

That uses the tool camcontrol, which appears to be strictly a FreeBSD tool, for the FreeBSD CAM subsystem. I'm no expert here though, but it does not look in any way adaptable for us.

unRAID Server Release 5.0-rc16c Possible Spindown issues under ESX and possible enhancements regardi

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Archived