Lime Technology - unRAID Server Community
July 29, 2010, 03:48:34 AM *
Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
 
   Home   Help Search Login Register  
Pages: [1] 2 3 ... 7
  Print  
Author Topic: cache_dirs - an attempt to keep directory entries in RAM to prevent disk spin-up  (Read 4753 times)
Joe L.
Hero Member
*****
Posts: 7545


View Profile
« on: October 15, 2009, 10:27:34 AM »

cache_dirs is a script to attempt to keep directory entries in memory to prevent disks from spinning up just to get a directory listing.

The Linux kernel keeps the most recently accessed disk buffers and directories in memory, and flushes out the least recently accessed entries. If we can 'trick' the kernel into keeping our directory entries in that memory cache, then directory scans will find what it is looking for is already in memory, and not need to access the physical disk to get it. As a result, since the physical disk is not accessed, linux will (after the defined time-out delay) let the physical drives spin down, and save on power costs, plus remove the drives as heat sources. This is especially useful when media files are spread across multiple drives, and a media player begins to scan for a particular media file to play. You want the scan to look at all of the relevant directories, but only spin up the one drive containing the desired media file.

Since the cache management decision process tries to keep the most recently accessed disk buffers and directory entries, we need to 'trick' it by constantly accessing the directories of the folders we want to keep in the cache, so that they will always appear to be the most recently accessed.  I've developed, with the help of lots of suggestions and feedback, an easily customizable script called cache_dirs to do this.

it is described in the wiki here

A long series of posts describing its evolution over time is here in this thread

It has many possible tunable options, but most people can simply invoke it as
cache_dirs -w

The "-w" option will cause it to wait if unRAID is not yet started.  

If you have a folder or folders you wish to exclude, there is a -e option.  This option can be used multiple times.  To exclude a "data" and "old-stuff" directory, you would use
cache_dirs -w -e "data" -e "old-stuff"
Always use Quote marks around the folders you wish to exclude, this is especially important if the folder name contains a space or other special character that might be interpreted by the linux shell.

By default, all top level folders on the disks and everything under them in sub-folders are scanned.  If you only want a subset of the top -level folders scanned, you can supply an "include" list using the "-i include_dir" option.  Again, it may be repeated on the command line multiple times.  If using the "include" feature, only those directories included are scanned.  There is no need to use an exclude as well, unless you use a wild-card for the include directory and the "include" wild-card matches more than you want cached.  (The include and exclude options work on top-level folders only.  They may not be used to include or exclude specific sub-folders.  You can use a different option (-a) if you wish to exclude a sub-folder, as shown in  this post )

For example, let's say you have folders like this:
Movies-Comedy-Bad
Movies-Comedy-Good
Movies-Chick-Flicks-Good
Movies-Chick-Flicks-Bad
Movies-Adventure-Good
Movies-Adventure-Bad
Movies-Drama-Good
Movies-Drama-Bad
Movies-Kids-Good
Movies-Kids-Bad
Movies-Junk-Good
Movies-Junk-Bad
Data1
Data2
Data3

...
You could use an include rule like this
-i "Movies*"

and an exclude rule like this in combination with it
-e "*Bad"

You would cache only those directories that start with "Movies" and do not have "Bad" at the end of their name.

If you added one more exclude like this:
-e "*Junk*"
You would not scan either of the folders with Junk in their name.   Using a combination of include and exclude directories make it pretty flexible if you have the need.  For most people, one or two exclude folders might be all that is needed, if at all.  If you have enough RAM, just let it scan and cache everything.  My "data" folder holds a directory with a backup of an old windows system and has at least several hundred thousand files and folders under it.  I always exclude it, as it is never needed by my media players in their listing of movies.

If you add /boot/cache_dirs -w to your "go" script, it will run each time you re-start your server.

To stop cache_dirs from running, type
cache_dirs -q

To see all the options, type
cache_dirs -h

To run it in the foreground, so you can see what it is doing, use the -F option.  As it loops and scans it will print statistics on how long each scan is taking.  It will adjust the scan rate based on the activity on the server.  You can set the min and max delay times of the scan rate using the -m min-time and -M max-time options.

Usage: cache_dirs [-m min_seconds] [-M max_seconds] [-F] [-d maxdepth] [-c command] [-a args] [-e exclude_dir] [-i include_dir] [-w]
       cache_dirs -V      = print program version
       cache_dirs -q
       cache_dirs -l on   = turn on logging to /var/log/cache_dirs.log
       cache_dirs -l off  = turn off logging to /var/log/cache_dirs.log
 -w       =   wait for array to come online before start of cache scan of directories
 -m NN    =   minimum seconds to wait between directory scans (default=1)
 -M NN    =   maximum seconds to wait between directory scans (default=10)
 -F       =   do NOT run in background, run in Foreground and print statistics as it loops and scans
 -v       =   when used with -F, verbose statistics are printed as directories are scanned
 -s       =   shorter-log - print count of directories scanned to syslog instead of their names
 -d NN    =   use "find -maxdepth NN" instead of "find -maxdepth 999"
 -c command   = use command instead of "find"
              (command should be quoted if it has embedded spaces)
 -a args    = append args to command
 -u       =   also scan /mnt/user (scan user shares)
 -e exclude_dir  (may be repeated as many times as desired)
 -i include_dir  (may be repeated as many times as desired)
 -B       =   do not force disks busy (to prevent unmounted disks showing as unformatted)
 -S       =   do not suspend scan during 'mover' process
 -z       = concise log (log run criteria on one line)
 -q       = terminate any background instance of cache_dirs


cache_dirs will force all the data disks to be "busy" to prevent them from being un-mounted.  This will prevent un-mounted disks appearing as un-formatted in the unRAID management console.   If you are using any release prior to 4.5beta7, this will prevent you from "Stopping" the array the first time you press the "Stop" button.  Simply wait a few seconds and then press "Stop" a second time within 2 minutes of the first attempt to stop the array.  If you have no other processes keeping disks busy, it will then stop.

On release 4.5b7, it is no longer necessary to press stop a second time, and in fact you cannot, as the management console will show "Unmounting" until all processes holding disks busy are terminated and only the "Refresh" button is active.

If you are on 4.5b7 or greater, if you wish, you can use the -B option to not force the disks to be busy.

The 1.6.4 version of cache_dirs is attached.  It is now coded to sleep while the "mover" process moves files from your cache drive.  
# Version 1.6.4 - Modified to suspend scan during time "mover" script is running to prevent
#                 DuplicateFile messages from occurring as file is being copied.
#               - Added -S option to NOT suspend scan during mover process.
#               - Added logic to re-invoke cache_dirs if array is stopped and then re-started
#                 by submitting command string to "at" to re-invoke in a minute.
#               - Added entry to "usage()" function for -B

# Version 1.6.5 - Fixed what I broke in looking for "mover" pid to suspend during the "mover"
#                 to eliminate warnings in syslog about duplicate files detected while files were
#                 being copied.

The full revision history is as follows:
Code:
####################################################################################
# cache_dirs
# A utility to attempt to keep directory entries in the linux
# buffer cache to allow disks to spin down and no need to spin-up
# simply to get a directory listing on an unRAID server.
#
# Version 1.0   Initial proof of concept using "ls -R"
# Version 1.1   Working version, using "ls -R" or "find -maxdepth"
# Version 1.2   Able to be used with or without presence of user-shares.
#               Removed "ls -R" as it was too easy to run out of ram. (ask me how I know)
#               Added -i include_dir to explicitly state cached directories
#               Added -v option, verbose statistics when run in foreground
#               Added -q option, to easily terminate a process run in the background
#               Added logging of command line parameters to syslog
# Version 1.3   Added -w option, to wait till array comes online before starting scan
#               of /mnt/disk* share folders.
#               Changed min-seconds delay between scans to 1 instead of 0.
#               Moved test of include/exclude directories to after array is on-line
#               Added logging of mis-spelled/missing include/exclude dirs to syslog
#               Added ability to have shell wildcard expansion in include/exclude names
# Version 1.4   Fix bug with argument order passed to find when using -d option
#               Fixed command submitted to "at" to use full path. Should not need to
#              set PATH variable in "go" script.
#               Added ability to also cache scan /mnt/user with -u option
# Version 1.4.1 Fixed version comment so it is actually a comment.
# Version 1.5   Added -V to print version number.
#               Added explicit cache of root directories on disks and cache drive
#               Modified "average" scan time statistic to be weighted average with a window
#               of recent samples.
#               Added -a args option to allow entry of args to commands after dir/file name
#                 example: cache_dirs -a "-ls" -d 3
#                 This will execute "find disk/share -ls -maxdepth 3"
# Version 1.6   - Fixed bug... if -q was used, and cache_dirs not currently running,
#               it started running in error. OOps... Added the missing "exit"
#               - Changed vfs_cache_pressure setting to be 1 instead of 0 by default.
#               - Added "-p cache_pressure" to allow experimentation with vfs_cache_pressure values
#                (If not specified, default value of 1 will be used)
#               - Made -noleaf the default behavior for the "find" command (use -a "" to disable).
#               - Added logic to force all disks "busy" by starting a process with each as their
#               current working directory.   This will prevent a user from seeing a frightening
#               Unformatted description if they attempt to stop the array.  A second "Stop" will
#               succeed (the scan is paused for 2 minutes, so it may be stopped cleanly)
#               - Added new -B option to revert to the old behaviour and not force disks busy if by
#               chance this new feature causes problems for some users.
#               - Allow min seconds to be equal to max seconds in loop delay range.
#               - Added run-time-logging, log name = /var/log/cache_dirs.log
# Version 1.6.1 - Fixed bug. Added missing /mnt/cache disk to scanned directories
# Version 1.6.2 - Added trap to clean up processes after kill signal when run in background
# Version 1.6.3 - Modified to deal with new un-mounting message in syslog in 4.5b7 to
#                 allow array shutdown to occur cleanly.
# Version 1.6.4 - Modified to suspend scan during time "mover" script is running to prevent
#                 DuplicateFile messages from occurring as file is being copied.
#               - Added -S option to NOT suspend scan during mover process.
#               - Added logic to re-invoke cache_dirs if array is stopped and then re-started
#                 by submitting command string to "at" to re-invoke in a minute.
#               - Added entry to "usage()" function for -B
# Version 1.6.5 - Fixed what I broke in looking for "mover" pid to suspend during the "mover"
#                 to eliminate warnings in syslog about duplicate files detected while files were
#                 being copied.

Joe L.
« Last Edit: May 15, 2010, 03:03:50 AM by Joe L. » Logged

jupilerman
Full Member
***
Posts: 157



View Profile Email
« Reply #1 on: October 16, 2009, 12:37:28 AM »

Nice.
Thanx, you're the greatest. Smiley
Logged

mobo : ASUSTeK M3A76-CM / cpu : AMD Athlon II X4 620 - Quad Core / 4 Gb G.Skill PK DDR2-SDRAM PC2-6400 / 10 WD HD from 500 Gb to 1 Tb / 2 pci sata cards / 1 WB HD 250 Gb for virtual machines
drealit
Jr. Member
**
Posts: 98


View Profile
« Reply #2 on: October 16, 2009, 05:31:16 AM »

New ver. working great Joe! The dupes work around is working perfect.
Logged
kapperz
Full Member
***
Posts: 184



View Profile
« Reply #3 on: October 19, 2009, 11:19:33 AM »

Thanks Joe. Why do i see 8 seperate processes when this is started?

Quote
root      2618     1  0 11:37 ?        00:00:00 /bin/bash /boot/scripts/cache_dirs.sh -d 2 -m 3 -M 5 -w
root      2622     1  0 11:37 ?        00:00:00 /bin/bash /boot/scripts/cache_dirs.sh -d 2 -m 3 -M 5 -w
root      2628     1  0 11:37 ?        00:00:00 /bin/bash /boot/scripts/cache_dirs.sh -d 2 -m 3 -M 5 -w
root      2633     1  0 11:37 ?        00:00:00 /bin/bash /boot/scripts/cache_dirs.sh -d 2 -m 3 -M 5 -w
root      2637     1  0 11:37 ?        00:00:00 /bin/bash /boot/scripts/cache_dirs.sh -d 2 -m 3 -M 5 -w
root      2643     1  0 11:37 ?        00:00:00 /bin/bash /boot/scripts/cache_dirs.sh -d 2 -m 3 -M 5 -w
root      2648     1  0 11:37 ?        00:00:00 /bin/bash /boot/scripts/cache_dirs.sh -d 2 -m 3 -M 5 -w
root      2652     1  0 11:37 ?        00:00:02 /bin/bash /boot/scripts/cache_dirs.sh -d 2 -m 3 -M 5 -w

My go script looks like this...

Quote
/boot/scripts/cache_dirs.sh  -d  2  -m  3  -M  5  -w
Logged

My Rig (~4.5TB): CM Centurion 590 - GIGABYTE GA-MA785G-UD3H - 3 x Rosewill RC-207 - 3 x CM STB-3T4-E3-GP 4-in-3 - CORSAIR CMPSU-650TX 650W - AMD Sempron 140 - G.SKILL 4GB DDR2
Pictures Here
Joe L.
Hero Member
*****
Posts: 7545


View Profile
« Reply #4 on: October 19, 2009, 12:30:22 PM »

Easy, there is one "child" process keeping each of your data disks busy to prevent you from seeing a "Un-formatted" description when you attempt to stop the array and a directory scan is in progress.

The child process basically does this
   changes directory to /mnt/diskX
   then invokes a loop waiting for the lock-file used by cache_dirs to not exist
   while lock-file-exists
   do
     sleep 2 seconds
   end while 


Every 2 seconds it wakes up from its sleep and check if the lock-file is still there, if it is, it sleeps another 2 seconds any then looks once more, etc... Since the child processes each have as its "current directory" one of your data disks, it will not be possible for unRAID to un-mount them, as they will be "busy"

The main cache_dirs process removes the lock-file when it notices an attempt to stop the array allowing the child processes to stop and the disks to be un-mounted.

So... it is normal to see what you are seeing.   Once you upgrade to unRAID 4.5beta7, or beyond, you will not need those extra processes, as unRAID is smart enough now to not show "un-formatted" on disks it already un-mounted. On earlier versions it shows "Unformatted" on all the disks that could be  un-mounted and also showed a "Format" button you might accidentally use on a disk with your data.

Once you upgrade to 4.5b7, you can use the "-B" option to cache_dirs if you want to clean up your process listing, as those child processes will no longer be needed on your server to force disk to be "busy".

Joe L.
Logged

kapperz
Full Member
***
Posts: 184



View Profile
« Reply #5 on: October 19, 2009, 01:19:14 PM »

Ah ok. I haven't upgraded to 4.5b7 yet. I try to stay a little behind the bleeding edge since I'm new to unraid. Beta 6 was being used for many months with little issue. I feel its stable enough until I get things how I like. The "un-formatted" issue is not that troublesome now.
Logged

My Rig (~4.5TB): CM Centurion 590 - GIGABYTE GA-MA785G-UD3H - 3 x Rosewill RC-207 - 3 x CM STB-3T4-E3-GP 4-in-3 - CORSAIR CMPSU-650TX 650W - AMD Sempron 140 - G.SKILL 4GB DDR2
Pictures Here
betaman
Jr. Member
**
Posts: 99


View Profile
« Reply #6 on: October 19, 2009, 04:24:18 PM »

Hi Joe, quick question about the benefits of cache_dirs in my particular situation.  Is cache_dirs worthwhile if I'm  using my UnRAID server to feed an NMT Popcorn Hour where I use a movie jukebox and skin that is stored locally on a drive in the NMT?  I could see the benefit if I'm pointing my NMT to the user share of the UnRAID server and I'm using the standard UI of the NMT but I'm struggling to rationalize if I'm getting any benefit with my current setup?  Any insight would be greatly appreciated!
Logged
Joe L.
Hero Member
*****
Posts: 7545


View Profile
« Reply #7 on: October 19, 2009, 05:51:23 PM »

Hi Joe, quick question about the benefits of cache_dirs in my particular situation.  Is cache_dirs worthwhile if I'm  using my UnRAID server to feed an NMT Popcorn Hour where I use a movie jukebox and skin that is stored locally on a drive in the NMT?  I could see the benefit if I'm pointing my NMT to the user share of the UnRAID server and I'm using the standard UI of the NMT but I'm struggling to rationalize if I'm getting any benefit with my current setup?  Any insight would be greatly appreciated!
If the popcorn hour does not need to perform directory listings for you to choose a movie to view, then it is of less use to you.

It won't hurt anything, but you'll know if you need it.... (because your family will ask, why does it take so long to get a listing of our movies/music/pictures when I press a button?)

Joe L.
Logged

olympia
Full Member
***
Posts: 207


View Profile
« Reply #8 on: October 20, 2009, 12:30:42 AM »

Hi Joe L.,

I am wondering if it would be easy for you to include an option for completely wiping the disk instead of preclearing it.
Or the preclearing method in it's current form can be used for this purposes? (but at least the readback seems unneccessary in this case)

That would be extremly useful on a disk replacement, when the old disk is going to be sold out.

Thank you in advance for your feedback.
Logged
Joe L.
Hero Member
*****
Posts: 7545


View Profile
« Reply #9 on: October 20, 2009, 01:38:10 AM »

Hi Joe L.,

I am wondering if it would be easy for you to include an option for completely wiping the disk instead of preclearing it.
Or the preclearing method in it's current form can be used for this purposes? (but at least the readback seems unneccessary in this case)

That would be extremly useful on a disk replacement, when the old disk is going to be sold out.

Thank you in advance for your feedback.
I think you intended this question to be posted to the thread on the preclear_disk.sh script,
so I'll post my answer there.

Joe L.
Logged

KentBrockman
Newbie
*
Posts: 32


View Profile
« Reply #10 on: October 29, 2009, 12:29:54 PM »

Joe,

I really want to use this script as I am a user of MediaBrowser in Windows Media Center.
The application scans my movie folder every time to refresh the metadata and with 8 disks this causes an annoying delay as well as spins up my disks unnecessarily.

I installed the script and it seemed to work fine, but the disks showing unformatted really made me nervous.
It seems like I could accidentally do a lot of damage to my server.

Would upgrading to 4.5b7 change this in any way ?
Is 4.5b7 good enough to trust with my data ?

Any help you could be provide would be greatly appreciated.

Thanks,
Kent
Logged
Joe L.
Hero Member
*****
Posts: 7545


View Profile
« Reply #11 on: October 29, 2009, 12:58:25 PM »

Joe,

I really want to use this script as I am a user of MediaBrowser in Windows Media Center.
The application scans my movie folder every time to refresh the metadata and with 8 disks this causes an annoying delay as well as spins up my disks unnecessarily.

I installed the script and it seemed to work fine, but the disks showing unformatted really made me nervous.
It seems like I could accidentally do a lot of damage to my server.

Would upgrading to 4.5b7 change this in any way ?
Is 4.5b7 good enough to trust with my data ?

Any help you could be provide would be greatly appreciated.

Thanks,
Kent
The current version of cache_dirs explicitly keeps all your disks "busy" so no disk should show as "unformatted"   Are you using the current version of cache_dirs?

The 4.5b7 version of unraid will no longer show "Unformatted" for any disk that it is able to un-mount successfully but unable to un-mount a busy disk.  For that reason, it is safer.   You will need to use the current version of cache_dirs with it, otherwise you will not be permitted to stop the array until you stop the cache_dirs process (with cache_dirs -q)

The most current version of cache_dirs will work with both versions of unRAID.  Both versions of unRAID are safe for your data.  The "bugs" are in active directory support and in adding more than 18 data drives... (You have neither in your version, so you are not affected by those issues when you upgrade)

To see the version of cache_dirs you are running, type
cache_dirs -V

Current version is 1.6.4 as of today.
Logged

KentBrockman
Newbie
*
Posts: 32


View Profile
« Reply #12 on: October 29, 2009, 08:26:12 PM »

Joe,

I upgraded to 4.5B7 and I have got the script running now.  Thanks for your help.
I'm not so sure it's working though.  When I browse to the folders my disks still spin up.

This is the command I used to start it:
/boot/cache_dirs -w -i "Movies" -i "KidsMovies" -i "TV"

Movies, KidsMovies and TV are the user shares I want cached.

Am I missing anything obvious here ?

Thanks again for your help,

Kent
Logged
dvd.collector
Jr. Member
**
Posts: 74


View Profile
« Reply #13 on: October 30, 2009, 07:20:02 AM »

Wont mediabrowser be accessing the mymovies.xml files in each folder to verify it is current?  If so, just caching the folder names isn't going to help with this.
Logged
purko
Hero Member
*****
Posts: 1407


View Profile WWW
« Reply #14 on: October 30, 2009, 08:14:33 AM »

Wont mediabrowser be accessing the mymovies.xml files in each folder to verify it is current?

That is corret. My media frontend (SageTV) also reads the cover images of all my videos.
So the caching script that I am using copies all .jpg files it finds to /dev/null

Purko
Logged

A complex system that does not work is invariably found to have evolved from a simpler system that worked just fine.
Pages: [1] 2 3 ... 7
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.10 | SMF © 2006-2009, Simple Machines LLC Valid XHTML 1.0! Valid CSS!