cache_dirs is a script to attempt to keep directory entries in memory to prevent disks from spinning up just to get a directory listing.
The Linux kernel keeps the most recently accessed disk buffers and directories in memory, and flushes out the least recently accessed entries. If we can 'trick' the kernel into keeping our directory entries in that memory cache, then directory scans will find what it is looking for is already in memory, and not need to access the physical disk to get it. As a result, since the physical disk is not accessed, linux will (after the defined time-out delay) let the physical drives spin down, and save on power costs, plus remove the drives as heat sources. This is especially useful when media files are spread across multiple drives, and a media player begins to scan for a particular media file to play. You want the scan to look at all of the relevant directories, but only spin up the one drive containing the desired media file.
Since the cache management decision process tries to keep the most recently accessed disk buffers and directory entries, we need to 'trick' it by constantly accessing the directories of the folders we want to keep in the cache, so that they will always appear to be the most recently accessed. I've developed, with the help of lots of suggestions and feedback, an easily customizable script called cache_dirs to do this.
it is described in the wiki
here A long series of posts describing its evolution over time
is here in this threadIt has many possible tunable options, but most people can simply invoke it as
cache_dirs -wThe "-w" option will cause it to wait if unRAID is not yet started.
If you have a folder or folders you wish to exclude, there is a
-e option. This option can be used multiple times. To exclude a "data" and "old-stuff" directory, you would use
cache_dirs -w -e "data" -e "old-stuff"
Always use Quote marks around the folders you wish to exclude, this is especially important if the folder name contains a space or other special character that might be interpreted by the linux shell.
By default, all top level folders on the disks and everything under them in sub-folders are scanned. If you only want a subset of the top -level folders scanned, you can supply an "include" list using the "-i include_dir" option. Again, it may be repeated on the command line multiple times. If using the "include" feature, only those directories included are scanned. There is no need to use an exclude as well, unless you use a wild-card for the include directory and the "include" wild-card matches more than you want cached. (The include and exclude options work on top-level folders only. They may not be used to include or exclude specific sub-folders. You can use a different option (-a) if you wish to exclude a sub-folder, as shown in
this post )
For example, let's say you have folders like this:
Movies-Comedy-Bad
Movies-Comedy-Good
Movies-Chick-Flicks-Good
Movies-Chick-Flicks-Bad
Movies-Adventure-Good
Movies-Adventure-Bad
Movies-Drama-Good
Movies-Drama-Bad
Movies-Kids-Good
Movies-Kids-Bad
Movies-Junk-Good
Movies-Junk-Bad
Data1
Data2
Data3...
You could use an include rule like this
-i "Movies*"and an exclude rule like this in combination with it
-e "*Bad"You would cache only those directories that start with "Movies" and do not have "Bad" at the end of their name.
If you added one more exclude like this:
-e "*Junk*"You would not scan either of the folders with Junk in their name. Using a combination of include and exclude directories make it pretty flexible if you have the need. For most people, one or two exclude folders might be all that is needed, if at all. If you have enough RAM, just let it scan and cache everything. My "data" folder holds a directory with a backup of an old windows system and has at least several hundred thousand files and folders under it. I always exclude it, as it is never needed by my media players in their listing of movies.
If you add
/boot/cache_dirs -w to your "go" script, it will run each time you re-start your server.
To stop cache_dirs from running, type
cache_dirs -qTo see all the options, type
cache_dirs -hTo run it in the foreground, so you can see what it is doing, use the
-F option. As it loops and scans it will print statistics on how long each scan is taking. It will adjust the scan rate based on the activity on the server. You can set the min and max delay times of the scan rate using the
-m min-time and
-M max-time options.
Usage: cache_dirs [-m min_seconds] [-M max_seconds] [-F] [-d maxdepth] [-c command] [-a args] [-e exclude_dir] [-i include_dir] [-w]
cache_dirs -V = print program version
cache_dirs -q
cache_dirs -l on = turn on logging to /var/log/cache_dirs.log
cache_dirs -l off = turn off logging to /var/log/cache_dirs.log
-w = wait for array to come online before start of cache scan of directories
-m NN = minimum seconds to wait between directory scans (default=1)
-M NN = maximum seconds to wait between directory scans (default=10)
-F = do NOT run in background, run in Foreground and print statistics as it loops and scans
-v = when used with -F, verbose statistics are printed as directories are scanned
-s = shorter-log - print count of directories scanned to syslog instead of their names
-d NN = use "find -maxdepth NN" instead of "find -maxdepth 999"
-c command = use command instead of "find"
(command should be quoted if it has embedded spaces)
-a args = append args to command
-u = also scan /mnt/user (scan user shares)
-e exclude_dir (may be repeated as many times as desired)
-i include_dir (may be repeated as many times as desired)
-B = do not force disks busy (to prevent unmounted disks showing as unformatted)
-S = do not suspend scan during 'mover' process
-z = concise log (log run criteria on one line)
-q = terminate any background instance of cache_dirs
cache_dirs will force all the data disks to be "busy" to prevent them from being un-mounted. This will prevent un-mounted disks appearing as un-formatted in the unRAID management console. If you are using any release prior to 4.5beta7, this will prevent you from "Stopping" the array the first time you press the "Stop" button. Simply wait a few seconds and then press "Stop" a second time within 2 minutes of the first attempt to stop the array. If you have no other processes keeping disks busy, it will then stop.
On release 4.5b7, it is no longer necessary to press stop a second time, and in fact you cannot, as the management console will show "Unmounting" until all processes holding disks busy are terminated and only the "Refresh" button is active.
If you are on 4.5b7 or greater, if you wish, you can use the
-B option to
not force the disks to be busy.
The 1.6.4 version of cache_dirs is attached. It is now coded to sleep while the "mover" process moves files from your cache drive.
# Version 1.6.4 - Modified to suspend scan during time "mover" script is running to prevent
# DuplicateFile messages from occurring as file is being copied.
# - Added -S option to NOT suspend scan during mover process.
# - Added logic to re-invoke cache_dirs if array is stopped and then re-started
# by submitting command string to "at" to re-invoke in a minute.
# - Added entry to "usage()" function for -B
# Version 1.6.5 - Fixed what I broke in looking for "mover" pid to suspend during the "mover"
# to eliminate warnings in syslog about duplicate files detected while files were
# being copied.
The full revision history is as follows:
####################################################################################
# cache_dirs
# A utility to attempt to keep directory entries in the linux
# buffer cache to allow disks to spin down and no need to spin-up
# simply to get a directory listing on an unRAID server.
#
# Version 1.0 Initial proof of concept using "ls -R"
# Version 1.1 Working version, using "ls -R" or "find -maxdepth"
# Version 1.2 Able to be used with or without presence of user-shares.
# Removed "ls -R" as it was too easy to run out of ram. (ask me how I know)
# Added -i include_dir to explicitly state cached directories
# Added -v option, verbose statistics when run in foreground
# Added -q option, to easily terminate a process run in the background
# Added logging of command line parameters to syslog
# Version 1.3 Added -w option, to wait till array comes online before starting scan
# of /mnt/disk* share folders.
# Changed min-seconds delay between scans to 1 instead of 0.
# Moved test of include/exclude directories to after array is on-line
# Added logging of mis-spelled/missing include/exclude dirs to syslog
# Added ability to have shell wildcard expansion in include/exclude names
# Version 1.4 Fix bug with argument order passed to find when using -d option
# Fixed command submitted to "at" to use full path. Should not need to
# set PATH variable in "go" script.
# Added ability to also cache scan /mnt/user with -u option
# Version 1.4.1 Fixed version comment so it is actually a comment.
# Version 1.5 Added -V to print version number.
# Added explicit cache of root directories on disks and cache drive
# Modified "average" scan time statistic to be weighted average with a window
# of recent samples.
# Added -a args option to allow entry of args to commands after dir/file name
# example: cache_dirs -a "-ls" -d 3
# This will execute "find disk/share -ls -maxdepth 3"
# Version 1.6 - Fixed bug... if -q was used, and cache_dirs not currently running,
# it started running in error. OOps... Added the missing "exit"
# - Changed vfs_cache_pressure setting to be 1 instead of 0 by default.
# - Added "-p cache_pressure" to allow experimentation with vfs_cache_pressure values
# (If not specified, default value of 1 will be used)
# - Made -noleaf the default behavior for the "find" command (use -a "" to disable).
# - Added logic to force all disks "busy" by starting a process with each as their
# current working directory. This will prevent a user from seeing a frightening
# Unformatted description if they attempt to stop the array. A second "Stop" will
# succeed (the scan is paused for 2 minutes, so it may be stopped cleanly)
# - Added new -B option to revert to the old behaviour and not force disks busy if by
# chance this new feature causes problems for some users.
# - Allow min seconds to be equal to max seconds in loop delay range.
# - Added run-time-logging, log name = /var/log/cache_dirs.log
# Version 1.6.1 - Fixed bug. Added missing /mnt/cache disk to scanned directories
# Version 1.6.2 - Added trap to clean up processes after kill signal when run in background
# Version 1.6.3 - Modified to deal with new un-mounting message in syslog in 4.5b7 to
# allow array shutdown to occur cleanly.
# Version 1.6.4 - Modified to suspend scan during time "mover" script is running to prevent
# DuplicateFile messages from occurring as file is being copied.
# - Added -S option to NOT suspend scan during mover process.
# - Added logic to re-invoke cache_dirs if array is stopped and then re-started
# by submitting command string to "at" to re-invoke in a minute.
# - Added entry to "usage()" function for -B
# Version 1.6.5 - Fixed what I broke in looking for "mover" pid to suspend during the "mover"
# to eliminate warnings in syslog about duplicate files detected while files were
# being copied.
Joe L.