Native Cache_Dirs Support


jonp

Recommended Posts

  • 2 weeks later...

I think this should be considered for prioritization for 6.0's release.  In all my testing with 6.0 without this feature, it's very annoying to not have it especially in larger arrays where data is spread across a variety of drives.  Not sure how much effort it would take to make this happen but I would like us to consider it...

  • Like 1
Link to comment

Agree this would be a nice feature to have "baked in" to the core.

 

One suggestion ...

 

If this is a native feature, it'd be nice if it was automatically toggled off during parity checks.  [Or, alternatively, if it had a fixed memory buffer which, once filled, wasn't impacted by other system operations].

 

The filling of the cache has a notable impact on other system operations.    Simple way to show that:

 

=>  With cache_dirs enabled; boot your system and immediately start a parity check.  Refresh the screen once/minute and watch the speed of the check for a while.    Then stop the check.

 

=>  Disable cache_dirs from your go script; then reboot the system and, again, immediately start a parity check.  Refresh the screen once/minute and watch the speed of the check now.

 

The difference is significant.  Once the cache is filled (5-30 minutes, depending on the number of files you have), the parity check runs at normal speeds.    But if you're still using the system, this can change and the slowdown will repeat itself numerous times during the check.

 

Alternatively, a simple checkbox on the Web GUI that enabled/disabled cache_dirs would be nice ... and would let anyone who wanted to disable it during other disk-intensive operations (parity checks; disk rebuilds; etc.).

 

Link to comment

If this is a native feature, it'd be nice if it was automatically toggled off during parity checks.  [Or, alternatively, if it had a fixed memory buffer which, once filled, wasn't impacted by other system operations].

 

During parity sync all drives are spun up (at least in beginning) so don't need cache-dirs to run.  Probably I would tie it to spindown: just prior to spinning a drive down, if drive is marked for "cache-dir" it add it to an 'active cache-dir'ing' list and fire off a scan, and then spin drive down after scan completes.

Link to comment

I dont get it. All the testing in the past showed that for cache_dirs to work it needs to continually perform the recursive ls regardless of drive spin state. Tieing it to spin down, due to the kernels natural tendency to drop caches, will just result in cache_dirs causing the drives to spin up again.

 

This is doubly true during very busy operations which typically are causing more caches to be dropped. this is not to say you should not not stop cache_dirs during certain operations but you cant link this action with spin state.

 

cache_dirs is a very very clever kludge fix for the inherent problem that the kernel consider inode entrys as being cheap and disposable. It should be possible to tune the kernel to not drop these entry's but no amount of fettling seems to do it and can result in unpredictable behavior (cache pressure etc).

 

The perfect solution would be a semi permanent inode cache where the directory listing, which is actually very small amounts of data, could be stored in both ram and disk making it  non volatile rather than cache_dirs tricking the kernel into keeping it on ram.

 

Update:

 

Whilst this is getting some attention, assuming we cant find some magic kernel fix, the current cache_dirs implementation lacks in a couple of areas:

 

1. Visibility of RAM usage and inode count vs dir . 99% of your cache is from this one folder 3 deep which you might not care about so much

2. Setting folder depth limits. Yes i want to cache one folder deep in them all so that an errant click doesnt cause disk spin up but i dont want to go beyond 2 deep on folder Y. I believe this ties in with the use of "find" in cache_dirs it just need better control.

3. Webui config (obviously) and control.

 

Link to comment

Actually, the idea of a dedicated, permanent memory area for the directory cache would be even better ... although clearly this could only be done on systems with sufficient RAM (which most new systems will have).

 

I'd be quite happy with a simple Cache_Dirs on/off toggle on the Web GUI.  The only operations I've personally found that it notably impacts are parity checks and drive rebuilds.    I suppose some plugins could also be negatively impacted, but I don't think it'd be near as bad as those two functions.

 

 

Link to comment

I dont get it. All the testing in the past showed that for cache_dirs to work it needs to continually perform the recursive ls regardless of drive spin state. Tieing it to spin down, due to the kernels natural tendency to drop caches, will just result in cache_dirs causing the drives to spin up again.

If drive spinning => no scanning that drive

If drive about to be spun down by emhttp => first scan (might take a while), then spin down

If drive not spinning => scan periodically

If drive spun up => quit scanning this drive

 

The perfect solution would be a semi permanent inode cache where the directory listing, which is actually very small amounts of data, could be stored in both ram and disk making it  non volatile rather than cache_dirs tricking the kernel into keeping it on ram.

Unfortunately this is not possible without LOTS of kernel-level programming, and wouldn't stand a chance of getting merged into upstream.

 

Also it's no good to just store directory listings.. need all the stuff in the inode too: permissions, type, etc.

Link to comment

I am too lazy to quote all this properly...

 

If drive spinning => no scanning that drive ... i know why this seems logical and perhaps with SSD it will be fine but with traditional spinners even when a drive is spun up cache_dirs improves the virtual responsiveness for browsing and searching drives (significantly).

 

If drive about to be spun down by emhttp => first scan (might take a while), then spin down ... sensible

 

If drive not spinning => scan periodically ... aka cache_dirs

 

If drive spun up => quit scanning this drive ... aka drop the cache and rebuild it on demand. This is the crux of this change in approach and the bit that doesnt make sense to me.

 

So the drive is spun down and and you are maintaining a cache for it. Then it spins up and you allow the cache to expire. Just before the drive spins down you recreate the cache. That seems sensible but in reality all you gain is a reduction in ram usage for the spin up time. Before the spin up and just before the spin down you use that ram up again and it costs to a bunch of disk IO to get it back. In fact net disk IO is increased using this approach all to save a bit of RAM for a short period of time.

 

Is this worth it?

 

 

Unfortunately this is not possible without LOTS of kernel-level programming....

 

I assume as much. But perhaps we can do a poor mans version of this with no kernel hacking. If we had a sacrificial SSD drive for page file and tuned the kernel tunables realting to inodes we may be able to achieve a semi permanent cache.

Link to comment

Most pressure on RAM is going to be when drive is spun up because we're doing transfers with it (or else it wouldn't be spun up).  Why not let linux go ahead and claim those inode pages if they age off the LRU list?

 

Is it worth it?  If memory pressure gets high enough those inode pages will get ejected anyway and now you have even more pressure because cache-dirs will be trying to bring those inodes back into memory.

 

The argument against it would be:

 

- having the inodes cached makes for a snapier experience - huh well if you say so, but seems like there's "snappy" and "snappy enough" - anyway this would be configurable so if you really wanted to maintain caching while spun up then sure

 

- might have a further delay before drive gets spun down - well spin down inactivity is measured in hours, a few more seconds at most to scan the dirs seems meaningless

 

Last time I looked (back in 2.6 kernel) those 'tunable's relating to inode aging didn't do anything.

Link to comment

 

- having the inodes cached makes for a snapier experience ...

 

...a few more seconds at most to scan the dirs seems meaningless

 

 

Here is a rather unscientific example of a folder taken out of cache dirs but spun up to make a point...

 

Edit: I got bored waiting. I took a folder i know has a ridculous amount of files out of cache dirs and then done a ls -R on it. Normally this is a few seconds but out of cache dirs its been going now for 5+ minutes. This is neither just a bit snappoer or a few meanigles sseconds its big slow kludge numbers.

 

Update: it completed

 

Out of cache dirs

 

time ls -R /mnt/user/comics/

real    11m23.293s

user    0m1.200s

sys    0m2.020s

 

In cache dirs

 

real    0m1.920s

user    0m0.340s

sys    0m0.270s

 

360 times faster just a bit of a difference

 

 

 

 

 

 

Link to comment

parity check AND the mover. Joe's version also monitors the mover.

 

At one time I tried increasing the dentry queue size, it helped, but since I had so many files, it would cause the system to crash with an OOM.

 

Last time I looked (back in 2.6 kernel) those 'tunable's relating to inode aging didn't do anything.

 

When I looked I saw they did, just not very obviously.

 

In my early tests I set it so they were last to be ejected.

It helped with directory sweeps, but again, I had so many files, I had all kinds of OOM crashes.

 

I think with 64bit this will be less of a problem.

 

FWIW, here is my test on one of my mp3 disks.

I should re-iterate only one of them. I have many of them.

Plus I have tons of source code files I've collected over the years.

You can quickly see how I had millions and millions of files.

 

root@unRAID:/mnt/disk3# time find /mnt/disk3 -type f -print > /mnt/disk3/filelist.txt

real    32m46.917s
user    0m4.030s
sys     0m33.110s

2nd test immediately after.
root@unRAID:/mnt/disk3# time find /mnt/disk3 -type f -print > /mnt/disk3/filelist.txt

real    3m15.613s
user    0m0.890s
sys     0m5.520s

root@unRAID:/mnt/disk3# wc -l /mnt/disk3/filelist.txt
307013 /mnt/disk3/filelist.txt

 

I'm hoping the move to 64bit will be better for me.

Link to comment

parity check AND the mover. Joe's version also monitors the mover.

 

That's surely the best approach -- if Cache_Dirs monitors parity checks, the Mover, and drive rebuilds, then that's most of the activities that it could interfere with.

 

Note that Tom's thoughts r.e. automatically throttling parity checks based on other system activity could also impact this.    With Cache_Dirs "built in" as a core feature, any throttling can be coordinated between the various activities.    The reality is it's in the "no big deal" category for most things -- i.e. if a parity check takes an extra 20-30 minutes due to Cache_Dir activity, I don't really care.  But if I'm watching a video, and it starts to stutter due to other system activity (e.g. an automated parity check kicks off), THAT is much more frustrating.    [Although I no longer use scheduled parity checks -- I just start one myself at the beginning of each month, so there's never any interference with other activity.]

 

Note that Cache_Dir activity can also cause a bit of stuttering, if you happen to be streaming a movie from a particular disk when it starts buffering the directory info from that disk (easy to force this to show it;  but not a very likely scenario).

 

Not sure just how automated this all needs to be -- I think a simple On/Off for Cache_Dirs would be fine; but I DO like the idea of automatic throttling of the parity checks and rebuilds based on other system activity.  [but that's off-topic for this thread]

 

 

 

Link to comment

I did not know that however surely a swap file would reduce the risk of the entrys needing to be dropped resulting in the net same result albeit from a less direct/elegant angle?

 

If this was an efficient way to do what you are trying to do, the kernel developers would have done it.

 

Is it faster to swap pages in and walk through them, or just go out to disk and re-read the information?

 

 

What could possibly be done is have the usershare/fuse shfs use some kind of mmap'ed file that is on a cache disk.

This cached/mmap file could contain all the stat information for all visited files.

 

 

The downside is that you would be duplicating what the kernel does.

 

The upside is that you can keep the information longer outside of real ram requirements and connect some kind of inotify so that when files are opened/closed/added/removed the mmap cache for that directory is updated.

 

If the device being reviewed is spun down, use the data in the stat cache rather then what is actually on the disk.

 

Allot more work for the fuse layer.

 

The data can be cached in a mmap file or a .gdbm file. I'm testing how long it takes to store all the stat blocks in a gdbm file now.

Link to comment

My data must already be cached. It should have taken much much longer.

 

time /mnt/disk1/home/rcotrone/src.slacky/ftwcache/ftwstatcache /tmp/disk3.gdbm /mnt/disk3

 

 

files processed: 324600, stores: 298901, duplicates: 0, errors: 0

fetched 0 records, deleted 0 records, stored 298965 records

 

real    2m22.192s

user    0m2.830s

sys    0m8.660s

 

root@unRAID:~# ls -l /tmp/disk3.gdbm

-rw-rw-r-- 1 root root 50897023 2014-08-16 18:24 /tmp/disk3.gdbm

 

root@unRAID:~# ls -l --si /tmp/disk3.gdbm

-rw-rw-r-- 1 root root 51M 2014-08-16 18:24 /tmp/disk3.gdbm

 

 

The gdbm contains the key of the full path and the data of struct stat[];

Link to comment

I did the ftw across 3 drives of data, storing all stat structures in a .gdbm file.

 

fetched 0 records, deleted 0 records, stored 1151935 records

 

real    71m26.353s

user    0m17.040s

sys    2m4.010s

 

root@unRAID:~# ls -l --si /tmp/statcache.gdbm

-rw-rw-r-- 1 root root 179M 2014-08-16 19:38 /tmp/statcache.gdbm

 

 

The issue with this approach is scanning through all the keys to find a match can take time as the number of files increase.

 

 

Here's an example.

 

 

using a bash loadable library, I'm able to access the gdbm file at the bash level directly.

 

 

A single key look up is pretty fast.

 

root@unRAID:/mnt/disk1/home/rcotrone/src.slacky# enable -f ./bash/bash-4.1/examples/loadables/gdbm gdbm

root@unRAID:/mnt/disk1/home/rcotrone/src.slacky# gdbm

gdbm: usage: gdbm [-euikvr] [-KVW array] file [key | key value ...]

 

time gdbm /tmp/statcache.gdbm /mnt/disk3/Music/music.mp3/Jazz/Various\ Artists/The\ Art\ Of\ Electro\ Swing/01\ Tape\ Five\ -\ Madame\ Coquette\ \(Feat.\ Yuliet\ Topaz\).mp3

<binary data here>

 

real    0m0.003s

user    0m0.010s

sys    0m0.000s

 

 

Yet traversing all the keys of 1 million files takes time.

 

root@unRAID:/mnt/disk1/home/rcotrone/src.slacky# time gdbm -k /tmp/statcache.gdbm | grep 'Jazzy Lounge - The Electro Swing Session' | wc -l

56

 

real    0m11.225s

user    0m14.280s

sys    0m2.290s

 

root@unRAID:/mnt/disk1/home/rcotrone/src.slacky# time gdbm -k /tmp/statcache.gdbm | wc -l                     

1,151,935

 

real    0m7.236s

user    0m3.930s

sys    0m8.350s

 

 

This might be faster with an sqlite table or mmap'ed file.

 

Point is, there's quite a bit that would have to go into this to cache the stat data outside of the kernel.

So kernel patches may be better then doing this at an application level. I don't know for sure.

 

When the next uNRAID release is available with SQLite compiled into PHP we can build a table with filenames and some stat data for a browser based locate function. We'll also be able to store md5's in there too.

Link to comment

If this was an efficient way to do what you are trying to do, the kernel developers would have done it.

I think the issue is that what we want to do is inherently very inefficient and so specific a use case that the kernel developers would now do it. Note this is not the same as couldn't do it.

 

The fact that cache_dirs works at all shows it can be done to some extent. Just how do we scale it whilst remaining reliable.

 

Is it faster to swap pages in and walk through them, or just go out to disk and re-read the information?

 

Million dollar quesion although with SSD/ram drives we could possibly skew the results to our favour.

 

What could possibly be done is have the usershare/fuse shfs use some kind of mmap'ed file that is on a cache disk.

 

I did the ftw across 3 drives of data, storing all stat structures in a .gdbm file.

 

fetched 0 records, deleted 0 records, stored 1151935 records

 

real    71m26.353s

user    0m17.040s

sys    2m4.010s

 

root@unRAID:~# ls -l --si /tmp/statcache.gdbm

-rw-rw-r-- 1 root root 179M 2014-08-16 19:38 /tmp/statcache.gdbm

...

 

This is a clever approach obviously and what is especially interesting is the statcache file you create is relatively small considering there could be more efficient storage mechanism.

 

Even as is 7M records would be about 1GB of ram and I am pretty sure we could better that.

 

However what I dont want to end up happening is we pre-design this so much we end up not getting anything. Even cache_dirs added as is with a few tweaks to showh better feed back about memory usage, folder inode usage and potential cache_dirs causing spin up would be a big step.

 

Maybe split this into phase1 and phase 2 anbd we can play about experimeting for phase 2 stuff whilst the path to phase 1 would be quite clear and straight forward.

Link to comment
This is a clever approach obviously and what is especially interesting is the statcache file you create is relatively small considering there could be more efficient storage mechanism

 

This was pretty much an academic exercise in feasibility.

According to my numbers, when extracted the keys (full path filenames) are 121MB.

Calculating the stat struct size and number of records I get 165MB.

 

The statcache.gdbm file is 179MB. so somewhere the keys are being compressed into hashes.

Therefore it's pretty efficient.

 

While the 179MB seems relatively small, we have to consider that if something like this is on disk, when it is read, it is read into the buffer cache. Therefore this could feasibly take up twice the amount of ram.

 

Would I give up 512MB to cache the stat information on my array, I sure would.

 

In comparison a mmap file access the filesystem as if it were real ream. I.E. The array is stored as a file.

This means traversing million entries in an array requires reading the file from somewhere. SSD, tmpfs/rootfs, etc, etc.

Which brings some kind of time delay. 

 

Again, since it's a file, it will need to utilize disk space and ram in buffer cache.

 

So in comparison, it may be better to review the kernel code, expand the dentry or other hash tables and see how to preserve these entries in ram better.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.