RFC: 6.2-rc2 experimental shfs hard-link support & appdata share discussion


Recommended Posts

Certain docker applications can have problems because the unRAID user share file system (shfs) does not support hard-links.

 

Ok, in studying this issue we have concluded there are two ways to solve this.  The first way is to implement hard-link support in shfs and the 'mover'.  Implementing hard-link support in shfs is not difficult and has been integrated starting with 6.2.0-rc2 release.  To enable hard-link support however, you must prepare a file on your USB flash boot device:

 

config/extra.cfg

 

with these two lines in there:

 

shfsExtra=-o use_ino
shfsExtra0=-o use_ino

 

Next, Stop the array and then Start the array (umount shfs then mount shfs).  Now hard-links can be created.

 

The problem however, is with the 'mover' script.  The 'mover' does not take into account multiple links to a file.  For example, if you have a share where "Use cache disk" is "Yes", and you have a file on the cache with two hard-links, that file will be copied twice and the hard-links will not be preserved (you now have two separate files).  To fix the 'mover' would involve quite a bit of work (probably have to code our own "rsync-lite" utility).

 

Fixing the 'mover' is certainly "do-able" but we believe there is a better solution: get rid of hard-link support in shfs and instead create a loopback file image for each container to store its appdata.

 

For example, if you create the 'Plex' container, the docker manager would do something like this:

 

- Create an appdata share path as usual, eg:

/mnt/user/appdata/plex

- Create a loopback image file there:

/mnt/user/appdata/plex/config.img

- When creating the container, mount the dereferenced image:

mount -o loop /mnt/user/appdata/plex/config.img /var/docker/appdata/plex/config

- Finally, pass the mount point to the container:

docker ... -v /var/docker/appdata/plex/config:/config

...

 

There are several advantages to this approach:

- mover will work (because it won't move loopback-mounted files)

- no more permissions issues in the appdata file tree

- the amount of storage space consumed by a containers appdata is fixed (but can be easily extended)

- easy to  make backups and/or snapshots of the appdata

- the container can do anything it wants with permissions, ownership, hard links, etc. within its appdata file system

- the container appdata file system would default to 'btrfs' but a container author could override this

- the container appdata initial file system image size would be specified by container author

- it's the "unRAID way" ;)

 

Implementing this along with a script to convert existing installations is beyond the scope of 6.2, but we think this work needs to be accomplished.

 

Back to the hard-link support: Well it's in there for you to tinker with but pretty sure we're going to remove it.

Link to comment
  • Replies 58
  • Created
  • Last Reply

Top Posters In This Topic

"- the amount of storage space consumed by a containers appdata is fixed (but can be easily extended)"

 

it better be, cos about the only thing i know about plex is massive amounts of metadata.

Mine is 110GB and growing

 

My question about the image based approach is that, what happens to containers that share their local files and folders? I don't think they would have access to each other's image so they would be very isolated.

 

Another issue I can think of off the top of my head is that often, (very often with some) the user has to manually modify the local files. Currently, it is not easy for the common user to modify files that are hosted within the docker.img file,  but it's OK because the user rarely has to modify those (changes are not retained during update or reinstall anyway). But with the config folder it is different.

Link to comment

"- the amount of storage space consumed by a containers appdata is fixed (but can be easily extended)"

 

it better be, cos about the only thing i know about plex is massive amounts of metadata.

 

The image file size can be extended with 'fallocate' or 'truncate' and then the contained file system expanded.

 

The disadvantage is that the metadata would exist in a single image file and thus could exist on only one device, and limited to the size of that device; whereas if it were a traditional directory tree in a share, could be spread among several devices.

 

The way we have in mind to set this up in the UI is to make it optional if a loopback file should be used for any given mapping to a container (not just /config).  User could choose to simply map a directory on user share or disk share as is done now.

Link to comment

"- the amount of storage space consumed by a containers appdata is fixed (but can be easily extended)"

 

it better be, cos about the only thing i know about plex is massive amounts of metadata.

Mine is 110GB and growing

Storage is cheap  ;)

 

My question about the image based approach is that, what happens to containers that share their local files and folders? I don't think they would have access to each other's image so they would be very isolated.

Should work to specify the same /var/docker/... mount point for any container.  Some trickiness in deciding when to mount/unmount but can be solved.

 

Another issue I can think of off the top of my head is that often, (very often with some) the user has to manually modify the local files. Currently, it is not easy for the common user to modify files that are hosted within the docker.img file,  but it's OK because the user rarely has to modify those (changes are not retained during update or reinstall anyway). But with the config folder it is different.

Again, if the image file is mounted you will be able to browse /var/docker/... like any other file system.

Link to comment

Experimental user share file system 'hard link' support.  But please do not discuss this here, instead we have created a new thread in the Programming board.

 

Sorry, I don't have write access to the other thread.  If you want to move this there and reply that is fine.

 

The loopback image looks interesting.  So I'm imagining that from the GUI you would specify the img file (/mnt/user/appdata/plex/config.img) and behind the scenes unRAID would deal with mounting the image when the container is started, and passing the mounted dir (/var/docker/appdata/Plex) to the Plex docker.  That should work fine for Plex, but PlexPy also needs access to files from Plex's appdata.  Would we pass the Plex log file to PlexPy using the mounted dir like this?

  /var/docker/appdata/Plex/Library/Application Support/Plex Media Server/Logs/

 

For this to work reliably, we'd need to make sure that Plex always starts before PlexPy and that PlexPy always shuts down before Plex.  Can we add this sort of parent-child relationship to dockerman?

 

 

Another issue, can /var/docker/appdata/ be shared on the network?  Because people will want to be able to directly access their appdata over the network.

Link to comment

Experimental user share file system 'hard link' support.  But please do not discuss this here, instead we have created a new thread in the Programming board.

 

That should work fine for Plex, but PlexPy also needs access to files from Plex's appdata.  Would we pass the Plex log file to PlexPy using the mounted dir like this?

  /var/docker/appdata/Plex/Library/Application Support/Plex Media Server/Logs/

 

 

That's exactly what links are for. Proper container design should expose the data volume which can then be accessed in plexpy by linking to the plex container.

 

Link to comment

Experimental user share file system 'hard link' support.  But please do not discuss this here, instead we have created a new thread in the Programming board.

 

That should work fine for Plex, but PlexPy also needs access to files from Plex's appdata.  Would we pass the Plex log file to PlexPy using the mounted dir like this?

  /var/docker/appdata/Plex/Library/Application Support/Plex Media Server/Logs/

 

 

That's exactly what links are for. Proper container design should expose the data volume which can then be accessed in plexpy by linking to the plex container.

 

Wonder if you could elaborate a bit on this?

 

Link to comment

Experimental user share file system 'hard link' support.  But please do not discuss this here, instead we have created a new thread in the Programming board.

 

That should work fine for Plex, but PlexPy also needs access to files from Plex's appdata.  Would we pass the Plex log file to PlexPy using the mounted dir like this?

  /var/docker/appdata/Plex/Library/Application Support/Plex Media Server/Logs/

 

 

That's exactly what links are for. Proper container design should expose the data volume which can then be accessed in plexpy by linking to the plex container.

 

yes that would work, but there are enough issues with less technical members using straightforward mounted volumes, they are going to have a nightmare with linked containers.

Link to comment

Experimental user share file system 'hard link' support.  But please do not discuss this here, instead we have created a new thread in the Programming board.

 

That should work fine for Plex, but PlexPy also needs access to files from Plex's appdata.  Would we pass the Plex log file to PlexPy using the mounted dir like this?

  /var/docker/appdata/Plex/Library/Application Support/Plex Media Server/Logs/

 

 

That's exactly what links are for. Proper container design should expose the data volume which can then be accessed in plexpy by linking to the plex container.

 

Wonder if you could elaborate a bit on this?

 

Whoops, sorry, I should have said that's exactly what --volumes-from is for. https://docs.docker.com/engine/tutorials/dockervolumes/#/creating-and-mounting-a-data-volume-container

 

Say docker A has a Dockerfile with VOLUME /data.

 

Container B is then run with docker run -d --name "container_b" --volumes-from "container_a" <image>

 

Container B will have a directory /data with the same data as in /data in container a.

Link to comment

Experimental user share file system 'hard link' support.  But please do not discuss this here, instead we have created a new thread in the Programming board.

 

That should work fine for Plex, but PlexPy also needs access to files from Plex's appdata.  Would we pass the Plex log file to PlexPy using the mounted dir like this?

  /var/docker/appdata/Plex/Library/Application Support/Plex Media Server/Logs/

 

 

That's exactly what links are for. Proper container design should expose the data volume which can then be accessed in plexpy by linking to the plex container.

 

yes that would work, but there are enough issues with less technical members using straightforward mounted volumes, they are going to have a nightmare with linked containers.

 

Yeah, that's probably true. That's why I've been thinking for a while that unraid's template/docker interface system may be much better suited if it were GUI for docker-compose instead of for docker run. That would make it VERY easy to set up newbies with complicated container structure consisting of links, builds, volumes, etc.

 

For example, this is one of the more complicated docker-compose setups, but doing something similar for plex and extending that to include plexpy, and the other plex add-ons could be easily done using docker-compose.

Link to comment

Experimental user share file system 'hard link' support.  But please do not discuss this here, instead we have created a new thread in the Programming board.

 

That should work fine for Plex, but PlexPy also needs access to files from Plex's appdata.  Would we pass the Plex log file to PlexPy using the mounted dir like this?

  /var/docker/appdata/Plex/Library/Application Support/Plex Media Server/Logs/

 

 

That's exactly what links are for. Proper container design should expose the data volume which can then be accessed in plexpy by linking to the plex container.

 

Wonder if you could elaborate a bit on this?

 

Whoops, sorry, I should have said that's exactly what --volumes-from is for. https://docs.docker.com/engine/tutorials/dockervolumes/#/creating-and-mounting-a-data-volume-container

 

Say docker A has a Dockerfile with VOLUME /data.

 

Container B is then run with docker run -d --name "container_b" --volumes-from "container_a" <image>

 

Container B will have a directory /data with the same data as in /data in container a.

 

So it looks like --volumes-from would give PlexPy access to all the volumes that Plex uses?  That is significant overkill when it just needs access to a very small portion of Plex's config volume and not all the movies/photos/etc.  I'd prefer to pass in just the log dir as it greatly reduces the damage that PlexPy could do if it were hacked or whatever.

Link to comment

The problem however, is with the 'mover' script.  The 'mover' does not take into account multiple links to a file.  For example, if you have a share where "Use cache disk" is "Yes", and you have a file on the cache with two hard-links, that file will be copied twice and the hard-links will not be preserved (you now have two separate files).  To fix the 'mover' would involve quite a bit of work (probably have to code our own "rsync-lite" utility).

 

I am a little confused by this. I don't use the mover script and cache drive, well, at least not for cache anyway. The cache drive isn't in a user share so hasn't it always had the ability to have hard links? If so, this isn't a problem that was introduced by adding hard linking to shfs. I'm probably missing something so please let me know if I am.

 

Secondly, would fixing the mover script be that big of an undertaking? Again, I don't use it so it's been a while since I've looked at it but my solution (since I personally prefer to keep hard linking) would be to modify the mover script so that before it copies files over it searches and compares the inodes of the files. If it find multiple files with the same inode, it only copies the first and hard links the rest.

Link to comment

The problem however, is with the 'mover' script.  The 'mover' does not take into account multiple links to a file.  For example, if you have a share where "Use cache disk" is "Yes", and you have a file on the cache with two hard-links, that file will be copied twice and the hard-links will not be preserved (you now have two separate files).  To fix the 'mover' would involve quite a bit of work (probably have to code our own "rsync-lite" utility).

 

I am a little confused by this. I don't use the mover script and cache drive, well, at least not for cache anyway. The cache drive isn't in a user share so hasn't it always had the ability to have hard links? If so, this isn't a problem that was introduced by adding hard linking to shfs. I'm probably missing something so please let me know if I am.

 

Secondly, would fixing the mover script be that big of an undertaking? Again, I don't use it so it's been a while since I've looked at it but my solution (since I personally prefer to keep hard linking) would be to modify the mover script so that before it copies files over it searches and compares the inodes of the files. If it find multiple files with the same inode, it only copies the first and hard links the rest.

 

Right if you use "disk paths" to reference directories passed to docker then everything is ok.  What's a disk path?

/mnt/cache/...

or

/mnt/disk<N>/...

 

However we are moving to default paths that specify "user shares":

/mnt/user/appdata/...

 

If path passed to docker container is on user share then I/O passes through shfs.

 

The reason for this is flexibility: if a user starts out without a cache disk, appdata, etc. get created on the parity-protected array.  Later, to improve performance they may add a cache disk/pool, but still all the paths are for user shares, so no need to edit anything, but behind the scenes we can move their appdata and loopback image files (and vm vdisk files) to higher performance cache disk/pool.

Link to comment

The problem however, is with the 'mover' script.  The 'mover' does not take into account multiple links to a file.  For example, if you have a share where "Use cache disk" is "Yes", and you have a file on the cache with two hard-links, that file will be copied twice and the hard-links will not be preserved (you now have two separate files).  To fix the 'mover' would involve quite a bit of work (probably have to code our own "rsync-lite" utility).

 

I am a little confused by this. I don't use the mover script and cache drive, well, at least not for cache anyway. The cache drive isn't in a user share so hasn't it always had the ability to have hard links? If so, this isn't a problem that was introduced by adding hard linking to shfs. I'm probably missing something so please let me know if I am.

 

Secondly, would fixing the mover script be that big of an undertaking? Again, I don't use it so it's been a while since I've looked at it but my solution (since I personally prefer to keep hard linking) would be to modify the mover script so that before it copies files over it searches and compares the inodes of the files. If it find multiple files with the same inode, it only copies the first and hard links the rest.

 

Right if you use "disk paths" to reference directories passed to docker then everything is ok.  What's a disk path?

/mnt/cache/...

or

/mnt/disk<N>/...

 

However we are moving to default paths that specify "user shares":

/mnt/user/appdata/...

 

If path passed to docker container is on user share then I/O passes through shfs.

 

The reason for this is flexibility: if a user starts out without a cache disk, appdata, etc. get created on the parity-protected array.  Later, to improve performance they may add a cache disk/pool, but still all the paths are for user shares, so no need to edit anything, but behind the scenes we can move their appdata and loopback image files (and vm vdisk files) to higher performance cache disk/pool.

 

Sorry, I'm still not getting it. Are we talking about the same thing? I'm talking about the mover script, not Docker. My understanding is the cache drive is not on a user share and as such has never had a problem hard linking. Is that right? If so, I don't see how adding hard linking to shfs introduces a problem with the mover script. Even before hard linking was added to shfs, if you had hard linked files on the cache drive the mover script would have still made two copies. What am I missing?

Link to comment

I hope we don't have to use the loop back image.  I don't use mover I have it renamed in the go file so that isn't a problem for me.  I also don't use any dockers that do hardlinks so I don't have the problem in the first place.  But I do edit files in appdata for the docker I do use and want to keep on using MC to do said editing.  Having a loop back image sounds like I wouldn't be able to do that.

Link to comment

I'm not familiar with loopback mounted images. You listed their advantages but what exactly do they do and what are they used for? I understand there are many benefits with Docker containers but what about outside of that? There are other reasons to want hard links outside of Docker. For one, plugins, and for another just plain old file namagement. I don't imagine using the loopback image would address those. Would it be possible to keep both? Conflicts may be an issue but since hard linking is an advanced feature that requires manual file editing I think the risk is negligible, as long as you put in a big fat warning.

Link to comment

I hope we don't have to use the loop back image.  I don't use mover I have it renamed in the go file so that isn't a problem for me.  I also don't use any dockers that do hardlinks so I don't have the problem in the first place.  But I do edit files in appdata for the docker I do use and want to keep on using MC to do said editing.  Having a loop back image sounds like I wouldn't be able to do that.

The loopback image is mounted, so for you there will be no difference when it comes to editing appdata.

Both the docker.img and libvirt.img are loopback images. If you ssh into your server and run the below command, you will see where libvirt.img is mounted.

 

mount

 

The result is this (Filtered out everything but the libvirt.img).

 

/mnt/cache/system/libvirt/libvirt.img on /etc/libvirt type btrfs (rw)

 

If you want you can try to make a file in /etc/libvirt and do a reboot. The file should still be there.

Link to comment

I hope we don't have to use the loop back image.  I don't use mover I have it renamed in the go file so that isn't a problem for me.  I also don't use any dockers that do hardlinks so I don't have the problem in the first place.  But I do edit files in appdata for the docker I do use and want to keep on using MC to do said editing.  Having a loop back image sounds like I wouldn't be able to do that.

The loopback image is mounted, so for you there will be no difference when it comes to editing appdata.

Both the docker.img and libvirt.img are loopback images. If you ssh into your server and run the below command, you will see where libvirt.img is mounted.

 

mount

 

The result is this (Filtered out everything but the libvirt.img).

 

/mnt/cache/system/libvirt/libvirt.img on /etc/libvirt type btrfs (rw)

 

If you want you can try to make a file in /etc/libvirt and do a reboot. The file should still be there.

I understand that but my SageTV docker stores the install directory in appdata not docker.img and I edit the property files with MC from unRAID directly or across the network with notepad on a windows box.  If dockers are forced to store there appdata info INSIDE an image like docker.img then I won't be able to edit the properties files that way and as var as I know the only editor available in the SageTV docker is VI which is NOT my favorite editor.
Link to comment

I hope we don't have to use the loop back image.  I don't use mover I have it renamed in the go file so that isn't a problem for me.  I also don't use any dockers that do hardlinks so I don't have the problem in the first place.  But I do edit files in appdata for the docker I do use and want to keep on using MC to do said editing.  Having a loop back image sounds like I wouldn't be able to do that.

The loopback image is mounted, so for you there will be no difference when it comes to editing appdata.

Both the docker.img and libvirt.img are loopback images. If you ssh into your server and run the below command, you will see where libvirt.img is mounted.

 

mount

 

The result is this (Filtered out everything but the libvirt.img).

 

/mnt/cache/system/libvirt/libvirt.img on /etc/libvirt type btrfs (rw)

 

If you want you can try to make a file in /etc/libvirt and do a reboot. The file should still be there.

I understand that but my SageTV docker stores the install directory in appdata not docker.img and I edit the property files with MC from unRAID directly or across the network with notepad on a windows box.  If dockers are forced to store there appdata info INSIDE an image like docker.img then I won't be able to edit the properties files that way and as var as I know the only editor available in the SageTV docker is VI which is NOT my favorite editor.

 

The image is MOUNTED so you see the content at /mnt/cache/appdata/SageTV (or something along those lines) and can still edit files.

Link to comment

I hope we don't have to use the loop back image.  I don't use mover I have it renamed in the go file so that isn't a problem for me.  I also don't use any dockers that do hardlinks so I don't have the problem in the first place.  But I do edit files in appdata for the docker I do use and want to keep on using MC to do said editing.  Having a loop back image sounds like I wouldn't be able to do that.

The loopback image is mounted, so for you there will be no difference when it comes to editing appdata.

Both the docker.img and libvirt.img are loopback images. If you ssh into your server and run the below command, you will see where libvirt.img is mounted.

 

mount

 

The result is this (Filtered out everything but the libvirt.img).

 

/mnt/cache/system/libvirt/libvirt.img on /etc/libvirt type btrfs (rw)

 

If you want you can try to make a file in /etc/libvirt and do a reboot. The file should still be there.

No difference through command line, but most users edit the config files through samba.

 

Can those mounted images be exported like the user shares? (I know technically they can be with changes to dynamix gui or by manually editing the samba config, but any reason they shouldn't be?)

Link to comment

I hope we don't have to use the loop back image.  I don't use mover I have it renamed in the go file so that isn't a problem for me.  I also don't use any dockers that do hardlinks so I don't have the problem in the first place.  But I do edit files in appdata for the docker I do use and want to keep on using MC to do said editing.  Having a loop back image sounds like I wouldn't be able to do that.

The loopback image is mounted, so for you there will be no difference when it comes to editing appdata.

Both the docker.img and libvirt.img are loopback images. If you ssh into your server and run the below command, you will see where libvirt.img is mounted.

 

mount

 

The result is this (Filtered out everything but the libvirt.img).

 

/mnt/cache/system/libvirt/libvirt.img on /etc/libvirt type btrfs (rw)

 

If you want you can try to make a file in /etc/libvirt and do a reboot. The file should still be there.

I understand that but my SageTV docker stores the install directory in appdata not docker.img and I edit the property files with MC from unRAID directly or across the network with notepad on a windows box.  If dockers are forced to store there appdata info INSIDE an image like docker.img then I won't be able to edit the properties files that way and as var as I know the only editor available in the SageTV docker is VI which is NOT my favorite editor.

 

The image is MOUNTED so you see the content at /mnt/cache/appdata/SageTV (or something along those lines) and can still edit files.

Ok.
Link to comment

I hope we don't have to use the loop back image.  I don't use mover I have it renamed in the go file so that isn't a problem for me.  I also don't use any dockers that do hardlinks so I don't have the problem in the first place.  But I do edit files in appdata for the docker I do use and want to keep on using MC to do said editing.  Having a loop back image sounds like I wouldn't be able to do that.

The loopback image is mounted, so for you there will be no difference when it comes to editing appdata.

Both the docker.img and libvirt.img are loopback images. If you ssh into your server and run the below command, you will see where libvirt.img is mounted.

 

mount

 

The result is this (Filtered out everything but the libvirt.img).

 

/mnt/cache/system/libvirt/libvirt.img on /etc/libvirt type btrfs (rw)

 

If you want you can try to make a file in /etc/libvirt and do a reboot. The file should still be there.

No difference through command line, but most users edit the config files through samba.

 

Can those mounted images be exported like the user shares? (I know technically they can be with changes to dynamix gui or by manually editing the samba config, but any reason they shouldn't be?)

 

If the image is mounted inside the path of a user share it should be available through samba. But mounting it in a share might defeat the original purpose to allow hard links? I'm no expert on this, so it might of course be wrong.

In the OP the image is mounted at /var/docker, so we might not be allowed to specify where it gets mounted. Hopefully Tom will chime in.

Link to comment

Q1. When will the container specific images be mounted?

Q1a. Will they only be mounted when the Docker container is running?

Q1b. Will they be unmounted when the Docker container is stopped?

 

If the answer to Q1a and Q1b are both yes, then the system created quite a pickle. Imagine the file the user needs to edit can only be edited when the application (inside the Docker Container) is NOT running.

 

8)

 

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.