Jump to content

Stale NFS File Handles


Recommended Posts

I had been using unRAID 4.7 until recently with NFS for most of my client mounting.

 

I have had very little issues with stale handles.

 

But I needed to be able to use larger than 2TB drives, so that drove a need to update to a 5.0 xRC.

 

I have been using the "share" concept for mounts in two ways.

 

1. Using shares that spanned disks. This is by far the largest amount of data, TV and Movies shares.

2. Using shares that were all focused on one (1) single disk (2TB). A general data disk. This was for every computer in the house, several Linux clients and 1 Windows client (Windows mounts using SMB). On this disk are several standard Linux folders (being used as mount points). Documents, Downloads, Videos, etc.

 

Stated another way:

 

1. Spans

 

TV = disk2, disk3, disk5, disk6, disk6,disk7, disk12

Movies = disks1, disk8, disk9

 

2. Non-span (disk4)

 

Videos

Pictures

Music

Documents  > disk4

Downloads

backups

Archive

Frostwire

 

Under unRAID 4.7, this worked great.

 

Not so great under unRAID 5.0 11RC.

 

I immediately began to get "stale nfs file handle" errors all the time. I would have to umount/mount a dozen times a day, on every folder I was working in. This was un-workable. So, I read on the board that RC10 was better at this than RC11. So I rev'ed back to RC10. It was better, but not much.

 

So I decided to make a change on how I would mount the non-span shares on my client computers.

 

The fstab under the old way would read something like this:

 

10.0.1.10:/mnt/user/Pictures /home/bkasten/Pictures nfs defaults,soft,nolock,nfsvers=3 0 0

 

Under the new way, the fstab reads something like this:

 

10.0.1.10:/mnt/disk4/Pictures /home/bkasten/Pictures nfs defaults,soft,nolock,nfsvers=3 0 0

 

For the spanned shares I was going to switch to cifs if needed, but I wanted to get the non-span shares working first.

 

The effect was immediate. I stopped getting the stale handle messages at once (once they were unmounted/remounted). Strangely enough, even on the "shares" that spanned multiple disk, I stopped getting the errors, even through I had NOT made any changes to their mounts.

 

So far, I have been working this way for several days, and no "stale" incidents. I don' know if this has been posted anywhere else, but I thought I would share my experiences.

 

Bruce

Link to comment
  • 2 weeks later...

Please try this mount and let me know if stale file handles also go away:

 

10.0.1.10:/mnt/user/Pictures  /home/bkasten/Pictures  nfs  defaults,soft,nolock,nfsvers=3,lookupcache=none,noac  0  0

 

OK, took me a while to get back to this, work has been keeping me busy.

 

I tried the suggested fstab changes, and the short answer is it works.  :)

 

One of the differences I have found is if I use /mnt/user/mount.point I get an error using certain Linux commands. Namely, vobcopy and rsync.

 

nfsd: non-standard errno: -38

 

If I go back to the /mnt/disk#/mount.point, the errors are gone. To be fair, I did have these errors under unRAID 4.7, I just did not know how to get rid of them.

 

For now I will stick with the disk# method.

 

Thank you

 

Bruce

Link to comment

I hate to say it, but adding "lookupcache=none,noac" to my mounts doesn't appear to improve matters at all. :(

 

peter@desktop:~$ mount
/dev/sda7 on / type ext4 (rw,errors=remount-ro)
.
.
.
tower:/mnt/user/Movies on /net/tower/mnt/user/Movies type nfs (rw,nosuid,nodev,vers=3,hard,intr,nolock,udp,lookupcache=none,noac,sloppy,addr=10.2.0.100)
.
.
.
tower:/mnt/user/UMC on /net/tower/mnt/user/UMC type nfs (rw,nosuid,nodev,vers=3,hard,intr,nolock,udp,lookupcache=none,noac,sloppy,addr=10.2.0.100)
peter@desktop:~$ ls /net/tower/mnt/user/Movies
ls: cannot access /net/tower/mnt/user/Movies: Stale NFS file handle
peter@desktop:~$ 

 

This is on rc12.

Link to comment

I hate to say it, but adding "lookupcache=none,noac" to my mounts doesn't appear to improve matters at all. :(

What is your client system, ubuntu?

 

Here's something to try.  Save the attached file in the 'config' directory of your flash device and then Stop/Start the array (for the option contained there-in to take effect).  This should eliminate all stale file handle issues with these exceptions:

 

a) If you Stop/Start the array with an active NFS mount by some client(s), then those clients will start seeing stale file handles. In this case you must unmount and remount on the client side.  Similar if the server is restarted.

 

b) Certain config changes on the server will have the same effect as above because they unmount/remount the user share file system.  For example, any change to the Settings/Share settings page.

 

c) This is the killer: with this option, it's possible to run out of memory  :P  Explanation follows.

 

The problem with NFS (and also AFP) is that at it's core, it's an antiquated design.  NFS relies on the notion of "file handles" to identify files on the server (AFP calls these "CNID's" but same concept).  A file handle is basically just a number, like 1, 2, 3...  When NFS was designed these numbers referred to "inodes" in the mounted file system and the file system provides a nice fast way to convert an inode number to an actual "inode". An inode is a data structure the describes a file.  Hence if you have an inode number you can quickly find a file.

 

This falls apart if the file system can not quickly translate a number to a file.  In other words, NFS requires this functionality which imposes a serious design limitation on modern file systems.  By contrast, SMB is entirely path-based.

 

The unRaid user share file system is in essence a "stacked" file system implemented using FUSE.  What happens is that when the linux kernel needs to translate a NFS file handle to an inode, it passes the file handle to FUSE. What FUSE does is maintain a set of in-memory "nodes".  Each time a file in unRaid user share file system is referenced a new FUSE node is created.  These nodes serve as "inodes" and we can quickly translate the NFS file handle to a FUSE node.

 

The problem is that these FUSE nodes are in-memory-only data structures created on the fly as files are accessed.  The "node numbers" assigned to any particular node (which corresponds to particular files or directories) get assigned in the order those files and directories are accessed and thus can change from mount-to-mount (which NFS doesn't like).  In addition, FUSE includes a background daemon that runs and deletes nodes (to reclaim memory) that are no longer active.  This is the reason you get stale file handles.

 

Starting around -rc10 or so, I increased this daemon timeout to 6 minutes.  Most NFS clients will only cache attributes for 5 minutes, so the client would "expect" to get stale file handles from time-to-time and deal with it by traversing the file's directory structure again starting at the mount point.  For some reason it seems that your NFS client isn't doing this, or doing it differently than other NFS clients... hard to say wth is happening.

 

The option in the attached config file tells FUSE to never release nodes.  This will result in more and more memory being taken up by these nodes, which actually may not ever be a problem depending on how many files you have and how much memory you have.  Ultimately I think the solution to this issue will be to make use of a swap file.

 

BTW, netatalk solves this problem by maintaining a CNID-to-filepath database.  Each time an AFP request is received, netatalk grabs the CNID from the request packet and consults the database to translate this 32-bit number to a file path which it can then use to find the file.  You think stale file handles are a bitch, try looking at all the threads where there are AFP database issues, or ask madburg, LOL.

extra.cfg

Link to comment

Tom, thank you for the detailed and informative reply.  That's given me a better feel for what is happening.

 

Edit:

Yes, I'm using Ubuntu.

 

However, I think I need to do a little more investigation - umount/re-mount doesn't remove the stale file handle error.

 

As I've said, elsewhere, I'm puzzled as to why the shfs fix you implemented in rc4 was so effective at eliminating most stale file handle problems, but that improvement vanished in rc11.  For me, rc4-10 are very usable, which rc11/12 are not.  I still wonder whether the fix you implemented in rc4 has been reverted as a result of other shfs changes you made for rc11.

 

Anyway, I will return to rc12, try setting the shfsExtra parameter, and report back.

Link to comment
Anyway, I will return to rc12, try setting the shfsExtra parameter, and report back.

 

Well, I'm afraid that didn't help at all.

 

Running rc12a, with the extra.cfg placed in my config folder, the very first mkvmerge I ran, writing the output to my Movies share (it actually runs to successful completion, and the output file is good) resulted in an inaccessible Movies folder and 'stale nfs handle' error.

 

I have to say that I'm not totally surprised at this.  Having thought at some length about your explanation of what is happening, and the precise symptoms I see, it doesn't quite make sense, because I don't have to wait six minutes for the problem to occur.  Also, the trick of 'umount'ing and re'mount'ing (at the Ubuntu client) doesn't fix it.

 

I would still like you to confirm that your shfs fix, implemented in rc4, is still in effect.

Link to comment

I've got tons and tons of files shared so I'd rather not have my memory be eaten up by infinite nodes stored in memory.

 

It's not perfect but for the time being I just have cron job that runs every 4 hours on my guest OS that runs a script that unmounts and remounts all my NFS shares. At some point this is probably going to break a file transfer in progress at the time the script runs but I'd rather deal with that than having to reboot my unraid VM because services are shutting down for lack of memory.

Link to comment
  • 1 month later...

Think I might be experiencing a similar issue… can anyone advise before I screw things up totally by making and of the changes proposed elsewhere on this thread?

 

I have a cached NFS share (multi-disk share) on my unRaid server (5.0 rc12a) - "TVRecordings".

I mount this as "recordings" on my OpenElec XBMC via the command:

 

mount -t nfs [myunRaidIPAddress]:/mnt/user/TVRecordings ~openelec/recordings -o nolock

My plan is that I use TVHeadEnd on OpenElec XBMC to make TV recordings to the "TVRecordings" unRaid share (which since it is cached should appear to XBMC to be permanently spinning and so no delays when recordings start) and then watch these recordings from any XBMC install on my home network.

 

It works fine after a reboot of my OpenElec XBMC - but only for a few hours - and there does not appear to be any trigger that stops it from working!

 

After a reboot if I ssh in to OpenElec/storage I can see these folders:

 

downloads    logfiles      music        recordings    videos

emulators    lost+found    pictures      screenshots  tvshows

An hour or so later I can ssh in and see the following:

 

ls: ./recordings: Stale NFS file handle

downloads    logfiles      music        screenshots  tvshows

emulators    lost+found    pictures    videos

 

Note - Stale NFS file handle message - plus "recordings" has disappeared.

OpenElec does not appear to realise this and carries on recording to this dropped mount - but, of course, if I try to play any of these recordings they give an error as they do not exist. A reboot of OpenElec will fix everything for a few more hours.

 

Can anyone advise which of the potential fixes in the thread above I should try?… either changes to my mount command?… or use of "extra.cfg" (have 4gb of Ram on my unRaid so perhaps enough to cope with the memory overhead?)

 

Thanks in anticipation!

Dave K

 

Link to comment
ls: ./recordings: Stale NFS file handle

downloads    logfiles      music        screenshots  tvshows

emulators    lost+found    pictures    videos

 

Note - Stale NFS file handle message - plus "recordings" has disappeared.

 

For me, the stale nfs file handle problem is a lot less severe in rc10.

 

I'd be interested to know whether your experience is the same, if you revert to rc10.

Link to comment

Thanks for this - going to try some experiments with the mount command extensions suggested earlier in this thread and if they don't help then I'll roll back to .10 at the weekend and report back. Thanks again - any other advice on this issue gratefully accepted!

 

Link to comment

OK - so using the information above I tried changing my mount command to:

 

mount -t nfs [my unraid IP address]:/mnt/user/TVRecordings ~openelec/recordings -o nfs defaults,soft,nolock,nfsvers=3,lookupcache=none,noac 0 0

 

But I simply get a long "BusyBox" error/help message.

Anyone around who can advise re what I have done incorrectly?

Link to comment

What is the error message?

 

Are you taking that mount line from your fstab?

 

Try :

 

mount -t nfs [my unraid IP address]:/mnt/user/TVRecordings ~openelec/recordings -o defaults,soft,nolock,nfsvers=3,lookupcache=none,noac

 

instead. Though I havn't checked all those options are valid. I'd change ~openelec/recordings to the fully qualified path as well, though you probably don't have to.

Link to comment

OK - Tried the new command and got the same response:

 

BusyBox v1.21.0 (2013-04-25 23:06:23 CEST) multi-call binary.

 

Usage: mount [OPTIONS] [-o OPTS] DEVICE NODE

 

Mount a filesystem. Filesystem autodetection requires /proc.

 

-a Mount all filesystems in fstab

-f Dry run

-i Don't run mount helper

-v Verbose

-r Read-only mount

-w Read-write mount (default)

-t FSTYPE[,...] Filesystem type(s)

-O OPT Mount only filesystems with option OPT (-a only)

-o OPT:

loop Ignored (loop devices are autodetected)

[a]sync Writes are [a]synchronous

[no]atime Disable/enable updates to inode access times

[no]diratime Disable/enable atime updates to directories

[no]relatime Disable/enable atime updates relative to modification time

[no]dev (Dis)allow use of special device files

[no]exec (Dis)allow use of executable files

[no]suid (Dis)allow set-user-id-root programs

[r]shared Convert [recursively] to a shared subtree

[r]slave Convert [recursively] to a slave subtree

[r]private Convert [recursively] to a private subtree

[un]bindable Make mount point [un]able to be bind mounted

[r]bind Bind a file or directory [recursively] to another location

move Relocate an existing mount point

remount Remount a mounted filesystem, changing flags

ro/rw Same as -r/-w

 

There are filesystem-specific -o flags.

 

Which seems like an error/help handler to me (but it actually doesn't help me!)

Any thoughts?

 

Link to comment

I don't know what busy box gives you so go to the basic mount with no options and work up from there :

 

mount -t nfs [my unraid IP address]:/mnt/user/TVRecordings ~openelec/recordings

 

Again get rid of the ~ shorthand as well.

Link to comment

Great - so figured out your response (I am not particularly good at this sort of thing) and:

 

mount -t nfs 192.168.x.x:/mnt/user/TVRecordings /storage/recordings

 

Works!

In that it does what it is supposed to do and when I do a directory of "recordings" on my OpenElec XBMC box it shows me what is actually in the "TVRecordings" share on unRaid.

I'll take a look at it every 30 minutes or so now - but expect it to become a "stale nfs file handle" in the next hour or so max.

Thanks for patience/persevering with this

What do I try next? (the command also works with -o nolock at the end too - but similarly becomes a stale nfs file handle in about an hour or so)

Dave

Link to comment

What is the error message?

 

Are you taking that mount line from your fstab?

 

Try :

 

mount -t nfs [my unraid IP address]:/mnt/user/TVRecordings ~openelec/recordings -o defaults,soft,nolock,nfsvers=3,lookupcache=none,noac

 

instead. Though I havn't checked all those options are valid. I'd change ~openelec/recordings to the fully qualified path as well, though you probably don't have to.

The DEVICE and MOUNT-POINT must be the last two parameters on the command line.  All options must come first.

 

Something like this

mount -t nfs  -o defaults,soft,nolock,nfsvers=3,lookupcache=none,noac  [my unraid IP address]:/mnt/user/TVRecordings ~openelec/recordings

Link to comment

What is the error message?

 

Are you taking that mount line from your fstab?

 

Try :

 

mount -t nfs [my unraid IP address]:/mnt/user/TVRecordings ~openelec/recordings -o defaults,soft,nolock,nfsvers=3,lookupcache=none,noac

 

instead. Though I havn't checked all those options are valid. I'd change ~openelec/recordings to the fully qualified path as well, though you probably don't have to.

The DEVICE and MOUNT-POINT must be the last two parameters on the command line.  All options must come first.

 

Something like this

mount -t nfs  -o defaults,soft,nolock,nfsvers=3,lookupcache=none,noac  [my unraid IP address]:/mnt/user/TVRecordings ~openelec/recordings

 

Is that just a busy box (or openelec..or specific ditro) thing? it's not true for 'general' linux nfs client mount operations.

Link to comment

Great - so figured out your response (I am not particularly good at this sort of thing) and:

 

mount -t nfs 192.168.x.x:/mnt/user/TVRecordings /storage/recordings

 

Works!

In that it does what it is supposed to do and when I do a directory of "recordings" on my OpenElec XBMC box it shows me what is actually in the "TVRecordings" share on unRaid.

I'll take a look at it every 30 minutes or so now - but expect it to become a "stale nfs file handle" in the next hour or so max.

Thanks for patience/persevering with this

What do I try next? (the command also works with -o nolock at the end too - but similarly becomes a stale nfs file handle in about an hour or so)

Dave

 

I'm muddying the waters now I think, I'm not following the nfs problems much (I'm waiting for someone else to tell me it's all working as I'm dying to go back to using nfs). But I think the pertinent options you want are :

 

lookupcache=none,noac

 

Though as suggested they may not make much of a difference. I think you want, as has been previously suggested on the thread, to roll back to an earlier RC as specified to see if it behaves better for you.

 

You could also not use user share for your nfs mounts which should side step the issue (I believe) if it's a practical setup for you.

 

PeterB is the authority as far as I'm concerned on the ongoing nfs issues so do whatever he suggests :)

Link to comment

Wow - you guys are flying way over my head.

Anyway - get:

mount: unknown nfs mount option: defaults

When I try Joe L's mount command:

mount -t nfs  -o defaults,soft,nolock,nfsvers=3,lookupcache=none,noac  [my unraid IP address]:/mnt/user/TVRecordings ~openelec/recordings

Which suggests a tiny bit of progress (in that I no longer get the Busy Box response)

Whereas:

mount -t nfs lookupcache=none,noac 192.168.0.10:/mnt/user/TVRecordings /storage/recordings

Takes me to the Busy Box message again.

So - maybe almost there (assuming of course that this will stop my Stale NFS File Handles]

Plan on a roll back to rc.10 at the weekend if none of this works .. but really do like to be running the latest unRaid when I can

Thanks again

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...