Jump to content

BTRFS errors and Plex Server unreachable


ElJimador

Recommended Posts

Hi.  Can anyone take a look at the log and tell me what's going on here?  Last night my plex server was unreachable and after rebooting and updating everything I could (which turned out to be just unRAID itself to 6.2.4) my Plex and PlexWatch dockers will start but BTSync won't and I'm seeing all these BTRFS errors in the log.  Thanks.

syslog.txt

Link to comment

Hi.  Can anyone take a look at the log and tell me what's going on here?  Last night my plex server was unreachable and after rebooting and updating everything I could (which turned out to be just unRAID itself to 6.2.4) my Plex and PlexWatch dockers will start but BTSync won't and I'm seeing all these BTRFS errors in the log.  Thanks.

the BTRFS errors all relate to the loop0 device which is where the docker image is mounted.  Not sure why the errors occur - has your docker image filled up for some reason (which would suggest a configuration error for one of the apps)?

 

The way forward in such a case is normally to stop docker, delete the sitting docker.img file; restart docker to create a new image file; reinstall the docker apps using the my-templates option to bring back the apps and your settings.

Link to comment

The way forward in such a case is normally to stop docker, delete the sitting docker.img file; restart docker to create a new image file; reinstall the docker apps using the my-templates option to bring back the apps and your settings.

 

Thanks for the reply.  I did the above and reinstalled Plex from my templates but it's still telling me the server is unreachable.  I've attached a new log with what I assume as the relevant portion copied as well (also where the log ends btw).  Any other ideas?

 

 

Nov 26 18:50:57 JBOX root: starting docker ...

Nov 26 18:50:57 JBOX kernel: ip_tables: © 2000-2006 Netfilter Core Team

Nov 26 18:50:57 JBOX avahi-daemon[1833]: Joining mDNS multicast group on interface docker0.IPv4 with address 172.17.0.1.

Nov 26 18:50:57 JBOX avahi-daemon[1833]: New relevant interface docker0.IPv4 for mDNS.

Nov 26 18:50:57 JBOX avahi-daemon[1833]: Registering new address record for 172.17.0.1 on docker0.IPv4.

Nov 26 18:50:58 JBOX root: PlexMediaServer: started succesfully!

Nov 26 18:50:58 JBOX emhttp: shcmd (139): /usr/local/emhttp/plugins/dynamix.docker.manager/scripts/dockerupdate.php |& logger

Nov 26 18:51:02 JBOX root:  Updating templates...  Updating info...  Done.

Nov 26 18:51:02 JBOX emhttp: nothing to sync

Nov 26 18:51:02 JBOX kernel: BTRFS error (device sdk1): parent transid verify failed on 1734586941440 wanted 542969 found 540656

Nov 26 18:51:02 JBOX kernel: BTRFS info (device sdk1): read error corrected: ino 1 off 1734586941440 (dev /dev/sdk1 sector 532512)

Nov 26 18:51:02 JBOX kernel: BTRFS info (device sdk1): read error corrected: ino 1 off 1734586945536 (dev /dev/sdk1 sector 532520)

Nov 26 18:51:02 JBOX kernel: BTRFS info (device sdk1): read error corrected: ino 1 off 1734586949632 (dev /dev/sdk1 sector 532528)

Nov 26 18:51:02 JBOX kernel: BTRFS info (device sdk1): read error corrected: ino 1 off 1734586953728 (dev /dev/sdk1 sector 532536)

Nov 26 18:51:02 JBOX kernel: BTRFS info (device sdk1): no csum found for inode 269 start 40960

Nov 26 18:51:02 JBOX kernel: BTRFS warning (device sdk1): csum failed ino 269 off 40960 csum 2225587593 expected csum 0

Nov 26 18:51:02 JBOX kernel: BTRFS info (device sdk1): no csum found for inode 269 start 147456

Nov 26 18:51:02 JBOX kernel: BTRFS warning (device sdk1): csum failed ino 269 off 147456 csum 3864140711 expected csum 0

 

syslog.txt

Link to comment

Those BTRFS errors refer to the cache drive, not to the docker.img per se.

 

You need to check the file system on the cache

 

Interesting.  It never occurred to me that there was something I needed to do to maintain the integrity of the file system on the cache pool.  I thought once I threw in a second 500gb SSD and unRAID showed me just the 500gb total cache capacity and green dots by both, that meant that the drives were mirrored and everything was working as it should.  Apparently not though since clicking on the cache device from the main page and then running a BTRFS scrub with the option to correct file system errors checked produced these results:

 

 

scrub status for a5d441db-5eaf-41bf-8cbb-33ff7dce48da

scrub started at Sat Nov 26 21:55:07 2016 and finished after 02:33:14

total bytes scrubbed: 258.48GiB with 16239401 errors

error details: verify=5152 csum=16234249

corrected errors: 16238345, uncorrectable errors: 1056, unverified errors: 0

 

Since 16m+ of the writes were to the 2nd cache drive and only 160k to the original, I assume this means the 2nd drive was basically empty until I ran this scrub and nothing had ever been mirrored before?  My fault for not understanding how a cache pool works I guess, but it sure would have been nice to see some kind of flag to alert me there was an issue.

 

Anyway, any advice on what to do now?  Thanks again for your help.  I really do appreciate it.

Link to comment

I would do one or two more scrubs to see if all errors are fixed, if not probably best to redo de pool.

 

Hi Johnnie. I ran 2 more scrubs that both completed within a few minutes and found the same 1056 uncorrectable errors.  So how exactly do I go about re-doing the cache pool?  Can any of the data on the cache (Plex server especially) be saved?  Thanks.

Link to comment

Easiest way is probably doing the replace cache procedure, after the move to the array when the pool is empty stop the array and wipe the its filesystem:

 

wipefs -a /dev/sdX

 

replace X with your SSDs, wipe both.

 

Recreate the pool and continue the procedure to move cache files back.

 

Thanks Johnnie.  Did all that and I no longer see any errors on the log or running a new BTRFS scrub, and I got the BTSync docker restored and working again too.

 

Unfortunately Plex still doesn't work.  I recreated the docker from my templates and started it just fine, but clicking the Web UI returns "page unavailable" and opening a new tab to access it from the Plex web app doesn't work either.  So not sure what's going on there but I assume it's a Plex issue at this point and will take it up on their forum

 

Back to the BTRFS errors though.  Am I supposed to be running BTRFS scrubs once in a while just like I run parity checks?  And what the heck is BTRFS balance supposed to do?  I tried running it once I got everything copied back to the cache pool and it did the same thing as when I tried running it many months ago:  it keeps showing progress every time you refresh the page until you get to the point where it should be completing and then it stops and says "No balance found".  If that can be safetly ignored, fine.  But if it's indicating that there is still some problem I'd like to know what I'm supposed to do about that.

 

Thanks again.

Link to comment

According to those at LimeTech, any errors users have with BTRFS are all caused by Hardware issues. You should be sure to validate your hardware is fine before proceeding.

Pretty sure they are calling unclean shutdowns "hardware issues" in this context. I guess holding down the power button can be classified as hardware.
Link to comment

According to those at LimeTech, any errors users have with BTRFS are all caused by Hardware issues. You should be sure to validate your hardware is fine before proceeding.

 

One of the SSDs in my cache pool dropped off some weeks back due to a loose SATA cable, so I suppose the problem could have started there.  Thing is though, after I reconnected it everything was fine again for some time and the errors only became evident last week.  So maybe it was something else entirely, I just have no idea if it was hardware related what else it could have been.

Link to comment

Back to the BTRFS errors though.  Am I supposed to be running BTRFS scrubs once in a while just like I run parity checks?  And what the heck is BTRFS balance supposed to do?  I tried running it once I got everything copied back to the cache pool and it did the same thing as when I tried running it many months ago:  it keeps showing progress every time you refresh the page until you get to the point where it should be completing and then it stops and says "No balance found".  If that can be safetly ignored, fine.  But if it's indicating that there is still some problem I'd like to know what I'm supposed to do about that.

 

I run a BTRFS scrub every once in a while. Unlike XFS, the file system needs to be mounted so it's no great trouble to do it. It isn't something that unRAID runs automatically - perhaps it should, or perhaps it isn't really needed until there's an actual problem.

 

BTRFS balance is used to balance the files across the devices according to the RAID level you have configured. Typically, you would run it if you reconfigured your cache pool by adding/removing a device or changing the RAID level. You can run it whenever you want without causing any harm. The "No balance found" message is confusing and could be worded better. It means it has finished balancing so it's the normal status and it can be safely ignored.

 

Link to comment

Back to the BTRFS errors though.  Am I supposed to be running BTRFS scrubs once in a while just like I run parity checks?  And what the heck is BTRFS balance supposed to do?  I tried running it once I got everything copied back to the cache pool and it did the same thing as when I tried running it many months ago:  it keeps showing progress every time you refresh the page until you get to the point where it should be completing and then it stops and says "No balance found".  If that can be safetly ignored, fine.  But if it's indicating that there is still some problem I'd like to know what I'm supposed to do about that.

 

I run a BTRFS scrub every once in a while. Unlike XFS, the file system needs to be mounted so it's no great trouble to do it. It isn't something that unRAID runs automatically - perhaps it should, or perhaps it isn't really needed until there's an actual problem.

 

BTRFS balance is used to balance the files across the devices according to the RAID level you have configured. Typically, you would run it if you reconfigured your cache pool by adding/removing a device or changing the RAID level. You can run it whenever you want without causing any harm. The "No balance found" message is confusing and could be worded better. It means it has finished balancing so it's the normal status and it can be safely ignored.

 

Good info.  Thanks John! 

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...