Ned Posted March 31, 2016 Share Posted March 31, 2016 I have been having some csum issues and it looks as though my docker.img container is corrupt. What is the correct procedure to delete the current docker.img and create a new one and get my dockers up and running again? Dockers I use are: - AirVideo HD - Crashplan - MariaDB I'm guessing that I should do the following but can someone please confirm? I don't want to lose any data or the current state of my dockers. 1) take array offline 2) delete docker.img file 3) will unraid now think there is no dockers enabled and I will need to re-enable it from the docker tab and re-create a new virtual disk at this point? 4) after re-creating docker.img, re-install the docker apps with the same config parameters 5) the dockers will resume operation using the existing appdata folder and data? Prior to doing any of this, should I remove all of the docker containers and images I'm using? Is this the correct procedure? Thanks! Quote Link to comment
Squid Posted March 31, 2016 Share Posted March 31, 2016 No you don't to remove the containers. And re-adding them is a snap. Either use CA's previous apps section or Add A container via docker tab and select the user template. All your mappings, etc will be the exact same as they were Quote Link to comment
Ned Posted March 31, 2016 Author Share Posted March 31, 2016 Thanks! So just to confirm then, I should do the following: 1) take array offline 2) delete docker.img 3) bring array back online 4) go to docker tab and re-enable docker and create new docker.img file 5) re-install dockers using user templates to retain my mappings, etc. 6) start up the dockers That's it? Quote Link to comment
Squid Posted March 31, 2016 Share Posted March 31, 2016 You don't have to take the array offline. Settings, docker, disable docker, delete the image, reenable docker, add the containers Quote Link to comment
Ned Posted March 31, 2016 Author Share Posted March 31, 2016 So I just re-built my docker.img and re-installed everything which went perfect. HOWEVER, shortly after the last docker came online I noticed a couple of random csum errors showed up again in my log! I then went to settings -> docker and ran a scrub command and instantly my log screen was flooded with csum errors! What is going on here??? Running a scrub on the cache disk itself yeilds 0 errors so I don't understand what is wrong or how to fix this? Mar 31 19:02:48 Tower emhttp: need_authorization: getpeername: Transport endpoint is not connected Mar 31 19:03:05 Tower kernel: BTRFS warning (device sdh1): csum failed ino 101233 off 21794816 csum 2563185076 expected csum 2324397979 Mar 31 19:03:11 Tower emhttp: cmd: /usr/local/emhttp/plugins/dynamix.plugin.manager/scripts/plugin checkall Mar 31 19:03:12 Tower kernel: BTRFS warning (device sdh1): csum failed ino 101233 off 138301440 csum 282748589 expected csum 40004289 Mar 31 19:03:24 Tower emhttp: cmd: /usr/local/emhttp/plugins/dynamix.plugin.manager/scripts/plugin checkall Mar 31 19:03:32 Tower kernel: unregister_netdevice: waiting for lo to become free. Usage count = 1 Mar 31 19:03:36 Tower emhttp: cmd: /usr/local/emhttp/plugins/dynamix.plugin.manager/scripts/plugin update community.applications.plg Mar 31 19:03:36 Tower logger: plugin: running: anonymous Mar 31 19:03:36 Tower logger: plugin: running: anonymous Mar 31 19:03:36 Tower logger: plugin: creating: /boot/config/plugins/community.applications/community.applications-2016.03.31.txz - downloading from URL https://raw.github.com/Squidly271/community.applications/master/archive/community.applications-2016.03.31.txz Mar 31 19:03:37 Tower logger: plugin: checking: /boot/config/plugins/community.applications/community.applications-2016.03.31.txz - MD5 Mar 31 19:03:37 Tower logger: plugin: running: /boot/config/plugins/community.applications/community.applications-2016.03.31.txz Mar 31 19:03:37 Tower logger: plugin: running: anonymous Mar 31 19:03:42 Tower kernel: unregister_netdevice: waiting for lo to become free. Usage count = 1 Mar 31 19:03:52 Tower kernel: unregister_netdevice: waiting for lo to become free. Usage count = 1 Mar 31 19:03:53 Tower kernel: BTRFS warning (device sdh1): csum failed ino 101233 off 490090496 csum 256138570 expected csum 723321813 Mar 31 19:04:12 Tower php: /usr/local/emhttp/plugins/dynamix.docker.manager/scripts/docker 'stop' 'AirVideoHD' Mar 31 19:04:13 Tower kernel: docker0: port 1(veth9e07c17) entered disabled state Mar 31 19:04:13 Tower kernel: vethf4d40ff: renamed from eth0 Mar 31 19:04:13 Tower kernel: docker0: port 1(veth9e07c17) entered disabled state ... Mar 31 19:05:00 Tower php: /usr/local/emhttp/plugins/dynamix/scripts/btrfs_scrub 'start' '/var/lib/docker' '-r' Mar 31 19:05:02 Tower kernel: BTRFS: checksum error at logical 107446272 on dev /dev/loop0, sector 2323392: metadata leaf (level 0) in tree 328 Mar 31 19:05:02 Tower kernel: BTRFS: checksum error at logical 107446272 on dev /dev/loop0, sector 2323392: metadata leaf (level 0) in tree 327 Mar 31 19:05:02 Tower kernel: BTRFS: checksum error at logical 107446272 on dev /dev/loop0, sector 2323392: metadata leaf (level 0) in tree 326 Mar 31 19:05:02 Tower kernel: BTRFS: checksum error at logical 107446272 on dev /dev/loop0, sector 2323392: metadata leaf (level 0) in tree 325 Mar 31 19:05:02 Tower kernel: BTRFS: checksum error at logical 107446272 on dev /dev/loop0, sector 2323392: metadata leaf (level 0) in tree 324 Mar 31 19:05:02 Tower kernel: BTRFS: checksum error at logical 107446272 on dev /dev/loop0, sector 2323392: metadata leaf (level 0) in tree 323 Mar 31 19:05:02 Tower kernel: BTRFS: checksum error at logical 107446272 on dev /dev/loop0, sector 2323392: metadata leaf (level 0) in tree 322 Mar 31 19:05:02 Tower kernel: BTRFS: checksum error at logical 107446272 on dev /dev/loop0, sector 2323392: metadata leaf (level 0) in tree 321 Mar 31 19:05:02 Tower kernel: BTRFS: checksum error at logical 107446272 on dev /dev/loop0, sector 2323392: metadata leaf (level 0) in tree 320 Mar 31 19:05:02 Tower kernel: BTRFS: checksum error at logical 107446272 on dev /dev/loop0, sector 2323392: metadata leaf (level 0) in tree 319 Mar 31 19:05:02 Tower kernel: BTRFS: checksum error at logical 107446272 on dev /dev/loop0, sector 2323392: metadata leaf (level 0) in tree 318 Mar 31 19:05:02 Tower kernel: BTRFS: checksum error at logical 107446272 on dev /dev/loop0, sector 2323392: metadata leaf (level 0) in tree 317 Mar 31 19:05:02 Tower kernel: BTRFS: checksum error at logical 107446272 on dev /dev/loop0, sector 2323392: metadata leaf (level 0) in tree 316 Mar 31 19:05:02 Tower kernel: BTRFS: checksum error at logical 107446272 on dev /dev/loop0, sector 2323392: metadata leaf (level 0) in tree 315 Mar 31 19:05:02 Tower kernel: BTRFS: checksum error at logical 107446272 on dev /dev/loop0, sector 2323392: metadata leaf (level 0) in tree 314 Mar 31 19:05:02 Tower kernel: BTRFS: checksum error at logical 107446272 on dev /dev/loop0, sector 2323392: metadata leaf (level 0) in tree 313 Mar 31 19:05:02 Tower kernel: BTRFS: checksum error at logical 107446272 on dev /dev/loop0, sector 2323392: metadata leaf (level 0) in tree 312 Mar 31 19:05:02 Tower kernel: BTRFS: checksum error at logical 107446272 on dev /dev/loop0, sector 2323392: metadata leaf (level 0) in tree 311 Mar 31 19:05:02 Tower kernel: BTRFS: checksum error at logical 107446272 on dev /dev/loop0, sector 2323392: metadata leaf (level 0) in tree 310 Mar 31 19:05:02 Tower kernel: BTRFS: checksum error at logical 107446272 on dev /dev/loop0, sector 2323392: metadata leaf (level 0) in tree 309 Mar 31 19:05:02 Tower kernel: BTRFS: checksum error at logical 107446272 on dev /dev/loop0, sector 2323392: metadata leaf (level 0) in tree 308 Mar 31 19:05:02 Tower kernel: BTRFS: checksum error at logical 107446272 on dev /dev/loop0, sector 2323392: metadata leaf (level 0) in tree 307 Mar 31 19:05:02 Tower kernel: BTRFS: checksum error at logical 107446272 on dev /dev/loop0, sector 2323392: metadata leaf (level 0) in tree 306 Mar 31 19:05:02 Tower kernel: BTRFS: checksum error at logical 107446272 on dev /dev/loop0, sector 2323392: metadata leaf (level 0) in tree 305 Mar 31 19:05:02 Tower kernel: BTRFS: checksum error at logical 107446272 on dev /dev/loop0, sector 2323392: metadata leaf (level 0) in tree 304 Mar 31 19:05:02 Tower kernel: BTRFS: checksum error at logical 107446272 on dev /dev/loop0, sector 2323392: metadata leaf (level 0) in tree 303 Mar 31 19:05:02 Tower kernel: BTRFS: checksum error at logical 107446272 on dev /dev/loop0, sector 2323392: metadata leaf (level 0) in tree 302 Mar 31 19:05:02 Tower kernel: BTRFS: checksum error at logical 107446272 on dev /dev/loop0, sector 2323392: metadata leaf (level 0) in tree 301 Mar 31 19:05:02 Tower kernel: BTRFS: checksum error at logical 107446272 on dev /dev/loop0, sector 2323392: metadata leaf (level 0) in tree 299 Mar 31 19:05:02 Tower kernel: BTRFS: checksum error at logical 107446272 on dev /dev/loop0, sector 2323392: metadata leaf (level 0) in tree 299 Mar 31 19:05:02 Tower kernel: BTRFS: bdev /dev/loop0 errs: wr 0, rd 0, flush 0, corrupt 1, gen 0 Mar 31 19:05:06 Tower kernel: BTRFS: checksum error at logical 1734238208 on dev /dev/loop0, sector 5500720, root 289, inode 21020, offset 532480, length 4096, links 1 (path: usr/lib/x86_64-linux-gnu/dri/vmwgfx_dri.so) Mar 31 19:05:06 Tower kernel: BTRFS: checksum error at logical 1734238208 on dev /dev/loop0, sector 5500720, root 288, inode 21020, offset 532480, length 4096, links 1 (path: usr/lib/x86_64-linux-gnu/dri/vmwgfx_dri.so) Mar 31 19:05:06 Tower kernel: BTRFS: checksum error at logical 1734238208 on dev /dev/loop0, sector 5500720, root 287, inode 21020, offset 532480, length 4096, links 1 (path: usr/lib/x86_64-linux-gnu/dri/vmwgfx_dri.so) Mar 31 19:05:06 Tower kernel: BTRFS: checksum error at logical 1734238208 on dev /dev/loop0, sector 5500720, root 286, inode 21020, offset 532480, length 4096, links 1 (path: usr/lib/x86_64-linux-gnu/dri/vmwgfx_dri.so) Mar 31 19:05:06 Tower kernel: BTRFS: checksum error at logical 1734238208 on dev /dev/loop0, sector 5500720, root 285, inode 21020, offset 532480, length 4096, links 1 (path: usr/lib/x86_64-linux-gnu/dri/vmwgfx_dri.so) ... I cancelled it right away and this was the output from the command in the GUI window: scrub status for 2eec4a9b-d079-4ce6-a510-350a24858b34 scrub started at Thu Mar 31 19:05:00 2016 and finished after 00:00:13 total bytes scrubbed: 3.35GiB with 6 errors error details: csum=6 corrected errors: 0, uncorrectable errors: 0, unverified errors: 0 Quote Link to comment
Ned Posted April 1, 2016 Author Share Posted April 1, 2016 Will post shortly as soon as I get back in front of my computer... thanks Quote Link to comment
Ned Posted April 1, 2016 Author Share Posted April 1, 2016 Diagnostics file attached. Also FYI, since my second last post where I reported the errors I have not seen any more csum errors in the log file but I suspect that if I tried to do the scrub on the docker volume again I'd get the same result. I would try it right now but the system is doing its monthly parity check so I don't want to push my luck at the moment. ** edit *** spoke to soon... just got the following in the log file right at the same time as I launched the WebUI for the crashplan docker: Apr 1 00:15:34 Tower kernel: BTRFS warning (device sdh1): csum failed ino 101233 off 202776576 csum 604878402 expected csum 1288708544 Apr 1 00:16:02 Tower kernel: BTRFS warning (device sdh1): csum failed ino 101233 off 684728320 csum 3214513585 expected csum 3532612319 Apr 1 00:16:02 Tower kernel: BTRFS: read error corrected: ino 101233 off 684728320 (dev /dev/sdi1 sector 262307968) Looking forward to see what you can find in the diagnostics file and thank you!! tower_log.zip Quote Link to comment
BRiT Posted April 1, 2016 Share Posted April 1, 2016 According to LimeTech's jonp, any and all BTRFS errors are the result of the users having faulty equipment. Quote Link to comment
Ned Posted April 1, 2016 Author Share Posted April 1, 2016 are you saying one (or both) of my cache disks are bad? Where do you conclude that from the diagnostics? Their smart reports are perfect and a scrub on the disk file systems also returns no errors... Quote Link to comment
BRiT Posted April 1, 2016 Share Posted April 1, 2016 are you saying one (or both) of my cache disks are bad? Where do you conclude that from the diagnostics? Their smart reports are perfect and a scrub on the disk file systems also returns no errors... No. I'm just reposting the opinion of jonp from linetech. My opinion is BTRFS is immature and still riddled with issues especially if your server has uncontrolled shutdowns or power glitches and the recovery tools are immature too. Its why it will never be on my data disks before they address those issues, say maybe around 2018. Quote Link to comment
CHBMB Posted April 1, 2016 Share Posted April 1, 2016 No. I'm just reposting the opinion of jonp from linetech. My opinion is BTRFS is immature and still riddled with issues especially if your server has uncontrolled shutdowns or power glitches and the recovery tools are immature too. Its why it will never be on my data disks before they address those issues, say maybe around 2018. I have the same opinion. btrfs I just don't think is quite there yet. Quote Link to comment
Ned Posted April 1, 2016 Author Share Posted April 1, 2016 Ok in hindsight, I should have gone with a single XFS cache disk instead of a BTRFS cache pool but that's what I have now... So.. based on the diagnostics I posted, are you guys able to help me pinpoint what is the cause of the csum errors I'm getting? Quote Link to comment
Ned Posted April 1, 2016 Author Share Posted April 1, 2016 Let me ask another question... how would I go about getting rid of my cache pool and converting it to a single XFS cache drive? What are the steps to do that? Quote Link to comment
lionelhutz Posted April 1, 2016 Share Posted April 1, 2016 If you cache drive pool isn't showing any errors then the errors are in the virtual docker filesystem. You can try re-creating it again until it starts to behave. The docker image will still corrupt on a XFS drive. My cache is currently formatted XFS and my docker image currently is corrupt with 4 errors. Fortunately, the dockers are all still working so I haven't bothered to fix it. Quote Link to comment
BRiT Posted April 2, 2016 Share Posted April 2, 2016 If you cache drive pool isn't showing any errors then the errors are in the virtual docker filesystem. Unfortunately I don't believe his BTRFS issues are confined to the docker loopback image because of the following entries: Apr 1 00:15:34 Tower kernel: BTRFS warning (device sdh1): csum failed ino 101233 off 202776576 csum 604878402 expected csum 1288708544 Apr 1 00:16:02 Tower kernel: BTRFS warning (device sdh1): csum failed ino 101233 off 684728320 csum 3214513585 expected csum 3532612319 Apr 1 00:16:02 Tower kernel: BTRFS: read error corrected: ino 101233 off 684728320 (dev /dev/sdi1 sector 262307968) Quote Link to comment
Ned Posted April 2, 2016 Author Share Posted April 2, 2016 Thanks guys... I had two issues, both of which are now resolved. I had a bad SATA cable on the primary cache disk and I'm guessing that may have had something to do with my docker.img going corrupt. Cable was replaced and docker.img re-created and issues seem to have gone away. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.