How to rebuild docker.img?


Ned

Recommended Posts

I have been having some csum issues and it looks as though my docker.img container is corrupt.  What is the correct procedure to delete the current docker.img and create a new one and get my dockers up and running again?  Dockers I use are:

 

- AirVideo HD

- Crashplan

- MariaDB

 

I'm guessing that I should do the following but can someone please confirm?  I don't want to lose any data or the current state of my dockers.

 

1) take array offline

2) delete docker.img file

3) will unraid now think there is no dockers enabled and I will need to re-enable it from the docker tab and re-create a new virtual disk at this point?

4) after re-creating docker.img, re-install the docker apps with the same config parameters

5) the dockers will resume operation using the existing appdata folder and data?

 

Prior to doing any of this, should I remove all of the docker containers and images I'm using?  Is this the correct procedure?

 

Thanks!

Link to comment

Thanks!

 

So just to confirm then, I should do the following:

 

1) take array offline

2) delete docker.img

3) bring array back online

4) go to docker tab and re-enable docker and create new docker.img file

5) re-install dockers using user templates to retain my mappings, etc.

6) start up the dockers

 

That's it?

Link to comment

So I just re-built my docker.img and re-installed everything which went perfect.  HOWEVER, shortly after the last docker came online I noticed a couple of random csum errors showed up again in my log!  I then went to settings -> docker and ran a scrub command and instantly my log screen was flooded with csum errors!  What is going on here???  Running a scrub on the cache disk itself yeilds 0 errors so I don't understand what is wrong or how to fix this?

 

Mar 31 19:02:48 Tower emhttp: need_authorization: getpeername: Transport endpoint is not connected

Mar 31 19:03:05 Tower kernel: BTRFS warning (device sdh1): csum failed ino 101233 off 21794816 csum 2563185076 expected csum 2324397979

Mar 31 19:03:11 Tower emhttp: cmd: /usr/local/emhttp/plugins/dynamix.plugin.manager/scripts/plugin checkall

Mar 31 19:03:12 Tower kernel: BTRFS warning (device sdh1): csum failed ino 101233 off 138301440 csum 282748589 expected csum 40004289

Mar 31 19:03:24 Tower emhttp: cmd: /usr/local/emhttp/plugins/dynamix.plugin.manager/scripts/plugin checkall

Mar 31 19:03:32 Tower kernel: unregister_netdevice: waiting for lo to become free. Usage count = 1

Mar 31 19:03:36 Tower emhttp: cmd: /usr/local/emhttp/plugins/dynamix.plugin.manager/scripts/plugin update community.applications.plg

Mar 31 19:03:36 Tower logger: plugin: running: anonymous

Mar 31 19:03:36 Tower logger: plugin: running: anonymous

Mar 31 19:03:36 Tower logger: plugin: creating: /boot/config/plugins/community.applications/community.applications-2016.03.31.txz - downloading from URL https://raw.github.com/Squidly271/community.applications/master/archive/community.applications-2016.03.31.txz

Mar 31 19:03:37 Tower logger: plugin: checking: /boot/config/plugins/community.applications/community.applications-2016.03.31.txz - MD5

Mar 31 19:03:37 Tower logger: plugin: running: /boot/config/plugins/community.applications/community.applications-2016.03.31.txz

Mar 31 19:03:37 Tower logger: plugin: running: anonymous

Mar 31 19:03:42 Tower kernel: unregister_netdevice: waiting for lo to become free. Usage count = 1

Mar 31 19:03:52 Tower kernel: unregister_netdevice: waiting for lo to become free. Usage count = 1

Mar 31 19:03:53 Tower kernel: BTRFS warning (device sdh1): csum failed ino 101233 off 490090496 csum 256138570 expected csum 723321813

Mar 31 19:04:12 Tower php: /usr/local/emhttp/plugins/dynamix.docker.manager/scripts/docker 'stop' 'AirVideoHD'

Mar 31 19:04:13 Tower kernel: docker0: port 1(veth9e07c17) entered disabled state

Mar 31 19:04:13 Tower kernel: vethf4d40ff: renamed from eth0

Mar 31 19:04:13 Tower kernel: docker0: port 1(veth9e07c17) entered disabled state

 

...

 

Mar 31 19:05:00 Tower php: /usr/local/emhttp/plugins/dynamix/scripts/btrfs_scrub 'start' '/var/lib/docker' '-r'

Mar 31 19:05:02 Tower kernel: BTRFS: checksum error at logical 107446272 on dev /dev/loop0, sector 2323392: metadata leaf (level 0) in tree 328

Mar 31 19:05:02 Tower kernel: BTRFS: checksum error at logical 107446272 on dev /dev/loop0, sector 2323392: metadata leaf (level 0) in tree 327

Mar 31 19:05:02 Tower kernel: BTRFS: checksum error at logical 107446272 on dev /dev/loop0, sector 2323392: metadata leaf (level 0) in tree 326

Mar 31 19:05:02 Tower kernel: BTRFS: checksum error at logical 107446272 on dev /dev/loop0, sector 2323392: metadata leaf (level 0) in tree 325

Mar 31 19:05:02 Tower kernel: BTRFS: checksum error at logical 107446272 on dev /dev/loop0, sector 2323392: metadata leaf (level 0) in tree 324

Mar 31 19:05:02 Tower kernel: BTRFS: checksum error at logical 107446272 on dev /dev/loop0, sector 2323392: metadata leaf (level 0) in tree 323

Mar 31 19:05:02 Tower kernel: BTRFS: checksum error at logical 107446272 on dev /dev/loop0, sector 2323392: metadata leaf (level 0) in tree 322

Mar 31 19:05:02 Tower kernel: BTRFS: checksum error at logical 107446272 on dev /dev/loop0, sector 2323392: metadata leaf (level 0) in tree 321

Mar 31 19:05:02 Tower kernel: BTRFS: checksum error at logical 107446272 on dev /dev/loop0, sector 2323392: metadata leaf (level 0) in tree 320

Mar 31 19:05:02 Tower kernel: BTRFS: checksum error at logical 107446272 on dev /dev/loop0, sector 2323392: metadata leaf (level 0) in tree 319

Mar 31 19:05:02 Tower kernel: BTRFS: checksum error at logical 107446272 on dev /dev/loop0, sector 2323392: metadata leaf (level 0) in tree 318

Mar 31 19:05:02 Tower kernel: BTRFS: checksum error at logical 107446272 on dev /dev/loop0, sector 2323392: metadata leaf (level 0) in tree 317

Mar 31 19:05:02 Tower kernel: BTRFS: checksum error at logical 107446272 on dev /dev/loop0, sector 2323392: metadata leaf (level 0) in tree 316

Mar 31 19:05:02 Tower kernel: BTRFS: checksum error at logical 107446272 on dev /dev/loop0, sector 2323392: metadata leaf (level 0) in tree 315

Mar 31 19:05:02 Tower kernel: BTRFS: checksum error at logical 107446272 on dev /dev/loop0, sector 2323392: metadata leaf (level 0) in tree 314

Mar 31 19:05:02 Tower kernel: BTRFS: checksum error at logical 107446272 on dev /dev/loop0, sector 2323392: metadata leaf (level 0) in tree 313

Mar 31 19:05:02 Tower kernel: BTRFS: checksum error at logical 107446272 on dev /dev/loop0, sector 2323392: metadata leaf (level 0) in tree 312

Mar 31 19:05:02 Tower kernel: BTRFS: checksum error at logical 107446272 on dev /dev/loop0, sector 2323392: metadata leaf (level 0) in tree 311

Mar 31 19:05:02 Tower kernel: BTRFS: checksum error at logical 107446272 on dev /dev/loop0, sector 2323392: metadata leaf (level 0) in tree 310

Mar 31 19:05:02 Tower kernel: BTRFS: checksum error at logical 107446272 on dev /dev/loop0, sector 2323392: metadata leaf (level 0) in tree 309

Mar 31 19:05:02 Tower kernel: BTRFS: checksum error at logical 107446272 on dev /dev/loop0, sector 2323392: metadata leaf (level 0) in tree 308

Mar 31 19:05:02 Tower kernel: BTRFS: checksum error at logical 107446272 on dev /dev/loop0, sector 2323392: metadata leaf (level 0) in tree 307

Mar 31 19:05:02 Tower kernel: BTRFS: checksum error at logical 107446272 on dev /dev/loop0, sector 2323392: metadata leaf (level 0) in tree 306

Mar 31 19:05:02 Tower kernel: BTRFS: checksum error at logical 107446272 on dev /dev/loop0, sector 2323392: metadata leaf (level 0) in tree 305

Mar 31 19:05:02 Tower kernel: BTRFS: checksum error at logical 107446272 on dev /dev/loop0, sector 2323392: metadata leaf (level 0) in tree 304

Mar 31 19:05:02 Tower kernel: BTRFS: checksum error at logical 107446272 on dev /dev/loop0, sector 2323392: metadata leaf (level 0) in tree 303

Mar 31 19:05:02 Tower kernel: BTRFS: checksum error at logical 107446272 on dev /dev/loop0, sector 2323392: metadata leaf (level 0) in tree 302

Mar 31 19:05:02 Tower kernel: BTRFS: checksum error at logical 107446272 on dev /dev/loop0, sector 2323392: metadata leaf (level 0) in tree 301

Mar 31 19:05:02 Tower kernel: BTRFS: checksum error at logical 107446272 on dev /dev/loop0, sector 2323392: metadata leaf (level 0) in tree 299

Mar 31 19:05:02 Tower kernel: BTRFS: checksum error at logical 107446272 on dev /dev/loop0, sector 2323392: metadata leaf (level 0) in tree 299

Mar 31 19:05:02 Tower kernel: BTRFS: bdev /dev/loop0 errs: wr 0, rd 0, flush 0, corrupt 1, gen 0

Mar 31 19:05:06 Tower kernel: BTRFS: checksum error at logical 1734238208 on dev /dev/loop0, sector 5500720, root 289, inode 21020, offset 532480, length 4096, links 1 (path: usr/lib/x86_64-linux-gnu/dri/vmwgfx_dri.so)

Mar 31 19:05:06 Tower kernel: BTRFS: checksum error at logical 1734238208 on dev /dev/loop0, sector 5500720, root 288, inode 21020, offset 532480, length 4096, links 1 (path: usr/lib/x86_64-linux-gnu/dri/vmwgfx_dri.so)

Mar 31 19:05:06 Tower kernel: BTRFS: checksum error at logical 1734238208 on dev /dev/loop0, sector 5500720, root 287, inode 21020, offset 532480, length 4096, links 1 (path: usr/lib/x86_64-linux-gnu/dri/vmwgfx_dri.so)

Mar 31 19:05:06 Tower kernel: BTRFS: checksum error at logical 1734238208 on dev /dev/loop0, sector 5500720, root 286, inode 21020, offset 532480, length 4096, links 1 (path: usr/lib/x86_64-linux-gnu/dri/vmwgfx_dri.so)

Mar 31 19:05:06 Tower kernel: BTRFS: checksum error at logical 1734238208 on dev /dev/loop0, sector 5500720, root 285, inode 21020, offset 532480, length 4096, links 1 (path: usr/lib/x86_64-linux-gnu/dri/vmwgfx_dri.so)

 

...

 

I cancelled it right away and this was the output from the command in the GUI window:

 

scrub status for 2eec4a9b-d079-4ce6-a510-350a24858b34

scrub started at Thu Mar 31 19:05:00 2016 and finished after 00:00:13

total bytes scrubbed: 3.35GiB with 6 errors

error details: csum=6

corrected errors: 0, uncorrectable errors: 0, unverified errors: 0

Link to comment

Diagnostics file attached.

 

Also FYI, since my second last post where I reported the errors I have not seen any more csum errors in the log file but I suspect that if I tried to do the scrub on the docker volume again I'd get the same result.  I would try it right now but the system is doing its monthly parity check so I don't want to push my luck at the moment.

 

** edit ***

spoke to soon... just got the following in the log file right at the same time as I launched the WebUI for the crashplan docker:

 

Apr 1 00:15:34 Tower kernel: BTRFS warning (device sdh1): csum failed ino 101233 off 202776576 csum 604878402 expected csum 1288708544

Apr 1 00:16:02 Tower kernel: BTRFS warning (device sdh1): csum failed ino 101233 off 684728320 csum 3214513585 expected csum 3532612319

Apr 1 00:16:02 Tower kernel: BTRFS: read error corrected: ino 101233 off 684728320 (dev /dev/sdi1 sector 262307968)

 

 

Looking forward to see what you can find in the diagnostics file and thank you!!

tower_log.zip

Link to comment

are you saying one (or both) of my cache disks are bad?  Where do you conclude that from the diagnostics?  Their smart reports are perfect and a scrub on the disk file systems also returns no errors...

 

No. I'm just reposting the opinion of jonp from linetech.

 

My opinion is BTRFS is immature and still riddled with issues especially if your server has uncontrolled shutdowns or power glitches and the recovery tools are immature too. Its why it will never be on my data disks before they address those issues, say maybe around 2018.

Link to comment

No. I'm just reposting the opinion of jonp from linetech.

 

My opinion is BTRFS is immature and still riddled with issues especially if your server has uncontrolled shutdowns or power glitches and the recovery tools are immature too. Its why it will never be on my data disks before they address those issues, say maybe around 2018.

 

I have the same opinion.  btrfs I just don't think is quite there yet. 

Link to comment

Ok in hindsight, I should have gone with a single XFS cache disk instead of a BTRFS cache pool but that's what I have now...

 

So.. based on the diagnostics I posted, are you guys able to help me pinpoint what is the cause of the csum errors I'm getting?

Link to comment

If you cache drive pool isn't showing any errors then the errors are in the virtual docker filesystem. You can try re-creating it again until it starts to behave. The docker image will still corrupt on a XFS drive. My cache is currently formatted XFS and my docker image currently is corrupt with 4 errors. Fortunately, the dockers are all still working so I haven't  bothered to fix it.

Link to comment

If you cache drive pool isn't showing any errors then the errors are in the virtual docker filesystem.

 

Unfortunately I don't believe his BTRFS issues are confined to the docker loopback image because of the following entries:

 

Apr 1 00:15:34 Tower kernel: BTRFS warning (device sdh1): csum failed ino 101233 off 202776576 csum 604878402 expected csum 1288708544

Apr 1 00:16:02 Tower kernel: BTRFS warning (device sdh1): csum failed ino 101233 off 684728320 csum 3214513585 expected csum 3532612319

Apr 1 00:16:02 Tower kernel: BTRFS: read error corrected: ino 101233 off 684728320 (dev /dev/sdi1 sector 262307968)

Link to comment

Thanks guys... I had two issues, both of which are now resolved.  I had a bad SATA cable on the primary cache disk and I'm guessing that may have had something to do with my docker.img going corrupt.  Cable was replaced and docker.img re-created and issues seem to have gone away.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.