Corrupt docker.img


Recommended Posts

I'm seeing this error in my log

 

Sep 12 03:07:28 HunterNAS rsyncd[10804]: rsync: get_xattr_names: llistxattr(""/mnt/cache/appdata/PlexMediaServer/Library/Application Support/Plex Media Server/Metadata/Albums/0/a2afe4e8de3cdaf8c0c1646f5fa9458d559d25c.bundle/Contents/_combined/posters"",1024) failed: Input/output error (5)

 

Does this mean that my docker needs to be enlarged?  I'm seeing this error over and over...filling my log...

 

Log too big to post here, but its available from this link:

https://dl.dropboxusercontent.com/u/35207785/hunternas-diagnostics-20160913-1808.zip

Link to comment

I'm seeing this error in my log

 

Sep 12 03:07:28 HunterNAS rsyncd[10804]: rsync: get_xattr_names: llistxattr(""/mnt/cache/appdata/PlexMediaServer/Library/Application Support/Plex Media Server/Metadata/Albums/0/a2afe4e8de3cdaf8c0c1646f5fa9458d559d25c.bundle/Contents/_combined/posters"",1024) failed: Input/output error (5)

 

Does this mean that my docker needs to be enlarged?  I'm seeing this error over and over...filling my log...

 

Log too big to post here, but its available from this link:

https://dl.dropboxusercontent.com/u/35207785/hunternas-diagnostics-20160913-1808.zip

Possibly a loose cable on the cache drive.

Sep 12 02:05:45 HunterNAS kernel: ata7: exception Emask 0x10 SAct 0x0 SErr 0x10000 action 0xe frozen
Sep 12 02:05:45 HunterNAS kernel: ata7: irq_stat 0x80400000, PHY RDY changed
Sep 12 02:05:45 HunterNAS kernel: ata7: SError: { PHYRdyChg }
Sep 12 02:05:45 HunterNAS kernel: ata7: hard resetting link
Sep 12 02:05:46 HunterNAS kernel: ata7: SATA link down (SStatus 0 SControl 300)
Sep 12 02:05:51 HunterNAS kernel: ata7: hard resetting link
Sep 12 02:05:51 HunterNAS kernel: ata7: SATA link down (SStatus 0 SControl 300)
Sep 12 02:05:51 HunterNAS kernel: ata7: limiting SATA link speed to 1.5 Gbps
Sep 12 02:05:56 HunterNAS kernel: ata7: hard resetting link
Sep 12 02:05:57 HunterNAS kernel: ata7: SATA link down (SStatus 0 SControl 310)
Sep 12 02:05:57 HunterNAS kernel: ata7.00: disabled
Sep 12 02:05:57 HunterNAS kernel: ata7: EH complete
Sep 12 02:05:57 HunterNAS kernel: sd 8:0:0:0: rejecting I/O to offline device
Sep 12 02:05:57 HunterNAS kernel: sd 8:0:0:0: [sde] killing request
Sep 12 02:05:57 HunterNAS kernel: sd 8:0:0:0: rejecting I/O to offline device
Sep 12 02:05:57 HunterNAS kernel: BTRFS: bdev /dev/sde1 errs: wr 1, rd 0, flush 0, corrupt 0, gen 0
Sep 12 02:05:57 HunterNAS kernel: sd 8:0:0:0: rejecting I/O to offline device
Sep 12 02:05:57 HunterNAS kernel: BTRFS: bdev /dev/sde1 errs: wr 2, rd 0, flush 0, corrupt 0, gen 0
Sep 12 02:05:57 HunterNAS kernel: sd 8:0:0:0: rejecting I/O to offline device
Sep 12 02:05:57 HunterNAS kernel: BTRFS: bdev /dev/sde1 errs: wr 3, rd 0, flush 0, corrupt 0, gen 0
Sep 12 02:05:57 HunterNAS kernel: sd 8:0:0:0: rejecting I/O to offline device
Sep 12 02:05:57 HunterNAS kernel: BTRFS: bdev /dev/sde1 errs: wr 4, rd 0, flush 0, corrupt 0, gen 0
Sep 12 02:05:57 HunterNAS kernel: sd 8:0:0:0: rejecting I/O to offline device
Sep 12 02:05:57 HunterNAS kernel: BTRFS: bdev /dev/sde1 errs: wr 5, rd 0, flush 0, corrupt 0, gen 0
Sep 12 02:05:57 HunterNAS kernel: sd 8:0:0:0: rejecting I/O to offline device
Sep 12 02:05:57 HunterNAS kernel: BTRFS: bdev /dev/sde1 errs: wr 6, rd 0, flush 0, corrupt 0, gen 0
Sep 12 02:05:57 HunterNAS kernel: sd 8:0:0:0: rejecting I/O to offline device
Sep 12 02:05:57 HunterNAS kernel: BTRFS: bdev /dev/sde1 errs: wr 7, rd 0, flush 0, corrupt 0, gen 0
Sep 12 02:05:57 HunterNAS kernel: ata7.00: detaching (SCSI 8:0:0:0)
Sep 12 02:05:57 HunterNAS kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO
Sep 12 02:05:57 HunterNAS kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO
Sep 12 02:05:57 HunterNAS kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO
Sep 12 02:05:57 HunterNAS kernel: sd 8:0:0:0: [sde] UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00
Sep 12 02:05:57 HunterNAS kernel: sd 8:0:0:0: [sde] CDB: opcode=0x2a 2a 00 04 6e 48 68 00 05 40 00
Sep 12 02:05:57 HunterNAS kernel: blk_update_request: I/O error, dev sde, sector 74336360
Sep 12 02:05:57 HunterNAS kernel: BTRFS: bdev /dev/sde1 errs: wr 8, rd 0, flush 0, corrupt 0, gen 0
Sep 12 02:05:57 HunterNAS kernel: sd 8:0:0:0: [sde] Synchronizing SCSI cache
Sep 12 02:05:57 HunterNAS kernel: sd 8:0:0:0: [sde] Synchronize Cache(10) failed: Result: hostbyte=0x04 driverbyte=0x00
Sep 12 02:05:57 HunterNAS kernel: sd 8:0:0:0: [sde] Stopping disk
Sep 12 02:05:57 HunterNAS kernel: sd 8:0:0:0: [sde] Start/Stop Unit failed: Result: hostbyte=0x04 driverbyte=0x00
Sep 12 02:05:59 HunterNAS kernel: ata7: exception Emask 0x10 SAct 0x0 SErr 0x4000000 action 0xe frozen
Sep 12 02:05:59 HunterNAS kernel: ata7: irq_stat 0x80000040, connection status changed
Sep 12 02:05:59 HunterNAS kernel: ata7: SError: { DevExch }
Sep 12 02:05:59 HunterNAS kernel: ata7: hard resetting link
Sep 12 02:06:00 HunterNAS kernel: ata7: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Sep 12 02:06:00 HunterNAS kernel: ata7.00: native sectors (1) is smaller than sectors (468862128)
Sep 12 02:06:00 HunterNAS kernel: ata7.00: ATA-8: OCZ-AGILITY3, OCZ-2W4T6Y3445O71DAW, 2.22, max UDMA/133
Sep 12 02:06:00 HunterNAS kernel: ata7.00: 468862128 sectors, multi 16: LBA48 NCQ (depth 31/32), AA
Sep 12 02:06:00 HunterNAS kernel: ata7.00: configured for UDMA/133
Sep 12 02:06:00 HunterNAS kernel: ata7: EH complete
Sep 12 02:06:00 HunterNAS kernel: scsi 8:0:0:0: Direct-Access     ATA      OCZ-AGILITY3     2.22 PQ: 0 ANSI: 5
Sep 12 02:06:00 HunterNAS kernel: sd 8:0:0:0: [sdn] 468862128 512-byte logical blocks: (240 GB/224 GiB)
Sep 12 02:06:00 HunterNAS kernel: sd 8:0:0:0: [sdn] 4096-byte physical blocks
Sep 12 02:06:00 HunterNAS kernel: sd 8:0:0:0: [sdn] Write Protect is off
Sep 12 02:06:00 HunterNAS kernel: sd 8:0:0:0: [sdn] Mode Sense: 00 3a 00 00
Sep 12 02:06:00 HunterNAS kernel: sd 8:0:0:0: [sdn] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Sep 12 02:06:00 HunterNAS kernel: sd 8:0:0:0: Attached scsi generic sg4 type 0
Sep 12 02:06:00 HunterNAS kernel: sdn: sdn1
Sep 12 02:06:00 HunterNAS kernel: sd 8:0:0:0: [sdn] Attached SCSI disk
Sep 12 02:06:15 HunterNAS shfs/user: shfs_fsync: fsync: (5) Input/output error
Sep 12 02:06:15 HunterNAS shfs/user: shfs_read: read: (5) Input/output error
Sep 12 02:06:15 HunterNAS shfs/user: shfs_read: read: (5) Input/output error

 

Then the system remounted the drive as read-only

Sep 12 02:06:25 HunterNAS shfs/user: shfs_write: write: (30) Read-only file system

 

Because of this, the docker.img file also has some issues

Sep 12 02:06:48 HunterNAS kernel: loop: Write error at byte offset 2240770048, length 4096.
Sep 12 02:06:48 HunterNAS kernel: blk_update_request: I/O error, dev loop0, sector 4376504
Sep 12 02:06:48 HunterNAS kernel: BTRFS: bdev /dev/loop0 errs: wr 1, rd 0, flush 0, corrupt 0, gen 0

But then the system actually detected corruption on the cache drive

Sep 12 03:07:06 HunterNAS kernel: BTRFS info (device sde1): no csum found for inode 347850 start 0
Sep 12 03:07:06 HunterNAS kernel: BTRFS info (device sde1): no csum found for inode 347850 start 4096
Sep 12 03:07:06 HunterNAS kernel: BTRFS info (device sde1): no csum found for inode 347850 start 8192
Sep 12 03:07:06 HunterNAS kernel: BTRFS info (device sde1): no csum found for inode 347850 start 12288
Sep 12 03:07:06 HunterNAS kernel: BTRFS info (device sde1): no csum found for inode 347850 start 16384
Sep 12 03:07:06 HunterNAS kernel: BTRFS info (device sde1): no csum found for inode 347850 start 20480
Sep 12 03:07:06 HunterNAS kernel: BTRFS info (device sde1): no csum found for inode 347850 start 24576
Sep 12 03:07:06 HunterNAS kernel: BTRFS info (device sde1): no csum found for inode 347850 start 28672
Sep 12 03:07:06 HunterNAS kernel: BTRFS info (device sde1): no csum found for inode 347850 start 32768

All of these errors that you noticed

Sep 12 03:07:19 HunterNAS rsyncd[10804]: rsync: get_xattr_names: llistxattr(""/mnt/cache/appdata/PlexMediaServer/Library/Application Support/Plex Media Server/Metadata/Albums/0/a2afe4e8de3cdaf8c0c1646f5fa9458d559d25c.bundle/Contents/_combined/posters"",1024) failed: Input/output error (5)

are actually generated by CA's appdata module, and are being logged into the syslog because the logfile that CA normally uses is located within the docker.img file, but its now trashed.

 

Additionally, your destination disk that you have set for CA's backup appears to be completely full

Sep 10 03:10:25 HunterNAS logger: 2016/09/10 03:10:22 [1340] rsync: rename "/[email protected] Support/Plex Media Server/Metadata/Movies/4/14739a31753e9114620e67be06e86e5c7ca83cc.bundle/Contents/com.plexapp.agents.themoviedb/posters/.3ac4f73ebc9a30ce12131466d6d0803e337e3d8b.Wd1Ft1" -> "PlexMediaServer/Library/Application Support/Plex Media Server/Metadata/Movies/4/14739a31753e9114620e67be06e86e5c7ca83cc.bundle/Contents/com.plexapp.agents.themoviedb/posters/3ac4f73ebc9a30ce12131466d6d0803e337e3d8b": No space left on device (28)

 

The BTRFS issue with the cache drive, I'm going to leave to other experts to handle, as I have no experience with btrfs on an array drive.

 

While I can't tell (due to the anonymising feature of diagnostics) the actual destination for CA's backup, after you get the cache drive problem sorted out, you might want to insure that if CA's backup is going to separate dated destinations that you also have CA set to delete the backups after so many days.  Wouldn't be a bad idea either to go to the Miscellaneous tab of CA Backup and delete all errored out backup sets, and to set up CA to only notify you about errors on backups (right now its emailing you about starts and stops, so you might not have even checked when it emailed you about the errors)

 

Can't tell you anything about what to do with regards to the docker.img file as of yet (not until the corruption present on the cache drive itself gets sorted out)

 

Additionally, FCP is throwing out errors and winds up flagging every plugin as being incompatible, but that one's my problem...  (looks like that the time you ran FCP the CA's application feed had some issues with it and wasn't returning the complete list of apps)

 

 

 

I'm going to request a mod to split this out to a new topic as this is going to wind up (and is) being completely OT for this thread.  Hopefully they'll post the new URL for this...

 

Link to comment

Thanks for this great analysis.  Good things to look for in the future.

 

Some follow-ups:

 

1. RE: Loose Cable on Cache Drive/"Read Only" - I'm able to copy files through the cache drive to their destination (113mb/s across gigabit enet), and the cache drive accepts them and the mover moves them to their final destination.  So I'm confused at why copying files through the cache to the array would work.  Perhaps just the docker image is corrupt?

 

2. RE: "But then the system actually detected corruption on the cache drive" - so it feels like the drive is bad?  Perhaps just corrupted in the spot where the docker.img file resides?  I'll run some diagnostics.

 

3. RE: "Additionally, your destination disk that you have set for CA's backup appears to be completely full" - This is true, when I originally installed Appdata backup, I had set the auto delete feature for a few days (daily backups), when I went to look at it again, I see its been reset to "never delete" AND the link is unresponsive.  See the second image HTML Inspect for the INPUT...

 

width=300http://my.jetscreenshot.com/12412/20160914-jdne-36kb.jpg[/img]

width=300http://my.jetscreenshot.com/12412/20160914-mnbx-177kb.jpg[/img]

Link to comment

Thanks for this great analysis.  Good things to look for in the future.

 

Some follow-ups:

 

1. RE: Loose Cable on Cache Drive/"Read Only" - I'm able to copy files through the cache drive to their destination (113mb/s across gigabit enet), and the cache drive accepts them and the mover moves them to their final destination.  So I'm confused at why copying files through the cache to the array would work.  Perhaps just the docker image is corrupt?

 

2. RE: "But then the system actually detected corruption on the cache drive" - so it feels like the drive is bad?  Perhaps just corrupted in the spot where the docker.img file resides?  I'll run some diagnostics.

 

3. RE: "Additionally, your destination disk that you have set for CA's backup appears to be completely full" - This is true, when I originally installed Appdata backup, I had set the auto delete feature for a few days (daily backups), when I went to look at it again, I see its been reset to "never delete" AND the link is unresponsive.  See the second image HTML Inspect for the INPUT...

 

width=300http://my.jetscreenshot.com/12412/20160914-jdne-36kb.jpg[/img]

width=300http://my.jetscreenshot.com/12412/20160914-mnbx-177kb.jpg[/img]

With the log being such a mess, and with all those huge error blocks its a real pain to read, I would reset the server, copy some files over, wait say an hour and then post another diagnostics. 

 

My eyes were getting blurry trying to find stuff in your log, and may have made mistakes in the analysis.

Link to comment

Howdy Jeffrey!  Squid mentioned your diagnostics to me, in connection with something else, and I took a look at it.  We weren't talking *to* you, but were talking *about* you (actually, your syslog), so I thought you might be interested in my comments, primarily here.

 

I agree, a smaller dumpster is a better idea.  I had gone from a post a long time ago...

to expand the log file.  Not knowing any better, I followed suit and went larger (bigger is always better right?!?)...  So I'll reset to a smaller size, reboot and repost...

 

# resize tmpfs

mount -o remount,size=8m /var/log

 

Link to comment

# resize tmpfs

mount -o remount,size=8m /var/log

 

I don't think this is a wise idea...

 

Besides storing 'syslog', the folder /var/log is used for various other temporary storage files. You run a great risk a number of functions won't work properly anymore (e.g. checking for updates).

 

I'm all about wisdom.  Where should it be kept?  I believe this is the default location?

Link to comment

With the log being such a mess, and with all those huge error blocks its a real pain to read, I would reset the server, copy some files over, wait say an hour and then post another diagnostics. 

 

My eyes were getting blurry trying to find stuff in your log, and may have made mistakes in the analysis.

 

Ok, reboot, smaller log file attached.  As always, thanks for your wisdom...

hunternas-syslog-20160919-1258.zip

Link to comment

# resize tmpfs

mount -o remount,size=8m /var/log

 

I don't think this is a wise idea...

 

Besides storing 'syslog', the folder /var/log is used for various other temporary storage files. You run a great risk a number of functions won't work properly anymore (e.g. checking for updates).

 

I'm all about wisdom.  Where should it be kept?  I believe this is the default location?

 

You should use at least the default size which is 128MB (I am not talking about moving locations).

 

Link to comment

Howdy Jeffrey!  Squid mentioned your diagnostics to me, in connection with something else, and I took a look at it.  We weren't talking *to* you, but were talking *about* you (actually, your syslog), so I thought you might be interested in my comments, primarily here.

 

RobJ/bonienl, I'm confused.  This article talks about an 8MB sized log file, yet Bonienl has said 128mb should be the smallest.  Is this one of those "opinions"...  Like how you pronounce Rodeo?  Potatoe? - appeared to me that the article was more than opinion?  Not trying to start an argument, just wanting to understand...

Link to comment

Howdy Jeffrey!  Squid mentioned your diagnostics to me, in connection with something else, and I took a look at it.  We weren't talking *to* you, but were talking *about* you (actually, your syslog), so I thought you might be interested in my comments, primarily here.

 

RobJ/bonienl, I'm confused.  This article talks about an 8MB sized log file, yet Bonienl has said 128mb should be the smallest.  Is this one of those "opinions"...  Like how you pronounce Rodeo?  Potatoe? - appeared to me that the article was more than opinion?  Not trying to start an argument, just wanting to understand...

log FILE vs log file storage location. The (hopefully small) log FILE is only one of very many things that need to occupy the space that you were restricting to 8MB.
Link to comment

I would never want to disagree with bonienl!  He probably knows stuff we don't.  When he said:

Besides storing 'syslog', the folder /var/log is used for various other temporary storage files. You run a great risk a number of functions won't work properly anymore (e.g. checking for updates).

He may know of other things being temporarily stored there, then removed later, which we never see.  I don't recall ever seeing anything but logs, and I thought that was all this folder was used for.

 

The reason I initially suggested 8MB is it's very hard to think of a way an 8MB /var/log folder could fill up, and that not be because something is very wrong.  It's just text files in there, and 8MB is a lot of space.  The right number would be where we know there is no way there could validly be that much stuff in the folder.  Any more means something is really wrong.  Once you have a couple of megabytes of extra garbage, how do you justify more space?  It's just more garbage, and a longer delay before it overflows and notifies you somehow.  And if you're going to make it that much bigger, where do you stop?  If 32MB, it's probably over 30MB of garbage to chop off.  If 128MB, it's probably over 126MB of garbage.  If 16GB, it's almost 16GB of garbage.  We are probably always going to only be interested in the first megabyte or less, of the sum total of everything in the folder.  The rest will almost certainly be never looked at.

 

Certainly, one could set it to 32MB.  But unless bonienl knows of larger stuff being temporarily stored there, all you really needed for your issue was a /var/log size of about 1MB, possibly 2MB.  Someone with a lot of Dockers and VM's, and a long uptime (so larger than normal log files), and all Mover logging turned on, and fast renewing DHCP, still would find it very hard to fill up 16MB.

 

But because bonienl has inside knowledge of how it's used, go with whatever he recommends.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.