Helmonder Posted February 11, 2017 Share Posted February 11, 2017 After doing some changes to my BTRFS volumes I experienced a BTRFS error. The volume got marked read-only. Now my unraid system refuses to start. This is what I did and what happened: http://lime-technology.com/forum/index.php?topic=56470.msg538907#msg538907 At the moment only disk10, disk11, disk3, disk6, disk7 and disk9 get mounted. Disk8 does not get mounted and an error appears in the log when it tries to get mounted: Feb 11 13:13:16 Tower emhttp: shcmd (47): mkdir -p /mnt/disk8 Feb 11 13:13:16 Tower emhttp: shcmd (48): set -o pipefail ; mount -t btrfs -o noatime,nodiratime /dev/md8 /mnt/disk8 |& logger Feb 11 13:13:16 Tower kernel: BTRFS info (device md8): disk space caching is enabled Feb 11 13:13:16 Tower kernel: BTRFS info (device md8): has skinny extents Feb 11 13:13:16 Tower vsftpd[13363]: connect from 127.0.0.1 (127.0.0.1) Feb 11 13:13:22 Tower vsftpd[13396]: connect from 127.0.0.1 (127.0.0.1) Feb 11 13:13:29 Tower vsftpd[13427]: connect from 127.0.0.1 (127.0.0.1) Feb 11 13:13:37 Tower vsftpd[13461]: connect from 127.0.0.1 (127.0.0.1) Feb 11 13:13:47 Tower vsftpd[13499]: connect from 127.0.0.1 (127.0.0.1) Feb 11 13:13:53 Tower in.telnetd[13525]: connect from 192.168.1.36 (192.168.1.36) Feb 11 13:13:57 Tower vsftpd[13543]: connect from 127.0.0.1 (127.0.0.1) Feb 11 13:14:00 Tower login[13526]: ROOT LOGIN on '/dev/pts/0' from '192.168.1.36' Feb 11 13:14:07 Tower vsftpd[13596]: connect from 127.0.0.1 (127.0.0.1) Feb 11 13:14:17 Tower vsftpd[13638]: connect from 127.0.0.1 (127.0.0.1) Feb 11 13:14:27 Tower vsftpd[13688]: connect from 127.0.0.1 (127.0.0.1) Feb 11 13:14:37 Tower vsftpd[13734]: connect from 127.0.0.1 (127.0.0.1) Feb 11 13:14:47 Tower vsftpd[13777]: connect from 127.0.0.1 (127.0.0.1) Feb 11 13:14:57 Tower vsftpd[13821]: connect from 127.0.0.1 (127.0.0.1) Feb 11 13:15:07 Tower vsftpd[13865]: connect from 127.0.0.1 (127.0.0.1) Feb 11 13:15:17 Tower vsftpd[13909]: connect from 127.0.0.1 (127.0.0.1) Feb 11 13:15:27 Tower vsftpd[13952]: connect from 127.0.0.1 (127.0.0.1) Feb 11 13:15:28 Tower kernel: BUG: unable to handle kernel NULL pointer dereference at 000000000000035c Feb 11 13:15:28 Tower kernel: IP: [<ffffffff812dd4af>] flush_space+0x44/0x472 Feb 11 13:15:28 Tower kernel: PGD 7cd245067 Feb 11 13:15:28 Tower kernel: PUD 7ce2be067 Feb 11 13:15:28 Tower kernel: PMD 0 Feb 11 13:15:28 Tower kernel: Feb 11 13:15:28 Tower kernel: Oops: 0000 [#1] PREEMPT SMP Feb 11 13:15:28 Tower kernel: Modules linked in: md_mod nct6775 hwmon_vid bonding e1000e ptp pps_core x86_pkg_temp_thermal coretemp i2c_i801 i2c_smbus mpt3sas kvm_intel ahci raid_class i2c_core libahci scsi_transport_sas kvm ipmi_si video backlight [last unloaded: pps_core] Feb 11 13:15:28 Tower kernel: CPU: 2 PID: 13342 Comm: mount Not tainted 4.9.8-unRAID #1 Feb 11 13:15:28 Tower kernel: Hardware name: Supermicro X9SCL/X9SCM/X9SCL/X9SCM, BIOS 2.0b 09/17/2012 Feb 11 13:15:28 Tower kernel: task: ffff88080a7c5940 task.stack: ffffc9000b650000 Feb 11 13:15:28 Tower kernel: RIP: 0010:[<ffffffff812dd4af>] [<ffffffff812dd4af>] flush_space+0x44/0x472 Feb 11 13:15:28 Tower kernel: RSP: 0018:ffffc9000b6537d8 EFLAGS: 00010246 Feb 11 13:15:28 Tower kernel: RAX: 0000000000020000 RBX: 0000000000000000 RCX: 0000000000020000 Feb 11 13:15:28 Tower kernel: RDX: 0000000000020000 RSI: ffff880807fb8400 RDI: 0000000000000000 Feb 11 13:15:28 Tower kernel: RBP: ffffc9000b653870 R08: 0000000000000001 R09: 0000000000000000 Feb 11 13:15:28 Tower kernel: R10: ffff88080a2eb418 R11: 0000000000000000 R12: 00000000ffffffff Feb 11 13:15:28 Tower kernel: R13: ffff880807fb8400 R14: 0000000000000002 R15: ffff880807fb8400 Feb 11 13:15:28 Tower kernel: FS: 00002b170e849e40(0000) GS:ffff88082fc80000(0000) knlGS:0000000000000000 Feb 11 13:15:28 Tower kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Feb 11 13:15:28 Tower kernel: CR2: 000000000000035c CR3: 00000007cd18e000 CR4: 00000000000406e0 Feb 11 13:15:28 Tower kernel: Stack: Feb 11 13:15:28 Tower kernel: 0000000000000000 0000000000000000 ffffc9000b653810 ffffffff812d4017 Feb 11 13:15:28 Tower kernel: 00000000ffffffe4 ffff880807fb8400 ffff8807ee8fa000 0000000000020000 Feb 11 13:15:28 Tower kernel: ffffffff812d7ff4 ffffc9000b653870 ffffffff812d8070 ffff8807ee8fa148 Feb 11 13:15:28 Tower kernel: Call Trace: Feb 11 13:15:28 Tower kernel: [<ffffffff812d4017>] ? get_alloc_profile+0xd0/0x166 Feb 11 13:15:28 Tower kernel: [<ffffffff812d7ff4>] ? btrfs_get_alloc_profile+0x2b/0x2d Feb 11 13:15:28 Tower kernel: [<ffffffff812d8070>] ? can_overcommit+0x7a/0x100 Feb 11 13:15:28 Tower kernel: [<ffffffff812de190>] reserve_metadata_bytes+0x569/0x651 Feb 11 13:15:28 Tower kernel: [<ffffffff813a807d>] ? __radix_tree_lookup+0x2b/0x86 Feb 11 13:15:28 Tower kernel: [<ffffffff812de87f>] btrfs_block_rsv_refill+0x6b/0x91 Feb 11 13:15:28 Tower kernel: [<ffffffff812f9c09>] btrfs_evict_inode+0x305/0x491 Feb 11 13:15:28 Tower kernel: [<ffffffff81136cc5>] evict+0xb8/0x16d Feb 11 13:15:28 Tower kernel: [<ffffffff811373b6>] iput+0x163/0x170 Feb 11 13:15:28 Tower kernel: [<ffffffff812fa827>] btrfs_orphan_cleanup+0x326/0x394 Feb 11 13:15:28 Tower kernel: [<ffffffff813391ae>] btrfs_recover_relocation+0x3b6/0x3cc Feb 11 13:15:28 Tower kernel: [<ffffffff812e8d6b>] ? btrfs_cleanup_fs_roots+0x12e/0x140 Feb 11 13:15:28 Tower kernel: [<ffffffff812eccc0>] open_ctree+0x1e1b/0x208e Feb 11 13:15:28 Tower kernel: [<ffffffff812c82ef>] btrfs_mount+0xb37/0xd1e Feb 11 13:15:28 Tower kernel: [<ffffffff810e2176>] ? pcpu_alloc+0x3d5/0x4c1 Feb 11 13:15:28 Tower kernel: [<ffffffff811248fd>] mount_fs+0xf/0x84 Feb 11 13:15:28 Tower kernel: [<ffffffff8113a78a>] ? alloc_vfsmnt+0x189/0x215 Feb 11 13:15:28 Tower kernel: [<ffffffff811248fd>] ? mount_fs+0xf/0x84 Feb 11 13:15:28 Tower kernel: [<ffffffff8113a87b>] vfs_kern_mount+0x65/0xf7 Feb 11 13:15:28 Tower kernel: [<ffffffff812c7ae3>] btrfs_mount+0x32b/0xd1e Feb 11 13:15:28 Tower kernel: [<ffffffff813b4903>] ? find_next_zero_bit+0x17/0x1d Feb 11 13:15:28 Tower kernel: [<ffffffff810e2176>] ? pcpu_alloc+0x3d5/0x4c1 Feb 11 13:15:28 Tower kernel: [<ffffffff811248fd>] mount_fs+0xf/0x84 Feb 11 13:15:28 Tower kernel: [<ffffffff811248fd>] ? mount_fs+0xf/0x84 Feb 11 13:15:28 Tower kernel: [<ffffffff8113a87b>] vfs_kern_mount+0x65/0xf7 Feb 11 13:15:28 Tower kernel: [<ffffffff8113d196>] do_mount+0x744/0xa23 Feb 11 13:15:28 Tower kernel: [<ffffffff810ddec4>] ? strndup_user+0x3a/0x6f Feb 11 13:15:28 Tower kernel: [<ffffffff8113d66b>] SyS_mount+0x72/0x9a Feb 11 13:15:28 Tower kernel: [<ffffffff8167d1b7>] entry_SYSCALL_64_fastpath+0x1a/0xa9 Feb 11 13:15:28 Tower kernel: Code: ec 70 41 83 f9 05 0f 87 3b 04 00 00 48 89 4d a0 48 89 d0 49 89 f7 48 89 fb 42 ff 24 cd 40 be 83 81 41 83 cc ff 41 83 f8 01 75 18 <8b> 8f 5c 03 00 00 31 d2 c1 e1 04 48 f7 f1 85 c0 41 0f 44 c0 44 Feb 11 13:15:28 Tower kernel: RIP [<ffffffff812dd4af>] flush_space+0x44/0x472 Feb 11 13:15:28 Tower kernel: RSP <ffffc9000b6537d8> Feb 11 13:15:28 Tower kernel: CR2: 000000000000035c Feb 11 13:15:28 Tower kernel: ---[ end trace 279c8d91daf3797c ]--- Feb 11 13:15:28 Tower emhttp: err: shcmd: shcmd (48): exit status: -119 Feb 11 13:15:28 Tower emhttp: mount error: No file system (-119) Feb 11 13:15:28 Tower emhttp: shcmd (49): umount /mnt/disk8 |& logger Feb 11 13:15:28 Tower root: umount: /mnt/disk8: not mounted Feb 11 13:15:28 Tower emhttp: shcmd (50): rmdir /mnt/disk8 Feb 11 13:15:28 Tower emhttp: shcmd (51): mkdir -p /mnt/disk9 Feb 11 13:15:28 Tower emhttp: shcmd (52): set -o pipefail ; mount -t btrfs -o noatime,nodiratime /dev/md9 /mnt/disk9 |& logger Feb 11 13:15:28 Tower kernel: BTRFS info (device md9): disk space caching is enabled Feb 11 13:15:28 Tower kernel: BTRFS info (device md9): has skinny extents Feb 11 13:15:28 Tower kernel: BTRFS info (device md9): bdev /dev/md9 errs: wr 0, rd 0, flush 0, corrupt 3456, gen 0 Feb 11 13:15:41 Tower emhttp: shcmd (53): btrfs filesystem resize max /mnt/disk9 |& logger Feb 11 13:15:41 Tower root: Resize '/mnt/disk9' of 'max' Feb 11 13:15:41 Tower kernel: BTRFS info (device md9): new size for /dev/md9 is 6001175072768 Feb 11 13:15:41 Tower emhttp: shcmd (54): mkdir -p /mnt/disk10 Feb 11 13:15:41 Tower emhttp: shcmd (55): set -o pipefail ; mount -t btrfs -o noatime,nodiratime /dev/md10 /mnt/disk10 |& logger Feb 11 13:15:41 Tower kernel: BTRFS info (device md10): disk space caching is enabled Feb 11 13:15:41 Tower kernel: BTRFS info (device md10): has skinny extents Feb 11 13:15:50 Tower emhttp: shcmd (56): btrfs filesystem resize max /mnt/disk10 |& logger Feb 11 13:15:50 Tower root: Resize '/mnt/disk10' of 'max' Feb 11 13:15:50 Tower kernel: BTRFS info (device md10): new size for /dev/md10 is 6001175072768 Feb 11 13:15:50 Tower emhttp: shcmd (57): mkdir -p /mnt/disk11 Feb 11 13:15:50 Tower emhttp: shcmd (58): set -o pipefail ; mount -t btrfs -o noatime,nodiratime /dev/md11 /mnt/disk11 |& logger Feb 11 13:15:50 Tower kernel: BTRFS info (device md11): disk space caching is enabled Feb 11 13:15:50 Tower kernel: BTRFS info (device md11): has skinny extents Feb 11 13:16:02 Tower emhttp: shcmd (59): btrfs filesystem resize max /mnt/disk11 |& logger Feb 11 13:16:02 Tower root: Resize '/mnt/disk11' of 'max' Feb 11 13:16:02 Tower kernel: BTRFS info (device md11): new size for /dev/md11 is 8001563168768 Feb 11 13:16:02 Tower emhttp: shcmd (60): mkdir -p /mnt/cache Feb 11 13:16:02 Tower emhttp: mount error: No file system (no btrfs UUID) Feb 11 13:16:02 Tower emhttp: shcmd (61): umount /mnt/cache |& logger Feb 11 13:16:02 Tower root: umount: /mnt/cache: not mounted Feb 11 13:16:02 Tower emhttp: shcmd (62): rmdir /mnt/cache Feb 11 13:16:02 Tower emhttp: shcmd (63): sync Kind of worried now.. The server has been at thisd lever for more then 10 minutes and it appears to net get any further.. I can telnet into the system but the webgui does not load and there are no shares.. On advice I am running a memtest, it has been running for 15 minutes now and no errors. I will keep it running for longer but would appreciate help in additional steps to take. The only thing I can think of is pulling the physical drive, then rebooting the system, unraid will hopefully emulate the drive, then add the physical drive back in and copy the data from the emulated drive to the "new" drive.. I am expecting that rebuilding it on itself will not work as it will most likely just recreate the issue I am having now.. Hoping for a better idea by someone... I need some kind of BTRFS /read-only remove command... Quote Link to comment
JorgeB Posted February 11, 2017 Share Posted February 11, 2017 If enable disable array auto-start, try to mount disk8 read only, eg: mkdir /x mount -o recovery,ro /dev/sdX1 /x If successful copy everything to another disk/server and then format that disk. After getting the data (or if you have backups) you can try repairing the filesystem with btrfs check --repair Quote Link to comment
Helmonder Posted February 11, 2017 Author Share Posted February 11, 2017 If enable disable array auto-start, try to mount disk8 read only, eg: mkdir /x mount -o recovery,ro /dev/sdX1 /x If successful copy everything to another disk/server and then format that disk. After getting the data (or if you have backups) you can try repairing the filesystem with btrfs check --repair I started the system again, how do I disable the auto-start ? I have tried mounting using command: mount -o recovery,ro /dev/md8 /x System appears to hang now. Dit I give that command correctly ? Quote Link to comment
JorgeB Posted February 11, 2017 Share Posted February 11, 2017 Disable autostart by editing disk.cfg on you flash drive (flash/config) and changing startArray="yes" to "no". to mount the disk first create a temp mountpoint: mkdir /x then, and since the array won't be started you can't use the md device, use sdX. mount -o recovery,ro /dev/sdX1 /x If that fails you can try btrfs recovery. Quote Link to comment
Helmonder Posted February 11, 2017 Author Share Posted February 11, 2017 Disable autostart by editing disk.cfg on you flash drive (flash/config) and changing startArray="yes" to "no". to mount the disk first create a temp mountpoint: mkdir /x then, and since the array won't be started you can't use the md device, use sdX. mount -o recovery,ro /dev/sdX1 /x If that fails you can try btrfs recovery. Thanks. I found the config and changed the autostart. Shutdown from command still does not work so I will have to force a hard reboot again. Rebooting now. Quote Link to comment
Helmonder Posted February 11, 2017 Author Share Posted February 11, 2017 Disable autostart by editing disk.cfg on you flash drive (flash/config) and changing startArray="yes" to "no". to mount the disk first create a temp mountpoint: mkdir /x then, and since the array won't be started you can't use the md device, use sdX. mount -o recovery,ro /dev/sdX1 /x If that fails you can try btrfs recovery. Thanks. I found the config and changed the autostart. Shutdown from command still does not work so I will have to force a hard reboot again. Rebooting now. How do I know what sdX1 to use ? Quote Link to comment
Helmonder Posted February 11, 2017 Author Share Posted February 11, 2017 Just realised the webpage would be up :-) I found the drive name and can now mount /x Quote Link to comment
Helmonder Posted February 11, 2017 Author Share Posted February 11, 2017 Now I have /x available with disk8.. but where do I copy it to ? Since the array is not up I cannot access anything under /mnt .. Quote Link to comment
JorgeB Posted February 11, 2017 Share Posted February 11, 2017 Give me a few minutes, I'm having lunch. Quote Link to comment
Helmonder Posted February 11, 2017 Author Share Posted February 11, 2017 Absolutely... i thoroughly appreciate the help.. i ammout of home for a bit. Thinking of just setting disk8 to disabled and then startong the array.... then i could copy using mc .. Verzonden vanaf mijn iPhone met Tapatalk Quote Link to comment
JorgeB Posted February 11, 2017 Share Posted February 11, 2017 OK, try this: In the same disk.cfg you editet before change diskFsType.8="auto" or "btfrs" to "xfs", this should allow you to start the array without crashing, then use mc or any other util to copy from /x to /mnt/diskX or /mnt/user/sharename Quote Link to comment
Helmonder Posted February 11, 2017 Author Share Posted February 11, 2017 It said btrfs, I have changed it to xfs I can start the array. Both my disk8 but also one of my cachedrives appear to be unmountable. I am going to focus on disk8 first. Quote Link to comment
JorgeB Posted February 11, 2017 Share Posted February 11, 2017 Disk8 is expected to be unmountable, strange about the cache, but yes deal with disk8 first, then post your diags. Quote Link to comment
Helmonder Posted February 11, 2017 Author Share Posted February 11, 2017 Copies are running.. I do notice that the transfers are really fast.. I have like 80MB/s sustained... And there is no cache drive.. I checked in console though and files are getting written. I'll let the copies finish and then look further. I have already attached diagnostics. tower-diagnostics-20170211-1648.zip Quote Link to comment
JorgeB Posted February 11, 2017 Share Posted February 11, 2017 All cache disks are being detected as new, strange if it was working correctly, maybe disk8 being unmountable is causing some confusion, so let's wait until disk8 is mountable to see if the issue persists. Completely unrelated but before I forget, this is not good for parity checks with LSI controllers: Feb 11 16:38:29 Tower kernel: mdcmd (31): set md_num_stripes 4264 Feb 11 16:38:29 Tower kernel: mdcmd (32): set md_sync_window 1920 Feb 11 16:38:29 Tower kernel: mdcmd (33): set md_sync_thresh 192 sync_tresh needs to me much higher, close to sync_window, change it to 1872, parity check speed should improve considerably. Quote Link to comment
Helmonder Posted February 11, 2017 Author Share Posted February 11, 2017 I as soon aa all the data is over i will format disk8 as xfs (getting a bit nervous about btrfs ;-). Disk8 is a full 3tb so this will take some time. Verzonden vanaf mijn iPhone met Tapatalk Quote Link to comment
JorgeB Posted February 11, 2017 Share Posted February 11, 2017 (getting a bit nervous about btrfs ;-). Can't blame you, I like btrfs and some of its features, but it can be complicated when there's trouble. It's possible that btrfs check --repair would fix your problem, but it's only recommended as a last resort, because it could also make it worse, so this is much more work but safer since it's non destructive. Quote Link to comment
Helmonder Posted February 11, 2017 Author Share Posted February 11, 2017 Jup.. I have been reading up since this morning and I got the same impression.. and I do have backups, but still.. Verzonden vanaf mijn iPhone met Tapatalk Quote Link to comment
Helmonder Posted February 11, 2017 Author Share Posted February 11, 2017 Copies are running.. I see three lines appearing on my console display, I do not see them in the syslog: ERROR: system chunk array too small 34 < 97 ERROR: superblock checksum matches but it has invalid members ERROR: cannot scan /dev/sdf1: Input/output error sdf is my primary parity drive .. Log is flooded with the following line: Feb 11 21:24:24 Tower shfs/user: err: shfs_mkdir: assign_disk: system (123) No medium found Feb 11 21:24:28 Tower shfs/user: err: shfs_mkdir: assign_disk: system (123) No medium found Quote Link to comment
JorgeB Posted February 11, 2017 Share Posted February 11, 2017 Parity has no filesystem but when there's an odd number of data disks with the same filesystem it will appear to have one, those errors are harmless, the no medium found errors look harmless also. Quote Link to comment
Helmonder Posted February 12, 2017 Author Share Posted February 12, 2017 Alraity then... disk8 has been fully copied to other parts of the array (I did not have enough space available on one individual drive so I had to copy the data in parts). Since disk8 was read only mounted to /x I suspect that formatting it now will not work, so I am rebooting. When the array is back up it is my plan to format drive8. Next thing is... How do I get my cache drive back which seems to have misteriously died on me in an unrealated event.. Quote Link to comment
Helmonder Posted February 12, 2017 Author Share Posted February 12, 2017 System is back up and disk8 is now getting formatted with XFS.. And.... My cache drive pool is back !! After reboot everything appears to function.. The whole btrfs thing must have messed something up in the btrfs logic.. Quote Link to comment
Helmonder Posted February 12, 2017 Author Share Posted February 12, 2017 All cache disks are being detected as new, strange if it was working correctly, maybe disk8 being unmountable is causing some confusion, so let's wait until disk8 is mountable to see if the issue persists. Completely unrelated but before I forget, this is not good for parity checks with LSI controllers: Feb 11 16:38:29 Tower kernel: mdcmd (31): set md_num_stripes 4264 Feb 11 16:38:29 Tower kernel: mdcmd (32): set md_sync_window 1920 Feb 11 16:38:29 Tower kernel: mdcmd (33): set md_sync_thresh 192 sync_tresh needs to me much higher, close to sync_window, change it to 1872, parity check speed should improve considerably. Just changed this, thanks ! Quote Link to comment
JorgeB Posted February 12, 2017 Share Posted February 12, 2017 System is back up and disk8 is now getting formatted with XFS.. And.... My cache drive pool is back !! After reboot everything appears to function.. The whole btrfs thing must have messed something up in the btrfs logic.. Good, that was my hope as there was no other reason I could see for the problem. Quote Link to comment
Helmonder Posted February 12, 2017 Author Share Posted February 12, 2017 System is back up and disk8 is now getting formatted with XFS.. And.... My cache drive pool is back !! After reboot everything appears to function.. The whole btrfs thing must have messed something up in the btrfs logic.. Good, that was my hope as there was no other reason I could see for the problem. Johny.. Seriously.. You have been a tremendous help in this whole ordeal.. Thanks thanks thanks ! Can I send you a bottle of Johny black ? Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.