crash - hard reset - weirdness ensues...


Recommended Posts

I had unRAID completely lockup while I was transferring a ton of data from an offline source to one of my array disks. I could not telnet in, or access the GUI, so I had to do a hard reset.

 

Once unRAID came back up, and I started the array, it began to rebuild the parity -- made sense to me. But some things seem to be broken.

 

I have two docker containers that don't seem to work anymore (delugevpn and sabnzbdvpn, both by binhex). All of my docker containers seem to run, but for those two, I cannot access the webui interface. The page just sits there, waiting to load...

 

I have methodically tried multiple things to get those two containers working, and in the end, even creating a new docker.img, deleting the docker containers and images, and starting with new, clean config/appdata directories, I can't get those two containers working anymore.

 

Unfortunately, that's not my only problem. I just now discovered that screen no longer runs. I installed it using the Nerd Pack, and the Nerd Pack reports that it is still installed, but from the terminal, I just get a command not found.

 

Help! Is my system broken? What else might not be working correctly? What should I do?

Link to comment

System Overview	
unRAID system:	unRAID server Plus, version 6.1.6
Model:	Custom
Motherboard:	ASUSTeK Computer INC. - M3A78-EM
Processor:	AMD Athlon(tm) 7750 Dual-Core @ 2.7 GHz
HVM:	Enabled
IOMMU:	Disabled
Cache:	L1-Cache = 256 kB (max. capacity 256 kB)
L2-Cache = 1024 kB (max. capacity 1024 kB)
L3-Cache = 2048 kB (max. capacity 2048 kB)
Memory:	4096 MB (max. installable capacity 8 GB)
BANK0 = 2048 MB, 667 MHz
BANK1 = 2048 MB, 667 MHz
Network:	eth0: 1000Mb/s - Full Duplex

 

Plugins: unbalance, unassigned devices, preclear disks, open files, dynamix (system statistics, system information, local master, cache directories, active streams), community applications, nerd tools (screen -- see OP).

 

The errors being reported seem to be related to my cache drives:

Feb  8 17:24:23 undrobo kernel: BTRFS: lost page write due to I/O error on /dev/sdm1
Feb  8 17:24:23 undrobo kernel: BTRFS: lost page write due to I/O error on /dev/sdm1
Feb  8 17:24:25 undrobo kernel: btrfs_dev_stat_print_on_error: 5386 callbacks suppressed
Feb  8 17:24:25 undrobo kernel: BTRFS: bdev /dev/sdm1 errs: wr 4832, rd 572, flush 1, corrupt 0, gen 0
Feb  8 17:24:25 undrobo kernel: BTRFS: bdev /dev/sdm1 errs: wr 4833, rd 572, flush 1, corrupt 0, gen 0
Feb  8 17:24:25 undrobo kernel: BTRFS: bdev /dev/sdm1 errs: wr 4833, rd 572, flush 2, corrupt 0, gen 0
Feb  8 17:24:25 undrobo kernel: BTRFS: bdev /dev/sdm1 errs: wr 4834, rd 572, flush 2, corrupt 0, gen 0
Feb  8 17:24:25 undrobo kernel: BTRFS: bdev /dev/sdm1 errs: wr 4835, rd 572, flush 2, corrupt 0, gen 0
Feb  8 17:24:25 undrobo kernel: BTRFS: lost page write due to I/O error on /dev/sdm1

 

I have attached only part of the syslog.txt. It goes on for another 26 MB repeating the same errors. Let me know if you need to see anything else.

syslog-part.txt.zip

Link to comment

Thanks Trurl. I had another lockup, so had to reset once again. After that, one of my cache drives didn't even get recognized by unRAID (/dev/sdm). I tried rebooting one more time, and that time /dev/sdm was recognized... Began to suspect hardware. So I copied everything off the cache to one of my array disks. I changed the data cables on both cache drives. Ran scrub on them, but I'm not really too familiar with btrfs. It reported 10s of 1000s of errors in one section of the report, but in another it seemed to report no errors... Since I had copied all the data off the drives, I decided to preclear them before adding them back to cache.

 

Do you think the data I copied off the cache will be OK, since the filesystem didn't report any errors? Should I rely on that data, or toss it? The only data I really want to keep is /mnt/cache/appdata/.

 

If the preclear (w/ pre-reads and post-reads) reports no issues, should I trust these drives/my hardware?

Link to comment

Similar problems with fresh, pre-cleared and reformatted cache drives. I suspect that maybe there are problems with one or both cache drives that aren't showing up during preclear? Decided to pull the cache drives and just install docker.img and appdata onto a disk in my data array.

 

STILL having problems.

 

Installed a brand new docker.img, re-downloaded delugevpn from binhex and started with a new, empty appdata config directory. Webui for delugevpn still not loading. Some of my other docker containers do seem to be working, but I am using delugevpn as my litmus test. I am concerned that because it is not working, I may have other, as yet undetected problems.

 

Link to comment

Sorry I missed that. What BIOS are you running. Do you have PATA or IDE disabled in your bios. How many drives do you have connected to your motherboard? Is it possible to switch the port your cache drive is connected to. Have you considered redoing your UnRaid install and running it bare, meaning no plugins or dockers for a period of time to see if its stable? Have you run memtest?

Link to comment

BIOS Information	
Vendor:	American Megatrends Inc.
Version:	2003
Release Date:	10/12/2009
Address:	0xF0000
Runtime Size:	64 kB
ROM Size:	1024 kB
Characteristics:	ISA is supported
PCI is supported
PNP is supported
APM is supported
BIOS is upgradeable
BIOS shadowing is allowed
ESCD support is available
Boot from CD is supported
Selectable boot is supported
BIOS ROM is socketed
EDD is supported
5.25"/1.2 MB floppy services are supported (int 13h)
3.5"/720 kB floppy services are supported (int 13h)
3.5"/2.88 MB floppy services are supported (int 13h)
Print screen service is supported (int 5h)
8042 keyboard services are supported (int 9h)
Serial services are supported (int 14h)
Printer services are supported (int 17h)
CGA/mono video services are supported (int 10h)
ACPI is supported
USB legacy is supported
LS-120 boot is supported
ATAPI Zip drive boot is supported
BIOS boot specification is supported
Targeted content distribution is supported
BIOS Revision:	8.14

 

I have 5 SATA ports on my motherboard and am using 4 of them. I also have an 8 port PCIe SATA card, of which I am using all 8 ports, and where my two cache drives were attached.

 

Sure, I can move drives to any ports. I can also reinstall unRAID (yuck), could run it without plugins and dockers, and could try memtest.

 

What do you suggest I do, and in what order?

Link to comment

Lets get some more details first.

 

Is there a video card in your system if so what model?

What is the model of the 8 port PCIe SATA card you are using and what slot is it in.

 

The latest BIOS I can see appears to be 2701 from 2010, might not be a bad idea to upgrade to that first, see if any of the stability issues go away.

 

Also maybe try starting out with one cache drive first, see how that goes.

Link to comment

Updated the BIOS. And yes, cable connections are one of those things that I automatically check. My array consists of 10 disks including parity. All the ports on the iocrest are populated.

 

Just booted with my cache disks attached. Cache disk 1 is showing 34,177,702,359,547,252 writes and 0 errors...

Link to comment

OK. So I ran a short mem test: two full passes. No errors. Removed every plugin except preclear. Changed the cache drives from the PCIe SATA card to the connectors on the motherboard. After rebooting, still having weird errors relted to the chache disks

 

BTRFS: error (device sdc1) in write_all_supers: 3498: errno=-5 IO failure (errors while submitting device barriers.)

 

And then the computer hangs again...

 

So, is my disk bad? Could a bad disk really bring the machine to a screaming halt?

 

I've disconnected the cache disks for now, and all seems good. BUT, no matter what I do, I cannot get binhex's dockers to work again. I've done everything I can think of, I just can't get the webui to load... Any thoughts?

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.