Lost GUI on Reboot - Able to Ping/SSH/Log in Locally - 6.2.0b21

SuperW2 · May 8, 2016

On 6.2.0-beta21... Rebooted server and can longer access GUI.

Can Ping server from Windows PC, can SSH, able to log in directly on server but no Web GUI... Have 2 NIC's both plugged in... tried swapping cables etc (which I tihnk you can see at end of Syslog.

Getting some "<My IP> linkdown" message above my Login prompt on server console, which I'm never really seen before.

Other than some new HD Address (one old and on new, no changes to anyting settings or otherwise to my server recently).

Anything in my Syslog jump out to anyone?

-SW2

syslog.txt

Squid · May 8, 2016

Tools - Diagnostics would be better than a syslog

Squid · May 8, 2016

Actually on an unrelated note to your problem, looks like you have some issues with disk #6. Now I'd really love to see your diagnostics so I can see how the Fix Common Problems plugin responds to it. (or install it yourself)

SuperW2 · May 8, 2016

Tools - Diagnostics would be better than a syslog

How do I get that if I can get to a GUI?

Squid · May 8, 2016

Tools - Diagnostics would be better than a syslog

How do I get that if I can get to a GUI?

from command prompt

diagnostics

ignore any errors. File will be saved somewhere (can't remember where off the top of my head) on the flash drive

SuperW2 · May 8, 2016

Tools - Diagnostics would be better than a syslog

How do I get that if I can get to a GUI?

from command prompt
diagnostics
ignore any errors. File will be saved somewhere (can't remember where off the top of my head) on the flash drive

OK, here is the Diag Zip... I pulled Disk 6 out hoping that the server would boot without it but no change... and my powerdown/shutdown scripts are doing nothing, so I've had to hard power the thing each time

media-diagnostics-20160508-1622.zip

Squid · May 9, 2016

Dunno... everything looks good to me... No obvious errors except for disk6. Hopefully someone better versed at UI not being able to be accessed problems will pipe in

SuperW2 · May 9, 2016

Dunno... everything looks good to me... No obvious errors except for disk6. Hopefully someone better versed at UI not being able to be accessed problems will pipe in

Thanks for the attempt...

I'm kind of desperate... I tried to manually kill emhttp and rerun from the command line "/usr/local/sbin/emhttp &" (searching the forum and trying anything that might help.... Getting a "segfault at 530 ip <long string> sp <another long string> error 4 in ehmttp[<one more slightly less long string>]

Squid · May 9, 2016

Dunno... everything looks good to me... No obvious errors except for disk6. Hopefully someone better versed at UI not being able to be accessed problems will pipe in

Thanks for the attempt...

I'm kind of desperate... I tried to manually kill emhttp and rerun from the command line "/usr/local/sbin/emhttp &" (searching the forum and trying anything that might help.... Getting a "segfault at 530 ip <long string> sp <another long string> error 4 in ehmttp[<one more slightly less long string>]

emhttp isn't designed to be restarted anymore

SuperW2 · May 9, 2016

Dunno... everything looks good to me... No obvious errors except for disk6. Hopefully someone better versed at UI not being able to be accessed problems will pipe in

Thanks for the attempt...

I'm kind of desperate... I tried to manually kill emhttp and rerun from the command line "/usr/local/sbin/emhttp &" (searching the forum and trying anything that might help.... Getting a "segfault at 530 ip <long string> sp <another long string> error 4 in ehmttp[<one more slightly less long string>]

emhttp isn't designed to be restarted anymore

Grrr.... Thanks... I was able to boot on a new USB stick (trial version)... obviously not going to mount any of the drives or whatever, but I can see that it's booted, am able to see the webconsole, etc. At least tells me that network is OK, and hopefully my Server HW... Not sure where to go from here.

-W

Squid · May 9, 2016

Dunno... everything looks good to me... No obvious errors except for disk6. Hopefully someone better versed at UI not being able to be accessed problems will pipe in

Thanks for the attempt...

I'm kind of desperate... I tried to manually kill emhttp and rerun from the command line "/usr/local/sbin/emhttp &" (searching the forum and trying anything that might help.... Getting a "segfault at 530 ip <long string> sp <another long string> error 4 in ehmttp[<one more slightly less long string>]

emhttp isn't designed to be restarted anymore

Grrr.... Thanks... I was able to boot on a new USB stick (trial version)... obviously not going to mount any of the drives or whatever, but I can see that it's booted, am able to see the webconsole, etc. At least tells me that network is OK, and hopefully my Server HW... Not sure where to go from here.

-W

ok. Then your next step is to use the original stick and restart it in "Safe" mode

If that works, then delete all the .plg files from config/plugins on the flash and start normally

Then post what happens before we hit the next stage (I'm going to bed soon)

If that's still not working, then I know how to transfer the configs over to a new stick, but want someone more experienced (itimpi / RobJ / johnny.black) because we also have a disk disabled at the same time, and I don't want to make any mistakes that might result in data loss

SuperW2 · May 9, 2016

Dunno... everything looks good to me... No obvious errors except for disk6. Hopefully someone better versed at UI not being able to be accessed problems will pipe in

Thanks for the attempt...

I'm kind of desperate... I tried to manually kill emhttp and rerun from the command line "/usr/local/sbin/emhttp &" (searching the forum and trying anything that might help.... Getting a "segfault at 530 ip <long string> sp <another long string> error 4 in ehmttp[<one more slightly less long string>]

emhttp isn't designed to be restarted anymore

Grrr.... Thanks... I was able to boot on a new USB stick (trial version)... obviously not going to mount any of the drives or whatever, but I can see that it's booted, am able to see the webconsole, etc. At least tells me that network is OK, and hopefully my Server HW... Not sure where to go from here.

-W

ok. Then your next step is to use the original stick and restart it in "Safe" mode

If that works, then delete all the .plg files from config/plugins on the flash and start normally

Then post what happens before we hit the next stage (I'm going to bed soon)

If that's still not working, then I know how to transfer the configs over to a new stick, but want someone more experienced because we also have a disk disabled at the same time, and I don't want to make any mistakes that might result in data loss

I already tried Safe Mode on the original stick and did exactly the same thing... I'm happy to try removing the PLG's... at this point I"ll try anything.

Squid · May 9, 2016

safe mode effectively disables the plgs

One thing I've seen before is network.cfg gets buggered somehow. Delete config/network.cfg from the flash drive and try it.

failing that,

delete everything in the extra folder on the flash if its there, and everything in the packages folder if its there

edit the file config/docker.cfg and change DOCKER_ENABLED to be "no"

edit the file config/disk.cfg and change startArray to be "no"

If that works, try starting the array.

Failing all that, I'd go with redoing the flash drive or transferring the config to a new one, but I want robj / itimpi / someone else to handle that do to the disabled drive.

SuperW2 · May 9, 2016

safe mode effectively disables the plgs

One thing I've seen before is network.cfg gets buggered somehow. Delete config/network.cfg from the flash drive and try it.

failing that,

delete everything in the extra folder on the flash if its there, and everything in the packages folder if its there

edit the file config/docker.cfg and change DOCKER_ENABLED to be "no"

edit the file config/disk.cfg and change startArray to be "no"

If that works, try starting the array.

Failing all that, I'd go with redoing the flash drive or transferring the config to a new one, but I want robj / itimpi / someone else to handle that do to the disabled drive.

OK, so good news....I manually renamed all the PLG files (Except Dynamix.Plg) to OLD and RM'ed the PLG's. Did the same for Network.Cfg, nothing existed in Extra Folder, I set both Docker and Start Array to "No" by VI editing those files (I'm a Windows guy so this Linux command line stuff is complex!).

After all of that stuff, I was able to reboot (hard power off/on since my powerdown command line stuff still isn't working), and am able to get back into the GUI... the IP Address obviously is now back to a DHCP one instead of the static it was at before, but that's easy to fix.

Now I need to figure out what's going on with that Disk6 (currently pulled out in my Windows Box running SeaTools to see if I can get a fail code or what). Then once I get that sorted, I guess I can go back to re-adding the plugin's one by one and then the docker stuff to see if I can figure out what was handing this thing.

-Sw2

Squid · May 9, 2016

Just for future reference, all of those files I have you to edit you could have done by pulling the stick and editing them with notepad.

Fyi instead of VI if your doing it locally use nano. Far easier (i can't even figure out how to edit VI without looking online)

Sent from my LG-D852 using Tapatalk

SuperW2 · May 9, 2016

Just for future reference, all of those files I have you to edit you could have done by pulling the stick and editing them with notepad.

Fyi instead of VI if your doing it locally use nano. Far easier (i can't even figure out how to edit VI without looking online)

Sent from my LG-D852 using Tapatalk

Thanks, I think something is still wonky... I stuffed Disk6 back into the array, tried to start my Array (and Disk6 was Blue balled), and it just hangs on mounting disks. I am watching the server console and see a final "XFS (md18): Ending clean mount" and then nothing after that and it's just hanging there (GUI is basically unresponsive). Can't do anything else with the console at that point and no disk activity lights on any drive in my array. I'll try to force another shutdown and do another Diag and attach but about ready to give up for the night!

At least I installed the powerdown plugin first, so it "seems" like it might actually do a shut down this time, but I'm not holding my breath!

*EDIT* Nope... not powering down... <sigh>, but able to map to my \\Media\Flash drive to get the new Diags Zip that the Powerdown thing created before it tried to shut down and didn't work. I've attached here.

*EDIT2* Interestingly, I also see a TimeSync Error : ntpd[1800]: kernel reports TIME_ERROR: 0x41: Clock Unsynchronized

I noticed my BIOS time was off by 7-8 hours (forward), when I was in there checking settings and updating BIOS. Wonder if I need a new CMOS Battery.

-W

media-diagnostics-20160508-2129.zip

SuperW2 · May 9, 2016

Any other ideas or analysis of Diags for what to try next? Confirmed that left running overnight last night (9+ hours) that the system is hung at Mounting the Disks, GUI is then unresponsive, but still have console access/SSH, etc (basically the same issue at the start expect that I was able to get to a GUI when setting Arrray Start/Docker to No and disabling all of the plugins.

HELP please! I have 50+TB of stuff that I'm not excited about the possibility of losing.

*Edit* attached diag Zip from this AM after it hung overnight trying to mount the disk... unsure if any new info. Had to hard power cycle the server again as the powerdown script, while now attempts to run still won't shut down the server.

*Edit2* I noticed there is a 7 hour shift in time (this time 7 hours back) in the log even tough I know it was just physically powered off and back on

May 9 00:09:42 media avahi-dnsconfd[2270]: Successfully connected to Avahi daemon.

May 9 00:09:42 media emhttp: autostart disabled

May 9 00:09:43 media avahi-daemon[2261]: Server startup complete. Host name is media.local. Local service cookie is 4271840896.

May 9 00:09:44 media avahi-daemon[2261]: Service "media" (/services/ssh.service) successfully established.

May 9 00:09:44 media avahi-daemon[2261]: Service "media" (/services/smb.service) successfully established.

May 9 00:09:44 media avahi-daemon[2261]: Service "media" (/services/sftp-ssh.service) successfully established.

May 9 07:09:59 media emhttp: shcmd (18): rmmod md-mod |& logger

May 9 07:09:59 media kernel: md: unRAID driver removed

May 9 07:09:59 media emhttp: shcmd (19): modprobe md-mod super=/boot/config/super.dat |& logger

May 9 07:09:59 media kernel: md: unRAID driver 2.6.1 installed

May 9 07:09:59 media emhttp: Pro key detected, GUID: 0781-5506-0000-173EA180319E FILE: /boot/config/Pro.key

May 9 07:09:59 media emhttp: Device inventory:

May 9 07:09:59 media emhttp: shcmd (20): udevadm settle

May 9 07:09:59 media emhttp: SanDisk_Cruzer_0000173EA180319E-0:0 (sda) 4013860

-SW2

media-diagnostics-20160509-0703.zip

SuperW2 · May 9, 2016

Help? Anybody? I'm stuck here and worried!

JorgeB · May 9, 2016

You have fs corruption on disk6.

https://lime-technology.com/wiki/index.php/Check_Disk_Filesystems#Drives_formatted_with_XFS

SuperW2 · May 9, 2016

You have fs corruption on disk6.

https://lime-technology.com/wiki/index.php/Check_Disk_Filesystems#Drives_formatted_with_XFS

Will try a xfs_repair now and see what happens. Thanks!

*Edit* Had to do a -L but it ran for just a few mins, and then reran with a -V and afterwards I was able to start the array... it immediately started a parity check and I have the yellow triangle next to Disk 6, so unsure exactly what is next, but parity will take 19-20 hours or so (at at least typeical time).

*Edit2* Actually the GUI is completely hung again about 60 seconds into it's parity check/rebuild, whatever. Can't stop it, can't get back to GUI at all... still "dead in the water"

SuperW2 · May 9, 2016

In the Syslog, I'm seeing this over and over and over (several times every second).... GUI still hung,

May 9 10:45:24 media kernel: swapper/0: page allocation failure: order:0, mode:0x2080020

May 9 10:45:24 media kernel: CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.4.6-unRAID #1

May 9 10:45:24 media kernel: Hardware name: Supermicro X10SAE/X10SAE, BIOS 3.0 05/20/2015

May 9 10:45:24 media kernel: 0000000000000000 ffff88041dc03c28 ffffffff813688da 0000000000000000

May 9 10:45:24 media kernel: 0000000000000000 ffff88041dc03cc0 ffffffff810bc9b0 ffffffff818b0e38

May 9 10:45:24 media kernel: ffff88041dff9b00 ffffffffffffffff ffffffff008b0680 0000000000000000

May 9 10:45:24 media kernel: Call Trace:

May 9 10:45:24 media kernel: <IRQ> [<ffffffff813688da>] dump_stack+0x61/0x7e

May 9 10:45:24 media kernel: [<ffffffff810bc9b0>] warn_alloc_failed+0x10f/0x127

May 9 10:45:24 media kernel: [<ffffffff810bf9c7>] __alloc_pages_nodemask+0x870/0x8ca

May 9 10:45:24 media kernel: [<ffffffff814333a9>] ? device_has_rmrr+0x5a/0x63

May 9 10:45:24 media kernel: [<ffffffff810bfabd>] __alloc_page_frag+0x9c/0x15f

May 9 10:45:24 media kernel: [<ffffffff8152e310>] __napi_alloc_skb+0x61/0xc1

May 9 10:45:24 media kernel: [<ffffffffa053e92a>] igb_poll+0x441/0xc06 [igb]

May 9 10:45:24 media kernel: [<ffffffff815390ac>] net_rx_action+0xd8/0x226

May 9 10:45:24 media kernel: [<ffffffff8104d4c0>] __do_softirq+0xc3/0x1b6

May 9 10:45:24 media kernel: [<ffffffff8104d73d>] irq_exit+0x3d/0x82

May 9 10:45:24 media kernel: [<ffffffff8100db9a>] do_IRQ+0xaa/0xc2

May 9 10:45:24 media kernel: [<ffffffff8161ab42>] common_interrupt+0x82/0x82

May 9 10:45:24 media kernel: <EOI> [<ffffffff815041b7>] ? cpuidle_enter_state+0xf0/0x148

May 9 10:45:24 media kernel: [<ffffffff81504170>] ? cpuidle_enter_state+0xa9/0x148

May 9 10:45:24 media kernel: [<ffffffff81504231>] cpuidle_enter+0x12/0x14

May 9 10:45:24 media kernel: [<ffffffff81076247>] call_cpuidle+0x4e/0x50

May 9 10:45:24 media kernel: [<ffffffff810763cf>] cpu_startup_entry+0x186/0x1fd

May 9 10:45:24 media kernel: [<ffffffff8160fbdd>] rest_init+0x84/0x87

May 9 10:45:24 media kernel: [<ffffffff818eaec0>] start_kernel+0x3f7/0x404

May 9 10:45:24 media kernel: [<ffffffff818ea120>] ? early_idt_handler_array+0x120/0x120

May 9 10:45:24 media kernel: [<ffffffff818ea339>] x86_64_start_reservations+0x2a/0x2c

May 9 10:45:24 media kernel: [<ffffffff818ea421>] x86_64_start_kernel+0xe6/0xf3

May 9 10:45:24 media kernel: Mem-Info:

May 9 10:45:24 media kernel: active_anon:468687 inactive_anon:4711 isolated_anon:0

May 9 10:45:24 media kernel: active_file:443016 inactive_file:3009187 isolated_file:32

May 9 10:45:24 media kernel: unevictable:0 dirty:64349 writeback:152019 unstable:0

May 9 10:45:24 media kernel: slab_reclaimable:51705 slab_unreclaimable:30682

May 9 10:45:24 media kernel: mapped:51722 shmem:85744 pagetables:5236 bounce:0

May 9 10:45:24 media kernel: free:17874 free_pcp:104 free_cma:0

May 9 10:45:24 media kernel: Node 0 DMA free:15580kB min:12kB low:12kB high:16kB active_anon:304kB inactive_anon:16kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15984kB managed:15900kB mlocked:0kB dirty:0kB writeback:0kB mapped:32kB shmem:320kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes

May 9 10:45:24 media kernel: lowmem_reserve[]: 0 3512 16022 16022

May 9 10:45:24 media kernel: Node 0 DMA32 free:51276kB min:3524kB low:4404kB high:5284kB active_anon:572120kB inactive_anon:3316kB active_file:391584kB inactive_file:2440188kB unevictable:0kB isolated(anon):0kB isolated(file):128kB present:3607096kB managed:3597428kB mlocked:0kB dirty:61208kB writeback:129236kB mapped:48616kB shmem:74916kB slab_reclaimable:44168kB slab_unreclaimable:26384kB kernel_stack:3376kB pagetables:5800kB unstable:0kB bounce:0kB free_pcp:144kB local_pcp:120kB free_cma:0kB writeback_tmp:0kB pages_scanned:44 all_unreclaimable? no

May 9 10:45:24 media kernel: lowmem_reserve[]: 0 0 12510 12510

May 9 10:45:24 media kernel: Node 0 Normal free:4640kB min:12564kB low:15704kB high:18844kB active_anon:1302324kB inactive_anon:15512kB active_file:1380480kB inactive_file:9596560kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:13074432kB managed:12810880kB mlocked:0kB dirty:196188kB writeback:478840kB mapped:158240kB shmem:267740kB slab_reclaimable:162652kB slab_unreclaimable:96344kB kernel_stack:11968kB pagetables:15144kB unstable:0kB bounce:0kB free_pcp:272kB local_pcp:140kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no

May 9 10:45:24 media kernel: lowmem_reserve[]: 0 0 0 0

May 9 10:45:24 media kernel: Node 0 DMA: 1*4kB (U) 1*8kB (U) 1*16kB (U) 0*32kB 3*64kB (UM) 2*128kB (UM) 1*256kB (U) 1*512kB (M) 2*1024kB (UM) 2*2048kB (UM) 2*4096kB (M) = 15580kB

May 9 10:45:24 media kernel: Node 0 DMA32: 499*4kB (ME) 306*8kB (UME) 807*16kB (UME) 358*32kB (UME) 93*64kB (UME) 23*128kB (UME) 7*256kB (ME) 1*512kB (E) 7*1024kB (M) 2*2048kB (M) 0*4096kB = 51276kB

May 9 10:45:24 media kernel: Node 0 Normal: 324*4kB (M) 140*8kB (UME) 89*16kB (UME) 35*32kB (M) 3*64kB (M) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 5152kB

May 9 10:45:24 media kernel: 3537970 total pagecache pages

May 9 10:45:24 media kernel: 0 pages in swap cache

May 9 10:45:24 media kernel: Swap cache stats: add 0, delete 0, find 0/0

May 9 10:45:24 media kernel: Free swap = 0kB

May 9 10:45:24 media kernel: Total swap = 0kB

May 9 10:45:24 media kernel: 4174378 pages RAM

May 9 10:45:24 media kernel: 0 pages HighMem/MovableOnly

May 9 10:45:24 media kernel: 68326 pages reserved

SuperW2 · May 9, 2016

Well, I started the Array and it's doing a data rebuild now on disk6 and it has been running about an hour and about 9% done... we'll see what happens, but at least a step in right direction.

RobJ · May 10, 2016

Some observations -

* Mounting Disk 6 brought to light corruption in the XFS file system on Disk 6, so serious that Call Traces occur every time. What's particularly noteworthy is this happened even when there was no drive assigned to Disk 6, so the corruption is in the emulated drive. You are going to have to attempt an XFS repair on Disk 6, in Maintenance Mode. The first Call Trace each time indicates mount is not 'Tainted', but the subsequent ones indicate it is now 'Tainted'. I would not try to operate once that has happened. You should probably try to fix the disk in Maintenance mode, then shut down, then try to start the array and check for Call Traces when mounting Disk 6. If still there, you may need to copy everything you can off Disk 6, then reformat it, and copy back (using a method similar to this, substituting XFS for BTRFS).

* In first syslog, both networks dropped within 8 seconds of each other. One of them came back up 6 seconds later, the other never did. That seems very unlikely on this machine, so may indicate an issue with the switch or router that both are connected to.

* Clock is a little funny. Apparently when you updated the BIOS, it set the hardware clock back some hours. At about May 8 21:41:15 (I think), NTP kicked in and advanced it 7 hours. A proper shutdown should save it to the hardware clock. If still not right, you may want to set it yourself in the BIOS, then once it's close, NTP will keep it accurate.

SuperW2 · May 11, 2016

Some observations -

* Mounting Disk 6 brought to light corruption in the XFS file system on Disk 6, so serious that Call Traces occur every time. What's particularly noteworthy is this happened even when there was no drive assigned to Disk 6, so the corruption is in the emulated drive. You are going to have to attempt an XFS repair on Disk 6, in Maintenance Mode. The first Call Trace each time indicates mount is not 'Tainted', but the subsequent ones indicate it is now 'Tainted'. I would not try to operate once that has happened. You should probably try to fix the disk in Maintenance mode, then shut down, then try to start the array and check for Call Traces when mounting Disk 6. If still there, you may need to copy everything you can off Disk 6, then reformat it, and copy back (using a method similar to this, substituting XFS for BTRFS).

* In first syslog, both networks dropped within 8 seconds of each other. One of them came back up 6 seconds later, the other never did. That seems very unlikely on this machine, so may indicate an issue with the switch or router that both are connected to.

* Clock is a little funny. Apparently when you updated the BIOS, it set the hardware clock back some hours. At about May 8 21:41:15 (I think), NTP kicked in and advanced it 7 hours. A proper shutdown should save it to the hardware clock. If still not right, you may want to set it yourself in the BIOS, then once it's close, NTP will keep it accurate.

Thanks for the great Reply RobJ... A couple updates and comments to your comments...

I did a XFS_Repair on Disk6 and had to use the "-L" option since I couldn't mount the disk... I don't remember specifically what it said about call traces. I did get a couple of these messages below, but nothing after I did the XFS_Repair. The was the last message I received like this;l

May  9 16:09:33 media kernel: ata6: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
May  9 16:09:33 media kernel: ata6.00: configured for UDMA/133
May  9 16:09:33 media kernel: ata6: EH complete
May  9 16:10:31 media kernel: ata6: limiting SATA link speed to 3.0 Gbps
May  9 16:10:31 media kernel: ata6.00: exception Emask 0x10 SAct 0x0 SErr 0x280100 action 0x6 frozen
May  9 16:10:31 media kernel: ata6.00: irq_stat 0x08000000, interface fatal error
May  9 16:10:31 media kernel: ata6: SError: { UnrecovData 10B8B BadCRC }
May  9 16:10:31 media kernel: ata6.00: failed command: READ DMA EXT
May  9 16:10:31 media kernel: ata6.00: cmd 25/00:40:78:54:62/00:05:0b:00:00/e0 tag 15 dma 688128 in
May  9 16:10:31 media kernel:         res 50/00:00:77:54:62/00:00:0b:00:00/e0 Emask 0x10 (ATA bus error)
May  9 16:10:31 media kernel: ata6.00: status: { DRDY }
May  9 16:10:31 media kernel: ata6: hard resetting link

Once I did the XFS_repair, I was able to start the array and it immediately started a data rebuild of Disk6 (had yellow/orange triangle). That took the rest of the day/night/morning to complete (16ish hours) but hadn't seen any ATA Errors since it completed and no errors reported in console or Syslog that I can see. I am in the process of moving my Data off of Disk6 "just in case"...but that's going to take a while since it was almost topped off with 4TB of data (and again no errors in log after 6 or 7 hours of data transfer).

The Network drops in the logs was likely me, physically putzing... I swapped both physical cables on the server with new ones (just as a precaution trying to rule out anything), while it was running... not the smartest I know but I'd bet that is what was recorded then.

As far as the clock, I changed the CMOS Battery yesterday (although the old one was registering good voltage in BIOS), reset BIOS to defaults, etc. Again just trying to eliminate any variable of something that could have been causing issues. So far since changing have not noticed anything odd with the clock settings.

I was seeing some oddity in the Syslinux configuration on my Flash disk and No "unRAID OS GUI Mode" appearing on Server Startup, even though it was listed in the Config on the console. I clicked "Default" button and it did appear to change a few lines (which I didn't do a before/after comparison), and haven't rebooted since, so will do that once I finishing moving the data off Disk 6 to see if that addressed that issue.

So right now, I have an apparently "fully operational" running array with a rebuilt data on Disk 6, trying to empty it just in case and crossing fingers I can get it all off in case Disk6 is going out.

Thanks

SW2

RobJ · May 11, 2016

The drive associated with ata6 looks like Disk 11, sdg, serial ending in P74. Most likely possibility is a bad SATA cable, less likely but possibly a poor power connection.

Disk 6 looks physically fine, so it's not 'going out'. The problems are software issues (file system corruption), not hardware issues.

I wish we had a good way to test an XFS drive, that would tell us the file system is 100% fine. All you can do is run the xfs_repair one more time, and if it doesn't indicate any issues, assume it's OK and hope that's true. If xfs_repair doesn't find anything, it quits without saying anything.

Lost GUI on Reboot - Able to Ping/SSH/Log in Locally - 6.2.0b21

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation