Jump to content

unRaid issue- kinda(?) losing connection


Recommended Posts

I've attached my syslog to help with this.  I'm lost for what the issue could be. I've seen multiple posts (well over half a dozen) just in the last month so I'm thinking it's more than just a localized issue.  I've had no problems with unRaid until the upgrade to 4.7 and

 

Specs: Gigabyte 780 MoBo (HPA is disabled, using onboard video), Sempron 140 chip, 2gb RAM (no mem errors), Corsair 650 PSU, 6 drives (1 Hitachi 2tb as parity, 1 2TB seagate, 1 1.5tb seagate, 3 1tb WD greens.)

 

- No smart errors on the drives

- No memory errors

- No Parity check errors

 

I'm able stream content off the server, however, I noticed yesterday, if I pause it, I get an error when I try to hit play again stating the file could  not be found. I can not access the /tower when I try after this error. My HTPC and unRaid server are wired into the D-Link DIR655 router. I also have a laptop that is solely wireless via the same router. The 'tower' remains on my list of Network computers in W7, on both the HTPC and laptop. The IP address is still showing as LAN connected in the router. I can ping the IP address from either the HTPC or the laptop so I've ruled out a router, switch or network cable issue. The network card on the server responds to the pings and the light is lit steady, with occasional blinks to indicate traffic.  

 

If the server was in use, and I ping it a few times or wait a bit, I can usually access the unRaid menu, and eventually the unmenu one as well. You can see in the syslog that unmenu appears to shutdown and restart itself as well. I've had it go down while streaming content without interruption of the movie, (confirmed down since couldn't access the unRaid interface from the laptop.)

 

The last two times, when I awoke the server using Wake on Lan, I see the Tower in the network connections, on the router LAN connections and can ping it no problem.  However, I get no access the menu via regular or unmenu. I have to 'reset' the box and then get a parity check once it starts up again.  I use onboard video and unfortunately, when going to sleep, I lose the DVI connection so I have no video to the attached monitor to get console access. (Switching to VGA connection to hopefully avoid that issue).  

 

I've noticed two things since last night and this morning

 

1. It seems that this happens almost always that the unused drives begin to spin down. I've had the system up and running without any glitches almost 18 months now. Ths syslog shows a constant 'link beat up/down' that seems to occur around the same time, but I thought that was a NIC issue, yet, I don't see any problems with the network (like I said movies have continued to stream flawlessly, up until I pause then, then they won't start playing again.)

 

2. I rechecked the connections, drives, etc. this morning, and all looks good, but noticed the light on my firefly flash drive with unraid on it is NOT lit up anymore. Could all this simply have been a failing flash drive? (Can I unplug the unRaid flash drive while the system is running or will that crash it or do something terrible?! lol I don't want to hit the reset or power-buttons if I don't have to as I don't want to deal with a parity check, again.) I believe it ONLY lights on activity, can anyone confirm? If so, then there is no issue with it.

 

I"ll post more as I have it. Any help would be appreciated. I just suspect it may be something more than the flash drive, NIC issues as there are mulitple similar issues in the last few weeks. Especially with the 'link beat' message and errors connecting to the interface.

 

 

 

 

syslog-2011-04-24.zip

Link to comment

EDIT - Want to add, right after posting the below, once again, can't access via web interface or telnet, lasted about a minute, then got back in.  Stopped the array and initiated a clean shutdown.

 

OKay, the light on the flash drive is still off, but suddenly I can get into the interface again (but I had to telnet in to get unmenu up) I saw that it started up again, I had a hiccup with unmenu not working, and then starting back on it's own. It gave me an error of 141 for unmenu shutting down. Here's the tail end of the lated syslog in addition to what's above (minus all the startup stuff from waking it back from sleep)

Apr 24 13:11:50 Tower kernel: sd 2:0:0:0: [sdb] Starting disk

Apr 24 13:11:50 Tower kernel: sd 3:0:0:0: [sdc] Starting disk

Apr 24 13:11:50 Tower kernel: sd 4:0:0:0: [sdd] Starting disk

Apr 24 13:11:50 Tower kernel: sd 5:0:0:0: [sde] Starting disk

Apr 24 13:11:50 Tower kernel: sd 6:0:0:0: [sdf] Starting disk

Apr 24 13:11:50 Tower dhcpcd[1332]: sending DHCP_REQUEST for 192.168.0.104 to 192.168.0.1

Apr 24 13:11:50 Tower kernel: Restarting tasks ... done.

Apr 24 13:11:50 Tower kernel: mdcmd (69): spindown 1

Apr 24 13:11:50 Tower kernel: r8169: eth0: link down

Apr 24 13:11:50 Tower kernel: mdcmd (70): spindown 2

Apr 24 13:11:51 Tower ifplugd(eth0)[1302]: Link beat lost.

Apr 24 13:11:51 Tower kernel: mdcmd (71): spindown 3

Apr 24 13:11:51 Tower kernel: mdcmd (72): spindown 5

Apr 24 13:11:52 Tower kernel: r8169: eth0: link up

Apr 24 13:11:53 Tower ifplugd(eth0)[1302]: Link beat detected.

Apr 24 13:11:54 Tower dhcpcd[1332]: dhcpIPaddrLeaseTime=604800 in DHCP server response.

Apr 24 13:11:54 Tower dhcpcd[1332]: dhcpT1value is missing in DHCP server response. Assuming 302400 sec

Apr 24 13:11:54 Tower dhcpcd[1332]: dhcpT2value is missing in DHCP server response. Assuming 529200 sec

Apr 24 13:11:54 Tower dhcpcd[1332]: DHCP_ACK received from (192.168.0.1)

Apr 24 13:18:00 Tower login[1564]: invalid password for `UNKNOWN' on `tty1'

Apr 24 13:28:25 Tower ntpd[1339]: time reset -0.243516 s

Apr 24 13:28:56 Tower ntpd[1339]: synchronized to 67.18.187.111, stratum 2

Apr 24 13:41:54 Tower kernel: mdcmd (73): spindown 0

Apr 24 13:41:55 Tower kernel: mdcmd (74): spindown 4

Apr 24 13:50:22 Tower in.telnetd[30985]: connect from 192.168.0.105 (192.168.0.105)

Apr 24 13:50:37 Tower login[30986]: ROOT LOGIN on `pts/1' from `192.168.0.105'

Apr 24 13:51:18 Tower in.telnetd[31024]: connect from 192.168.0.105 (192.168.0.105)

Apr 24 13:51:34 Tower login[31025]: ROOT LOGIN on `pts/1' from `192.168.0.105'

Apr 24 13:51:44 Tower unmenu-status: Starting unmenu web-server

Apr 24 13:53:43 Tower unmenu-status: Exiting unmenu web-server, exit status code = 141

Apr 24 13:53:43 Tower unmenu-status: Starting unmenu web-server

Apr 24 13:54:02 Tower kernel: mdcmd (75): spindown 0

Apr 24 13:54:04 Tower kernel: mdcmd (76): spindown 0

Apr 24 13:54:12 Tower kernel: mdcmd (77): spindown 0

 

Link to comment

Rebooted the server from a clean powerdown, disabled IDE, serial and parallel in Bios to avoid conflicts. Reset the monitor to a VGA connection to hopefully avoid loss of signal in resuming from sleep. I've included a syslog from the reboot with just a log in to show the boot up process cleanly.

 

Had issues with the web interface, and confirmed I could access the server directly from the connected monitor keyboard. No new entries on the syslog from what's below. ??? I'm at a loss.

 

syslog-2011-04-24_1.txt

Link to comment

I'm kinda seeing something like this myself in 4.7, but it's generally when the system is under stress. 

 

Right now I have 12 2TB data drives, of which 11 are under 60-70GB of free space and 1 is under 90GB, which may be a factor.

 

I have a couple dozen open files on the unRAID server from file sharing.  If I then (from Windows 7) mount a Bluray ISO from the array Daemon Tools or Virtual CloneDrive, start a playback on one workstation, then start a copy operation of a bunch of files from the array on another workstation, then try something like simultaneously copying a 40+GB BluRay ISO *to* the array using TeraCopy, I'll get a brief network drop invalidating any open network connections... it's only for an instant but it looks like some kind of peer reset.  Once all the connections drop, the shares usually recover immediately and I can see them again.  At that point I need to unmount any ISOs and remount.

 

This is easily reproducible and the system is 24/7 stable otherwise; my mobo is a SuperMicro with onboard Intel NIC.

 

I think it may have something to do with SAMBA on the unRAID side, and/or something involving trying to find a 40+GB block in the low space.

 

Nothing appears in the syslog. 

 

Obviously, I'll be adding some more space soon.  :D

 

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...