Jump to content

unRAID network failure after Samba transfer from Windows 7 Guest (KVM)


Recommended Posts

So I've been having a bit of trouble with my unRAID server it's developed a habit of, well I would say hanging/freezing. 

 

Machine is:

i7 3770s

Asus P8Q77-M/CSM

16GB ram

AOC-SASLP-MV8

Assorted drives

160GB Intel SSD

unRAID Pro 6.1.3

 

I don't have much for plugins, I've got a Windows 7 VM for my SageTV/DVR, Logitech Media Server, Crashplan, and Deluge (usually stopped).  I have one non-array drive passed through to my Windows 7 VM as a recording drive.

 

Most recently, and the logs are attached, I returned home from work to find my unRAID server inaccessible from my PC.  No SAMBA, no web UI, no VMs.  I logged in via AMT to a console session (not putty/ssh/telnet, those don't work), and the system was responsive, unfortunately I did not think to check if the user shares were usable locally.  I did check and I was unable to ping out to anything from the unRAID machine. I ran the diagnostics script captured the zip file and ran powerdown which appeared to have cleanly shut down the system, it is not back up and running.

 

Unfortunately this has been happening on and off for the past couple weeks.  I posted in another thread, but since then I had thought it was due to my cache drive filling up (that's another story, which I resolved, more space was used than reported).  The last few times this has happened the machine has been under some amount of load.  This time I believe it was running a Parity check.  Previously I had been running an rsync compare on an old drive and the contents copied to a new one (I'm trying to convert to XFS, but that is on hold until I resolve this).  Another time I was copying the recordings off the non-array drive (via the VM over the network).

 

I will note that this all seems to have started since upgrading to 6.1.3, I believe I had actually skipped a number of versions, from the "previous" folder on my flash drive it looks like I was on 6.0.1 before.

 

Any help, or suggestions/requests for more information would be appreciated.

unraid-diagnostics-20151016-1623.zip

Link to comment
  I did check and I was unable to ping out to anything from the unRAID machine.
I didn't actually look at your diagnostics, but that single line stuck out like a sore thumb. Do you have a NIC you could temporarily use to see if that changes the symptoms? I'd disable the onboard NIC and just use the card for a period of time and see what happens.
Link to comment

So this gets curiouser and curiouser.  I did throw my "old" Intel Pro1000 (PCI) in, connected the ethernet to that and that didn't help.  A little more experimenting and this is what I found:

 

Firstly transfers via samba/Windows (remember this is from a passthrough disk in the Windows VM) file are abysmally slow, a few hundred KB/sec.

Secondly, such transfers clobber the server within about 30 seconds or so, it doesn't take long.

 

Here's where it gets really weird.  I can run iperf in the Windows VM, and I get respectable performance, >500Mbps.  I can run that for minutes with no issue with the server.  Additionally I can play recordings from that drive via my SageTV extenders (which don't use samba/Windows file sharing) all day long, no issues.  But if I try to play the same file from another Windows machine via the share, it clobbers the unRAID server within a few seconds again.

 

Additionally, I created a new virtual disk for the Windows VM and reinstalled windows from scratch, with the latest vertio drivers and see the same behavior (though admittedly I didn't let it run long enough to actually take out the server's network, but the samba performance was just as abysmal).

 

I'm not seeing anything in the logs that jumps out at me.  I've attached the diagnostics from a few of the network failure events today.

unraid-diagnostics-20151018-1631.zip

Link to comment

I've been having the same problem lately. I come home from work to find the machine completely unresponsive remotely, can't connect to webui or ssh. I'd chalked it up to failing hardware on an old hdd (because after one of the lockups the webui reported it as missing), but I continue to have the same issue even after replacing and removing that drive (which has no faults reported by smart and matches all file checksums with the reconstructed replacement).

 

The only thing I run besides vanilla unraid are deluge and crashplan dockers. Haven't even been running the crashplan one since this issue started. The last couple times, I left a ssh connection open from my PC with "tail -f /var/log/syslog" to try capturing what happened but nothing interesting showed up there. One time the last entry was the mover script ending (after doing nothing) and the next time the last entry was a spindown.

 

I guess the next step is to hook a monitor up to it, I don't believe AMT isn't an option for me.

 

This seems to have started after a recent upgrade in unraid versions. I'm not sure if it started with 6.1.2 or 6.1.3, I'm pretty sure I skipped some versions somewhere. But currently I'm experiencing this with 6.1.3. May try a downgrade if I can't capture anything with a monitor connected to the machine.

Link to comment
  • 4 weeks later...

So after a long while of everything going smoothly, I had my unraid server disappear again.

 

One thing I've noticed is that is that when the server disappears it's the br0 interface that becomes unreachable, the eth0 interface is still accessible.

 

I did a little searching and ran across this about bridging:

"Note: If, after trying to use the bridge interface, you find your network link becomes dead and refuses to work again, it might be that the router/switch upstream is blocking "unauthorized switches" in the network (for example, by detecting BPDU packets). You'll have to change its configuration to explicitly allow the host machine/network port as a "switch". "

https://wiki.debian.org/BridgeNetworkConnections

 

A bit more searching and I ran across this on the Sonos forums:

https://en.community.sonos.com/troubleshooting-228999/problems-with-certain-asus-routers-5893797

 

Where people reported problems with the Asus AC-RT68U, which is my router.  The solution they offer is to put a switch in the middle, which is the normal setup for my network, everything connected to my DGS-1024D switch.  Although I've done some searching and I can't find anything to confirm that that switch supports STP (802.1D) or RSTP.  I guess I'm wondering, should I be looking at a new switch?

 

Link to comment
  • 3 months later...

So, this is starting to get troublesome again, I've had my unRAID server disappear from my network several times in the past week, one time causing the failure of a recording (I run my DVR in a VM, and when the network disappears, so does access to my HDHRs...).

 

Every time this happens the "solution" is to plug my unRAID server into a different port on my switch.  Doing this (and nothing else) and the unRAID box is immediately accessible on the network.

 

What I have done:

I replaced my old switch with a D-Link DGS-1100-24, which per D-Link supports STP/RSTP.

I upgraded the firmware on my Asus RT-AC68P to the latest Merlin, and ensured that STP is enabled.

 

However I have confirmed (by trying to copy a large file from the VM), that within a minute of trying that the whole unRAID server disappeared.  Again, changing ports on the switch restored functionality.  I see nothing in the logs.

 

Any help would be appreciated.  It used to be a minor annoyance that I could work around by just not performing heavy access on the VM, but now it's affecting my everyday usage.

 

Logs attached, any help would be appreciated.

 

-edit

 

I just found this thread that recommended updating/reverting to virtio-win-0.1.102.  I "updated" my Windows 7 VM's Virtio drivers to the 102 version and it appears to have fixed my issue.  At least I can copy a multi-gb file off that VM successfully, without my unRAID server being booted off the network.

 

Hopefully that solves the issue for good.

unraid-diagnostics-20160313-1506.zip

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...