Crashes since beta9


Dimtar

Recommended Posts

Hi all.

 

I posted this in the announcement thread but told to post here instead, fair enough. Ever since upgraded to beta9 my server crashes one every two days roughly, I am currently unto 3 crashes. Samba, ssh and the web interface are non responsive. The only way to correct the issue is to do a hard reboot.

 

I was asked to tail the syslog but no luck so far. Anyone else?

Link to comment

Are you using any virtual machines?  Are you booting into Xen mode?  What kind of SATA controller are you using?  Last question: do you have VTd enabled in your BIOS?

 

No virtual machines.

Not booting in xen mode.

I have no idea on the sata controller but I use the LSI one on my mobo, SUPERMICRO X10SL7-F. Also I use a HIGHPOINT 2720, lastly I have a HIGHPOINT 640L in the system but no drives connected currently.

VTd is enabled.

Link to comment

Are you using any virtual machines?  Are you booting into Xen mode?  What kind of SATA controller are you using?  Last question: do you have VTd enabled in your BIOS?

 

No virtual machines.

Not booting in xen mode.

I have no idea on the sata controller but I use the LSI one on my mobo, SUPERMICRO X10SL7-F. Also I use a HIGHPOINT 2720, lastly I have a HIGHPOINT 640L in the system but no drives connected currently.

VTd is enabled.

Disable VTd and report back if problems persist.

Link to comment

Are you using any virtual machines?  Are you booting into Xen mode?  What kind of SATA controller are you using?  Last question: do you have VTd enabled in your BIOS?

 

No virtual machines.

Not booting in xen mode.

I have no idea on the sata controller but I use the LSI one on my mobo, SUPERMICRO X10SL7-F. Also I use a HIGHPOINT 2720, lastly I have a HIGHPOINT 640L in the system but no drives connected currently.

VTd is enabled.

Disable VTd and report back if problems persist.

 

Can do, I'll make the change before I go to bed. Is this based off a theory or its just standard troubleshooting? Another user is having the same issue, hopefully he/she posts here as well.

VT-D now disabled, screenshot attached.

VT-D_Off.jpg.445fce701d10a2a0ff141f492e12658a.jpg

Link to comment

Some users have reported that this resolves issues related to crashing with specific SATA controllers.  Just want to see if that's what we have here.

 

I've experienced these crashes as well and I'm running everything on a AVS-10/4 with AOC-SAS2LP-MV8 that you guys provided. No VMs, no plugins, etc. Just stock unraid beta 9. I've never had this happen before installing this beta.

 

The next time it happens I'll post my log.

Link to comment

I had a crash last night, not sure if it's related.  I was watching a movie when it froze.  I thought it was xbmc, so I used task manager to stop xbmc.  Restarted it and it would not start up.  I tried rebooting the computer connected to my tv, that did not fix it.  So I walked over to my desktop to get on the unraid main menu, but that would not load up.  I tried to see if the shares were showing and they were not.  I tried to connect to a docker and those would not load.  I also tried to use putty to connect and it would not connect.  I pinged my tower and it was connected and my router still showed the server as still being connected.  I had to do a hard shutdown to reboot.  It came back up and of course I had a parity check start up.  No log was saved, I'm guessing because of the hard shutdown.

 

I'm not running any VM's, just dockers. I'm booting into the stock unraid - no xen. But I did use to have VM's, so Vtd is probably still enabled. I do have the apcups plugin installed.  Last night was the first time it happened, but I think it happened right when my UPS performed a self-test.  I've noticed my UPS does a self test around 11:00pm, not sure if it does it daily or weekly.

Link to comment

What would also help is to have a telnet window open with this command running:

 

tail -f /var/log/syslog

 

When you first type this it will output a few lines at the end of the system log.  Then each time a new syslog entry is generated it will get displayed.  If your sever crashes you can select the contents of the telnet window and paste into a post or email.

Link to comment

FYI, for those testing this issue, if you have any plugins enabled such as apcupsd or anything else, please disable and retest.  Containers and VMs are OK, but plugins could be a culprit we need to rule out.

 

I am not running any plugins, it just crashed again too. (about 9 hours after disabling VT-D, thats a new record)

 

I'll setup the tail again.

Link to comment

Since in this thread the need of a tail command has come up, I was asking myself why unRAID doesn't feature a crashlog creation process.

 

I know the crashlog from xbmc, where it made debugging hangups and crashes a lot easier.

The running of a tail requires another pc to be running until the crash occurs, which can sometimes be days.

 

Is this possible to implement the way unRAID is running? I didn't want to write a feature request until I knew if it's possible from a technical standpoint.

 

Link to comment

I'm running ESXi, and managing it remotely right now so I can't go bare metal. 

 

I've been able to cause the crash multiple times by running the new bitrot script by Jon Bartlett, but the crash behaviour isn't consistent.  Running bitrot against my TV or Movie shares (the only 2 I tried) caused the crash multiple times.

 

This is the only time I got this kind of info in my syslog, and I was able to log in multiple times but trying to access any user shares would hang my ssh session - the gui was also unresponsive.  Issuing powerdown hung the system completely, requiring it to be reset.  Most other times my ssh sessions would just hang, and connecting to the ESXi console worked, but it was unresponsive.

 

I was able to capture the following errors in my syslog before the "crash" where I could log in after the problem arose, it's attached.

 

I've reverted to 6beta6 and am running bitrot against my TV share, and it's on file 2500 of 21000, whereas it would always fail before reaching 350 files.

 

I realize that ESXi isn't supported, but this is all I can currently add to the discussion until I've got physical access to the machine.

 

I'm going to stop the script now, boot up into safe mode, and try running bitrot again on 6b9

 

**Update**

 

I restarted in safemode and crashed again, but then realized everything was still loading from my go file.  I have modified my go file so that it only contains the line calling emhttp, and now will wait until a parity check finishes to test again.

 

For what it's worth, I was able to read the entire TV and Movies forlders and write to them on 6beta9 as I created hashes for all the files in those directories using Corz from windows.  That process went smoothly.

syslog.zip

Link to comment

What would also help is to have a telnet window open with this command running:

 

tail -f /var/log/syslog

 

When you first type this it will output a few lines at the end of the system log.  Then each time a new syslog entry is generated it will get displayed.  If your sever crashes you can select the contents of the telnet window and paste into a post or email.

 

Just had another crash. Had to do a hard reset. Attached are the contents of the window I had that command running in.

 

Again, I have nothing running on this server (plugins, etc). I do have a cache pool setup though.

 

Would there be any issues going back to beta 6 from beta 9? If not, does anyone have a link to the beta 6 download.

output.txt

Link to comment

This is helpful and I'm sure much appreciated.  I've checked it and can confirm it's a perfect copy.  Because there has been some concern about alternate download locations of official downloads, and to be safe, I've listed below the MD5's of recent UnRAID versions.  This is necessary to avoid any chance of tampering, or corrupted downloads.

 

02399A4E212EE386BAE5BFE117423A66  unRAIDServer-6.0-beta9-x86_64.zip

E49F58080E502C950C79E4FBCCCB4113  unRAIDServer-6.0-beta8-x86_64.zip

7CA14605D1F57B296F2D390B98FC93EA  unRAIDServer-6.0-beta7-x86_64.zip

30E2DC2C7509F03DF321B78F5A35E4AA  unRAIDServer-6.0-beta6-x86_64.zip

CDCE70BDED2C75803ED7C788DCA54951  unRAIDServer-6.0-beta5a-x86_64.zip

558FC29A7230BAEE8BC4C222DD872E6E  unRAIDServer-6.0-beta5-x86_64.zip

BD2E49B518818ABE3F44C2A40B63921B  unRAIDServer-6.0-beta4-x86_64.zip

 

04F74F27FE685CEFA28FA17D41BE13FC  unRAIDServer-5.0.5-i386.zip

 

 

This is normally found in the Release Notes, MD5 section, but the wiki is currently locked, un-editable, so a little out of date.

Link to comment

I posted this in the Beta 9 announcement post this morning. 

 

"Has anyone else had the experience of their unraid server becoming completely unresponsive after upgrading to Beta9.  99% of the time, it's been working fine since upgrade.  In fact, maybe a very tiny bit better than it used to.

 

But the unresponsiveness behavior is very annoying, because when it happens, I don't know if it's the standard unresponsiveness that happens sometimes when the server hasn't been used for several hours, and the drives need to spinup, or if it's something worse happening.  If the complete unresponsiveness is going to happen, it's much more likely to happen in the morning after my drives have been spun down for a while. 

 

I have the mover set to run at the default 3:40am time.  When this unresponsiveness happens in the morning, I have to hard boot the server to get it back up again.  And in the gui I can see that the mover did not run overnight.  So it's not just the webgui that has stopped functioning, or a samba problem because I can't get to the shares on Windows 7, the linux software that allows the mover to run also stopped functioning. 

 

Also, and I don't know if this is related or not.  The power down function no longer works either after upgrading to Beta9.  There was another thread on the board somewhere, where someone else reported this behavior, and I posted that it happens to me to.  The Reboot function appears to still be working, but not power down.  I end up having to do a hard shutdown when I want the server to shutdown. 

 

I don't shutdown the server that frequently, I usually just reboot it when the need for that sort of thing arises.  So honestly I'm not sure if that is a Beta9 thing or not.    I know it used to work, on unraid 5.05, but I couldn't say at what point it stopped working, as I've uses a couple of the 6 beta's. "

____________________________________________________________________________________________

Well about 30 minutes ago, for the first time ever my unraid system really locked up hard while I was actually using it.  So, drives not spun down or anything, just finished copying about 500megs of stuff to it.  Attempted to copy a single small 29K file over to it, and no go.  Further investigation revealed the thing to be locked up tight.  I've been using unraid since 2008 and that has never happened before.  Granted a lot of the components in my server are circa 2008, including motherboard, memory, processor and etc.  So I guess it could possibly be a hardware fault, but I have my doubts since others are experiencing similar things. 

 

No VT-d option in my board's bios.  Probably too old.  Not using any plugins, or playing around with any of the new-fangled virtual stuff.  The only fancy thing I'm doing is I have 2 SSD drives in a mirror for my cache. 

Link to comment

What would also help is to have a telnet window open with this command running:

 

tail -f /var/log/syslog

 

When you first type this it will output a few lines at the end of the system log.  Then each time a new syslog entry is generated it will get displayed.  If your sever crashes you can select the contents of the telnet window and paste into a post or email.

 

Just had another crash. Had to do a hard reset. Attached are the contents of the window I had that command running in.

 

Again, I have nothing running on this server (plugins, etc). I do have a cache pool setup though.

 

Would there be any issues going back to beta 6 from beta 9? If not, does anyone have a link to the beta 6 download.

 

That syslog is  almost  complete... a definite crash... you weren't able to capture more of it I guess?  Are you using putty?  If so maybe set it's scrollback buffer larger.  Another possibility is to click on the 'Log' button on the left side of the webGui menu bar.  This will open a browser window that does the same thing: display messages as they are generated. 

 

To anyone having this issue: please set up a telnet session with that tail command running, or have the browser window open as described above.  Then try to make the problem happen  ;)

Link to comment

That syslog is  almost  complete... a definite crash... you weren't able to capture more of it I guess?  Are you using putty?

Yep, was using putty. I meant to log the session but forgot. I have it running again with everything being logged. I'll let it run until Thursday night and then I'm gonna have to switch to using beta 6 because I need the server stable while I'm out of town for a week.

 

 

Link to comment

So here's where I'm at with my remotely controlling unRAID and ESXi.

 

I had a crash again with unRAID 6b9 where the syslog logs a bunch of errors, and I cannot do much with my system until I reboot, browsing user shares hangs whatever process is accessing them.  I can reliably make this happen by running the bitrot script that creates hashes in file attributes, it's available here: http://lime-technology.com/forum/index.php?topic=35226.0

 

After crashing, I reverted to 6beta6 and deleted any created hashes on my TV folder, I then ran the bitrot script against my TV folder, and movies folder (both of which failed on unraid 6beta9).  The script ran successfully. 

 

While unRAID isn't fully crashing, I cannot access any of the shares via SMB, or the console.

 

Attached is a log of 6beta9 with a clean go file and no plugins running (safe mode).

 

Hopefully this weekend I will have a chance to boot unRAID outside of esxi, run bitrot against another populated folder with a lot of files and recreate the issue.

 

syslog.txt.zip

Link to comment

I'm now running unRAID 9beta9 bare metal with a default go script and in safe mode. 

 

I was able to trigger another "crash" - I can still connect to unRAID, but trying to list directories on anything in the /mnt/user or /mnt/user0 space hangs my ssh sessions, and on a remote machine (windows 8.1 command line) it hangs whatever is trying to access it until it fails, a previously opened share (\\unriadIP\share) results in the window constantly trying to refresh the share without ever refreshing properly, whereas attempting to browse a share via a new explorer window never opens the explorer window.

 

I'll leave my SSH sessions open for an hour to see what happens, but I'm pretty sure the sessions are just hung and won't recover or cause a complete crash.  I've copied the current syslog to /boot (via ssh), and have tail -f /var/log/syslog running in an ssh session.

 

I was able to get this to happen by deleting the bitrot hashes on my TV share (which were created successfully on 6beta6), and try to recreate them.

 

The more I do on the system, the load average increases.  Load average has gone from 4ish to 27ish since troubleshooting

 

unfortunately I can't provide my syslog right now, as the shares are offline.  Will update in an hour or so when I give up on a complete system hang.  From previous experience the system will not powerdown properly.

 

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.