Jump to content

Fix Common Problems states "Out Of Memory" issues - causing server crash?


FreeMan

Recommended Posts

My server has completely locked up 3 times in the last 6 days. Locked up as in

* no access to shares

* I cannot access the WebGUI

* I cannot ssh in

* I cannot get a physically connected monitor & keyboard to respond

  - I don't usually keep the keyboard connected, and it won't acknowledge when I plug the keyboard in, no matter which USB port I use (i.e. caps-lock & num-lock lights don't light up, no response to any keystrokes, including Ctrl-Alt-Del).

 

I've had to hold down the power button until it has powered off - there has been no response to a brief press of the power button.

 

Since it's hard crashing, I don't have an opportunity to save the syslog or get a full diagnostics before it happens. I noticed on another thread that someone had used tail -f /var/log/syslog > /someplace persistent, so I did the same. I've attached a zipped copy of that.

 

The last thing in the log is an out of memory error. I'd noticed that there were quite a number of those scattered throughout the log. Is this the issue that's causing the crash, and if so, can anyone identify what's causing the crash?

 

My signature is up-to-date on hardware and dockers running. I have no VMs running, and do not have VM enabled. The only thing that I can think that I've been doing over the last week that's substantially different than any previous day-to-day running, is building hashes via Dynamix File Integrity for 3 disks at a time. At the point of this crash, I had just started the build on disks 10, 11 & 12, (3, 4 & 4TB), but I'm pretty sure they have fewer files on them combined than does my 2TB drive7 (lots of small pictures & MP3 files) and I'm not sure if that's related.

syslog2017.01.27.txt.zip

Link to comment

I woke up this morning to a report from Fix Common Problems that my server is, indeed, running out of memory with instructions to post diagnostics, so here's a fresh report from this morning.

 

Thinking about it, I did install the Dolphin docker just a week or two ago, but there was a period of time after its installation that things ran well.

 

Also, after I restarted the server, a parity check started. It has now been running for 17:55 at an average speed of 36.9 MB/sec. Usually it's made it through my entire 4TB parity disk in under 17 hours, so it's running very slowly indeed.

 

According to the Pushover notices I've received on my phone:

* 28 Jan 2017 16:55 - Parity Check started

* 28 Jan 2017 22:10 - Parity check finished (errors)

                                  - Duration: unavailable (no parity-check entries logged)

* 29 Jan 2017 00:20 - array health report [PASS]

                                  - array has 0 disks

                                  - Parity has not been checked yet

* 29 Jan 2017 03:24 - Parity check started

* 29 Jan 2017 04:30 - FCP - Errors have been found with your server

                                  - Out Of Memory errors detected on your server

 

I did not manually restart the parity check at 03:24 this morning, I was sleeping quite heavily at the time.

 

Anybody have any thoughts?

nas-diagnostics-20170129-1047.zip

Link to comment

Has this all started since you began to preclear a disk?  There have been reports with running multiple preclears simultaneously that OOM errors can happen. 

 

Even if you're not running multiple copies, with your docker containers (crashplan in particular), you may be running dangerously low.

 

Also wouldn't be a bad idea to remove Dynamix System Stats, as your syslog has a ton of errors from it.

Link to comment

No preclears have run in months. My last new drive was installed in the array on 26 Oct 2016 (and was precleared some time prior to that). No new drives purchased since then.

 

The closest I've come is powering down the server in early Jan to reinstall all the drives - I got three 5x3 drive cages and slid all the drives in those. The server was running with no issues for at least a couple of weeks after that before it started having a hissey fit about a week ago.

Link to comment

Uninstall preclear, then reinstall it...

 

Uninstall system stats then reinstall it.

 

Stop Crashplan from autostarting.

 

Reboot.

 

If that still doesn't help, then put Fix Common Problems into troubleshooting mode and wait for more issues to happen, then post diagnostics again.

 

 

Link to comment

Preclear & system stats uninstalled

Preclear & system stats reinstalled

Server rebooted

Dockers all set to manual start anyway, so I will not bring CrashPlan up.

Manually kicked off a parity check. I'll let it run until done & report back.

Heading to the preclear support post next to let gfjardim know.

 

If I don't get any additional OOM errors over the next few days, I'll assume that CrashPlan is hogging too much memory (can't imagine why, 300k+ files and 1.9+TB of data being backed up...).

 

I would presume that additional physical memory would be recommended in this situation. The MoBo has 4 DIMM slots, and I've got 2 filled with 4GB each, so there's room to expand, so how much would you guys recommend?

Link to comment

Preclear & system stats uninstalled

Preclear & system stats reinstalled

Server rebooted

Dockers all set to manual start anyway, so I will not bring CrashPlan up.

Manually kicked off a parity check. I'll let it run until done & report back.

Heading to the preclear support post next to let gfjardim know.

 

If I don't get any additional OOM errors over the next few days, I'll assume that CrashPlan is hogging too much memory (can't imagine why, 300k+ files and 1.9+TB of data being backed up...).

 

I would presume that additional physical memory would be recommended in this situation. The MoBo has 4 DIMM slots, and I've got 2 filled with 4GB each, so there's room to expand, so how much would you guys recommend?

 

You are likely running out of low memory.. Adding memory will not help if that is the case.. I am not fully sure how to ascertain what memory is low..

Link to comment

You are likely running out of low memory.. Adding memory will not help if that is the case.. I am not fully sure how to ascertain what memory is low..

 

Do I need to run EMM386.EXE?  :)

 

Sorry, I'm not familiar with the concept of "low memory" in a linux system. Care to offer some details or a link to where I could read up on that?

Link to comment

Don't think low memory is so much an issue with 64bit OS.

 

AFAIK, there's really two causes of OOM errors

 

A program attempts to allocate memory, and the memory is not available, so the OS will kill off other processes to regain the memory needed to fulfill the request

 

or,

 

The memory is available, but it is so fragmented that the OS can't fulfill the request so the OS has to kill off other processes to regain enough memory that is unfragmented.

 

 

Link to comment

OK, it's definitely CrashPlan using up the memory and causing the crash.

 

It's been running just fine since my last post. After my first-of-the-month parity check completed, I figured I'd fire up CrashPlan to see what happens.

 

Just prior to starting CP, I executed tail -f /var/log/syslog > //nas/appdata/syslog.2017.02.01.txt the result of which I've attached. It clearly shows an OOM at the end. Just for giggles, I'm attaching the syslog from prior to the start of the tail - I don't know if anyone might see anything in there that might be a flag. Also, there's a syslog.1 (at 5MB) that's been filling since the machine rebooted last time.

 

I really don't plan on running any VMs on the box any time soon. I've got plenty of physical machines I can use for various things if I need to do testing or playing, and I've got a couple of KVM switches handy, so that's not a major consideration. I'll probably add another couple of dockers as time goes on, and my backup set covered by CrashPlan will only grow.

 

So... the question stands, then, based on my setup, is an additional 8GB (2x4) sufficient?

-- Will I gain any extra benefit (now or in the future) from installing more memory - perhaps an additional 16GB (2x8)?

I currently have DDR3 PC1600 memory in the box. I know that if I buy faster memory, it will only run at the 1600 speed, but would it be a reasonable "future proofing" idea to get some faster RAM now so if I wanted to upgrade in the future, I could get another pair at the faster speed?

 

Any other thoughts and suggestions are more than welcome!

 

Thanks to everyone for all the help and input already!

syslog.2017.02.01.txt

syslog.txt

Link to comment
  • 1 month later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...