Jump to content

Help, constantly seeing 'Machine check events logged' in syslog


Recommended Posts

Hi.

 

I recently built my first unRAID server from these parts https://uk.pcpartpicker.com/user/moussekateer/saved/3DGU, and everything has been fine except for one issue. I keep seeing 'Blackbox kernel: mce: [Hardware Error]: Machine check events logged' in the syslog. Sometimes I go a few days without seeing anything and sometimes I see an error popup every few minutes or seconds.

 

My array currently consists of two 3TB drives (no parity drive yet), both of which I have precleared with 3 cycles and saw no issues. I believe the errors sometimes correlate with heavy writing to the drives, but I cannot reproduce the errors on demand so it may just be a coincidence. I have made sure all the cables and parts inside the server are connected and seated properly so I don't believe it's a connection issue. I have also updated my BIOS to the latest version. I have run mcelog from the flash drive (before I eventually rebooted my server, when I had 20 or so of these errors) to investigate, and believe it's a internal hardware issue with the CPU? Please find the output attached, along with my syslog.

 

I am running unRAID version: 5.0.5

 

Thank you for your help in advance.

syslog-2014-03-06.txt

mcelog_output.txt

Link to comment

Hi.

 

I recently built my first unRAID server from these parts https://uk.pcpartpicker.com/user/moussekateer/saved/3DGU, and everything has been fine except for one issue. I keep seeing 'Blackbox kernel: mce: [Hardware Error]: Machine check events logged' in the syslog. Sometimes I go a few days without seeing anything and sometimes I see an error popup every few minutes or seconds.

 

My array currently consists of two 3TB drives (no parity drive yet), both of which I have precleared with 3 cycles and saw no issues. I believe the errors sometimes correlate with heavy writing to the drives, but I cannot reproduce the errors on demand so it may just be a coincidence. I have made sure all the cables and parts inside the server are connected and seated properly so I don't believe it's a connection issue. I have also updated my BIOS to the latest version. I have run mcelog from the flash drive (before I eventually rebooted my server, when I had 20 or so of these errors) to investigate, and believe it's a internal hardware issue with the CPU? Please find the output attached, along with my syslog.

 

I am running unRAID version: 5.0.5

 

Thank you for your help in advance.

 

It's an issue with the version of Linix in 5.05.  Running V6 makes those go away.  I used to get them also.  I don't believe they mean anything.

Link to comment

It's an issue with the version of Linix in 5.05.  Running V6 makes those go away.  I used to get them also.  I don't believe they mean anything.

 

I've read that they're indicating a hardware fault in one of the CPU caches, and these messages are the CPU indicating that it's successfully recovered from them with a parity check. If so, wouldn't that indicate a long term problem and I should RMA the CPU?

 

I'd like to stay on v5.* until the plugins are updated if you say this is harmless, but I will try running v6 for a few days to see if any more errors pop up.

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...