Spies Posted February 18, 2017 Share Posted February 18, 2017 The very nature of ECC is that it's error checking, so if a stick develops errors, is there any way to tell that corrections are taking place? How badly does an ECC stick need to fail before ECC is unable to do it's job? Quote Link to comment
garycase Posted February 18, 2017 Share Posted February 18, 2017 Yes, an ECC module can fail. It can correct a single bit error and detect multiple failures ... which an OS that supports ECC will report (as long as the errors don't cause a crash). Quote Link to comment
Spies Posted February 18, 2017 Author Share Posted February 18, 2017 Would unraid be able to report those errors? Quote Link to comment
JorgeB Posted February 18, 2017 Share Posted February 18, 2017 Supermicro boards report correctable ECC errors in the bios event log, don't know about other brands but assume they are similar, if there's an uncorrectable error I believe the server should halt. Quote Link to comment
garycase Posted February 19, 2017 Share Posted February 19, 2017 As Johnnie noted, SuperMicro boards will show any corrected errors in the event log. I'm not certain, but don't think that UnRAID reports these anywhere, so you'd only know about them if you check the event log periodically. Quote Link to comment
ashman70 Posted February 19, 2017 Share Posted February 19, 2017 If you add the IPMI plug in, you may be able to access an event log that will show these errors. I was able to do so with my Supermicro chassis, although it has an Intel server motherboard, it recorded ECC failures on a specific piece of RAM and identified which slot it was in so I could remove it. Quote Link to comment
johii Posted February 19, 2017 Share Posted February 19, 2017 If one of your ram blocks start throwing more than 1 error once in a while you probably have a defective ram block that should be replaced, and RMA'ed if its within its warranty. I haven't seen any so fare in my relative new SuperMicroX11 64GB ECC build. But will let you know if/when I start notice ECC errors in the eventlog. Quote Link to comment
Spies Posted February 25, 2017 Author Share Posted February 25, 2017 So I've installed the ECC ram into my Microserver now. I see that single bit error correction is now reported when I type 'dmidecode --type memory'. I thought I should have multi-bit ECC as well? Quote Link to comment
garycase Posted February 25, 2017 Share Posted February 25, 2017 ECC memory can only correct a single bit error. Quote Link to comment
S80_UK Posted February 25, 2017 Share Posted February 25, 2017 In normal PC applications, where ECC DIMM modules are 72 bits wide, 64 of those bits are data. The normal usage in a PC allows single-bit errors to be corrected at the memory controller (in the processor) and double bit errors to be detected and not corrected. Multiple bit errors of more than two bits, may or may not be detected, but cannot be corrected. Error correction schemes can be implemented to offer much greater protection than this, but they would require more check bits to be stored with the data. The correction codes and numbers of check bits used with regular processors and memory systems are chosen based on costs, complexity and relative likelihood of different types of errors. I worked on error correcting computer memories back in the late 1970's - the maths behind Hamming codes and similar methods was beyond me, but the hardware to make these things work is fascinating (in a geeky kind of way). Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.