Re: Unable to load web interface or Hit Shares - CyberMew

CyberMew · July 11, 2016

I'm having this issue often as well, once I had a few corrupted file entries that caused it to hang (I think) and produce tons of errors on the console which I've fixed, but not sure if there's anything to be concerned about. I've attached my logs before powerdown and after a forced restart. So should I upgrade the filesystem as well in order to fix these issues?

tower-diagnostics-20160711-2338.zip

tower-diagnostics-20160712-0002.zip

RobJ · July 12, 2016

You're having a hardware issue with your system, causing 'machine check events', each preceded by momentary overheating CPU (all 4 cores) messages. I would install mcelog from the NerdPack, and see what it says, the next time it reports them. And of course you may want to examine the CPU cooling.

CyberMew · July 13, 2016

I might not have applied sufficient thermal paste on the CPU, hence the higher temps especially when there are a couple users trying to stream/transcode from Plex. I will try to replace the thermal paste again this weekend!

I've installed NerdPack, but how would I go about using mcelog?

Would the high cpu temps really be causing unraid web to lockup and render the disks inactive? There are no disk activities nor can we access the webui/shares via smb when it happens. I can't do a powerdown when this happens as well.

Just a side question, not sure if it's in the logs (and if I've mentioned this before), but my parity drive and disk4 often get errors. Are they ok or should I replace them?

CyberMew · July 13, 2016

Found the answer to my side question (finally!! the errors were making me uncomfortable)

Both drives were connected to https://www.amazon.com/gp/product/B00AZ9T3OU/, which I realise is problematic after looking at someone else's post http://lime-technology.com/forum/index.php?topic=50332.0, which linked me to http://lime-technology.com/forum/index.php?topic=40683.45

Thank the heavens! Going to try the solutions in that thread before it's time to switch out to another card!

RobJ · July 13, 2016

I've installed NerdPack, but how would I go about using mcelog?

mcelog is a module that's triggered when a Machine Check Event occurs, and is able to gather and log a fair amount of info about the MCE. Without it, all we know from the syslog is that an MCE occurred, but not the source of it (CPU, RAM, etc) or what the error was. If an MCE occurs again, there will be more info logged for us, to use in figuring out what needs to be fixed or replaced.

Would the high cpu temps really be causing unraid web to lockup and render the disks inactive? There are no disk activities nor can we access the webui/shares via smb when it happens. I can't do a powerdown when this happens as well.

I don't know. But it pays to deal with what we *can* deal with first, then see what else turns up, hoping that it will then be clearer to us with the obvious issues gone.

CyberMew · July 14, 2016

Oh that's great, I thought I had to do something in order to get it to appear in the logs. Glad that it's automatic.

I just got home, and seems like it's happening again. Unable to access my Plex, no hdd lights activity etc. This time however, I could access the webgui (a surprise!), which then I tried to do a stop array but it stuck at the first thing - Stopping Docker. No surprises there. Then proceed to do a powerdown, which didn't work as per usual. I will edit this post to attach the logs again once I'm able to get it out.

edit: attached logs.

Also, I've restarted with the bootup command line fix provided in the marvell topic, looks like it isn't working.

Jul 15 00:19:36 Tower emhttp: shcmd (11): /usr/local/sbin/set_ncq sdh 1 &> /dev/null
Jul 15 00:53:37 Tower kernel: ata9.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen

Jul 15 00:53:37 Tower kernel: ata9.00: failed command: SMART

Jul 15 00:53:37 Tower kernel: ata9.00: cmd b0/d1:01:01:4f:c2/00:00:00:00:00/00 tag 28 pio 512 in

Jul 15 00:53:37 Tower kernel: ata9.00: status: { DRDY }

Jul 15 00:53:37 Tower kernel: ata9: hard resetting link

Jul 15 00:53:38 Tower kernel: ata9: SATA link up 6.0 Gbps (SStatus 133 SControl 300)

Jul 15 00:53:38 Tower kernel: ata9.00: configured for UDMA/133

Jul 15 00:53:38 Tower kernel: ata9: EH complete

Jul 15 00:54:22 Tower kernel: ata9.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen

Jul 15 00:54:22 Tower kernel: ata9.00: failed command: IDENTIFY DEVICE

Jul 15 00:54:22 Tower kernel: ata9.00: cmd ec/00:01:00:00:00/00:00:00:00:00/00 tag 4 pio 512 in

Jul 15 00:54:22 Tower kernel: ata9.00: status: { DRDY }

Jul 15 00:54:22 Tower kernel: ata9: hard resetting link

Jul 15 00:54:23 Tower kernel: ata9: SATA link up 6.0 Gbps (SStatus 133 SControl 300)

Jul 15 00:54:23 Tower kernel: ata9.00: configured for UDMA/133

Jul 15 00:54:23 Tower kernel: ata9: EH complete

Not sure if it's affecting anything related to this main problem, but I'll also get another non-marvell sata card if possible to try and eliminate possible problems.

tower-diagnostics-20160715-0007.zip

Re: Unable to load web interface or Hit Shares - CyberMew

Recommended Posts

CyberMew

Link to comment

RobJ

Link to comment

CyberMew

Link to comment

CyberMew

Link to comment

RobJ

Link to comment

CyberMew

Link to comment

Join the conversation