Stall on CPU {X} and only option is to power off VM

SmallwoodDR82 · December 24, 2014

First let me say thank you in advance. I've been chasing this issue for awhile and I'm completely at a loss. I was lucky today and was able to pull the syslog before the crash was so severe it killed telnet.

Hardware :

Case: Norco 4224

Mb: Supermicro x9scl-f-o

CPU: Xeon 3.4Ghz E3-1240v2

RAM: Kingston 32GB

Controller: Intel M1015 (IT Mode)

Expander Card: RES2SV240

Running ESXi 5.5 with about 7 Guest. (Windows, Ubuntu, etc...)

Running unRAID 6.0beta12 Pro via Plop.

I was having this issue with 5.0.5 and the only plugin I had was Plex. So I decided to move to unRAID 6.0beta12 and run Plex as a docker. I did the move yesterday (12/23/2014) and all went smooth. Less than 24 hours later the same CPU stall error showed up

Small history: I had this issue on ESXi 5.1 so I upgraded to 5.5 in hopes it would solve it and I'm still having the same issue.

So far this issue has survived ESXi 5.1, ESXi 5.5, unRAID 5.0.5, and unRAID 6.0beta12. I'm kind of leaning toward hardware however none of my other VMs have issues at all. Granted they really don't have any passthrough.

Anyone seen this or have any ideas? Sometimes I can go over a month, other times less than 24 hours. Typically I have all 4 CPUs to unRAID. Just today I went down to 2 however I'd like to use 4 because of Plex.

Dec 24 14:33:30 S-M-C kernel: INFO: rcu_sched self-detected stall on CPU { 3}  (t=6000 jiffies g=2121076 c=2121075 q=47469)
Dec 24 14:33:30 S-M-C kernel: Task dump for CPU 3:
Dec 24 14:33:30 S-M-C kernel: shfs            R  running task        0 14923      1 0x00000008
Dec 24 14:33:30 S-M-C kernel: 0000000000000000 ffff88013fd83de8 ffffffff8105cc09 0000000000000003
Dec 24 14:33:30 S-M-C kernel: 0000000000000003 ffff88013fd83e00 ffffffff8105f2c4 ffffffff81822d00
Dec 24 14:33:30 S-M-C kernel: ffff88013fd83e30 ffffffff810766a5 ffffffff81822d00 ffff88013fd8e0c0
Dec 24 14:33:30 S-M-C kernel: Call Trace:
Dec 24 14:33:30 S-M-C kernel: <IRQ>  [<ffffffff8105cc09>] sched_show_task+0xbe/0xc3
Dec 24 14:33:30 S-M-C kernel: [<ffffffff8105f2c4>] dump_cpu_task+0x34/0x38
Dec 24 14:33:30 S-M-C kernel: [<ffffffff810766a5>] rcu_dump_cpu_stacks+0x6a/0x8c
Dec 24 14:33:30 S-M-C kernel: [<ffffffff81078ead>] rcu_check_callbacks+0x1e1/0x4ff
Dec 24 14:33:30 S-M-C kernel: [<ffffffff81086659>] ? tick_sched_handle+0x34/0x34
Dec 24 14:33:30 S-M-C kernel: [<ffffffff8107ac1a>] update_process_times+0x38/0x60
Dec 24 14:33:30 S-M-C kernel: [<ffffffff81086657>] tick_sched_handle+0x32/0x34
Dec 24 14:33:30 S-M-C kernel: [<ffffffff8108668e>] tick_sched_timer+0x35/0x53
Dec 24 14:33:30 S-M-C kernel: [<ffffffff8107b149>] __run_hrtimer.isra.29+0x57/0xb0
Dec 24 14:33:30 S-M-C kernel: [<ffffffff8107b634>] hrtimer_interrupt+0xd9/0x1c0
Dec 24 14:33:30 S-M-C kernel: [<ffffffff8102ea78>] local_apic_timer_interrupt+0x4f/0x52
Dec 24 14:33:30 S-M-C kernel: [<ffffffff8102ee4a>] smp_apic_timer_interrupt+0x3a/0x4b
Dec 24 14:33:30 S-M-C kernel: [<ffffffff815ead9d>] apic_timer_interrupt+0x6d/0x80
Dec 24 14:33:30 S-M-C kernel: <EOI>  [<ffffffff81154fd0>] ? unfix_nodes+0x13f/0x14b
Dec 24 14:33:30 S-M-C kernel: [<ffffffff81147aff>] ? __discard_prealloc+0x71/0xb1
Dec 24 14:33:30 S-M-C kernel: [<ffffffff81147ba2>] reiserfs_discard_all_prealloc+0x43/0x4c
Dec 24 14:33:30 S-M-C kernel: [<ffffffff81163ed6>] do_journal_end+0x4e1/0xc57
Dec 24 14:33:30 S-M-C kernel: [<ffffffff81164ba6>] journal_end+0xad/0xb4
Dec 24 14:33:30 S-M-C kernel: [<ffffffff8114b8d9>] reiserfs_unlink+0x1bf/0x21f
Dec 24 14:33:30 S-M-C kernel: [<ffffffff810fc287>] ? link_path_walk+0x67/0x70c
Dec 24 14:33:30 S-M-C kernel: [<ffffffff810ff1ed>] vfs_unlink+0xa7/0x120
Dec 24 14:33:30 S-M-C kernel: [<ffffffff810ff351>] do_unlinkat+0xeb/0x1ee
Dec 24 14:33:30 S-M-C kernel: [<ffffffff810f7750>] ? SyS_newlstat+0x25/0x2e
Dec 24 14:33:30 S-M-C kernel: [<ffffffff810fffe8>] SyS_unlink+0x11/0x13
Dec 24 14:33:30 S-M-C kernel: [<ffffffff815e9fa9>] system_call_fastpath+0x16/0x1b

Syslog attached.

Thanks all and happy holidays!

syslog.zip

SmallwoodDR82 · December 25, 2014

looks like there is a fix/work around. Another ReiserFS issue. $:-\$

http://lime-technology.com/forum/index.php?topic=35788.0

I have a lot of moving/formatting in my future...

SmallwoodDR82 · January 9, 2015

Update: 1/9/2015

After a few painfully slow weeks of transferring data around and formatting drives, I am now 100% on XFS array disks.

I'm am 48 hours in without a single CPU Stall. I will keep this thread updated.

Not just ready to mark it as solved.

Thanks!

SmallwoodDR82 · January 26, 2015

Update: 1/25/2015

19 days without an error/crash!

SmallwoodDR82 · February 10, 2015

Update: 2/9/2015

34 days, 2 hours and 1 minute without an error/crash! (not that I'm counting...)

Stall on CPU {X} and only option is to power off VM

Recommended Posts

SmallwoodDR82

Link to comment

SmallwoodDR82

Link to comment

SmallwoodDR82

Link to comment

SmallwoodDR82

Link to comment

SmallwoodDR82

Link to comment

Join the conversation