Jump to content

rcu_sched_state detected detected stall on CPU 3


Recommended Posts

updated from rc2, log ends with rcu_sched_state detected detected stall on CPU 3

 

no web interface

I can telnet in

The flash drive is still shared out.

 

after searching somone else with an amd 880 board saslp and an amd x4 was having the same problem but he was disabling cores for some reason.

 

I tried changing the AMD's equivalent of speed step (C&Q) with no change. 

 

reverted back to rc3, works fine.

 

syslog.txt

Link to comment

updated from rc2, log ends with rcu_sched_state detected detected stall on CPU 3

 

no web interface

I can telnet in

The flash drive is still shared out.

 

after searching somone else with an amd 880 board saslp and an amd x4 was having the same problem but he was disabling cores for some reason.

 

I tried changing the AMD's equivalent of speed step (C&Q) with no change. 

 

reverted back to rc3, works fine.

 

Post a syslog please.

Link to comment

I have seen this three times with B5rc4.  It happens on my system when the Areca controller chokes on a SCSI command.

 

Jun  8 15:11:34 Tower9 emhttp: shcmd (1301): /usr/sbin/hdparm -y /dev/sdd &> /dev/null
Jun  8 15:11:34 Tower9 emhttp: _shcmd: shcmd (1301): exit status: 52
Jun  8 15:11:35 Tower9 kernel: arcmsr0: abort device command of scsi id = 4 lun = 0
Jun  8 15:11:38 Tower9 kernel: arcmsr0: pCCB ='0xf72bd700' isr got aborted command
Jun  8 15:11:39 Tower9 kernel: arcmsr: executing bus reset eh.....num_resets = 0, num_aborts = 1
Jun  8 15:11:39 Tower9 kernel: arcmsr0: executing hw bus reset .....
Jun  8 15:12:58 Tower9 kernel: INFO: rcu_sched_state detected stall on CPU 1 (t=6000 jiffies)
Jun  8 15:15:59 Tower9 kernel: INFO: rcu_sched_state detected stall on CPU 1 (t=24030 jiffies)
Jun  8 15:18:59 Tower9 kernel: INFO: rcu_sched_state detected stall on CPU 1 (t=42060 jiffies)

 

Some research points to disabling NCQ and setting queue depth to 1 on the Areca will prevent the error.  I have done so, but I did not have a way to make the problem happen on demand, to be able to test it.

Link to comment
  • 3 weeks later...
  • 2 weeks later...
  • 1 month later...

5 and 6(both versions)

 

Hmm I am running a sempron 145 cpu with rc5 have never seen this error.  Anything I can try to induce it??

 

I believe this error occurs only with multicore processors.  So if you want to induce this error, you will need to replace your processor.  ::)

Link to comment

5 and 6(both versions)

 

Hmm I am running a sempron 145 cpu with rc5 have never seen this error.  Anything I can try to induce it??

 

I believe this error occurs only with multicore processors.  So if you want to induce this error, you will need to replace your processor.  ::)

 

Just as well I have not tried to turn on the other core then isn't it.

Link to comment

I googled the error message and find a number of people reporting this fault with newer Linux kernels, in ArchLinux, Ubuntu, ZFS, Fedora, Suse etc.  Like the LSI problem, someone has traced it back to a particular commit.  It seems that there is an incompatibility with some specific hardware configuration(s) and this commit in the Linux kernel sources.

 

Some believe that the problem is associated with network interfaces (possibly with two interfaces enabled at the same time - some have turned their wireless port off and the problem has gone away).  Possible triggers are wpa2 enterprise, lease renewal, IPV6 ...

 

A patch which has effected a cure for some: here

Link to comment
  • 4 weeks later...

I googled the error message and find a number of people reporting this fault with newer Linux kernels, in ArchLinux, Ubuntu, ZFS, Fedora, Suse etc.  Like the LSI problem, someone has traced it back to a particular commit.  It seems that there is an incompatibility with some specific hardware configuration(s) and this commit in the Linux kernel sources.

 

Some believe that the problem is associated with network interfaces (possibly with two interfaces enabled at the same time - some have turned their wireless port off and the problem has gone away).  Possible triggers are wpa2 enterprise, lease renewal, IPV6 ...

 

A patch which has effected a cure for some: here

 

 

interesting. I cant get anything past rc3 to work on my stuff.

 

I've tried dell 745, 755 and the amd machine.

everything works fine until i try to mount the drives.

 

seems like with the dell i get a little more info out of it before it pukes.

 

Sep  6 22:18:16 Fileserver02 kernel: Pid: 1140, comm: emhttp Not tainted 3.4.4-unRAID #2 (Errors)
Sep  6 22:18:16 Fileserver02 kernel: Call Trace: (Errors)
Sep  6 22:18:16 Fileserver02 kernel:  [<c1055180>] print_cpu_stall+0x59/0xd1 (Errors)
Sep  6 22:18:16 Fileserver02 kernel:  [<c1055233>] __rcu_pending+0x3b/0x125 (Errors)
Sep  6 22:18:16 Fileserver02 kernel:  [<c1055393>] rcu_check_callbacks+0x76/0xa1 (Errors)
Sep  6 22:18:16 Fileserver02 kernel:  [<c102b11b>] update_process_times+0x2d/0x58 (Errors)
Sep  6 22:18:16 Fileserver02 kernel:  [<c1048dfe>] tick_periodic+0x63/0x65 (Errors)
Sep  6 22:18:16 Fileserver02 kernel:  [<c1048e19>] tick_handle_periodic+0x19/0x6c (Errors)
Sep  6 22:18:16 Fileserver02 kernel:  [<c10173f2>] smp_apic_timer_interrupt+0x67/0x7a (Errors)
Sep  6 22:18:16 Fileserver02 kernel:  [<c131f3fa>] apic_timer_interrupt+0x2a/0x30 (Errors)
Sep  6 22:18:16 Fileserver02 kernel:  [<c105e2c4>] ? find_get_page+0x48/0x6a (Errors)
Sep  6 22:18:16 Fileserver02 kernel:  [<c10a1bb6>] __find_get_block_slow+0x42/0x117 (Errors)
Sep  6 22:18:16 Fileserver02 kernel:  [<c131f3fa>] ? apic_timer_interrupt+0x2a/0x30 (Errors)
Sep  6 22:18:16 Fileserver02 kernel:  [<c10a2142>] __find_get_block+0x85/0x13c (Errors)
Sep  6 22:18:16 Fileserver02 kernel:  [<c10a1f64>] ? grow_dev_page+0x6b/0x10b (Errors)
Sep  6 22:18:16 Fileserver02 kernel:  [<c10a2298>] __getblk_slow+0x9f/0x146 (Errors)
Sep  6 22:18:16 Fileserver02 kernel:  [<c10a2366>] __getblk+0x27/0x30 (Errors)
Sep  6 22:18:16 Fileserver02 kernel:  [<c10a43f3>] __bread+0xc/0x71 (Errors)
Sep  6 22:18:16 Fileserver02 kernel:  [<c10e1300>] reiserfs_resize+0x54/0x444 (Errors)
Sep  6 22:18:16 Fileserver02 kernel:  [<c10d39f8>] ? reiserfs_remount+0x6c/0x373 (Errors)
Sep  6 22:18:16 Fileserver02 kernel:  [<c10d3a6f>] reiserfs_remount+0xe3/0x373 (Errors)
Sep  6 22:18:16 Fileserver02 kernel:  [<c105f0b8>] ? filemap_write_and_wait+0x22/0x2d (Errors)
Sep  6 22:18:16 Fileserver02 kernel:  [<c10a659a>] ? __sync_blockdev+0x24/0x26 (Errors)
Sep  6 22:18:16 Fileserver02 kernel:  [<c10d398c>] ? finish_unfinished+0x39f/0x39f (Errors)
Sep  6 22:18:16 Fileserver02 kernel:  [<c10864de>] do_remount_sb+0x9c/0x10e (Errors)
Sep  6 22:18:16 Fileserver02 kernel:  [<c109854e>] do_remount+0xe9/0x142 (Errors)
Sep  6 22:18:16 Fileserver02 kernel:  [<c109a0c2>] do_mount+0x10b/0x1c9 (Errors)
Sep  6 22:18:16 Fileserver02 kernel:  [<c109a1e1>] sys_mount+0x61/0x94 (Errors)
Sep  6 22:18:16 Fileserver02 kernel:  [<c131f01d>] syscall_call+0x7/0xb (Errors)

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...