generalz Posted June 8, 2012 Share Posted June 8, 2012 updated from rc2, log ends with rcu_sched_state detected detected stall on CPU 3 no web interface I can telnet in The flash drive is still shared out. after searching somone else with an amd 880 board saslp and an amd x4 was having the same problem but he was disabling cores for some reason. I tried changing the AMD's equivalent of speed step (C&Q) with no change. reverted back to rc3, works fine. syslog.txt Link to comment
chickensoup Posted June 8, 2012 Share Posted June 8, 2012 updated from rc2, log ends with rcu_sched_state detected detected stall on CPU 3 no web interface I can telnet in The flash drive is still shared out. after searching somone else with an amd 880 board saslp and an amd x4 was having the same problem but he was disabling cores for some reason. I tried changing the AMD's equivalent of speed step (C&Q) with no change. reverted back to rc3, works fine. Post a syslog please. Link to comment
bubbaQ Posted June 8, 2012 Share Posted June 8, 2012 I have seen this three times with B5rc4. It happens on my system when the Areca controller chokes on a SCSI command. Jun 8 15:11:34 Tower9 emhttp: shcmd (1301): /usr/sbin/hdparm -y /dev/sdd &> /dev/null Jun 8 15:11:34 Tower9 emhttp: _shcmd: shcmd (1301): exit status: 52 Jun 8 15:11:35 Tower9 kernel: arcmsr0: abort device command of scsi id = 4 lun = 0 Jun 8 15:11:38 Tower9 kernel: arcmsr0: pCCB ='0xf72bd700' isr got aborted command Jun 8 15:11:39 Tower9 kernel: arcmsr: executing bus reset eh.....num_resets = 0, num_aborts = 1 Jun 8 15:11:39 Tower9 kernel: arcmsr0: executing hw bus reset ..... Jun 8 15:12:58 Tower9 kernel: INFO: rcu_sched_state detected stall on CPU 1 (t=6000 jiffies) Jun 8 15:15:59 Tower9 kernel: INFO: rcu_sched_state detected stall on CPU 1 (t=24030 jiffies) Jun 8 15:18:59 Tower9 kernel: INFO: rcu_sched_state detected stall on CPU 1 (t=42060 jiffies) Some research points to disabling NCQ and setting queue depth to 1 on the Areca will prevent the error. I have done so, but I did not have a way to make the problem happen on demand, to be able to test it. Link to comment
generalz Posted June 8, 2012 Author Share Posted June 8, 2012 hmm don't think I have those options on the MV8. I could go back to the bri10 im a little leery going back to that yet Link to comment
generalz Posted June 24, 2012 Author Share Posted June 24, 2012 hmm I put Bri10 back in, same issue. Put in an Intel nic and turned off the realtek. I think it has something to do with the amd 880 chipset Link to comment
TheWombat Posted July 8, 2012 Share Posted July 8, 2012 I have similar issues whether I disable cores or not. Amd phenom II x4, 880 motherboard. Have posted system logs in prior thread. Alex Link to comment
generalz Posted July 8, 2012 Author Share Posted July 8, 2012 yea, I'm not sure what the problem is, seems like anybody with the 880 is not going to work for now. Link to comment
Joe L. Posted July 8, 2012 Share Posted July 8, 2012 I have similar issues whether I disable cores or not. Amd phenom II x4, 880 motherboard. Have posted system logs in prior thread. Alex It appears as if there are some things you can try: http://www.mjmwired.net/kernel/Documentation/RCU/stallwarn.txt Link to comment
mejutty Posted August 12, 2012 Share Posted August 12, 2012 Which rc is this an issue with as I have an a88gmv running rc5 Link to comment
generalz Posted August 12, 2012 Author Share Posted August 12, 2012 5 and 6(both versions) Link to comment
mejutty Posted August 13, 2012 Share Posted August 13, 2012 5 and 6(both versions) Hmm I am running a sempron 145 cpu with rc5 have never seen this error. Anything I can try to induce it?? Link to comment
Frank1940 Posted August 13, 2012 Share Posted August 13, 2012 5 and 6(both versions) Hmm I am running a sempron 145 cpu with rc5 have never seen this error. Anything I can try to induce it?? I believe this error occurs only with multicore processors. So if you want to induce this error, you will need to replace your processor. Link to comment
mejutty Posted August 13, 2012 Share Posted August 13, 2012 5 and 6(both versions) Hmm I am running a sempron 145 cpu with rc5 have never seen this error. Anything I can try to induce it?? I believe this error occurs only with multicore processors. So if you want to induce this error, you will need to replace your processor. Just as well I have not tried to turn on the other core then isn't it. Link to comment
PeterB Posted August 13, 2012 Share Posted August 13, 2012 I googled the error message and find a number of people reporting this fault with newer Linux kernels, in ArchLinux, Ubuntu, ZFS, Fedora, Suse etc. Like the LSI problem, someone has traced it back to a particular commit. It seems that there is an incompatibility with some specific hardware configuration(s) and this commit in the Linux kernel sources. Some believe that the problem is associated with network interfaces (possibly with two interfaces enabled at the same time - some have turned their wireless port off and the problem has gone away). Possible triggers are wpa2 enterprise, lease renewal, IPV6 ... A patch which has effected a cure for some: here Link to comment
generalz Posted August 13, 2012 Author Share Posted August 13, 2012 huh, I have no idea how to apply that. But I did turn off the onboard nic and installed an intel one, no change. I might be able to try and take the lsi cards out and try just the onboard and a sil 3112 Link to comment
ixnu Posted August 14, 2012 Share Posted August 14, 2012 I might be able to try and take the lsi cards out and try just the onboard and a sil 3112 I have an 880 (Asrock 880GM-LE) with a Semperon 140 and I'd be interested in how this goes. A sil3132 busted my rig pretty bad. Link to comment
generalz Posted September 7, 2012 Author Share Posted September 7, 2012 I googled the error message and find a number of people reporting this fault with newer Linux kernels, in ArchLinux, Ubuntu, ZFS, Fedora, Suse etc. Like the LSI problem, someone has traced it back to a particular commit. It seems that there is an incompatibility with some specific hardware configuration(s) and this commit in the Linux kernel sources. Some believe that the problem is associated with network interfaces (possibly with two interfaces enabled at the same time - some have turned their wireless port off and the problem has gone away). Possible triggers are wpa2 enterprise, lease renewal, IPV6 ... A patch which has effected a cure for some: here interesting. I cant get anything past rc3 to work on my stuff. I've tried dell 745, 755 and the amd machine. everything works fine until i try to mount the drives. seems like with the dell i get a little more info out of it before it pukes. Sep 6 22:18:16 Fileserver02 kernel: Pid: 1140, comm: emhttp Not tainted 3.4.4-unRAID #2 (Errors) Sep 6 22:18:16 Fileserver02 kernel: Call Trace: (Errors) Sep 6 22:18:16 Fileserver02 kernel: [<c1055180>] print_cpu_stall+0x59/0xd1 (Errors) Sep 6 22:18:16 Fileserver02 kernel: [<c1055233>] __rcu_pending+0x3b/0x125 (Errors) Sep 6 22:18:16 Fileserver02 kernel: [<c1055393>] rcu_check_callbacks+0x76/0xa1 (Errors) Sep 6 22:18:16 Fileserver02 kernel: [<c102b11b>] update_process_times+0x2d/0x58 (Errors) Sep 6 22:18:16 Fileserver02 kernel: [<c1048dfe>] tick_periodic+0x63/0x65 (Errors) Sep 6 22:18:16 Fileserver02 kernel: [<c1048e19>] tick_handle_periodic+0x19/0x6c (Errors) Sep 6 22:18:16 Fileserver02 kernel: [<c10173f2>] smp_apic_timer_interrupt+0x67/0x7a (Errors) Sep 6 22:18:16 Fileserver02 kernel: [<c131f3fa>] apic_timer_interrupt+0x2a/0x30 (Errors) Sep 6 22:18:16 Fileserver02 kernel: [<c105e2c4>] ? find_get_page+0x48/0x6a (Errors) Sep 6 22:18:16 Fileserver02 kernel: [<c10a1bb6>] __find_get_block_slow+0x42/0x117 (Errors) Sep 6 22:18:16 Fileserver02 kernel: [<c131f3fa>] ? apic_timer_interrupt+0x2a/0x30 (Errors) Sep 6 22:18:16 Fileserver02 kernel: [<c10a2142>] __find_get_block+0x85/0x13c (Errors) Sep 6 22:18:16 Fileserver02 kernel: [<c10a1f64>] ? grow_dev_page+0x6b/0x10b (Errors) Sep 6 22:18:16 Fileserver02 kernel: [<c10a2298>] __getblk_slow+0x9f/0x146 (Errors) Sep 6 22:18:16 Fileserver02 kernel: [<c10a2366>] __getblk+0x27/0x30 (Errors) Sep 6 22:18:16 Fileserver02 kernel: [<c10a43f3>] __bread+0xc/0x71 (Errors) Sep 6 22:18:16 Fileserver02 kernel: [<c10e1300>] reiserfs_resize+0x54/0x444 (Errors) Sep 6 22:18:16 Fileserver02 kernel: [<c10d39f8>] ? reiserfs_remount+0x6c/0x373 (Errors) Sep 6 22:18:16 Fileserver02 kernel: [<c10d3a6f>] reiserfs_remount+0xe3/0x373 (Errors) Sep 6 22:18:16 Fileserver02 kernel: [<c105f0b8>] ? filemap_write_and_wait+0x22/0x2d (Errors) Sep 6 22:18:16 Fileserver02 kernel: [<c10a659a>] ? __sync_blockdev+0x24/0x26 (Errors) Sep 6 22:18:16 Fileserver02 kernel: [<c10d398c>] ? finish_unfinished+0x39f/0x39f (Errors) Sep 6 22:18:16 Fileserver02 kernel: [<c10864de>] do_remount_sb+0x9c/0x10e (Errors) Sep 6 22:18:16 Fileserver02 kernel: [<c109854e>] do_remount+0xe9/0x142 (Errors) Sep 6 22:18:16 Fileserver02 kernel: [<c109a0c2>] do_mount+0x10b/0x1c9 (Errors) Sep 6 22:18:16 Fileserver02 kernel: [<c109a1e1>] sys_mount+0x61/0x94 (Errors) Sep 6 22:18:16 Fileserver02 kernel: [<c131f01d>] syscall_call+0x7/0xb (Errors) Link to comment
generalz Posted September 7, 2012 Author Share Posted September 7, 2012 huh, so If i check the maintenance box and click start it comes up fine. maybe its my user shares it doesn't like Link to comment
joeshmoe1 Posted September 8, 2012 Share Posted September 8, 2012 huh, so If i check the maintenance box and click start it comes up fine. maybe its my user shares it doesn't like That's interesting. I wouldn't think it's the shares as I've had similar problems even after trying a completely fresh install (no shares added). http://lime-technology.com/forum/index.php?topic=21798.0 Link to comment
dgaschk Posted September 8, 2012 Share Posted September 8, 2012 Please update the HW wiki with this information. Link to comment
joeshmoe1 Posted September 12, 2012 Share Posted September 12, 2012 Please update the HW wiki with this information. I'm happy to update the wiki but I think my hardware is fairly common and the issue doesn't seem widespread. I've sent Tom a pm so I'm hoping he can provide some input. Link to comment
Recommended Posts
Archived
This topic is now archived and is closed to further replies.