Jump to content

How to resolve IRQ Conflicts in UnRaid


Recommended Posts

Just installed Docker and then created a container for Plex (which is running - default, not completely configured yet).  Seeing the following in my syslog and on the console.  Should I try to boot with the irqpoll option?  Duh, how do you do that?

 

I'm sure this means that there is some piece of hardware requesting an interrupt that something else has claim.  Seems like IRQ 16 is the problem - which is where the AOC-SAS2LP-MV8 HBA controller resides.

 

Tail of syslog (full syslog attached)

 

Apr 22 15:54:17 HunterNAS-6 kernel: irq 16: nobody cared (try booting with the "irqpoll" option) (Errors)
Apr 22 15:54:17 HunterNAS-6 kernel: CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.19.4-unRAID #1 (Errors)
Apr 22 15:54:17 HunterNAS-6 kernel: Hardware name: System manufacturer System Product Name/P8Z68-V PRO, BIOS 3603 11/09/2012
Apr 22 15:54:17 HunterNAS-6 kernel: 0000000000000000 ffff88041f203e18 ffffffff815f7e84 0000000000040001
Apr 22 15:54:17 HunterNAS-6 kernel: ffff88040ca0f600 ffff88041f203e48 ffffffff81075853 000000010029112f
Apr 22 15:54:17 HunterNAS-6 kernel: ffff88040ca0f600 0000000000000000 0000000000000010 ffff88041f203e88
Apr 22 15:54:17 HunterNAS-6 kernel: Call Trace: (Errors)
Apr 22 15:54:17 HunterNAS-6 kernel: <IRQ>  [<ffffffff815f7e84>] dump_stack+0x4c/0x6e
Apr 22 15:54:17 HunterNAS-6 kernel: [<ffffffff81075853>] __report_bad_irq+0x2b/0xbe
Apr 22 15:54:17 HunterNAS-6 kernel: [<ffffffff81075c5e>] note_interrupt+0x19d/0x227
Apr 22 15:54:17 HunterNAS-6 kernel: [<ffffffff81073dbc>] handle_irq_event_percpu+0xe0/0xf2
Apr 22 15:54:17 HunterNAS-6 kernel: [<ffffffff81073e0a>] handle_irq_event+0x3c/0x5e
Apr 22 15:54:17 HunterNAS-6 kernel: [<ffffffff810764bb>] handle_fasteoi_irq+0x7a/0xdb
Apr 22 15:54:17 HunterNAS-6 kernel: [<ffffffff8100d45a>] handle_irq+0x1a/0x24
Apr 22 15:54:17 HunterNAS-6 kernel: [<ffffffff810895ee>] ? __tick_nohz_idle_enter+0x27e/0x308
Apr 22 15:54:17 HunterNAS-6 kernel: [<ffffffff8100cefc>] do_IRQ+0x49/0xcd
Apr 22 15:54:17 HunterNAS-6 kernel: [<ffffffff815fdf2d>] common_interrupt+0x6d/0x6d
Apr 22 15:54:17 HunterNAS-6 kernel: <EOI>  [<ffffffff814eac9c>] ? cpuidle_enter_state+0x49/0x9f
Apr 22 15:54:17 HunterNAS-6 kernel: [<ffffffff814eac95>] ? cpuidle_enter_state+0x42/0x9f
Apr 22 15:54:17 HunterNAS-6 kernel: [<ffffffff814ead91>] cpuidle_enter+0x12/0x14
Apr 22 15:54:17 HunterNAS-6 kernel: [<ffffffff8106d148>] cpu_startup_entry+0x19a/0x272
Apr 22 15:54:17 HunterNAS-6 kernel: [<ffffffff815eb460>] rest_init+0x80/0x84
Apr 22 15:54:17 HunterNAS-6 kernel: [<ffffffff818aded9>] start_kernel+0x412/0x41f
Apr 22 15:54:17 HunterNAS-6 kernel: [<ffffffff818ad8bd>] ? set_init_arg+0x56/0x56
Apr 22 15:54:17 HunterNAS-6 kernel: [<ffffffff818ad120>] ? early_idt_handlers+0x120/0x120
Apr 22 15:54:17 HunterNAS-6 kernel: [<ffffffff818ad4c6>] x86_64_start_reservations+0x2a/0x2c
Apr 22 15:54:17 HunterNAS-6 kernel: [<ffffffff818ad5b6>] x86_64_start_kernel+0xee/0xfd
Apr 22 15:54:17 HunterNAS-6 kernel: handlers:
Apr 22 15:54:17 HunterNAS-6 kernel: [<ffffffff8149fec0>] usb_hcd_irq (Drive related)
Apr 22 15:54:17 HunterNAS-6 kernel: [<ffffffffa00a25e6>] mvs_interrupt [mvsas] (Drive related)
Apr 22 15:54:17 HunterNAS-6 kernel: Disabling IRQ #16

 

I did an lsdev, and there is mysas on IRQ 16, but I can't tell any conflicts from this.

 

Device            DMA   IRQ  I/O Ports
------------------------------------------------
0000:00:02.0                     f000-f03f
0000:00:19.0                     f080-f09f
0000:00:1f.2             26     f060-f07f   f0a0-f0a3   f0b0-f0b7   f0c0-f0c3   f0d0-f0d7
0000:00:1f.3                      f040-f05f
0000:01:00.0            27      e000-e01f     e020-e023     e030-e037     e040-e043     e050-e057
0000:05:00.0            28      d000-d00f     d010-d013     d020-d027     d030-d033     d040-d047
ACPI                                  0400-0403   0404-0405   0408-040b   0410-0415   0420-042f   0450-0450
acpi                            9
ahci                                   d000-d00f       d010-d013       d020-d027       d030-d033       d040-d047       e000-e01f       e020-e023       e030-e037       e040-e043       e050-e057     f060-f07f     f0a0-f0a3     f0b0-f0b7     f0c0-f0c3     f0d0-f0d7
cascade                       4
dma                                   0080-008f
dma1                                 0000-001f
dma2                                 00c0-00df
EC                                      0062-0062     0066-0066
ehci_hcd:usb2            23
eth0                          25
fpu                                     00f0-00ff
i8042                        1 12
keyboard                            0060-0060   0064-0064
mvsas                       16
PCI                                     0000-0cf7 0cf8-0cff 0d00-ffff   d000-dfff   e000-efff
pic1                                    0020-0021
pic2                                    00a0-00a1
pnp                                     0200-020f   0290-029f   0454-0457   0458-047f   04d0-04d1   0500-057f   0680-069f   164e-164f   ffff-ffff     ffff-ffff
PNP0C04:00                         00f0-00ff
PNP0C09:00                         0062-0062   0066-0066
rtc0                              8     0070-0077
timer                            0
timer0                                 0040-0043
timer1                                 0050-0053
vga+                                   03c0-03df

 

cat /proc/interrupts does show the SAS controller and the USB on the same IRQ?  IO-APIC  16-fasteoi  ehci_hcd:usb1, mvsas?  Since Linux can't change the IRQ, not sure how to resolve this.  My BIOS does not show any way to edit irq's...  And I've disabled anything that's not being used in BIOS (including the USB 3.0 controller)...  Thoughts?

 

           CPU0       CPU1       CPU2       CPU3
  0:         13          0          0          0   IO-APIC-edge      timer
  1:          3          0          0          0   IO-APIC-edge      i8042
  8:         33          0          0          0   IO-APIC-edge      rtc0
  9:          0          0          0          0   IO-APIC-fasteoi   acpi
12:          3          0          0          0   IO-APIC-edge      i8042
16:    2513722          0          0          0   IO-APIC  16-fasteoi   ehci_hcd:usb1, mvsas
23:       9469          0          0          0   IO-APIC  23-fasteoi   ehci_hcd:usb2
25:     559950          0          0          0   PCI-MSI-edge      0000:00:1f.2
26:      13377          0          0          0   PCI-MSI-edge      eth0
27:     260473          0          0          0   PCI-MSI-edge      0000:01:00.0
28:     565037          0          0          0   PCI-MSI-edge      0000:05:00.0
NMI:          0          0          0          0   Non-maskable interrupts
LOC:     141721     127078     121843     119970   Local timer interrupts
SPU:          0          0          0          0   Spurious interrupts
PMI:          0          0          0          0   Performance monitoring interrupts
IWI:          0          0          0          0   IRQ work interrupts
RTR:          3          0          0          0   APIC ICR read retries
RES:      87963       5858       6259       4213   Rescheduling interrupts
CAL:         72        117         73        126   Function call interrupts
TLB:       1976        887        835        851   TLB shootdowns
TRM:          0          0          0          0   Thermal event interrupts
THR:          0          0          0          0   Threshold APIC interrupts
MCE:          0          0          0          0   Machine check exceptions
MCP:          9          9          9          9   Machine check polls
HYP:          0          0          0          0   Hypervisor callback interrupts
ERR:          0
MIS:          0

syslog-2015-04-22.txt

Link to comment
kernel: irq 16: nobody cared (try booting with the "irqpoll" option)

 

This has been rather rare, and generally tough to solve.  Something set up an interrupt call on IRQ 16, and it happened, but nobody answered the bell.  That's a bug!  Somewhere.  The problem is trying to find who's to blame.  It's low-level, almost always hardware related.  In this case, you did the homework, which shows that there are 2 handlers that are *supposed* to handle any IRQ 16's, a USB driver and the mvsas driver.  But low level code involved could be in the BIOS USB support, or one of the USB drivers, or in the SAS card BIOS/firmware, or in the mvsas driver module.  Online, I found a number of 'IRQ #, nobody cared', involving the USB driver, so that puts some suspicion on it.

 

Things to try -

* update the motherboard BIOS (unlikely to help, but you never know)

* update the SAS card firmware (might help, probably low chance though)

* try the "irqpoll" option, just add the word to the append line in your syslinux.cfg (never heard of this helping anyone yet! plus it will likely affect system performance)

* move USB connected devices to very different ports, because there are different USB drivers assigned to different pairs of ports (just might help, and easy to try)

* replace motherboard or SAS card (sorry, sometimes that's the only choice left!)

* wait for help from someone else who's been there, and fixed it or worked around it

Link to comment

I thought this might be the case.  One interesting observation, everythings working fine.  I can access the USB w/o issue. I'm currently loading 1.8TB of movies on the system at 109MB/s.  Shouldn't there be something not working?

 

That's a good question.  I took a look at your syslog, found that your motherboard architecture is setup as 2 USB buses, with bus 1 assigned IRQ 16 and bus 2 assigned IRQ 23.  Your USB mouse, keyboard, and flash drive are all connected to bus 2, so you lucked out there.  Syslog says bus 1 has 6 ports and bus 2 has 8 ports, but some are just pinouts on the motherboard, some may not even have pins, and some may be the ports on the front of the case.  I think it's likely that some of the ports aren't working, once IRQ 16 was disabled.

 

However mvsas is handling 5 of your drives, Disks 2, 3, 4, 6, and 7.  I would normally think that the 5 drives would now be unresponsive.  Nowhere can I find an actual IRQ assigned to mvsas, yet it shows an mvsas interrupt handler on IRQ 16, so it must be.  Perhaps it is controlling the I/O differently, I don't know.  Since the syslog you attached stops immediately after the disabling of IRQ 16, it's not clear that you actually accessed any of those drives.  On boot, a parity sync began, which obviously read from them all, but the disabling happened just after 7.5 hours had passed, and I think by then the parity calc process had passed the 2GB mark, and was through with all 5 drives.  Your fast writing to the drives is probably going to User Shares, so it is all being written directly to the SSD Cache drive, not the 5 drives.  At 3:40am, the Mover will kick in and try to move the data from the Cache drive to the data drives, and then you will quickly find out if they are responding.  Or you can try reading directly from one of those disks now (not from the Shares, from the disk itself).  If that works, then mvsas is not using interrupt-driven I/O.

Link to comment
Since the syslog you attached stops immediately after the disabling of IRQ 16, it's not clear that you actually accessed any of those drives.

 

I moved nearly 2Tb of movie files after that syslog was generated.  The share (Movies) was across two disks (Disk2 and Disk3).  With high-water in place, it filled half of Disk2, then finished up on Disk3.  So unRAID was using both disks to its fullest capacity...as far as I can tell.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...