Hotfix: Device 'vfio-pci' could not be initialized


jonp

Recommended Posts

Additional note:  this is a short-term fix and is NOT the same fix we are implementing in the longer term, so if you want a proper fix, you'll need to wait.

 

We recently discovered a bug that may have been preventing some people to pass through certain PCI devices to their VM guests.  If you saw something to the affect of this:

 

internal error: early end of file from monitor: possible problem:

2015-11-14T00:52:29.584664Z qemu-system-x86_64: -device vfio-pci,host=09:00.0,bus=root.1,addr=00.0,multifunction=on,x-vga=on: vfio: error opening /dev/vfio/37: Operation not permitted

2015-11-14T00:52:29.584683Z qemu-system-x86_64: -device vfio-pci,host=09:00.0,bus=root.1,addr=00.0,multifunction=on,x-vga=on: vfio: failed to get group 37

2015-11-14T00:52:29.584690Z qemu-system-x86_64: -device vfio-pci,host=09:00.0,bus=root.1,addr=00.0,multifunction=on,x-vga=on: Device initialization failed

2015-11-14T00:52:29.584697Z qemu-system-x86_64: -device vfio-pci,host=09:00.0,bus=root.1,addr=00.0,multifunction=on,x-vga=on: Device 'vfio-pci' could not be initialized

 

This may have affected you.  The solution for new users is quite simple, and will be addressed in an upcoming release.  For existing users, however, an automatic patch is slightly more complicated.  As such, I wanted to put out this quicker fix that can be self-applied.

 

NOTE:  IF YOU HAVE ANY VMs DEFINED UNDER THE VMs TAB, THIS PROCEDURE WILL REMOVE THEM, BUT NOT DELETE THEIR CORRESPONDING VIRTUAL DISKS.  YOU CAN RE-ADD YOUR VMs AND POINT THEM TO THEIR PREVIOUS VDISKS.  DO NOT DO THIS PROCEDURE UNLESS YOU ARE HAVING A PROBLEM WITH PASSING THROUGH PCI DEVICES TO VMs AS INDICATED PREVIOUSLY.  ALTERNATIVELY, SEE THE ADVANCED USER PROCEDURE AFTER THIS ONE FOR A WAY TO AVOID LOSING YOUR EXISTING VM CONFIGURATIONS.

 

1. Go to Settings -> VM Manager on the webGui

2. Set Enable VMs to No and click apply

3. Browse to the config -> plugins -> dynamix.kvm.manager folder on your flash device from a Windows or Mac device

4. Rename the domain.img file there to domain.old

5. Download the domain.zip file I have attached here

6. Extract it to the dynamix.kvm.manager folder on your flash device (should be a new domain.img file there now)

7. Go back to Settings -> VM Manager on the webGui

8. Set Enable VMs to Yes and click apply

9. The fix should be applied and will persist upon reboots

 

ADVANCED USERS ALTERNATE METHOD (SSH ACCESS REQUIRED)

So if you've already got existing VMs and want to make life a little easier for yourself, all you really need to do is download the qemu.conf file I have attached here, put that somewhere on your system, and then copy it to the /etc/libvirt folder, overwriting the existing file that is already there.  You have to do this with libvirt already in a running state, then restart libvirt after you've overwritten the file with the new one.

domain.zip

qemu.conf

Link to comment

Additional note:  this is a short-term fix and is NOT the same fix we are implementing in the longer term, so if you want a proper fix, you'll need to wait.

Is this patch still relevant for 6.1.4, or did the long term fix make it in? I didn't notice a reference to this particular issue in the changelog, but I may have missed it.
Link to comment

Additional note:  this is a short-term fix and is NOT the same fix we are implementing in the longer term, so if you want a proper fix, you'll need to wait.

Is this patch still relevant for 6.1.4, or did the long term fix make it in? I didn't notice a reference to this particular issue in the changelog, but I may have missed it.

 

6.1.4 does NOT address the issue this hotfix addresses.  And again, if you're not having a problem, you should NOT apply this hotfix.  It is ONLY in the event you are having this issue.

Link to comment
6.1.4 does NOT address the issue this hotfix addresses.  And again, if you're not having a problem, you should NOT apply this hotfix.  It is ONLY in the event you are having this issue.
Is this the same issue? It doesn't look the same as your example, and I haven't bothered to troubleshoot since you stated you were changing pass through mechanics in a future release.

internal error: early end of file from monitor: possible problem:
2015-11-18T01:36:10.907491Z qemu-system-x86_64: -device vfio-pci,host=01:00.0,bus=root.1,addr=00.0,multifunction=on,x-vga=on: vfio: error, group 1 is not viable, please ensure all devices within the iommu_group are bound to their vfio bus driver.
2015-11-18T01:36:10.907511Z qemu-system-x86_64: -device vfio-pci,host=01:00.0,bus=root.1,addr=00.0,multifunction=on,x-vga=on: vfio: failed to get group 1
2015-11-18T01:36:10.907518Z qemu-system-x86_64: -device vfio-pci,host=01:00.0,bus=root.1,addr=00.0,multifunction=on,x-vga=on: Device initialization failed
2015-11-18T01:36:10.907526Z qemu-system-x86_64: -device vfio-pci,host=01:00.0,bus=root.1,addr=00.0,multifunction=on,x-vga=on: Device 'vfio-pci' could not be initialized

Link to comment

6.1.4 does NOT address the issue this hotfix addresses.  And again, if you're not having a problem, you should NOT apply this hotfix.  It is ONLY in the event you are having this issue.
Is this the same issue? It doesn't look the same as your example, and I haven't bothered to troubleshoot since you stated you were changing pass through mechanics in a future release.

internal error: early end of file from monitor: possible problem:
2015-11-18T01:36:10.907491Z qemu-system-x86_64: -device vfio-pci,host=01:00.0,bus=root.1,addr=00.0,multifunction=on,x-vga=on: vfio: error, group 1 is not viable, please ensure all devices within the iommu_group are bound to their vfio bus driver.
2015-11-18T01:36:10.907511Z qemu-system-x86_64: -device vfio-pci,host=01:00.0,bus=root.1,addr=00.0,multifunction=on,x-vga=on: vfio: failed to get group 1
2015-11-18T01:36:10.907518Z qemu-system-x86_64: -device vfio-pci,host=01:00.0,bus=root.1,addr=00.0,multifunction=on,x-vga=on: Device initialization failed
2015-11-18T01:36:10.907526Z qemu-system-x86_64: -device vfio-pci,host=01:00.0,bus=root.1,addr=00.0,multifunction=on,x-vga=on: Device 'vfio-pci' could not be initialized

This hotfix is only for those of us that have more than 32 iommu groups and want to pass through a device that is in a group higher than 32.

As your error is with group 1 it is not the same problem  :)

Link to comment

6.1.4 does NOT address the issue this hotfix addresses.  And again, if you're not having a problem, you should NOT apply this hotfix.  It is ONLY in the event you are having this issue.
Is this the same issue? It doesn't look the same as your example, and I haven't bothered to troubleshoot since you stated you were changing pass through mechanics in a future release.

internal error: early end of file from monitor: possible problem:
2015-11-18T01:36:10.907491Z qemu-system-x86_64: -device vfio-pci,host=01:00.0,bus=root.1,addr=00.0,multifunction=on,x-vga=on: vfio: error, group 1 is not viable, please ensure all devices within the iommu_group are bound to their vfio bus driver.
2015-11-18T01:36:10.907511Z qemu-system-x86_64: -device vfio-pci,host=01:00.0,bus=root.1,addr=00.0,multifunction=on,x-vga=on: vfio: failed to get group 1
2015-11-18T01:36:10.907518Z qemu-system-x86_64: -device vfio-pci,host=01:00.0,bus=root.1,addr=00.0,multifunction=on,x-vga=on: Device initialization failed
2015-11-18T01:36:10.907526Z qemu-system-x86_64: -device vfio-pci,host=01:00.0,bus=root.1,addr=00.0,multifunction=on,x-vga=on: Device 'vfio-pci' could not be initialized

This hotfix is only for those of us that have more than 32 iommu groups and want to pass through a device that is in a group higher than 32.

As your error is with group 1 it is not the same problem  :)

Thanks. I'm currently quite happy running my VM's headless, but was experimenting since everyone said it was so easy. I'll wait for the pass through rewrites to make it into the release version and try again then. Troubleshooting something that is in flux is an exercise in frustration if you don't need the function to begin with.
Link to comment

Well your issue above is you need the pcie ACS Override.

Has your stance on using the override changed? Last I remember seeing, it was a hack that could be risky, depending on the specific hardware involved. Specifically, I am not interested in changing something that may cause silent data corruption. I'm trying to pass a GTX460 plugged in to the longer of the two slots of a X10SL7-F.

 

Please correct or point me to more reading if my recollection is flawed.

 

Also, feel free to move my posts to a new thread since my issue is not related to the hotfix thread.

 

Thanks for your time!

Link to comment
  • 2 months later...
  • 1 month later...
  • 2 weeks later...

internal error: early end of file from monitor: possible problem:

2016-03-10T18:14:26.733069Z qemu-system-x86_64: -device vfio-pci,host=02:00.0,bus=root.1,addr=00.0,multifunction=on,x-vga=on: vfio: error opening /dev/vfio/1: Device or resource busy

2016-03-10T18:14:26.733091Z qemu-system-x86_64: -device vfio-pci,host=02:00.0,bus=root.1,addr=00.0,multifunction=on,x-vga=on: vfio: failed to get group 1

2016-03-10T18:14:26.733103Z qemu-system-x86_64: -device vfio-pci,host=02:00.0,bus=root.1,addr=00.0,multifunction=on,x-vga=on: Device initialization failed

2016-03-10T18:14:26.733110Z qemu-system-x86_64: -device vfio-pci,host=02:00.0,bus=root.1,addr=00.0,multifunction=on,x-vga=on: Device 'vfio-pci' could not be initialized

 

 

 

How can i solve this? help please

Link to comment

internal error: early end of file from monitor: possible problem:

2016-03-10T18:14:26.733069Z qemu-system-x86_64: -device vfio-pci,host=02:00.0,bus=root.1,addr=00.0,multifunction=on,x-vga=on: vfio: error opening /dev/vfio/1: Device or resource busy

2016-03-10T18:14:26.733091Z qemu-system-x86_64: -device vfio-pci,host=02:00.0,bus=root.1,addr=00.0,multifunction=on,x-vga=on: vfio: failed to get group 1

2016-03-10T18:14:26.733103Z qemu-system-x86_64: -device vfio-pci,host=02:00.0,bus=root.1,addr=00.0,multifunction=on,x-vga=on: Device initialization failed

2016-03-10T18:14:26.733110Z qemu-system-x86_64: -device vfio-pci,host=02:00.0,bus=root.1,addr=00.0,multifunction=on,x-vga=on: Device 'vfio-pci' could not be initialized

 

 

 

How can i solve this? help please

 

This is not your issue.

 

All devices in group 1 must be passed to the VM, or "stubbed" to not be used.

Under the system devices you can find out your IOMMU grouping for your hardware.

 

http://lime-technology.com/wiki/index.php/UnRAID_6/VM_Management

Specifically

Help! Failed to set iommu for container: Operation not permitted[edit]

 

If you are getting the above message when trying to assign a graphics device to a VM, it is most likely that your device is in an IOMMU group along with another active/in-use device on your system. Please see this article written by Alex Williamson on IOMMU groups if you wish to better understand this issue and how it impacts you. Under Settings -> VM Manager you will find an option to toggle for PCIe ACS Override, which will forcibly break out each device into it's own IOMMU group (following a reboot of the system). This setting is experimental, so use with caution.

Another possibility here is that your system doesn't support interrupt remapping, which is critical for VFIO and GPU pass through. There is a workaround for this, but you will not be protected against MSI-based interrupt injection attacks by guests (more info about MSI injection attacks through VT-d). If you completely trust your VM guests and the drivers inside them, enabling this workaround should resolve the issue. The alternative is to purchase hardware that offers interrupt remapping support. To enable the workaround, you will need to modify your syslinux.cfg file, adding the bolded bit below:

Link to comment

Thank you. so How can i separate my vcards from iommu groups.

 

PCI Devices

 

00:00.0 Host bridge: Intel Corporation Sky Lake Host Bridge/DRAM Registers (rev 07)

00:01.0 PCI bridge: Intel Corporation Sky Lake PCIe Controller (x16) (rev 07)

00:01.1 PCI bridge: Intel Corporation Sky Lake PCIe Controller (x8) (rev 07)

00:14.0 USB controller: Intel Corporation Sunrise Point-H USB 3.0 xHCI Controller (rev 31)

00:16.0 Communication controller: Intel Corporation Sunrise Point-H CSME HECI #1 (rev 31)

00:17.0 SATA controller: Intel Corporation Device a102 (rev 31)

00:1b.0 PCI bridge: Intel Corporation Sunrise Point-H PCI Root Port #17 (rev f1)

00:1b.3 PCI bridge: Intel Corporation Sunrise Point-H PCI Root Port #20 (rev f1)

00:1c.0 PCI bridge: Intel Corporation Sunrise Point-H PCI Express Root Port #1 (rev f1)

00:1d.0 PCI bridge: Intel Corporation Sunrise Point-H PCI Express Root Port #9 (rev f1)

00:1f.0 ISA bridge: Intel Corporation Sunrise Point-H LPC Controller (rev 31)

00:1f.2 Memory controller: Intel Corporation Sunrise Point-H PMC (rev 31)

00:1f.3 Audio device: Intel Corporation Sunrise Point-H HD Audio (rev 31)

00:1f.4 SMBus: Intel Corporation Sunrise Point-H SMBus (rev 31)

00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (2) I219-V (rev 31)

01:00.0 VGA compatible controller: NVIDIA Corporation GM204 [GeForce GTX 970] (rev a1)

01:00.1 Audio device: NVIDIA Corporation GM204 High Definition Audio Controller (rev a1)

02:00.0 VGA compatible controller: NVIDIA Corporation GM204 [GeForce GTX 970] (rev a1)

02:00.1 Audio device: NVIDIA Corporation GM204 High Definition Audio Controller (rev a1)

04:00.0 SATA controller: ASMedia Technology Inc. ASM1062 Serial ATA Controller (rev 02)

05:00.0 USB controller: ASMedia Technology Inc. Device 1242

IOMMU Groups

 

/sys/kernel/iommu_groups/0/devices/0000:00:00.0

/sys/kernel/iommu_groups/1/devices/0000:00:01.0

/sys/kernel/iommu_groups/1/devices/0000:00:01.1

/sys/kernel/iommu_groups/1/devices/0000:01:00.0

/sys/kernel/iommu_groups/1/devices/0000:01:00.1

/sys/kernel/iommu_groups/1/devices/0000:02:00.0

/sys/kernel/iommu_groups/1/devices/0000:02:00.1

/sys/kernel/iommu_groups/2/devices/0000:00:14.0

/sys/kernel/iommu_groups/3/devices/0000:00:16.0

/sys/kernel/iommu_groups/4/devices/0000:00:17.0

/sys/kernel/iommu_groups/5/devices/0000:00:1b.0

/sys/kernel/iommu_groups/5/devices/0000:00:1b.3

/sys/kernel/iommu_groups/5/devices/0000:04:00.0

/sys/kernel/iommu_groups/6/devices/0000:00:1c.0

/sys/kernel/iommu_groups/6/devices/0000:05:00.0

/sys/kernel/iommu_groups/7/devices/0000:00:1d.0

/sys/kernel/iommu_groups/8/devices/0000:00:1f.0

/sys/kernel/iommu_groups/8/devices/0000:00:1f.2

/sys/kernel/iommu_groups/8/devices/0000:00:1f.3

/sys/kernel/iommu_groups/8/devices/0000:00:1f.4

/sys/kernel/iommu_groups/8/devices/0000:00:1f.6

Link to comment

Thank you. so How can i separate my vcards from iommu groups.

 

PCI Devices

 

00:00.0 Host bridge: Intel Corporation Sky Lake Host Bridge/DRAM Registers (rev 07)

00:01.0 PCI bridge: Intel Corporation Sky Lake PCIe Controller (x16) (rev 07)

00:01.1 PCI bridge: Intel Corporation Sky Lake PCIe Controller (x8) (rev 07)

00:14.0 USB controller: Intel Corporation Sunrise Point-H USB 3.0 xHCI Controller (rev 31)

00:16.0 Communication controller: Intel Corporation Sunrise Point-H CSME HECI #1 (rev 31)

00:17.0 SATA controller: Intel Corporation Device a102 (rev 31)

00:1b.0 PCI bridge: Intel Corporation Sunrise Point-H PCI Root Port #17 (rev f1)

00:1b.3 PCI bridge: Intel Corporation Sunrise Point-H PCI Root Port #20 (rev f1)

00:1c.0 PCI bridge: Intel Corporation Sunrise Point-H PCI Express Root Port #1 (rev f1)

00:1d.0 PCI bridge: Intel Corporation Sunrise Point-H PCI Express Root Port #9 (rev f1)

00:1f.0 ISA bridge: Intel Corporation Sunrise Point-H LPC Controller (rev 31)

00:1f.2 Memory controller: Intel Corporation Sunrise Point-H PMC (rev 31)

00:1f.3 Audio device: Intel Corporation Sunrise Point-H HD Audio (rev 31)

00:1f.4 SMBus: Intel Corporation Sunrise Point-H SMBus (rev 31)

00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (2) I219-V (rev 31)

01:00.0 VGA compatible controller: NVIDIA Corporation GM204 [GeForce GTX 970] (rev a1)

01:00.1 Audio device: NVIDIA Corporation GM204 High Definition Audio Controller (rev a1)

02:00.0 VGA compatible controller: NVIDIA Corporation GM204 [GeForce GTX 970] (rev a1)

02:00.1 Audio device: NVIDIA Corporation GM204 High Definition Audio Controller (rev a1)

04:00.0 SATA controller: ASMedia Technology Inc. ASM1062 Serial ATA Controller (rev 02)

05:00.0 USB controller: ASMedia Technology Inc. Device 1242

IOMMU Groups

 

/sys/kernel/iommu_groups/0/devices/0000:00:00.0

/sys/kernel/iommu_groups/1/devices/0000:00:01.0

/sys/kernel/iommu_groups/1/devices/0000:00:01.1

/sys/kernel/iommu_groups/1/devices/0000:01:00.0

/sys/kernel/iommu_groups/1/devices/0000:01:00.1

/sys/kernel/iommu_groups/1/devices/0000:02:00.0

/sys/kernel/iommu_groups/1/devices/0000:02:00.1

/sys/kernel/iommu_groups/2/devices/0000:00:14.0

/sys/kernel/iommu_groups/3/devices/0000:00:16.0

/sys/kernel/iommu_groups/4/devices/0000:00:17.0

/sys/kernel/iommu_groups/5/devices/0000:00:1b.0

/sys/kernel/iommu_groups/5/devices/0000:00:1b.3

/sys/kernel/iommu_groups/5/devices/0000:04:00.0

/sys/kernel/iommu_groups/6/devices/0000:00:1c.0

/sys/kernel/iommu_groups/6/devices/0000:05:00.0

/sys/kernel/iommu_groups/7/devices/0000:00:1d.0

/sys/kernel/iommu_groups/8/devices/0000:00:1f.0

/sys/kernel/iommu_groups/8/devices/0000:00:1f.2

/sys/kernel/iommu_groups/8/devices/0000:00:1f.3

/sys/kernel/iommu_groups/8/devices/0000:00:1f.4

/sys/kernel/iommu_groups/8/devices/0000:00:1f.6

 

The cards show up as two devices: the GPU and the HDMI/DP audio portion, pass both through to your VM. If you are using the GUI, don't start the VM and then "edit" the VM to verify that it is both 1:00:00 and 1:00:01 if it is, start it up.

 

Then make sure the next VM you create doesn't have the same GPU/audio card passed through.

Link to comment

The only GPU pass throught is: /sys/kernel/iommu_groups/1/devices/0000:02:00.0

                                              /sys/kernel/iommu_groups/1/devices/0000:02:00.1

 

 

 

This one: /sys/kernel/iommu_groups/1/devices/0000:01:00.0

              /sys/kernel/iommu_groups/1/devices/0000:01:00.1

 

The monitor stay black

 

PD: i have two asus strix GeForce GTX 970, i'm al ready install video driver in both VM trough /sys/kernel/iommu_groups/1/devices/0000:02:00.0

                                                                                                                                        /sys/kernel/iommu_groups/1/devices/0000:02:00.1

 

Please i need to solve this, help, thanks

Link to comment

The only GPU pass throught is: /sys/kernel/iommu_groups/1/devices/0000:02:00.0

                                              /sys/kernel/iommu_groups/1/devices/0000:02:00.1

 

 

 

This one: /sys/kernel/iommu_groups/1/devices/0000:01:00.0

              /sys/kernel/iommu_groups/1/devices/0000:01:00.1

 

The monitor stay black

 

PD: i have two asus strix GeForce GTX 970, i'm al ready install video driver in both VM trough /sys/kernel/iommu_groups/1/devices/0000:02:00.0

                                                                                                                                        /sys/kernel/iommu_groups/1/devices/0000:02:00.1

 

Please i need to solve this, help, thanks

 

Does your CPU have an iGPU? If so, enable it and set it to primary in the BIOS. There is currently an issue with passing through the first NVidia GPU to a VM.

Link to comment

I just came across this error in 6.2b19. Is it safe to use the qemu.conf file in this post overwriting the beta version file?

It's already in the new beta. Or do you have even more groups than in this hotfix?

 

I have 37 groups. Any harm in trying the file anyhow to see if it resolves the issue?

Link to comment
Guest
This topic is now closed to further replies.