unRAID Server Release 6.2.0-beta20 Available


Recommended Posts

Docker updates worked perfectly but now I am having some issues getting my Win10 VM updated.

 

I went into the VM tab and attempted to "edit" the VM (Win10ProIsolCPUs) so that it would update to the new settings. I may have done something wrong during that process because now the VM wont start and their are errors displayed on the VM tab. See screen capture.

 

I have copies of my old XML file and copies of the Win10 disk image. Should I attempt to fix this current "edit" or would it be better to use a template and import the existing Win10 disk image and then make changes to the generated XML if needed?

 

edit

 

I went into terminal and Virsh to see if I could start the VM from there. It reported that the VM started and when I turned on my TV the VM was being passed through Audio and Video as well as USB controller and attached devices. See screen shots.

 

The VM tab is still showing errors and the dashboard is no longer showing my working Dockers or VM's

 

Diagnostics attached

 

Couldn't add the diagnostics file to the previous post so here it is

 

Jude,

 

Please try booting into safe mode and report back if the errors persist.

 

Restarted in Safe mode same errors showing on VM tab page (see screen cap)

Clicking on the Windows icon brings up the menu but it will not start the VM.

 

Using the Virsh utility I was not able to start the VM

 

root@Tower:~# virsh
Welcome to virsh, the virtualization interactive terminal.

Type:  'help' for help with commands
       'quit' to quit

virsh # start Win10ProIsolCPUs
error: Failed to start domain Win10ProIsolCPUs
error: internal error: process exited while connecting to monitor: 2016-03-29T16:49:35.257351Z qemu-system-x86_64: -device vfio-pci,host=08:00.0,id=hostdev3,bus=pci.2,addr=0x7: vfio: error, group 13 is not viable, please ensure all devices within the iommu_group are bound to their vfio bus driver.
2016-03-29T16:49:35.257415Z qemu-system-x86_64: -device vfio-pci,host=08:00.0,id=hostdev3,bus=pci.2,addr=0x7: vfio: failed to get group 13
2016-03-29T16:49:35.257439Z qemu-system-x86_64: -device vfio-pci,host=08:00.0,id=hostdev3,bus=pci.2,addr=0x7: Device initialization failed

virsh # 

 

XML

 

<domain type='kvm' id='1'>
  <name>Win10ProIsolCPUs</name>
  <uuid>0c2749f8-96cd-1238-0965-6f9d33c9758f</uuid>
  <metadata>
    <vmtemplate xmlns="unraid" name="Windows 10" icon="windows.png" os="windows10"/>
  </metadata>
  <memory unit='KiB'>8388608</memory>
  <currentMemory unit='KiB'>8388608</currentMemory>
  <memoryBacking>
    <nosharepages/>
    <locked/>
  </memoryBacking>
  <vcpu placement='static'>5</vcpu>
  <cputune>
    <vcpupin vcpu='0' cpuset='2'/>
    <vcpupin vcpu='1' cpuset='3'/>
    <vcpupin vcpu='2' cpuset='4'/>
    <vcpupin vcpu='3' cpuset='5'/>
    <vcpupin vcpu='4' cpuset='6'/>
  </cputune>
  <resource>
    <partition>/machine</partition>
  </resource>
  <os>
    <type arch='x86_64' machine='pc-q35-2.3'>hvm</type>
  </os>
  <features>
    <acpi/>
    <apic/>
    <hyperv>
      <relaxed state='on'/>
      <vapic state='on'/>
      <spinlocks state='on' retries='8191'/>
      <vendor id='none'/>
    </hyperv>
  </features>
  <cpu mode='host-passthrough'>
    <topology sockets='1' cores='5' threads='1'/>
  </cpu>
  <clock offset='localtime'>
    <timer name='hypervclock' present='yes'/>
    <timer name='hpet' present='no'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <devices>
    <emulator>/usr/local/sbin/qemu</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='raw' cache='writeback'/>
      <source file='/mnt/disk/vmdisk/Win10Pro/vdisk1.img'/>
      <backingStore/>
      <target dev='hdc' bus='virtio'/>
      <boot order='1'/>
      <alias name='virtio-disk2'/>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x03' function='0x0'/>
    </disk>
    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <source file='/mnt/user/ISO Library Share/Windows.iso'/>
      <backingStore/>
      <target dev='hda' bus='sata'/>
      <readonly/>
      <boot order='2'/>
      <alias name='sata0-0-0'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>
    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <source file='/mnt/user/VirtIO Drivers/virtio-win-0.1.109.iso'/>
      <backingStore/>
      <target dev='hdb' bus='sata'/>
      <readonly/>
      <alias name='sata0-0-1'/>
      <address type='drive' controller='0' bus='0' target='0' unit='1'/>
    </disk>
    <controller type='usb' index='0' model='nec-xhci'>
      <alias name='usb'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
    </controller>
    <controller type='sata' index='0'>
      <alias name='ide'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x1f' function='0x2'/>
    </controller>
    <controller type='pci' index='0' model='pcie-root'>
      <alias name='pcie.0'/>
    </controller>
    <controller type='pci' index='1' model='dmi-to-pci-bridge'>
      <model name='i82801b11-bridge'/>
      <alias name='pci.1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x1e' function='0x0'/>
    </controller>
    <controller type='pci' index='2' model='pci-bridge'>
      <model name='pci-bridge'/>
      <target chassisNr='2'/>
      <alias name='pci.2'/>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x01' function='0x0'/>
    </controller>
    <controller type='virtio-serial' index='0'>
      <alias name='virtio-serial0'/>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x02' function='0x0'/>
    </controller>
    <interface type='bridge'>
      <mac address='52:54:00:69:22:69'/>
      <source bridge='br0'/>
      <target dev='vnet0'/>
      <model type='virtio'/>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x01' function='0x0'/>
    </interface>
    <serial type='pty'>
      <source path='/dev/pts/1'/>
      <target port='0'/>
      <alias name='serial0'/>
    </serial>
    <console type='pty' tty='/dev/pts/1'>
      <source path='/dev/pts/1'/>
      <target type='serial' port='0'/>
      <alias name='serial0'/>
    </console>
    <channel type='unix'>
      <source mode='bind' path='/var/lib/libvirt/qemu/channel/target/domain-Win10ProIsolCPUs/org.qemu.guest_agent.0'/>
      <target type='virtio' name='org.qemu.guest_agent.0' state='disconnected'/>
      <alias name='channel0'/>
      <address type='virtio-serial' controller='0' bus='0' port='1'/>
    </channel>
    <hostdev mode='subsystem' type='pci' managed='yes' xvga='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
      </source>
      <alias name='hostdev0'/>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x04' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x01' slot='0x00' function='0x1'/>
      </source>
      <alias name='hostdev1'/>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x05' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x02' slot='0x00' function='0x0'/>
      </source>
      <alias name='hostdev2'/>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x06' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x08' slot='0x00' function='0x0'/>
      </source>
      <alias name='hostdev3'/>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x07' function='0x0'/>
    </hostdev>
    <memballoon model='virtio'>
      <alias name='balloon0'/>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x08' function='0x0'/>
    </memballoon>
  </devices>
</domain>

 

Diagnostics attached

 

 

Edit

 

Booted up again in normal mode. Errors still on VM tab but able to use the Windows icon menu to start the VM

Screenshot_2016-03-29_11_38_43.png.ac3a9fb6849045c4144294e0470efadf.png

tower-diagnostics-20160329-1152.zip

Link to comment

Please excuse what may be an annoying question, redundant, but I can't remember if I or another asked if you had exhaustively tested the RAM?  I'm thinking something like 24 hours of Memtest.

 

Although there have been reports of issues with certain USB 3.0 drivers, what I see above looks too low level to be USB related.  Looks more like RAM or timers or CPU or VM/driver race condition or the like.

 

Edit: It would be interesting to know if anyone else has the same motherboard and BIOS and CPU etc, and what issues they are having?

 

Thanks for the reply Rob. I have done some 24/7 mem tests with no issues. I haven't done one recently though, worth another set of runs do we think?

 

Edit:

I have started a memtest so let's see what that does

 

Let's try this:

 

Reboot the system so you're at a clean boot.  Without the array started yet, login via SSH or Telnet.  Type the following command:

 

killall /usr/sbin/irqbalance

 

After this is done, start the array and begin using the system like normal.  Report back if you still have issues afterwards.

 

Thanks jonp

 

Well my system passed 20 hours of the memtest so i don't think it is a memory issue. I have just tried this, and will report back

 

As a note, i disabled autostart of my array to run this properly, ran the command as asked, and i have not had a single hardware error reported in my logs. I usually have at least 1 or 2 by the time the array has auto started

 

There is 1 more thing i thought of which may cause trouble. My NVMe drive was formatted as XFS before i put it into unraid and it was never reformatted so it is using my old format. It picks up as XFS correctly, obviously this is mounted as my cache drive and has my vms and that on it. As its NVMe im wondering if this could cause system lockups if the drive were to crash since it is a direct pci-e device

 

Just one to throw out there

Link to comment

If you want to help get support for your 2.0 flash device, your best bet would be to e-mail the QEMU mailing list with the details.  In the meantime, if the 2.0 flash device is required, you'll need to go back to passing through the entire PCIe controller.

That's what will do for now. Thanks for the info (ill get in touch with the QEMU community, though).

 

The bolded part is what I changed.  Instead of "de-ch" I used just "de".  Let me know if that works for you.

 

That worked right away! :D

 

Thank you. If this could make it to the drop-down of 6.20 me (and a few germans) would be super happy!

 

Thank you for confirming that this worked as expected.  We will be adding this to 6.2 (it was a very trivial thing to fix).  This was nothing more than a basic oversight when we first implemented keymap support for VNC.

 

Anything I could add to the go script to also change the keyboard layout for the boot modes?

 

We didn't have a ton of time to look into this for you, but from what we did research, I need you to submit this as a feature request.  It's not as simple as just adding something to your go file or a basic linux command.  It looks like there could be a lot more involved, but again, didn't have a ton of time to research it today.

Link to comment

Possible to add, clear and format disk larger than parity...

What happens if you fail the array by pulling 2 of the smaller disks? Or pull the larger disk and one small one? Does the array still start degraded?

 

I filled the disk with video files over the parity capacity, then started the array with the disk disable, could see all data but obviously any file that I tried to play that was copied over the parity limit didn't work.

Link to comment

Please excuse what may be an annoying question, redundant, but I can't remember if I or another asked if you had exhaustively tested the RAM?  I'm thinking something like 24 hours of Memtest.

 

Although there have been reports of issues with certain USB 3.0 drivers, what I see above looks too low level to be USB related.  Looks more like RAM or timers or CPU or VM/driver race condition or the like.

 

Edit: It would be interesting to know if anyone else has the same motherboard and BIOS and CPU etc, and what issues they are having?

 

Thanks for the reply Rob. I have done some 24/7 mem tests with no issues. I haven't done one recently though, worth another set of runs do we think?

 

Edit:

I have started a memtest so let's see what that does

 

Let's try this:

 

Reboot the system so you're at a clean boot.  Without the array started yet, login via SSH or Telnet.  Type the following command:

 

killall /usr/sbin/irqbalance

 

After this is done, start the array and begin using the system like normal.  Report back if you still have issues afterwards.

 

Thanks jonp

 

Well my system passed 20 hours of the memtest so i don't think it is a memory issue. I have just tried this, and will report back

 

As a note, i disabled autostart of my array to run this properly, ran the command as asked, and i have not had a single hardware error reported in my logs. I usually have at least 1 or 2 by the time the array has auto started

 

Perhaps you could try stopping and starting the array a bunch just to try and trigger the behavior.

 

There is 1 more thing i thought of which may cause trouble. My NVMe drive was formatted as XFS before i put it into unraid and it was never reformatted so it is using my old format. It picks up as XFS correctly, obviously this is mounted as my cache drive and has my vms and that on it. As its NVMe im wondering if this could cause system lockups if the drive were to crash since it is a direct pci-e device

 

Just one to throw out there

 

To be honest, I don't your NVMe device has anything to do with it.  The log events certainly don't point to that.  irqbalance seems to be the culprit, but not sure as to why.  First step is to simply see if the theory holds water.  We're testing that now by leaving irqbalance off.

Link to comment
There is 1 more thing i thought of which may cause trouble. My NVMe drive was formatted as XFS before i put it into unraid and it was never reformatted so it is using my old format. It picks up as XFS correctly, obviously this is mounted as my cache drive and has my vms and that on it. As its NVMe im wondering if this could cause system lockups if the drive were to crash since it is a direct pci-e device

 

Just one to throw out there

Same thing here, I had the NVMe disk outside of the array with xfs and just added it as the cache in 6.2. No re-formating was done, I asumed everything is fine.

 

I do not know if its related to your issue, but I also have some lockus, but just while running VMs with vDisks on the array and only with 6.2.

There is a third person with NVMe and lockups, see HERE for details.

 

I was hoping it had something to do with the "Host CPU-Overhead" in beta18 & 19, but that one is fixed.

My issue still remains.

 

My "Server 2012R2" VM has a vDisk for backups. With beta20, I moved it back to the cache to see if its better.

Yesterday at 3:00am the whole array and webpage got unresponsive, thats when the backupjob started writing to  the vDisk on the array.

Today probably the same, but I was sleeping, nothing was working when I woke up.

In both cases I had to powercycle the whole system.

 

So, I just came home, startet a VM, that I only use for home-office. So its on a SSD on the array. Within a minute or two, the system hang.

SSH still works, so I could create a diagnostics. (attached!)

At that time, 4 VMs were running. I was using some ressources through vpn during the day, evething was fine.

- Linux (Asterisk) VM, running only on the array since the restart this morning

- Server 2012R2 VM, running with the system disk on the cache and backup on the array (no backup jobs during the day)

- Gaming (Win10) VM , running only on the Cache (started around 16:41 UTC)

- Work (Win10) VM, running only on the array (started around 16:42 UTC)

 

I noticed the lockdown between 16:50 and 16:55. Because I knew I had to powercycle the system, I tried to shutdown as many VMs as possible.

- The work vm was never reachable through the network, console I dont know, webPage did not work.

- I could still connect through RDP to the Server 2012R2 and shut it down.

- I shutdown the Gaming VM

- I "force" Shutdown the Linux VM

 

I connected to the server by ssh to collect the diagnostics and saw that all the VMs that have a vDisk on the array were still running, but in the state "shutdown in Progress" (virsh list), so only the VM that has no vdisk on the array was shut down normaly.

 

TL;DR:

- Since 6.2 beta 18, my whole server gets unusable as soon as any VM has some sort of write activity on a vDisk that is placed on the array.

- I had no Issues with that in 6.1

- Apart from all the changes in 6.2, I put my old cache (Sata SSD) in the array, and I am using my NVMe disk as a cache device.

- unRAID itself seems fine, but anything that uses the array (VMs, _some_ Dockers, SMB, Midnight Commander) and the array itself) is unresponsive.

 

I also had the issue with NO cache device (libvirt.img was also on the array), but I never completly removed the NVMe disk.

I guess to rule out any issues with NVMe, I need to remove the Sata SSD from the array, put it back as the cache and remove the NVMe device completly... A lot of work, I have no diagnostics from 6.1., but if the issue remains after the removal of the NVMe disk, I would rollback and create some diagnistics. Unfortunatly, the "prevoius" folder probably contains beta 19 and not 6.1 :)

 

Maybe you find something in the Logs, until I find the time to do more testing.

unraid-diagnostics-20160329-1903.zip

Link to comment

I think there is a problem when you run more than 1 vdisk for a VM, and you have the vdisk's on different disks..

I sat up a Windows 2012 R2 VM with OS vdisk on cache, and data vdisk on array.. Worked fine, untill I copied something to the data vdisk.. had to use the reset button to get it up and running again.. Since I could not stop array, and reboot command did not work.. said it was going down for a reboot, but nothing happened.

Tried it multiple times.. and every time it copied about 1-2GB to the data vdisk, before it stopped.. and did not want to shut down..

I did not try this setup in 6.1.9.. only on 6.2 beta 18 and 19..

 

After that I deleted the VM.. and set it up again.. this time with only one vdisk (just made one extra partition when I installed Windows 2012 R2) and I also let unRaid deside where to put the vdisk image.

Working fine for me now :)

It has crashed suddenly.. but atleast I did not have to use the reset button again.

Link to comment

Please excuse what may be an annoying question, redundant, but I can't remember if I or another asked if you had exhaustively tested the RAM?  I'm thinking something like 24 hours of Memtest.

 

Although there have been reports of issues with certain USB 3.0 drivers, what I see above looks too low level to be USB related.  Looks more like RAM or timers or CPU or VM/driver race condition or the like.

 

Edit: It would be interesting to know if anyone else has the same motherboard and BIOS and CPU etc, and what issues they are having?

 

Thanks for the reply Rob. I have done some 24/7 mem tests with no issues. I haven't done one recently though, worth another set of runs do we think?

 

Edit:

I have started a memtest so let's see what that does

 

Let's try this:

 

Reboot the system so you're at a clean boot.  Without the array started yet, login via SSH or Telnet.  Type the following command:

 

killall /usr/sbin/irqbalance

 

After this is done, start the array and begin using the system like normal.  Report back if you still have issues afterwards.

 

Thanks jonp

 

Well my system passed 20 hours of the memtest so i don't think it is a memory issue. I have just tried this, and will report back

 

As a note, i disabled autostart of my array to run this properly, ran the command as asked, and i have not had a single hardware error reported in my logs. I usually have at least 1 or 2 by the time the array has auto started

 

Perhaps you could try stopping and starting the array a bunch just to try and trigger the behavior.

 

The issue seemed to be happening when i was doing a lot of intense stuff at the same time. I found a game that was able to crash the system every hour or less when playing multiplayer. Just played 3 hours of it and so far so good.

 

Ooo and still not hardware faults found in the logs

Link to comment

Docker updates worked perfectly but now I am having some issues getting my Win10 VM updated.

 

I went into the VM tab and attempted to "edit" the VM (Win10ProIsolCPUs) so that it would update to the new settings. I may have done something wrong during that process because now the VM wont start and their are errors displayed on the VM tab. See screen capture.

 

I have copies of my old XML file and copies of the Win10 disk image. Should I attempt to fix this current "edit" or would it be better to use a template and import the existing Win10 disk image and then make changes to the generated XML if needed?

 

edit

 

I went into terminal and Virsh to see if I could start the VM from there. It reported that the VM started and when I turned on my TV the VM was being passed through Audio and Video as well as USB controller and attached devices. See screen shots.

 

The VM tab is still showing errors and the dashboard is no longer showing my working Dockers or VM's

 

Diagnostics attached

 

Couldn't add the diagnostics file to the previous post so here it is

 

Jude,

 

Please try booting into safe mode and report back if the errors persist.

 

Restarted in Safe mode same errors showing on VM tab page (see screen cap)

Clicking on the Windows icon brings up the menu but it will not start the VM.

 

Using the Virsh utility I was not able to start the VM

 

root@Tower:~# virsh
Welcome to virsh, the virtualization interactive terminal.

Type:  'help' for help with commands
       'quit' to quit

virsh # start Win10ProIsolCPUs
error: Failed to start domain Win10ProIsolCPUs
error: internal error: process exited while connecting to monitor: 2016-03-29T16:49:35.257351Z qemu-system-x86_64: -device vfio-pci,host=08:00.0,id=hostdev3,bus=pci.2,addr=0x7: vfio: error, group 13 is not viable, please ensure all devices within the iommu_group are bound to their vfio bus driver.
2016-03-29T16:49:35.257415Z qemu-system-x86_64: -device vfio-pci,host=08:00.0,id=hostdev3,bus=pci.2,addr=0x7: vfio: failed to get group 13
2016-03-29T16:49:35.257439Z qemu-system-x86_64: -device vfio-pci,host=08:00.0,id=hostdev3,bus=pci.2,addr=0x7: Device initialization failed

virsh # 

 

XML

 

<domain type='kvm' id='1'>
  <name>Win10ProIsolCPUs</name>
  <uuid>0c2749f8-96cd-1238-0965-6f9d33c9758f</uuid>
  <metadata>
    <vmtemplate xmlns="unraid" name="Windows 10" icon="windows.png" os="windows10"/>
  </metadata>
  <memory unit='KiB'>8388608</memory>
  <currentMemory unit='KiB'>8388608</currentMemory>
  <memoryBacking>
    <nosharepages/>
    <locked/>
  </memoryBacking>
  <vcpu placement='static'>5</vcpu>
  <cputune>
    <vcpupin vcpu='0' cpuset='2'/>
    <vcpupin vcpu='1' cpuset='3'/>
    <vcpupin vcpu='2' cpuset='4'/>
    <vcpupin vcpu='3' cpuset='5'/>
    <vcpupin vcpu='4' cpuset='6'/>
  </cputune>
  <resource>
    <partition>/machine</partition>
  </resource>
  <os>
    <type arch='x86_64' machine='pc-q35-2.3'>hvm</type>
  </os>
  <features>
    <acpi/>
    <apic/>
    <hyperv>
      <relaxed state='on'/>
      <vapic state='on'/>
      <spinlocks state='on' retries='8191'/>
      <vendor id='none'/>
    </hyperv>
  </features>
  <cpu mode='host-passthrough'>
    <topology sockets='1' cores='5' threads='1'/>
  </cpu>
  <clock offset='localtime'>
    <timer name='hypervclock' present='yes'/>
    <timer name='hpet' present='no'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <devices>
    <emulator>/usr/local/sbin/qemu</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='raw' cache='writeback'/>
      <source file='/mnt/disk/vmdisk/Win10Pro/vdisk1.img'/>
      <backingStore/>
      <target dev='hdc' bus='virtio'/>
      <boot order='1'/>
      <alias name='virtio-disk2'/>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x03' function='0x0'/>
    </disk>
    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <source file='/mnt/user/ISO Library Share/Windows.iso'/>
      <backingStore/>
      <target dev='hda' bus='sata'/>
      <readonly/>
      <boot order='2'/>
      <alias name='sata0-0-0'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>
    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <source file='/mnt/user/VirtIO Drivers/virtio-win-0.1.109.iso'/>
      <backingStore/>
      <target dev='hdb' bus='sata'/>
      <readonly/>
      <alias name='sata0-0-1'/>
      <address type='drive' controller='0' bus='0' target='0' unit='1'/>
    </disk>
    <controller type='usb' index='0' model='nec-xhci'>
      <alias name='usb'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
    </controller>
    <controller type='sata' index='0'>
      <alias name='ide'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x1f' function='0x2'/>
    </controller>
    <controller type='pci' index='0' model='pcie-root'>
      <alias name='pcie.0'/>
    </controller>
    <controller type='pci' index='1' model='dmi-to-pci-bridge'>
      <model name='i82801b11-bridge'/>
      <alias name='pci.1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x1e' function='0x0'/>
    </controller>
    <controller type='pci' index='2' model='pci-bridge'>
      <model name='pci-bridge'/>
      <target chassisNr='2'/>
      <alias name='pci.2'/>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x01' function='0x0'/>
    </controller>
    <controller type='virtio-serial' index='0'>
      <alias name='virtio-serial0'/>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x02' function='0x0'/>
    </controller>
    <interface type='bridge'>
      <mac address='52:54:00:69:22:69'/>
      <source bridge='br0'/>
      <target dev='vnet0'/>
      <model type='virtio'/>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x01' function='0x0'/>
    </interface>
    <serial type='pty'>
      <source path='/dev/pts/1'/>
      <target port='0'/>
      <alias name='serial0'/>
    </serial>
    <console type='pty' tty='/dev/pts/1'>
      <source path='/dev/pts/1'/>
      <target type='serial' port='0'/>
      <alias name='serial0'/>
    </console>
    <channel type='unix'>
      <source mode='bind' path='/var/lib/libvirt/qemu/channel/target/domain-Win10ProIsolCPUs/org.qemu.guest_agent.0'/>
      <target type='virtio' name='org.qemu.guest_agent.0' state='disconnected'/>
      <alias name='channel0'/>
      <address type='virtio-serial' controller='0' bus='0' port='1'/>
    </channel>
    <hostdev mode='subsystem' type='pci' managed='yes' xvga='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
      </source>
      <alias name='hostdev0'/>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x04' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x01' slot='0x00' function='0x1'/>
      </source>
      <alias name='hostdev1'/>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x05' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x02' slot='0x00' function='0x0'/>
      </source>
      <alias name='hostdev2'/>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x06' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x08' slot='0x00' function='0x0'/>
      </source>
      <alias name='hostdev3'/>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x07' function='0x0'/>
    </hostdev>
    <memballoon model='virtio'>
      <alias name='balloon0'/>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x08' function='0x0'/>
    </memballoon>
  </devices>
</domain>

 

Diagnostics attached

 

 

Edit

 

Booted up again in normal mode. Errors still on VM tab but able to use the Windows icon menu to start the VM

 

Ok, you need to boot up in safe mode again, but please comment out these lines from your go file (or make a backup and revert to the stock go file):

 

cd /boot/packages && find . -name '*.auto_install' -type f -print | sort | xargs -n1 sh -c 

sleep 30; blockdev --setra 2048 /dev/md*
unraid_notify start

#
# Set up daily SSD cache trim for unRaid cron
#
fromdos < /boot/custom/DailyTrim > /etc/cron.daily/DailyTrim
chmod +x /etc/cron.daily/DailyTrim

#### Snap outside array disk mount and share
/boot/config/plugins/snap/snap.sh -b

### Mount and share snap disk
/boot/config/plugins/snap/snap.sh -m vmdisk
/boot/config/plugins/snap/snap.sh -ms vmdisk

 

I probably need to add this to the OP of this thread, but if anyone has an issue with the beta that has customized their installation with plugins, scripts, or other modifications, you need to strip down to a stock setup before posting an issue.  Safe mode is ok to use if all you have are plugins, but in your case, you have heavily customized your setup and almost any of those things could be causing this breakage.  If the issue persists, the next step I will resort to is having you recreate your libvirt.img file, manually copying the XML for your VMs to safe place before you do so.

Link to comment

Just found by accident that clearing a disk now does not put array offline, array is available during the process, don't think this was in the release notes.

 

I completely agree that this is a nice new feature, and needs to be mentioned ...  BUT it also needs to have a warning with it!

 

There was a very important reason this wasn't done before, and that is that Parity is invalid while clearing is being performed!  (Unless the drive has been Precleared of course!)  It was always safer the old way, keeping the array offline until the new drive is completely zeroed.

 

When the array is started, parity is assumed to be correct, so if anything happens, the bits of the parity drive can be used with the other drives to rebuild any data drive.  As soon as you add a new disk of unknown contents to the array, parity is wrong for every non-zero bit on that drive!  And it won't be correct until the last bit is zeroed, cleared.  Normally, nothing should happen during the clearing, so the likelihood of a mishap is extremely small.  But if anything goes wrong during the clearing, then the array is in a degraded state, with an invalid parity drive.  At that point, no drives can be rebuilt.

 

On the other hand, this is not as serious as that sounds, since if the clearing is aborted for any reason, you would simply remove the new drive, then 'trust parity' again, and you should be back where you started, before adding the new drive.  If what goes wrong is a *different* drive 'red-balling', then it's vital that you either make sure the clearing proceeds to completion (thereby returning parity to correctness), or you remove the new drive and attempt to get parity trusted (if that's possible in this situation).

 

It's still a nice feature, but I wanted to make sure users understand the ramifications.  There's a small risk involved.

 

Edit: I should add that Tom may have already implemented this in a safer way, such that if anything at all goes wrong, the new drive is immediately unassigned.

 

Right, in looking at the -beta18 change log I see that mention of this feature was omitted, sorry about that.  This feature was added as a result of coding changes to support P+Q in the "sync engine".  (There is a driver thread called "mdrecoveryd" which wakes up to perform operations such as parity sync, parity check, and data rebuild.  I call this the "sync engine" and added a new capability to clear, or set to zeros, the data partition of "new" disks added to an existing array.)

 

When you add "new" devices to an existing array, if those devices have not already been pre-cleared, then "mdrecoveryd" is started to clear those disks.  When the clear operation completes, those disks are added to the array and will appear "unformatted", whereupon you can format them and bring them online while array is still Started.  Devices which have already been pre-cleared are added immediately upon Start.  If for some reason the operation fails, e.g., write error to a disk being cleared, or user cancels, the array is left Started and those new disks are simply not added.  At no time is parity left "invalid".

 

In previous releases of unRaid, 6.1 and below, this operation was initiated by Starting the array, but it wouldn't actually "start" (bring shares online, etc), until after the "clearing" completed, which these days can take many hours.  I guess most people use 'pre-clear' anyway so they would not see an operational difference.

 

BTW: the "swap disable" operation is similar in that it copies "old parity" to "new parity" before starting the array - again resulting in a multi-hour delay before shares are brought on line.  That is why the function has been temporarily removed from 6.2 until we can add this capability into the "sync engine" as well.

Link to comment

I'm having an issue with the Unassigned Devices plugin with mounting a remote NFS share from another server.  The remote share mounts properly and is NFS shared on the unraid server properly but I get recurring log messages about exporting the NFS share.

 

I'm doing something here that is a bit circular and I am wondering about the wisdom of doing this.  Let's say that the unraid server UD is mounting a NFS share from a NAS server locally on the unraid server.  I mount the remote NFS share and re-share it on the unraid server using both SMB and NFS (if enabled on unraid).  So the remote NFS NAS share now ends up being re-shared with NFS on the unraid server.

 

Anyway here are the log entries I am getting.

 

Mar 27 07:05:12 Tower rpc.mountd[18611]: authenticated mount request from 192.168.1.3:950 for /mnt/user/Public (/mnt/user/Public)
Mar 27 07:06:01 Tower root: exportfs: /mnt/disks/MediaServer_Public does not support NFS export
Mar 27 07:06:42 Tower root: exportfs: /mnt/disks/MediaServer_Public does not support NFS export
Mar 27 07:07:01 Tower root: exportfs: /mnt/disks/MediaServer_Public does not support NFS export
Mar 27 07:07:12 Tower root: exportfs: /mnt/disks/MediaServer_Public does not support NFS export
Mar 27 07:07:23 Tower root: exportfs: /mnt/disks/MediaServer_Public does not support NFS export
Mar 27 07:08:01 Tower root: exportfs: /mnt/disks/MediaServer_Public does not support NFS export

 

The /etc/exports file:

# See exports(5) for a description.
# This file contains a list of all directories exported to other computers.
# It is used by rpc.nfsd and rpc.mountd.
"/mnt/disks/MediaServer_Public" -async,no_subtree_check,fsid=200 *(sec=sys,rw,insecure,anongid=100,anonuid=99,all_squash)
"/mnt/user/Computer Backups" -async,no_subtree_check,fsid=103 *(sec=sys,rw,insecure,anongid=100,anonuid=99,all_squash)
"/mnt/user/Public" -async,no_subtree_check,fsid=100 *(sec=sys,rw,insecure,anongid=100,anonuid=99,all_squash)
"/mnt/user/iTunes" -async,no_subtree_check,fsid=101 *(sec=sys,rw,insecure,anongid=100,anonuid=99,all_squash)

 

This does not happen on 6.1.9.

 

Diagnostics attached.

 

Hi dlandon,

 

Eric did look into this a bit and was surprised to hear that this worked for you on 6.1.9.  Did this work for you on either beta 18 or beta 19 or have all 6.2 releases exhibited this behavior?

 

One thing we found:  http://serverfault.com/questions/401312/how-to-create-an-nfs-proxy-by-using-kernel-server-client

 

The NFS protocol does not support proxies.

 

Now that article is a bit old, but likely still applies given that we're still using NFSv3.  I don't suspect NFSv4 will work any better for this.  I understand what you're trying to achieve here, but we honestly can't dedicate the time to investigate this as deeply as we'd need to in order to figure it out.  Have you tried exporting the NFS mounts over SMB?  Just curious to see if changing the protocol fixes this.

Link to comment

I think there is a problem when you run more than 1 vdisk for a VM, and you have the vdisk's on different disks..

I sat up a Windows 2012 R2 VM with OS vdisk on cache, and data vdisk on array.. Worked fine, untill I copied something to the data vdisk.. had to use the reset button to get it up and running again.. Since I could not stop array, and reboot command did not work.. said it was going down for a reboot, but nothing happened.

Tried it multiple times.. and every time it copied about 1-2GB to the data vdisk, before it stopped.. and did not want to shut down..

I did not try this setup in 6.1.9.. only on 6.2 beta 18 and 19..

 

After that I deleted the VM.. and set it up again.. this time with only one vdisk (just made one extra partition when I installed Windows 2012 R2) and I also let unRaid deside where to put the vdisk image.

Working fine for me now :)

It has crashed suddenly.. but atleast I did not have to use the reset button again.

 

As an FYI, I run one of my main VMs here with two virtual disks attached to it (one in a btrfs cache pool and one in the array).  I haven't noticed any issues and even just tried copying data from one to the other and back and haven't seen any problems.

 

Now it could be that one of your storage devices on the array is having problems.  Have you tried running a filesystem check?

Link to comment

Please excuse what may be an annoying question, redundant, but I can't remember if I or another asked if you had exhaustively tested the RAM?  I'm thinking something like 24 hours of Memtest.

 

Although there have been reports of issues with certain USB 3.0 drivers, what I see above looks too low level to be USB related.  Looks more like RAM or timers or CPU or VM/driver race condition or the like.

 

Edit: It would be interesting to know if anyone else has the same motherboard and BIOS and CPU etc, and what issues they are having?

 

Thanks for the reply Rob. I have done some 24/7 mem tests with no issues. I haven't done one recently though, worth another set of runs do we think?

 

Edit:

I have started a memtest so let's see what that does

 

Let's try this:

 

Reboot the system so you're at a clean boot.  Without the array started yet, login via SSH or Telnet.  Type the following command:

 

killall /usr/sbin/irqbalance

 

After this is done, start the array and begin using the system like normal.  Report back if you still have issues afterwards.

 

Thanks jonp

 

Well my system passed 20 hours of the memtest so i don't think it is a memory issue. I have just tried this, and will report back

 

As a note, i disabled autostart of my array to run this properly, ran the command as asked, and i have not had a single hardware error reported in my logs. I usually have at least 1 or 2 by the time the array has auto started

 

Perhaps you could try stopping and starting the array a bunch just to try and trigger the behavior.

 

The issue seemed to be happening when i was doing a lot of intense stuff at the same time. I found a game that was able to crash the system every hour or less when playing multiplayer. Just played 3 hours of it and so far so good.

 

Ooo and still not hardware faults found in the logs

 

Keep us posted.  This is definitely interesting.  If this turns out to be the issue, then I'll want you to reboot the system, leave irqbalance running and start the array.  Then copy the file I've attached to this message over to your flash device.  Then ssh into your server and type the following command:

 

/boot/irqdebug.sh

 

It will take a few moments to run and when its done, a new file called irqb_debug.txt will be put on the root of your flash device.  Upload that file back here for further analysis.

 

EDIT:  Would probably help if I attached the file ;-)  I forgot I had to zip it up first, so make sure you extract the file inside the zip to the root of the flash.

irqdebug.zip

Link to comment

Keep us posted.  This is definitely interesting.  If this turns out to be the issue, then I'll want you to reboot the system, leave irqbalance running and start the array.  Then copy the file I've attached to this message over to your flash device.  Then ssh into your server and type the following command:

 

/boot/irqdebug.sh

 

It will take a few moments to run and when its done, a new file called irqb_debug.txt will be put on the root of your flash device.  Upload that file back here for further analysis.

 

Hi Jonp,

 

Is it worth me running this for a few days to check it is working before running this or would you prefer it earlier? It's not an issue for me to run it, reboot and turn irq back off

Link to comment

Upgraded to 6.2.0-beta20 and array would not start due to four disks missing. I have a diagnostics file from before and after the upgrade.

 

My understanding from reading through the previous release posts is that this could be related to the marvel SATA controller that is integrated into this motherboard. Four drives are connected to the Marvel controllers. Prior to the upgrade all disks have functioned properly and this has never been a problem in the past. I checked my Bios firmware and I believe that I am on the most recent version for the GA-990FXA-UD5.

 

I have an LSI SATA SAS2008 RAID controller card plugged into the motherboard that has been flashed into IT mode that currently does not have any drives connected to it.

 

Should I power down and connect the drives currently attached to the Marvel SATA controllers to the SAS2008? (I have not tried this card - just flashed it recently and then installed it)

 

Or should I downgrade the software and then look at switching those drives over to the SAS2008 card?

 

Is there some way of upgrading the driver for the Marvel SATA controller? I would still like to be able to use those ports and they have always worked well in the past.

 

Jude,

 

If you are willing to test this again, I'd like you try booting up beta18 and see if the issue happens.  I know you've tested 6.1.x and 6.2-beta20, but we added support for AMD IOMMUv2 in beta 19 that wasn't there in beta 18.  Would like to know if this has anything to do with it given that the key event in your logs on the beta20 test was this:

Mar 28 08:41:02 Tower kernel: AMD-Vi: Event logged [iO_PAGE_FAULT device=03:00.1 domain=0x0000 address=0x00000000a9a60440 flags=0x0070]

Mar 28 08:41:02 Tower kernel: AMD-Vi: Event logged [iO_PAGE_FAULT device=03:00.1 domain=0x0000 address=0x00000000a9a60450 flags=0x0070]

Mar 28 08:41:02 Tower kernel: AMD-Vi: Event logged [iO_PAGE_FAULT device=03:00.1 domain=0x0000 address=0x00000000a9a80440 flags=0x0070]

Mar 28 08:41:02 Tower kernel: AMD-Vi: Event logged [iO_PAGE_FAULT device=03:00.1 domain=0x0000 address=0x00000000a9a80450 flags=0x0070]

...

Mar 28 08:41:02 Tower kernel: AMD-Vi: Event logged [iO_PAGE_FAULT device=09:00.1 domain=0x0000 address=0x00000000a9ae0440 flags=0x0070]

Mar 28 08:41:02 Tower kernel: AMD-Vi: Event logged [iO_PAGE_FAULT device=09:00.1 domain=0x0000 address=0x00000000a9ae0450 flags=0x0070]

Mar 28 08:41:02 Tower kernel: AMD-Vi: Event logged [iO_PAGE_FAULT device=09:00.1 domain=0x0000 address=0x00000000a9b00440 flags=0x0070]

Mar 28 08:41:02 Tower kernel: AMD-Vi: Event logged [iO_PAGE_FAULT device=09:00.1 domain=0x0000 address=0x00000000a9b00450 flags=0x0070]

Mar 28 08:41:02 Tower kernel: AMD-Vi: Event logged [iO_PAGE_FAULT device=03:00.1 domain=0x0000 address=0x00000000a9a60440 flags=0x0070]

Mar 28 08:41:02 Tower kernel: AMD-Vi: Event logged [iO_PAGE_FAULT device=03:00.1 domain=0x0000 address=0x00000000a9a60450 flags=0x0070]

 

What's interesting is that 09:00.1 nor 03:00.1 exist in your lspci (the parent device does, but not the function).  I'm no expert, but I think the marvell controllers use a virtual device of their own that conflicts with IOMMU/DMA.  Perhaps this issue doesn't present itself with IOMMUv1, but IOMMUv2 it does.  Please report back after testing to let us know.

Link to comment

Keep us posted.  This is definitely interesting.  If this turns out to be the issue, then I'll want you to reboot the system, leave irqbalance running and start the array.  Then copy the file I've attached to this message over to your flash device.  Then ssh into your server and type the following command:

 

/boot/irqdebug.sh

 

It will take a few moments to run and when its done, a new file called irqb_debug.txt will be put on the root of your flash device.  Upload that file back here for further analysis.

 

Hi Jonp,

 

Is it worth me running this for a few days to check it is working before running this or would you prefer it earlier? It's not an issue for me to run it, reboot and turn irq back off

 

Your logs clearly show a call trace that points a finger at irqbalance.  My plan is to take the information the script will output and submit this as a bug to the irqbalance project here:  https://github.com/Irqbalance/irqbalance.  They (the irqbalance team) ask for all the info the script collects, I just created the script itself with Eric really quickly to simplify the process for you to collect the info they need from an unRAID system.

 

That said, I'd like to confirm this is the root cause before we take that step.  If you've been thrashing away with your VMs for a while now and still haven't seen any issues (and like you said, by now you normally would have), then I'd say yeah, go ahead and reboot and let's get that info collected proactively.

 

Link to comment

Keep us posted.  This is definitely interesting.  If this turns out to be the issue, then I'll want you to reboot the system, leave irqbalance running and start the array.  Then copy the file I've attached to this message over to your flash device.  Then ssh into your server and type the following command:

 

/boot/irqdebug.sh

 

It will take a few moments to run and when its done, a new file called irqb_debug.txt will be put on the root of your flash device.  Upload that file back here for further analysis.

 

Hi Jonp,

 

Is it worth me running this for a few days to check it is working before running this or would you prefer it earlier? It's not an issue for me to run it, reboot and turn irq back off

 

Your logs clearly show a call trace that points a finger at irqbalance.  My plan is to take the information the script will output and submit this as a bug to the irqbalance project here:  https://github.com/Irqbalance/irqbalance.  They (the irqbalance team) ask for all the info the script collects, I just created the script itself with Eric really quickly to simplify the process for you to collect the info they need from an unRAID system.

 

That said, I'd like to confirm this is the root cause before we take that step.  If you've been thrashing away with your VMs for a while now and still haven't seen any issues (and like you said, by now you normally would have), then I'd say yeah, go ahead and reboot and let's get that info collected proactively.

 

Will do. Can you attach the file again? I can't see it in the other message

Link to comment

weird, here it is.

 

EDIT:  DOH!  I had attached it to the wrong post before.  Sorry about that.

 

No worries, here is the debug file as promised

 

Its 11pm here in the UK so luckily nobody is using the server :)

 

Thanks bigjme!

Link to comment

weird, here it is.

 

EDIT:  DOH!  I had attached it to the wrong post before.  Sorry about that.

 

No worries, here is the debug file as promised

 

Its 11pm here in the UK so luckily nobody is using the server :)

 

Thanks bigjme!

 

No problem. Hopefully it gives you some useful information.

I know there aren't many people running this level of hardware so i know its an unusual one

Link to comment

jude, why are you using SNAP anyway? It has been deprecated for nearly a year now, is unsupported, and doesn't even officially work on unRAID 6.1+

 

Honestly, I had no idea. SNAP has always worked fine for me I wasn't aware it had been deprecated.

 

Thanks for the heads up. I have removed it and installed Unassigned Devices. Works great and very easy to setup.

Link to comment
Guest
This topic is now closed to further replies.