bonzi

Members
  • Posts

    178
  • Joined

  • Last visited

Converted

  • Gender
    Undisclosed

bonzi's Achievements

Apprentice

Apprentice (3/14)

4

Reputation

1

Community Answers

  1. I have been running for 4 days without a crash with just one single stick of memory. I am going to let it go for at least a few more days then I will try putting in the other sticks and seeing which one is giving problems. Its strange that memtest did not pick this up at all but I will be able to figure this out now I think.
  2. Ok I will give that a try. Crashes are becoming more frequent now, often after a few hours and less than one day. Memtest does pass but it does seem like a hardware issue.
  3. It crashed again, this time I think we might have something more useful. Could it be that the USB ports are causing this and are failing or not drawing power properly for devices that need a lot of power? Especially I am thinking the USB port that the Coral TPU is plugged into? Thoughts? That is what it looks like to me from the log. I attached all of the log from the crash and reboot. @JorgeB if you could have a look at the first few lines and let me know if you think this is what is going on. Thanks! syslog-crash-2.rtf
  4. I had another crash last night, c-states are disabled. Frustratingly there seems to be little of interest in the syslog. The only activity I see is this which I believe is my Coral TPU: Oct 28 23:11:22 Tower kernel: usb 2-2: reset SuperSpeed USB device number 2 using xhci_hcd Oct 28 23:11:22 Tower kernel: usb 2-2: LPM exit latency is zeroed, disabling LPM. At this point I really do not know what to do, except hope to get more information before the crash. EDIT: I have also posted a screenshot of what is printed on to the screen at the time of the crash.
  5. Ok, I disabled c-states in the bios. I have done a overnight memtest, that passed without any issues at all. Let's see if the c-states fixes the problem.
  6. @JorgeB My server crashed again today. Managed to capture the syslog. Not sure what it all means. I am attaching a snip of it. I have done some googling not sure exactly what to make of it. Let me know what you think. Thanks! syslog-crash.rtf
  7. I am waiting for it to crash again, I think I will need that data to understand why this is happening. The memtest was fine, ran o/n no errors.
  8. Done, saw you recommend that in another thread. Good idea, that should get us more information. It passed an overnight run of memtest so all good there. Thanks!
  9. Hi, I am experiencing these every couple of days. Previously the system was rock solid. Nothing stands out in the diagnostics to me (attached), problem is that I need to capture this as it happens. Anyone have any ideas? I will do a memory test. That seems like the most likely culprit and report back. tower-diagnostics-20231018-1840.zip
  10. Hi Phillip, I didn't think so but I did have it set to a static IP. I set it to a different address and it is all good now.
  11. Hi, Yesterday I replaced the battery in my UPS and when I brought my server back up, I couldn't access the webgui. After trying several times, I realized that I could in fact access it about 20% of the time. I investigated further and it seemed like the flash had been corrupted so I went for a new install. However, my new install has the same exact problem and I have no idea what is going on. I am attaching my diagnostics. Would be grateful for any ideas at this point. I've been pulling my hair out for hours. EDIT: one more thing to add. Access seems to be better if I type tower.local, it will connect most of the time rather than the IP address of the server. tower-diagnostics-20230528-1912.zip
  12. Hoping someone might be able to help. I have installed Big Sur using the video instructions and it boots up and works well with VNC. I have been trying to setup passthrough of a 580x for the past few weeks and have had no luck but I feel like I am making progress and am likely close to making it work. Hardware is Asrock x399 Tiachi, Threadripper 1900x. IOMMU and other virtualization features are turned on in the BIOS. I have also enabled the PCIe overide and VFIO unsafe interrupts (both). Seems to be necessary. The GPU (video/audio) is binded to VFIO. I can pass the graphics card through to a Windows 10 VM. I am using the vBIOS for my card here: https://www.techpowerup.com/vgabios/194686/dell-rx580-8192-170301 When I try to pass the 580x through to my Mac VM, it hangs on the Mac loading screen at about 25%. My xml is below, hopefully someone knows how to fix this, I am at a bit of a loss: <?xml version='1.0' encoding='UTF-8'?> <domain type='kvm' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'> <name>Macinabox BigSur</name> <uuid>d28b8bc3-bb06-4370-abf5-66a3bb821127</uuid> <description>MacOS Big Sur</description> <metadata> <vmtemplate xmlns="unraid" name="Windows 10" icon="default.png" os="osx"/> </metadata> <memory unit='KiB'>16777216</memory> <currentMemory unit='KiB'>16777216</currentMemory> <memoryBacking> <nosharepages/> </memoryBacking> <vcpu placement='static'>8</vcpu> <cputune> <vcpupin vcpu='0' cpuset='1'/> <vcpupin vcpu='1' cpuset='9'/> <vcpupin vcpu='2' cpuset='3'/> <vcpupin vcpu='3' cpuset='11'/> <vcpupin vcpu='4' cpuset='5'/> <vcpupin vcpu='5' cpuset='13'/> <vcpupin vcpu='6' cpuset='7'/> <vcpupin vcpu='7' cpuset='15'/> </cputune> <resource> <partition>/machine</partition> </resource> <os> <type arch='x86_64' machine='pc-q35-4.2'>hvm</type> <loader readonly='yes' type='pflash'>/mnt/user/system/custom_ovmf/Macinabox_CODE-pure-efi.fd</loader> <nvram>/etc/libvirt/qemu/nvram/d28b8bc3-bb06-4370-abf5-66a3bb821127_VARS-pure-efi.fd</nvram> </os> <features> <acpi/> <apic/> </features> <cpu mode='host-passthrough' check='none' migratable='on'> <topology sockets='1' dies='1' cores='4' threads='2'/> <cache mode='passthrough'/> <feature policy='require' name='topoext'/> </cpu> <clock offset='utc'> <timer name='rtc' tickpolicy='catchup'/> <timer name='pit' tickpolicy='delay'/> <timer name='hpet' present='no'/> </clock> <on_poweroff>destroy</on_poweroff> <on_reboot>restart</on_reboot> <on_crash>restart</on_crash> <devices> <emulator>/usr/local/sbin/qemu</emulator> <disk type='file' device='disk'> <driver name='qemu' type='raw' cache='writeback'/> <source file='/mnt/cache/VM/Macinabox BigSur/macos_disk.img'/> <target dev='hdc' bus='sata'/> <boot order='1'/> <address type='drive' controller='0' bus='0' target='0' unit='2'/> </disk> <controller type='pci' index='0' model='pcie-root'/> <controller type='pci' index='1' model='pcie-root-port'> <model name='pcie-root-port'/> <target chassis='1' port='0x8'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0' multifunction='on'/> </controller> <controller type='pci' index='2' model='pcie-root-port'> <model name='pcie-root-port'/> <target chassis='2' port='0x9'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/> </controller> <controller type='pci' index='3' model='pcie-root-port'> <model name='pcie-root-port'/> <target chassis='3' port='0xb'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x3'/> </controller> <controller type='pci' index='4' model='pcie-root-port'> <model name='pcie-root-port'/> <target chassis='4' port='0x13'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x3'/> </controller> <controller type='pci' index='5' model='pcie-root-port'> <model name='pcie-root-port'/> <target chassis='5' port='0xa'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/> </controller> <controller type='pci' index='6' model='pcie-to-pci-bridge'> <model name='pcie-pci-bridge'/> <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/> </controller> <controller type='virtio-serial' index='0'> <address type='pci' domain='0x0000' bus='0x02' slot='0x00' function='0x0'/> </controller> <controller type='sata' index='0'> <address type='pci' domain='0x0000' bus='0x00' slot='0x1f' function='0x2'/> </controller> <controller type='usb' index='0' model='ich9-ehci1'> <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x7'/> </controller> <controller type='usb' index='0' model='ich9-uhci1'> <master startport='0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0' multifunction='on'/> </controller> <controller type='usb' index='0' model='ich9-uhci2'> <master startport='2'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x1'/> </controller> <controller type='usb' index='0' model='ich9-uhci3'> <master startport='4'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x2'/> </controller> <interface type='bridge'> <mac address='52:54:00:57:6d:8b'/> <source bridge='br0'/> <model type='e1000-82545em'/> <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x0'/> </interface> <serial type='pty'> <target type='isa-serial' port='0'> <model name='isa-serial'/> </target> </serial> <console type='pty'> <target type='serial' port='0'/> </console> <channel type='unix'> <target type='virtio' name='org.qemu.guest_agent.0'/> <address type='virtio-serial' controller='0' bus='0' port='1'/> </channel> <input type='tablet' bus='usb'> <address type='usb' bus='0' port='1'/> </input> <input type='mouse' bus='ps2'/> <input type='keyboard' bus='ps2'/> <graphics type='vnc' port='-1' autoport='yes' websocket='-1' listen='0.0.0.0' keymap='en-us'> <listen type='address' address='0.0.0.0'/> </graphics> <audio id='1' type='none'/> <video> <model type='qxl' ram='65536' vram='65536' vgamem='16384' heads='1' primary='yes'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0' multifunction='on'/> </video> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x41' slot='0x00' function='0x0'/> </source> <rom file='/mnt/cache/VM/Macinabox BigSur/Dell.RX580.8192.170301.rom'/> <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0' multifunction='on'/> </hostdev> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x41' slot='0x00' function='0x1'/> </source> <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x1'/> </hostdev> <memballoon model='none'/> </devices> <seclabel type='dynamic' model='dac' relabel='yes'/> <qemu:commandline> <qemu:arg value='-usb'/> <qemu:arg value='-device'/> <qemu:arg value='usb-kbd,bus=usb-bus.0'/> <qemu:arg value='-device'/> <qemu:arg value='************************'/> <qemu:arg value='-smbios'/> <qemu:arg value='type=2'/> <qemu:arg value='-cpu'/> <qemu:arg value='Penryn,kvm=on,vendor=GenuineIntel,+kvm_pv_unhalt,+kvm_pv_eoi,+hypervisor,+invtsc,+pcid,+ssse3,+sse4.2,+popcnt,+avx,+avx2,+aes,+fma,+fma4,+bmi1,+bmi2,+xsave,+xsaveopt,+rdrand,check'/> </qemu:commandline> </domain>
  13. I operate a microscope at a University that collects ~5-10 TB of data daily. The company that supplies the microscope are unfortunately bunch of total morons. They installed a new windows server that acts as our fileserver recently where the data is saved to initially. They have locked us out of direct connections to the data from all of our linux machines on the network. No amount of calm reasoning and even angry yelling/pleading/threats seems to help. They only allow one single SMB mount of our data on a single fixed address. This computer is connected via slow 1Gbe connection that runs ~30-40 MB/s so not even gigabit speeds. We can change the address of the computer but all we are ever going to get is a single SMB mount of the data. The path to getting the data is the following file server-> SMB mount on windows machine -> copy to linux machines for processing. Obviously our linux machines can be networked with much faster connections 10Gbe or better. My question is does anyone have a good idea of how to bypass this idiocy so I can get my data in a timely fashion. It currently takes me weeks. Our IT person suggests to build another linux computer to mirror the data and send it to other places. That's super expensive ~30k, we have ordered this but it will likely be many more months till it is ready. Its all enterprise grade stuff so there as still tons of supply issues. Does anyone has any good ideas of how to get around this lunacy? Thanks!
  14. Got it, I needed to add hwaccel_args: -c:v h264_cuvid after the ffmpeg lines for my inputs. One last question is what should I put for the /dev/dri/renderD128, should I just leave this? It seems to work..