michaelmcq
Members-
Posts
37 -
Joined
-
Last visited
Converted
-
Gender
Undisclosed
Recent Profile Visitors
The recent visitors block is disabled and is not being shown to other users.
michaelmcq's Achievements
Noob (1/14)
1
Reputation
-
After a month or so of this not happening I’ve had it 3 times this week so I’m back to investigating 😞 any suggestions for the best way to identify the cause. I don’t really want to replace parts that are working and I suspect one of: backplane motherboard psu
-
Could it be PCIe lanes? I don’t understand it enough but I wonder if I have 2 HBAs running (14 drives) and 4 SSDs could that cause this problem?
-
Thanks, I think there might be a correlation between this happening and me hammering the SSDs in there at the same time, they’re not via the HBA but on the motherboard (Z490-A-PRO). So I was thinking either power draw or something motherboard related when it’s on board drives are working hard? Off to research power consumption!
-
Thank you, I took the server down, reseated the card and thought all was sorted but it's gone again tonight but this time it was both cards/all drives, could they both fail at pretty much the same time, that seems unlikely, I don't really know what else to look for in the logs, there was nothing else in the log immediately before both cards failed, again they're all back after a reboot Jul 15 19:05:44 Tower root: Total Spundown: 1 Jul 15 19:05:44 Tower root: Entering Turbo Mode Jul 15 19:05:44 Tower kernel: mdcmd (160): set md_write_method 1 Jul 15 19:05:44 Tower kernel: Jul 15 19:10:44 Tower root: Total Spundown: 1 ### [PREVIOUS LINE REPEATED 4 TIMES] ### Jul 15 19:30:46 Tower emhttpd: spinning down /dev/sds Jul 15 19:31:06 Tower emhttpd: spinning down /dev/sdl Jul 15 19:31:18 Tower emhttpd: spinning down /dev/sdj Jul 15 19:31:29 Tower emhttpd: spinning down /dev/sdo Jul 15 19:31:39 Tower emhttpd: spinning down /dev/sdp Jul 15 19:31:43 Tower emhttpd: spinning down /dev/sdk Jul 15 19:31:57 Tower emhttpd: spinning down /dev/sdh Jul 15 19:31:57 Tower emhttpd: spinning down /dev/sdi Jul 15 19:35:44 Tower root: Total Spundown: 9 Jul 15 19:35:44 Tower root: Entering Normal Mode Jul 15 19:35:44 Tower kernel: mdcmd (161): set md_write_method 0 Jul 15 19:35:44 Tower kernel: Jul 15 19:40:44 Tower root: Total Spundown: 9 ### [PREVIOUS LINE REPEATED 8 TIMES] ### Jul 15 20:23:48 Tower emhttpd: read SMART /dev/sdo Jul 15 20:25:44 Tower root: Total Spundown: 8 ### [PREVIOUS LINE REPEATED 2 TIMES] ### Jul 15 20:36:11 Tower kernel: mpt2sas_cm0: SAS host is non-operational !!!! Jul 15 20:36:11 Tower kernel: mpt2sas_cm1: SAS host is non-operational !!!! Jul 15 20:36:12 Tower kernel: mpt2sas_cm0: SAS host is non-operational !!!! Jul 15 20:36:12 Tower kernel: mpt2sas_cm1: SAS host is non-operational !!!! Jul 15 20:36:13 Tower kernel: mpt2sas_cm0: SAS host is non-operational !!!! Jul 15 20:36:13 Tower kernel: mpt2sas_cm1: SAS host is non-operational !!!! Jul 15 20:36:14 Tower kernel: mpt2sas_cm0: SAS host is non-operational !!!! Jul 15 20:36:14 Tower kernel: mpt2sas_cm1: SAS host is non-operational !!!! Jul 15 20:36:15 Tower kernel: mpt2sas_cm0: SAS host is non-operational !!!! Jul 15 20:36:15 Tower kernel: mpt2sas_cm1: SAS host is non-operational !!!! Jul 15 20:36:16 Tower kernel: mpt2sas_cm0: SAS host is non-operational !!!! Jul 15 20:36:16 Tower kernel: mpt2sas_cm0: _base_fault_reset_work: Running mpt3sas_dead_ioc thread success !!!! Jul 15 20:36:16 Tower kernel: sd 8:0:0:0: [sdf] Synchronizing SCSI cache Jul 15 20:36:16 Tower kernel: sd 8:0:0:0: [sdf] Synchronize Cache(10) failed: Result: hostbyte=0x01 driverbyte=0x00 Jul 15 20:36:16 Tower kernel: sd 8:0:4:0: [sdj] tag#803 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00 cmd_age=5s Jul 15 20:36:16 Tower kernel: sd 8:0:4:0: [sdj] tag#803 CDB: opcode=0x85 85 06 20 00 00 00 00 00 00 00 00 00 00 40 e5 00 Jul 15 20:36:16 Tower kernel: sd 8:0:4:0: [sdj] tag#804 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00 cmd_age=0s Jul 15 20:36:16 Tower kernel: sd 8:0:4:0: [sdj] tag#804 CDB: opcode=0x85 85 06 20 00 00 00 00 00 00 00 00 00 00 40 98 00 Jul 15 20:36:16 Tower kernel: sd 8:0:5:0: [sdk] tag#805 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00 cmd_age=0s Jul 15 20:36:16 Tower kernel: sd 8:0:5:0: [sdk] tag#805 CDB: opcode=0x85 85 06 20 00 00 00 00 00 00 00 00 00 00 40 e5 00 Jul 15 20:36:16 Tower kernel: sd 8:0:5:0: [sdk] tag#806 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00 cmd_age=0s Jul 15 20:36:16 Tower kernel: sd 8:0:5:0: [sdk] tag#806 CDB: opcode=0x85 85 06 20 00 00 00 00 00 00 00 00 00 00 40 98 00 Jul 15 20:36:16 Tower kernel: sd 8:0:2:0: [sdh] tag#807 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00 cmd_age=0s Jul 15 20:36:16 Tower kernel: sd 8:0:2:0: [sdh] tag#807 CDB: opcode=0x85 85 06 20 00 00 00 00 00 00 00 00 00 00 40 e5 00 Jul 15 20:36:16 Tower kernel: sd 8:0:2:0: [sdh] tag#808 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00 cmd_age=0s Jul 15 20:36:16 Tower kernel: sd 8:0:2:0: [sdh] tag#808 CDB: opcode=0x85 85 06 20 00 00 00 00 00 00 00 00 00 00 40 98 00 tower-diagnostics-20210715-2051.zip
-
Over the last couple of days I've started seeing drive errors, I don't always notice straight away. Sometimes it's 4 drives, but other times all drives. Rebooting the server brings everything back as it should be. Jul 5 12:23:02 Tower kernel: mpt2sas_cm1: SAS host is non-operational !!!! Jul 5 12:23:03 Tower kernel: mpt2sas_cm1: SAS host is non-operational !!!! Jul 5 12:23:04 Tower kernel: mpt2sas_cm1: SAS host is non-operational !!!! Jul 5 12:23:05 Tower kernel: mpt2sas_cm1: SAS host is non-operational !!!! Jul 5 12:23:06 Tower kernel: mpt2sas_cm1: SAS host is non-operational !!!! Jul 5 12:23:07 Tower kernel: mpt2sas_cm1: SAS host is non-operational !!!! Jul 5 12:23:07 Tower kernel: mpt2sas_cm1: _base_fault_reset_work: Running mpt3sas_dead_ioc thread success !!!! Jul 5 12:23:07 Tower kernel: sd 9:0:0:0: [sdn] Synchronizing SCSI cache Jul 5 12:23:07 Tower kernel: sd 9:0:0:0: [sdn] Synchronize Cache(10) failed: Result: hostbyte=0x01 driverbyte=0x00 Jul 5 12:23:07 Tower kernel: sd 9:0:1:0: [sdo] Synchronizing SCSI cache Jul 5 12:23:07 Tower kernel: sd 9:0:1:0: [sdo] Synchronize Cache(10) failed: Result: hostbyte=0x01 driverbyte=0x00 Jul 5 12:23:07 Tower kernel: sd 9:0:4:0: [sdr] tag#731 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00 cmd_age=4s Jul 5 12:23:07 Tower kernel: sd 9:0:4:0: [sdr] tag#731 CDB: opcode=0x85 85 06 20 00 00 00 00 00 00 00 00 00 00 40 e5 00 Jul 5 12:23:07 Tower kernel: sd 9:0:4:0: [sdr] tag#732 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00 cmd_age=0s Jul 5 12:23:07 Tower kernel: sd 9:0:4:0: [sdr] tag#732 CDB: opcode=0x85 85 06 20 00 00 00 00 00 00 00 00 00 00 40 98 00 Jul 5 12:23:07 Tower kernel: sd 9:0:5:0: [sds] tag#733 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00 cmd_age=0s Jul 5 12:23:07 Tower kernel: sd 9:0:5:0: [sds] tag#733 CDB: opcode=0x85 85 06 20 00 00 00 00 00 00 00 00 00 00 40 e5 00 Jul 5 12:23:07 Tower kernel: sd 9:0:5:0: [sds] tag#734 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00 cmd_age=0s Jul 5 12:23:07 Tower kernel: sd 9:0:5:0: [sds] tag#734 CDB: opcode=0x85 85 06 20 00 00 00 00 00 00 00 00 00 00 40 98 00 Jul 5 12:23:07 Tower kernel: sd 9:0:3:0: [sdq] tag#735 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00 cmd_age=0s Jul 5 12:23:07 Tower kernel: sd 9:0:3:0: [sdq] tag#735 CDB: opcode=0x85 85 06 20 00 00 00 00 00 00 00 00 00 00 40 e5 00 Jul 5 12:23:07 Tower kernel: sd 9:0:3:0: [sdq] tag#736 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00 cmd_age=0s Jul 5 12:23:07 Tower kernel: sd 9:0:3:0: [sdq] tag#736 CDB: opcode=0x85 85 06 20 00 00 00 00 00 00 00 00 00 00 40 98 00 Jul 5 12:23:07 Tower kernel: sd 9:0:2:0: [sdp] Synchronizing SCSI cache Jul 5 12:23:07 Tower kernel: sd 9:0:2:0: [sdp] Synchronize Cache(10) failed: Result: hostbyte=0x01 driverbyte=0x00 Jul 5 12:23:07 Tower emhttpd: read SMART /dev/sdr Jul 5 12:23:07 Tower emhttpd: read SMART /dev/sds Jul 5 12:23:07 Tower emhttpd: read SMART /dev/sdq Jul 5 12:23:07 Tower kernel: sd 9:0:3:0: [sdq] Synchronizing SCSI cache Jul 5 12:23:07 Tower kernel: sd 9:0:3:0: [sdq] Synchronize Cache(10) failed: Result: hostbyte=0x01 driverbyte=0x00 Jul 5 12:23:07 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Jul 5 12:23:07 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Jul 5 12:23:07 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Jul 5 12:23:07 Tower kernel: sd 9:0:4:0: [sdr] Synchronizing SCSI cache Jul 5 12:23:07 Tower kernel: sd 9:0:4:0: [sdr] Synchronize Cache(10) failed: Result: hostbyte=0x01 driverbyte=0x00 Jul 5 12:23:07 Tower unassigned.devices: Warning: Can't get rotational setting of '/dev/sdq'. Jul 5 12:23:07 Tower unassigned.devices: Warning: Can't get rotational setting of '/dev/sdq'. Jul 5 12:23:07 Tower unassigned.devices: Warning: Can't get rotational setting of '/dev/sdr'. Jul 5 12:23:07 Tower unassigned.devices: Warning: Can't get rotational setting of '/dev/sdr'. Jul 5 12:23:07 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Jul 5 12:23:07 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Jul 5 12:23:07 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Jul 5 12:23:07 Tower kernel: sd 9:0:5:0: [sds] Synchronizing SCSI cache Jul 5 12:23:07 Tower kernel: sd 9:0:5:0: [sds] Synchronize Cache(10) failed: Result: hostbyte=0x01 driverbyte=0x00 Jul 5 12:23:07 Tower kernel: mpt2sas_cm1: mpt3sas_transport_port_remove: removed: sas_addr(0x4433221100000000) Jul 5 12:23:07 Tower kernel: mpt2sas_cm1: removing handle(0x0009), sas_addr(0x4433221100000000) Jul 5 12:23:07 Tower kernel: mpt2sas_cm1: enclosure logical id(0x5b8ca3a0f0160c00), slot(3) Jul 5 12:23:07 Tower kernel: mpt2sas_cm1: mpt3sas_transport_port_remove: removed: sas_addr(0x4433221101000000) Jul 5 12:23:07 Tower kernel: mpt2sas_cm1: removing handle(0x000a), sas_addr(0x4433221101000000) Jul 5 12:23:07 Tower kernel: mpt2sas_cm1: enclosure logical id(0x5b8ca3a0f0160c00), slot(2) Jul 5 12:23:07 Tower kernel: mpt2sas_cm1: mpt3sas_transport_port_remove: removed: sas_addr(0x4433221102000000) Jul 5 12:23:07 Tower kernel: mpt2sas_cm1: removing handle(0x000b), sas_addr(0x4433221102000000) Jul 5 12:23:07 Tower kernel: mpt2sas_cm1: enclosure logical id(0x5b8ca3a0f0160c00), slot(1) Jul 5 12:23:07 Tower kernel: mpt2sas_cm1: mpt3sas_transport_port_remove: removed: sas_addr(0x4433221104000000) Jul 5 12:23:07 Tower kernel: mpt2sas_cm1: removing handle(0x000c), sas_addr(0x4433221104000000) Jul 5 12:23:07 Tower kernel: mpt2sas_cm1: enclosure logical id(0x5b8ca3a0f0160c00), slot(7) Jul 5 12:23:07 Tower kernel: mpt2sas_cm1: mpt3sas_transport_port_remove: removed: sas_addr(0x4433221105000000) Jul 5 12:23:07 Tower kernel: mpt2sas_cm1: removing handle(0x000d), sas_addr(0x4433221105000000) Jul 5 12:23:07 Tower kernel: mpt2sas_cm1: enclosure logical id(0x5b8ca3a0f0160c00), slot(6) Jul 5 12:23:07 Tower kernel: mpt2sas_cm1: mpt3sas_transport_port_remove: removed: sas_addr(0x4433221103000000) Jul 5 12:23:07 Tower kernel: mpt2sas_cm1: removing handle(0x000e), sas_addr(0x4433221103000000) Jul 5 12:23:07 Tower kernel: mpt2sas_cm1: enclosure logical id(0x5b8ca3a0f0160c00), slot(0) Jul 5 12:23:07 Tower kernel: mpt2sas_cm1: unexpected doorbell active! Jul 5 12:23:07 Tower kernel: mpt2sas_cm1: sending diag reset !! Jul 5 12:23:08 Tower kernel: mpt2sas_cm1: Invalid host diagnostic register value Jul 5 12:23:08 Tower kernel: mpt2sas_cm1: System Register set: Jul 5 12:23:08 Tower kernel: 00000000: ffffffff Jul 5 12:23:08 Tower kernel: 00000004: ffffffff Jul 5 12:23:08 Tower kernel: 00000008: ffffffff Jul 5 12:23:08 Tower kernel: 0000000c: ffffffff Jul 5 12:23:08 Tower kernel: 00000010: ffffffff Jul 5 12:23:08 Tower kernel: 00000014: ffffffff Jul 5 12:23:08 Tower kernel: 00000018: ffffffff Jul 5 12:23:08 Tower kernel: 0000001c: ffffffff <REPEATED> Jul 5 12:23:08 Tower kernel: 000000f8: ffffffff Jul 5 12:23:08 Tower kernel: 000000fc: ffffffff Jul 5 12:23:08 Tower kernel: mpt2sas_cm1: diag reset: FAILED Jul 5 12:26:40 Tower root: Total Spundown: 8 Jul 5 12:31:41 Tower root: Total Spundown: 8 Jul 5 12:33:00 Tower kernel: md: disk5 read error, sector=9102416 Jul 5 12:33:00 Tower kernel: md: disk2 read error, sector=9102416 Jul 5 12:33:00 Tower kernel: md: disk4 read error, sector=9102416 Jul 5 12:33:00 Tower kernel: md: disk6 read error, sector=9102416 Jul 5 12:33:10 Tower emhttpd: read SMART /dev/sdj Jul 5 12:33:10 Tower emhttpd: read SMART /dev/sdk Jul 5 12:33:10 Tower emhttpd: read SMART /dev/sdg Jul 5 12:33:10 Tower kernel: XFS (md5): metadata I/O error in "xfs_da_read_buf+0x9e/0xfe [xfs]" at daddr 0x8ae450 len 8 error 5 Jul 5 12:33:10 Tower kernel: XFS (md5): metadata I/O error in "xfs_da_read_buf+0x9e/0xfe [xfs]" at daddr 0x8ae450 len 8 error 5 Jul 5 12:33:10 Tower emhttpd: read SMART /dev/sdf Jul 5 12:33:10 Tower emhttpd: read SMART /dev/sdl Jul 5 12:33:10 Tower emhttpd: read SMART /dev/sdi I have 2 SAS controller cards, I'd initially thought one of them might be failing but when all drives went I thought it must be motherboard related, nothing has changed on the machine recently it's been running quite nicely for a while? I'm a bit stuck as to where to look next. Diagnostics attached tower-diagnostics-20210705-1244.zip
-
I don't actively use this anymore, I guess you'd need to update the main script or you maybe able to update the options file to set defaults either via CLI --prefs-add or editing the file directly - it'll be in your config folder and you just want the line: subtitles 1
-
The max available resolution for get_iplayer is 720p, the BBC don't make anything else higher available through the web version of iPlayer only SmartTVs etc
-
Just an update from me, with my troublesome docker turned off everything is working fine, it's not Pi-Hole for me but rather this : https://github.com/chrisns/docker-node-sonos-http-api perhaps there are similarilties, although not obviously to me! I've not had a chance to update to RC2 yet.
-
Can’t stop them all, one of them, a node docker for the Sonos api that I added wouldn’t stop. The others all have. I’ve subsequently tried to restart the docker service with /etc/rc.d/rc.docker restart but that’s not working, looks like I might be doing a forced reboot in a bit and then disabling that particular docker
-
Is there a docker command to stop them all? I can’t do that at the minute, but will try over the weekend at some point. As an aside I ran “docker stats” from ssh and it doesn’t return anything, have to ctrl c to get a prompt back “docker ps” does return successfully
-
Not that it's any better than holding down the power button but if you can still get a ssh session, you can do the following to force a reboot echo 1 > /proc/sys/kernel/sysrq echo b > /proc/sysrq-trigger https://major.io/2009/01/29/linux-emergency-reboot-or-shutdown-with-magic-commands/
-
It stopped and started without issue but didn't seem to make any difference to the docker tab unfortunately
-
I've posted these over on the main release thread, but here's my diagnostics for the same issue https://lime-technology.com/applications/core/interface/file/attachment.php?id=39269 I wouldn't know where to start with sifting through them but I notice we all have the same error in our docker logs, specifically level=error msg="stream copy error: reading from a closed fifo" I don;t know what a log looks like with out this issue so that could be a red herring. I couldn't see anything at the same time in my sys log
-
unRAID OS version 6.5.0 Stable Release Available
michaelmcq replied to limetech's topic in Announcements
Here's my diag file too with the same issue tower-diagnostics-20180319-1556.zip -
unRAID OS version 6.5.0 Stable Release Available
michaelmcq replied to limetech's topic in Announcements
I do, that doesn’t run either just sits there scanning. interesting to see Yippy3000 had to force a reboot as I did too. GUI was stuck on stopping services.