Rick Sanchez Posted June 25, 2013 Share Posted June 25, 2013 Hi Everyone, After being told my eleven day parity checks are not normal on an Atom, I was asked by Paul to create a topic to try and solve my issue. Current build: Fractal Design Array R2, Intel D510MO/D510 Intel MN10, 4Gb DDR2 800Mhz 6x 2Tb HDD What information should I post to help diagnose the issue? Regards, Jon Link to comment
unevent Posted June 25, 2013 Share Posted June 25, 2013 A syslog would be the first place to start. If you have unMENU installed, go to the syslog page and attach a downloaded copy to a reply. If you do not have unMENU, read http://lime-technology.com/wiki/index.php?title=Troubleshooting#Capturing_your_syslog for instructions on obtaining a syslog to post. Link to comment
Rick Sanchez Posted June 25, 2013 Author Share Posted June 25, 2013 Jun 22 04:40:01 Atlantis syslogd 1.4.1: restart. Jun 22 21:41:51 Atlantis kernel: mdcmd (47): spindown 1 Jun 22 21:41:51 Atlantis kernel: mdcmd (48): spindown 2 Jun 23 02:53:06 Atlantis kernel: mdcmd (49): spindown 1 Jun 23 02:53:06 Atlantis kernel: mdcmd (50): spindown 2 Jun 25 01:39:58 Atlantis udevd-work[13941]: rename '/dev/disk/by-path/pci-0000:05:00.0-scsi-0:0:0:0.udev-tmp' '/dev/disk/by-path/pci-0000:05:00.0-scsi-0:0:0:0' failed: No such file or directory Jun 25 01:41:53 Atlantis udevd-work[14863]: symlink '../../sdd' '/dev/disk/by-path/pci-0000:05:00.0-scsi-0:0:0:0.udev-tmp' failed: File exists Jun 25 02:03:33 Atlantis emhttp: title not found Jun 25 02:05:30 Atlantis udevd-work[21871]: symlink '../../sdb' '/dev/disk/by-path/pci-0000:05:00.0-scsi-0:0:0:0.udev-tmp' failed: File exists Jun 25 02:06:17 Atlantis udevd-work[23661]: symlink '../../sde' '/dev/disk/by-path/pci-0000:05:00.0-scsi-0:0:0:0.udev-tmp' failed: File exists Jun 25 02:07:25 Atlantis in.telnetd[24494]: connect from 192.168.0.3 (192.168.0.3) Jun 25 02:07:36 Atlantis login[24495]: ROOT LOGIN on '/dev/pts/0' from '192.168.0.3' Jun 25 02:07:50 Atlantis udevd-work[24705]: symlink '../../sdb' '/dev/disk/by-path/pci-0000:05:00.0-scsi-0:0:0:0.udev-tmp' failed: File exists Is what I get Link to comment
unevent Posted June 25, 2013 Share Posted June 25, 2013 Attach the complete syslog as a file (the option is at the bottom of the reply window). Link to comment
dgaschk Posted June 25, 2013 Share Posted June 25, 2013 What unRAID version? The syslog in available under the Utils tab in version 5. Also see here: http://lime-technology.com/forum/index.php?topic=9880.0 Link to comment
garycase Posted June 25, 2013 Share Posted June 25, 2013 After being told my eleven day parity checks are not normal on an Atom ... Wow ... that is a major understatement !! An Atom has PLENTY of CPU power for parity computations ... the parity check speed is driven almost exclusively by your disks and disk interfaces. My Atom build is slightly faster than yours (D525 vs D510) ... but that shouldn't really impact things. And my parity checks with 6 3TB WD Reds take 7:41 Hopefully your full syslog will have some clues as to what's going on. Link to comment
Joe L. Posted June 25, 2013 Share Posted June 25, 2013 On unRAID 4.7 or 5.0 the syslog can also be viewed by browsing to //tower/log/syslog with your browser. Link to comment
jowi Posted June 25, 2013 Share Posted June 25, 2013 After being told my eleven day parity checks are not normal on an Atom ... Wow ... that is a major understatement !! An Atom has PLENTY of CPU power for parity computations ... the parity check speed is driven almost exclusively by your disks and disk interfaces. My Atom build is slightly faster than yours (D525 vs D510) ... but that shouldn't really impact things. And my parity checks with 6 3TB WD Reds take 7:41 Hopefully your full syslog will have some clues as to what's going on. My experience with an Atom D525 is that it has plenty of power... to do ONE thing. If you are running a parity check, AND copying or streaming, performance plummits to an absolute minimum. Or if you are watching/streaming a dvd or video etc and start a copy to or from the array, everything grinds to a complete stop. Link to comment
S80_UK Posted June 25, 2013 Share Posted June 25, 2013 One contributory factor might be that the motherboard you mention has a PCI slot (not PCI express). This will limit the speed of transfer to the drives on the controller in that slot. Also, it depends on the controller itself of course. I guess you have four drives on the PCI controller and only two on the motherboard. BUT, even taking that into account, between one and two days should be sufficient I would think. Something else is going on. Link to comment
garycase Posted June 25, 2013 Share Posted June 25, 2013 My experience with an Atom D525 is that it has plenty of power... to do ONE thing. If you are running a parity check, AND copying or streaming, performance plummits to an absolute minimum. Or if you are watching/streaming a dvd or video etc and start a copy to or from the array, everything grinds to a complete stop. Not sure what configuration you tried, but that's not at all true with my SuperMicro X7SPA-H-D525-O based system. I run UnRAID with UnMenu, Cache_Dirs, and the APC & CleanPowerdown plugins; and can easily do multiple copies to/from it at once; stream a movie while doing copies; etc. The performance is just fine. Link to comment
garycase Posted June 25, 2013 Share Posted June 25, 2013 One contributory factor might be that the motherboard you mention has a PCI slot (not PCI express). This is definitely a MAJOR factor in why parity checks run much slower than they could with an interface that could support the drive's native sustained access capabilities. HOWEVER ... Something else is going on. absolutely ... the PCI interface bottleneck does not explain eleven days !! Link to comment
garycase Posted June 25, 2013 Share Posted June 25, 2013 If you're willing to run "at risk" (e.g. without parity protection, so no fault tolerance) for a few days, try this ... (1) Be sure you know the serial number of your parity drive -- write it down; save an image of the Web GUI; etc. ... just be sure you know which drive it is. (2) Shut down and disconnect 3 of the drives connected to the PCI controller, so only ONE drive is connected to that controller card. Be SURE the parity drive is still connected ... either to the motherboard (best) or if it was connected to the PCI card, be sure it's the one still connected. (3) Boot to UnRAID, and create a new configuration with just the 3 drives now connected. Be SURE you assign your parity drive to the parity slot. NOTE: Since this config will only have 3 drives, you COULD save your flash drive and do this test with another flash drive, which you could load the latest version of UnRAID on (since you don't need a key with only 3 drives). (4) Let UnRAID do a parity sync with this new configuration (this will take a LONG time .. but it should be Hours ... not days). (5) When the parity sync has completed, run a parity check and see how long it takes. Link to comment
Rick Sanchez Posted June 25, 2013 Author Share Posted June 25, 2013 Sys log is the same as what I posted above Link to comment
Joe L. Posted June 25, 2013 Share Posted June 25, 2013 Sys log is the same as what I posted above unfortunately, that is a partial log after the prior log had been rotated out after growing to a large size. It does not contain the needed information. Look for syslog.1 or syslog.2 in /var/log/ What we'll need to see is how the disk is being initialized by the disk controller and any errors that might show themselves. If needed, stop the array, reboot, and then capture the system log. Joe L. Link to comment
Rick Sanchez Posted June 25, 2013 Author Share Posted June 25, 2013 It's to big to post so it is in a .txt file https://www.dropbox.com/s/s0tc4megkygfim0/syslog.txt Link to comment
dgaschk Posted June 25, 2013 Share Posted June 25, 2013 Post a SMART report for the parity drive. Link to comment
Joe L. Posted June 25, 2013 Share Posted June 25, 2013 It's to big to post so it is in a .txt file https://www.dropbox.com/s/s0tc4megkygfim0/syslog.txt I'm seeing ton of ICRC errors. These are errors communicating with the disk connected to ata2. Each time the error occurs, the disk controller resets itself and tries again. These slow you down a LOT. pretty sure ata2 = /dev/sdc Usually the cause of these is bundling SATA cables together (you try to make it look neat, and instead make it most likely for cables to induce noise into each other) cut the tie-wraps. Do NOT run the SATA cables near each other, do not run them near the power cables. Jun 21 19:02:00 Atlantis kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 Jun 21 19:02:00 Atlantis kernel: ata2.00: irq_stat 0x00020002, device error via D2H FIS Jun 21 19:02:00 Atlantis kernel: ata2.00: failed command: WRITE DMA Jun 21 19:02:00 Atlantis kernel: ata2.00: cmd ca/00:10:3f:00:00/00:00:00:00:00/e0 tag 0 dma 8192 out Jun 21 19:02:00 Atlantis kernel: res 51/84:01:3f:00:00/00:00:00:00:00/e0 Emask 0x10 (ATA bus error) Jun 21 19:02:00 Atlantis kernel: ata2.00: status: { DRDY ERR } Jun 21 19:02:00 Atlantis kernel: ata2.00: error: { ICRC ABRT } Jun 21 19:02:00 Atlantis kernel: ata2: hard resetting link Jun 21 19:02:01 Atlantis emhttp: shcmd (1423): :>/etc/samba/smb-shares.conf Jun 21 19:02:01 Atlantis emhttp: Restart SMB... Jun 21 19:02:01 Atlantis emhttp: shcmd (1424): killall -HUP smbd Jun 21 19:02:01 Atlantis emhttp: shcmd (1425): ps axc | grep -q rpc.mountd Jun 21 19:02:01 Atlantis emhttp: _shcmd: shcmd (1425): exit status: 1 Jun 21 19:02:01 Atlantis emhttp: shcmd (1426): /usr/local/sbin/emhttp_event svcs_restarted Jun 21 19:02:01 Atlantis emhttp_event: svcs_restarted Jun 21 19:02:02 Atlantis kernel: ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 0) Jun 21 19:02:02 Atlantis kernel: ata2.00: configured for UDMA/100 Jun 21 19:02:02 Atlantis kernel: ata2: EH complete Jun 21 19:02:02 Atlantis kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 Jun 21 19:02:02 Atlantis kernel: ata2.00: irq_stat 0x00020002, device error via D2H FIS Jun 21 19:02:02 Atlantis kernel: ata2.00: failed command: WRITE DMA EXT Jun 21 19:02:02 Atlantis kernel: ata2.00: cmd 35/00:00:c7:04:00/00:04:00:00:00/e0 tag 0 dma 524288 out Jun 21 19:02:02 Atlantis kernel: res 51/84:e0:c7:04:00/00:02:00:00:00/e0 Emask 0x10 (ATA bus error) Jun 21 19:02:02 Atlantis kernel: ata2.00: status: { DRDY ERR } Jun 21 19:02:02 Atlantis kernel: ata2.00: error: { ICRC ABRT } Jun 21 19:02:02 Atlantis kernel: ata2: hard resetting link Jun 21 19:02:05 Atlantis kernel: ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 0) Jun 21 19:02:05 Atlantis kernel: ata2.00: configured for UDMA/100 Jun 21 19:02:05 Atlantis kernel: ata2: EH complete Jun 21 19:02:05 Atlantis kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 Jun 21 19:02:05 Atlantis kernel: ata2.00: irq_stat 0x00020002, device error via D2H FIS Jun 21 19:02:05 Atlantis kernel: ata2.00: failed command: WRITE DMA EXT Jun 21 19:02:05 Atlantis kernel: ata2.00: cmd 35/00:88:c7:08:00/00:03:00:00:00/e0 tag 0 dma 462848 out Jun 21 19:02:05 Atlantis kernel: res 51/84:78:c7:08:00/00:03:00:00:00/e0 Emask 0x10 (ATA bus error) Jun 21 19:02:05 Atlantis kernel: ata2.00: status: { DRDY ERR } Jun 21 19:02:05 Atlantis kernel: ata2.00: error: { [b][size=14pt][color=red]ICRC [/color][/size][/b]ABRT } Jun 21 19:02:05 Atlantis kernel: ata2: hard resetting link It basically is saying the checksum across the SATA cable is failing. Looks like it is /dev/sdc Joe L. Link to comment
unevent Posted June 26, 2013 Share Posted June 26, 2013 Could be a cable issue like Joe L. mentioned, a failing drive, the Linux driver (sil24) for the Silicon Image controller in your system or the Silicon Image controller itself. Has it always been this way (long parity checks)? Did it start acting up after an unRAID version change or did it just occur one day with no other changes made to the system? Check your cables and try moving the drive and data cable together to a different SATA port and see what happens. If the errors move with the drive then it is the drive/cable. Swap out the data cable next. If the errors stay with the port then you have a bad card/port. Please post a smart report for the drives as well. It will shed further light on the issue. A quick search of the forum for obtaining smart reports will yield instructions if you are unfamiliar. Link to comment
Rick Sanchez Posted June 26, 2013 Author Share Posted June 26, 2013 Is it effecting the parity drive only? Link to comment
Joe L. Posted June 26, 2013 Share Posted June 26, 2013 Is it effecting the parity drive only? The parity drive has no issues, but the parity process is taking forever since one of the disks being read has issues. Link to comment
dgaschk Posted June 26, 2013 Share Posted June 26, 2013 The parity drive is having write failures: Jun 21 19:04:44 Atlantis kernel: handle_stripe write error: 55576/0, count: 1 Jun 21 19:04:44 Atlantis kernel: md: disk0 write error It should have a red dot. Link to comment
Joe L. Posted June 26, 2013 Share Posted June 26, 2013 The parity drive is having write failures: Jun 21 19:04:44 Atlantis kernel: handle_stripe write error: 55576/0, count: 1 Jun 21 19:04:44 Atlantis kernel: md: disk0 write error It should have a red dot. OOps... I'm wrong... you are right... That is an issue. Joe L. (I did not spot that in the syslog) Link to comment
Rick Sanchez Posted June 26, 2013 Author Share Posted June 26, 2013 Ok. What can/should I do? Also is there a way to differentiate the drives in my case? Link to comment
dgaschk Posted June 27, 2013 Share Posted June 27, 2013 Post a SMART report for the parity disk. Link to comment
Joe L. Posted June 27, 2013 Share Posted June 27, 2013 Ok. What can/should I do? Also is there a way to differentiate the drives in my case? Yes, each disk has its serial number printed on the outside of its housing. You can match those on your device assignment page. Link to comment
Recommended Posts
Archived
This topic is now archived and is closed to further replies.