nars Posted January 25, 2013 Share Posted January 25, 2013 r10 won't boot for me. I can't telnet into unRAID, webGUI won't come up either. I'm using Intel NIC. Unfortunately I don't have external monitor right now so I can't diagnose the issue. Unplugged the flash drive, copied back rc9a and it boots fine again. Note that you may need to re-run the .bat to reinstall syslinux. Thanks, somehow I missed it. I'm using Mac, so I guess finding Windows machine to run .bat is just too much hassle, I'll wait for the 5.0 final It may be better you test it before final as I'm not sure if that is really the problem that you are having Also I did checked better and in fact from rc9a to rc10 there is no syslinux change (it is from rc8a to rc9a) then I think it makes no sense it require running the .bat, unless you didn't also ran it when you updated to rc9a? Also are you just trying to replace bzimage and bzroot (and readme.txt if you want...) ? You should not need to replace any other files... if your existing syslinux is working fine then you don't probably need to update it BUT then you should also NOT update menu.c32 (part of syslinux) and make sure you don't delete the existing ldlinux.sys on the flash (created when you did ran the .bat), I guess that is what caused these problems with these guys that had to re-run the .bat. Quote Link to comment
limetech Posted January 26, 2013 Author Share Posted January 26, 2013 It appears that spindown of disks are not always (kind of hit and miss) happening even though I see the spin down command executed in the log file. After the above all disks are spinning down, but one, in this case disk 10 which keeps spinning. In another case, another disk is affected. Following that, because unraid think the disk has spun down, it never tries to spun it down again. In case I use the command '/root/mdcmd spindown 10' it spins down immediately and keeps spun down. Is this a potential bug or there is something I can look at on my side? Are the colored status balls in the webGui consistent with the disk spinning state? Also, I'd like to see the full system log. Hi Tom, Yes, the colored status ball reflect correctly to the spinning state. However if I look at unmenu, interestingly it shows the disk as spun down (but unmenu is wrong - probably due to different spinning state checking). I am now removing SimpleFeatures (I already planned for a long time). Will come back with the full system log if the issue persists. Hi Tom, OK, I removed SF yesterday evening, but some customization is still running (VBox, CrashPlan and TVHeadend). Please see the system log attached. FYI at 22:28:17 I was the one who manually spin up all the drives. Today morning I see all the drives, but the cache drive spinning. Let me know if you want me to test with full Vanilla unRAID (by removing every customization), but those apps never caused any issue like that before. Yes, please test with vanilla. The end of the syslog is "curious", my analysis: Jan 24 22:08:33 Tower in.telnetd[27827]: connect from 192.168.9.145 (192.168.9.145) Jan 24 22:08:36 Tower login[27828]: ROOT LOGIN on '/dev/pts/0' from 'BUD8288N' Jan 24 22:12:43 Tower in.telnetd[32070]: connect from 192.168.9.145 (192.168.9.145) Jan 24 22:12:45 Tower login[32071]: ROOT LOGIN on '/dev/pts/0' from 'BUD8288N' Jan 24 22:14:02 Tower kernel: mdcmd (51): spindown 0 Jan 24 22:14:02 Tower emhttp: shcmd (78): /usr/sbin/hdparm -y /dev/sdd &> /dev/null Jan 24 22:14:34 Tower kernel: mdcmd (52): spindown 1 Jan 24 22:14:45 Tower kernel: mdcmd (53): spindown 2 Jan 24 22:14:55 Tower kernel: mdcmd (54): spindown 3 Jan 24 22:15:06 Tower kernel: mdcmd (55): spindown 4 Jan 24 22:15:16 Tower kernel: mdcmd (56): spindown 5 Jan 24 22:15:27 Tower kernel: mdcmd (57): spindown 6 Jan 24 22:15:27 Tower kernel: mdcmd (58): spindown 8 Jan 24 22:17:00 Tower in.telnetd[4199]: connect from 192.168.9.145 (192.168.9.145) Jan 24 22:17:02 Tower login[4202]: ROOT LOGIN on '/dev/pts/0' from 'BUD8288N' Jan 24 22:21:18 Tower kernel: mdcmd (59): spindown 7 Jan 24 22:21:39 Tower kernel: mdcmd (60): spindown 9 Jan 24 22:28:17 Tower emhttp: Spinning up all drives... Jan 24 22:28:17 Tower emhttp: shcmd (79): /usr/sbin/hdparm -S0 /dev/sdd &> /dev/null Jan 24 22:28:17 Tower kernel: mdcmd (61): spinup 0 Jan 24 22:28:17 Tower kernel: mdcmd (62): spinup 1 Jan 24 22:28:17 Tower kernel: mdcmd (63): spinup 2 Jan 24 22:28:17 Tower kernel: mdcmd (64): spinup 3 Jan 24 22:28:17 Tower kernel: mdcmd (65): spinup 4 Jan 24 22:28:17 Tower kernel: mdcmd (66): spinup 5 Jan 24 22:28:17 Tower kernel: mdcmd (67): spinup 6 Jan 24 22:28:17 Tower kernel: mdcmd (68): spinup 7 Jan 24 22:28:17 Tower kernel: mdcmd (69): spinup 8 Jan 24 22:28:17 Tower kernel: mdcmd (70): spinup 9 Jan 24 22:28:17 Tower kernel: mdcmd (71): spinup 10 Jan 24 22:58:24 Tower kernel: mdcmd (72): spindown 0 Jan 24 22:58:24 Tower kernel: mdcmd (73): spindown 1 Jan 24 22:58:24 Tower kernel: mdcmd (74): spindown 2 Jan 24 22:58:25 Tower kernel: mdcmd (75): spindown 3 Jan 24 22:58:25 Tower kernel: mdcmd (76): spindown 4 Jan 24 22:58:26 Tower kernel: mdcmd (77): spindown 5 Jan 24 22:58:26 Tower kernel: mdcmd (78): spindown 6 Jan 24 22:58:27 Tower kernel: mdcmd (79): spindown 7 Jan 24 22:58:27 Tower kernel: mdcmd (80): spindown 8 Jan 24 22:58:27 Tower kernel: mdcmd (81): spindown 9 Jan 24 22:58:28 Tower emhttp: shcmd (80): /usr/sbin/hdparm -y /dev/sdd &> /dev/null Jan 24 22:58:50 Tower kernel: mdcmd (82): spindown 10 Jan 24 23:54:25 Tower kernel: mdcmd (83): spindown 10 Jan 25 01:01:26 Tower tvheadend[3986]: htsp: 192.168.9.4 [ XBMC Media Center ]: Disconnected Jan 25 05:10:32 Tower emhttp: shcmd (81): /usr/sbin/hdparm -y /dev/sdd &> /dev/null At 22:08:33 you connected and it took you 3 seconds to type the password and log in. I guess you logged right out. At 22:12:43 you connected and logged in again. About a minute and a half later all the disks were spun down, but I don't see an entry that said this happened via clicking 'Spin Down' button (I should see "Spinning down all drives..." message if so). Besides, they start spinning down more-or-less 10 seconds apart. Why are they spinning down here, because they hit the inactivity time-out? At 22:17:02 you log in again, and at 22:28:17 you manually click the 'Spin Up' button, ok. At 22:58:25 they all spin down again. This is consistent with spin-down delay set at 30 minutes. But disk10 does not spin down until 22:58:50, so something is accessing it. Then is does spin down again at 23:54:25, so something again must have been accessing it earlier. Also something accesses the cache drive because it doesn't spin down again until next day at 5:10:32. I think this must be a plugin or host side app accessing the server causing this. Quote Link to comment
joyless Posted January 26, 2013 Share Posted January 26, 2013 Also I did checked better and in fact from rc9a to rc10 there is no syslinux change (it is from rc8a to rc9a) then I think it makes no sense it require running the .bat, unless you didn't also ran it when you updated to rc9a? I upgraded from rc8a to rc9 and then to rc9a without running the .bat file Also are you just trying to replace bzimage and bzroot (and readme.txt if you want...) ? You should not need to replace any other files... if your existing syslinux is working fine then you don't probably need to update it BUT then you should also NOT update menu.c32 (part of syslinux) and make sure you don't delete the existing ldlinux.sys on the flash (created when you did ran the .bat), I guess that is what caused these problems with these guys that had to re-run the .bat. I just replaced bzimage and bzroot, nothing else. I should have access to external monitor and some time today, so I'll try again. Quote Link to comment
joyless Posted January 26, 2013 Share Posted January 26, 2013 Ok I managed to find out why r10 wouldn't "boot" - it loads Intel e100 module, but ethernet won't come up. Unfortunately I can't find keyboard wireless dongle, so I can't copy the logs or investigate the issue further. On the other hand, it also loads r8169 module which didn't work previously, but now it work's just fine, so I'm on rc10 right now, but using different NIC. Quote Link to comment
EMKO Posted January 26, 2013 Share Posted January 26, 2013 Ok I managed to find out why r10 wouldn't "boot" - it loads Intel e100 module, but ethernet won't come up. Unfortunately I can't find keyboard wireless dongle, so I can't copy the logs or investigate the issue further. On the other hand, it also loads r8169 module which didn't work previously, but now it work's just fine, so I'm on rc10 right now, but using different NIC. yea it wont boot for me to looks like i will have to open the case to get the usb flash drive. Quote Link to comment
EMKO Posted January 26, 2013 Share Posted January 26, 2013 well i re added the files and ran the bat file now its working. Quote Link to comment
optiman Posted January 26, 2013 Share Posted January 26, 2013 I'm on rc8a, going to try rc10 test, which I guess is the latest and greatest.... I copied the two main files and after reading all the posts here, I went ahead and ran the batch files again. Will run new perm as soon as it comes up and report back. Quote Link to comment
RobJ Posted January 26, 2013 Share Posted January 26, 2013 Ok I managed to find out why r10 wouldn't "boot" - it loads Intel e100 module, but ethernet won't come up. Unfortunately I can't find keyboard wireless dongle, so I can't copy the logs or investigate the issue further. On the other hand, it also loads r8169 module which didn't work previously, but now it work's just fine, so I'm on rc10 right now, but using different NIC. For users with multiple network chipsets, including both onboard and addon cards, be aware that different kernels may identify them in a different order, and this can keep your UnRAID server off the network after an upgrade. UnRAID always uses eth0 for network comms, in effect using whatever the kernel found and set up first as eth0. If you only have one NIC, then it's eth0 and everything is fine. Unfortunately, if you have more than one, the kernel has not always been consistent as to which one it identifies first, so your preferred NIC may be set up as eth1, not eth0. You should be able to see that in your syslog. There are several ways to fix this. The easiest is to try moving the network cable to the other network connector. If it works, then it is the new eth0, and you will be back on the network almost immediately. However, if you prefer your server to use the same network chipset it already was using, then you will have to either remove the other network chipset (if it is an addon card), or disable it in the BIOS settings (if it is onboard). Quote Link to comment
RobJ Posted January 26, 2013 Share Posted January 26, 2013 Migrating from unRAID 4.7 to unRAID 5.0 Kevin (stchas) has created a nice guide to upgrading to v5, and I would like to make sure it has maximum visibility, both to those who have experienced the upgrade already and have ideas for improvement, and for all those about to move to v5 when it goes Final. When I first saw him working on it, my first reaction was "ho hum, why do we need this, all you need to do is copy bzroot and bzimage and run NewPerms". But he has covered a lot of ground, including many of the gotchas such as changing the URL you use to access the WebGui, and fixing the line drawing characters in Midnight Commander, and setting up NFS. Because he also uses a number of addons, he includes them, and discusses topics such as moving to the Cache-only shares, and the new Plugin system, and how to start using them. There is still some polish needed, as well as some mention of the other issues some have faced, such as needing to run the make_bootable batch file if upgraded server does not boot, and the need for the MEM=4095M parameter for some who experience very slow writes on certain motherboards. Feedback and suggestions and additional tips for the page should be posted in Kevin's thread announcing it, found here: How-To: Migrate from unRAID 4.7 to unRAID 5.0. I would like to see anyone that had problems upgrading originally to v5 or recently to RC10 review the page, and either add your tips to the wiki page, or comment on them in Kevin's thread. If comprehensive enough, this could be THE guide for all future upgrades to v5. If more upgraders use the guide, we might even see less support requests! Quote Link to comment
RobJ Posted January 27, 2013 Share Posted January 27, 2013 A brief note on parity checks ... I could not have explained it better. Thanks. I wonder if that explanation should go in the Wiki somewhere. I've added it to the Improving unRAID Performance wiki page, in a new Parity Check Speed section, third bullet. I only linked to here, others (especially Gary? ) can improve/expand it. Feel free to edit, correct, expand the entire page. I updated it a little for v5, but it still could use more work. Quote Link to comment
optiman Posted January 27, 2013 Share Posted January 27, 2013 Ok, the upgrade from rc8 to rc10-test went ok. At first the server wouldn't boot. I had copyied the two main files and then ran the make_boot.bat and thought I was ok. I ended up having to also copy the menu.c32 file, and then it booted just fine. I ran the new perm utility, took about 2.5 hours. I then did a reboot, just to check that everything works ok. The server came right up, but I noticed that the webgui and shares took a long time to load compared to previous versions. Before when I saw the Login prompt on my console, it was only a matter of seconds before the webgui and shares were available. After the upgrade it took about 2 mintues, which seemed like a real long time. I was about to try launching the command to start the webgui manually, but it finaly came up. Again, went ok, just took a long time to load for some reason. Anyone else see this after upgrading? I will do a parity check and test copies re smb next and report back. Quote Link to comment
S80_UK Posted January 27, 2013 Share Posted January 27, 2013 I've added it to the Improving unRAID Performance wiki page, in a new Parity Check Speed section, third bullet. I only linked to here, others (especially Gary? ) can improve/expand it. Feel free to edit, correct, expand the entire page. I updated it a little for v5, but it still could use more work. Thanks, I added a paragraph about controllers - motherboard vs PCI-e (x1,x4,x8,x16) vs PCI. Quote Link to comment
RobJ Posted January 27, 2013 Share Posted January 27, 2013 Thanks, I added a paragraph about controllers - motherboard vs PCI-e (x1,x4,x8,x16) vs PCI. Very nice! Quote Link to comment
olympia Posted January 28, 2013 Share Posted January 28, 2013 It appears that spindown of disks are not always (kind of hit and miss) happening even though I see the spin down command executed in the log file. After the above all disks are spinning down, but one, in this case disk 10 which keeps spinning. In another case, another disk is affected. Following that, because unraid think the disk has spun down, it never tries to spun it down again. In case I use the command '/root/mdcmd spindown 10' it spins down immediately and keeps spun down. Is this a potential bug or there is something I can look at on my side? Are the colored status balls in the webGui consistent with the disk spinning state? Also, I'd like to see the full system log. Hi Tom, Yes, the colored status ball reflect correctly to the spinning state. However if I look at unmenu, interestingly it shows the disk as spun down (but unmenu is wrong - probably due to different spinning state checking). I am now removing SimpleFeatures (I already planned for a long time). Will come back with the full system log if the issue persists. Hi Tom, OK, I removed SF yesterday evening, but some customization is still running (VBox, CrashPlan and TVHeadend). Please see the system log attached. FYI at 22:28:17 I was the one who manually spin up all the drives. Today morning I see all the drives, but the cache drive spinning. Let me know if you want me to test with full Vanilla unRAID (by removing every customization), but those apps never caused any issue like that before. Yes, please test with vanilla. The end of the syslog is "curious", my analysis: Jan 24 22:08:33 Tower in.telnetd[27827]: connect from 192.168.9.145 (192.168.9.145) Jan 24 22:08:36 Tower login[27828]: ROOT LOGIN on '/dev/pts/0' from 'BUD8288N' Jan 24 22:12:43 Tower in.telnetd[32070]: connect from 192.168.9.145 (192.168.9.145) Jan 24 22:12:45 Tower login[32071]: ROOT LOGIN on '/dev/pts/0' from 'BUD8288N' Jan 24 22:14:02 Tower kernel: mdcmd (51): spindown 0 Jan 24 22:14:02 Tower emhttp: shcmd (78): /usr/sbin/hdparm -y /dev/sdd &> /dev/null Jan 24 22:14:34 Tower kernel: mdcmd (52): spindown 1 Jan 24 22:14:45 Tower kernel: mdcmd (53): spindown 2 Jan 24 22:14:55 Tower kernel: mdcmd (54): spindown 3 Jan 24 22:15:06 Tower kernel: mdcmd (55): spindown 4 Jan 24 22:15:16 Tower kernel: mdcmd (56): spindown 5 Jan 24 22:15:27 Tower kernel: mdcmd (57): spindown 6 Jan 24 22:15:27 Tower kernel: mdcmd (58): spindown 8 Jan 24 22:17:00 Tower in.telnetd[4199]: connect from 192.168.9.145 (192.168.9.145) Jan 24 22:17:02 Tower login[4202]: ROOT LOGIN on '/dev/pts/0' from 'BUD8288N' Jan 24 22:21:18 Tower kernel: mdcmd (59): spindown 7 Jan 24 22:21:39 Tower kernel: mdcmd (60): spindown 9 Jan 24 22:28:17 Tower emhttp: Spinning up all drives... Jan 24 22:28:17 Tower emhttp: shcmd (79): /usr/sbin/hdparm -S0 /dev/sdd &> /dev/null Jan 24 22:28:17 Tower kernel: mdcmd (61): spinup 0 Jan 24 22:28:17 Tower kernel: mdcmd (62): spinup 1 Jan 24 22:28:17 Tower kernel: mdcmd (63): spinup 2 Jan 24 22:28:17 Tower kernel: mdcmd (64): spinup 3 Jan 24 22:28:17 Tower kernel: mdcmd (65): spinup 4 Jan 24 22:28:17 Tower kernel: mdcmd (66): spinup 5 Jan 24 22:28:17 Tower kernel: mdcmd (67): spinup 6 Jan 24 22:28:17 Tower kernel: mdcmd (68): spinup 7 Jan 24 22:28:17 Tower kernel: mdcmd (69): spinup 8 Jan 24 22:28:17 Tower kernel: mdcmd (70): spinup 9 Jan 24 22:28:17 Tower kernel: mdcmd (71): spinup 10 Jan 24 22:58:24 Tower kernel: mdcmd (72): spindown 0 Jan 24 22:58:24 Tower kernel: mdcmd (73): spindown 1 Jan 24 22:58:24 Tower kernel: mdcmd (74): spindown 2 Jan 24 22:58:25 Tower kernel: mdcmd (75): spindown 3 Jan 24 22:58:25 Tower kernel: mdcmd (76): spindown 4 Jan 24 22:58:26 Tower kernel: mdcmd (77): spindown 5 Jan 24 22:58:26 Tower kernel: mdcmd (78): spindown 6 Jan 24 22:58:27 Tower kernel: mdcmd (79): spindown 7 Jan 24 22:58:27 Tower kernel: mdcmd (80): spindown 8 Jan 24 22:58:27 Tower kernel: mdcmd (81): spindown 9 Jan 24 22:58:28 Tower emhttp: shcmd (80): /usr/sbin/hdparm -y /dev/sdd &> /dev/null Jan 24 22:58:50 Tower kernel: mdcmd (82): spindown 10 Jan 24 23:54:25 Tower kernel: mdcmd (83): spindown 10 Jan 25 01:01:26 Tower tvheadend[3986]: htsp: 192.168.9.4 [ XBMC Media Center ]: Disconnected Jan 25 05:10:32 Tower emhttp: shcmd (81): /usr/sbin/hdparm -y /dev/sdd &> /dev/null At 22:08:33 you connected and it took you 3 seconds to type the password and log in. I guess you logged right out. At 22:12:43 you connected and logged in again. About a minute and a half later all the disks were spun down, but I don't see an entry that said this happened via clicking 'Spin Down' button (I should see "Spinning down all drives..." message if so). Besides, they start spinning down more-or-less 10 seconds apart. Why are they spinning down here, because they hit the inactivity time-out? At 22:17:02 you log in again, and at 22:28:17 you manually click the 'Spin Up' button, ok. At 22:58:25 they all spin down again. This is consistent with spin-down delay set at 30 minutes. But disk10 does not spin down until 22:58:50, so something is accessing it. Then is does spin down again at 23:54:25, so something again must have been accessing it earlier. Also something accesses the cache drive because it doesn't spin down again until next day at 5:10:32. I think this must be a plugin or host side app accessing the server causing this. Hi Tom, just to report back on this issue. After a lot of testing and much more grey hair I tracked this down to the good old smarthistory application which were always running just fine and very reliable in the background. I had this application scheduled daily (with wake -ON options) and were running at 4:40 spinning up all disks to read and record some smart parameters. However, it appears that smarthistory is spinning up the disks in a way, that unRAD still thinks the disks are all spun down, so it is not commanding them to spin down again until there is no "offical" wake up on those drives making unRAID aware that those are spinning. Disabling smarthistory solved the issue. Apologizes for the false "alert"! Quote Link to comment
optiman Posted January 28, 2013 Share Posted January 28, 2013 After this update, I see no difference in copy speeds, using smb. I will test partiy check next. So again, anyone else have their server take extra long time to load the webgui and publish the shares (about 2 min)? Before this update, it was much faster at both of those. Thanks! Quote Link to comment
RobJ Posted January 28, 2013 Share Posted January 28, 2013 So again, anyone else have their server take extra long time to load the webgui and publish the shares (about 2 min)? Before this update, it was much faster at both of those. Webgui display time and shares available time delay are 2 different things, and I can't think of a connection unless you have a bad drive that is having a hard time on the initial setup. The shares are commonly delayed if there are hard disk issues to be dealt with, or lots of transactions to be replayed, and I suppose possibly issues with an addon in its setup. As always, we really need to see a syslog! Quote Link to comment
optiman Posted January 29, 2013 Share Posted January 29, 2013 Ok, syslog coming after my parity check and a fresh reboot. Are you sure about the webgui and the shares not being related? It's my understanding that the webgui start command (/usr/local/sbin/emhttp &) is required to start the webgui interface and that must be running before any shares are published. If that is correct, they are very much related. Again, this wasn't the case before the update, and I have no smart data or other performance issues that indicate a hard drive issue. Other than the slow emhttp and share thing, this build seems good. By the way, after you get the console login prompt, who long before you can access via IE webgui and connect to a share? Quote Link to comment
Joe L. Posted January 29, 2013 Share Posted January 29, 2013 Ok, syslog coming after my parity check and a fresh reboot. Are you sure about the webgui and the shares not being related? It's my understanding that the webgui start command (/usr/local/sbin/emhttp &) is required to start the webgui interface and that must be running before any shares are published. If that is correct, they are very much related. Again, this wasn't the case before the update, and I have no smart data or other performance issues that indicate a hard drive issue. Other than the slow emhttp and share thing, this build seems good. By the way, after you get the console login prompt, who long before you can access via IE webgui and connect to a share? Revert back to a stock configuration and see if the delay is still there. I'll bet one of your add-ons is delaying the completion of emhttp's start up events. Joe L. Quote Link to comment
RobJ Posted January 29, 2013 Share Posted January 29, 2013 Ok, syslog coming after my parity check and a fresh reboot. Are you sure about the webgui and the shares not being related? It's my understanding that the webgui start command (/usr/local/sbin/emhttp &) is required to start the webgui interface and that must be running before any shares are published. If that is correct, they are very much related. Again, this wasn't the case before the update, and I have no smart data or other performance issues that indicate a hard drive issue. Other than the slow emhttp and share thing, this build seems good. By the way, after you get the console login prompt, who long before you can access via IE webgui and connect to a share? The Webgui is usually available very quickly. Once the kernel finishes identifying and setting up all of the hardware and drivers and other system modules, it tries to start and configure networking (if it can find a carrier), then loads plugins if any, then loads the Webgui. emhttp then loads the array configuration, and sets up the drive list and array structure, then (if autostart is enabled) begins to mount all of the drives. Once drives are mounted, it then sets up Samba and NFS and the disk shares and begins setting up the User shares. The plugins do precede emhttp start, and they can take a while, especially if it's the first time for any of them. They may have to download the files and dependencies for one or more plugins, which can take a long time, but should not be repeated on the next boot. On my system, the Webgui is available long before the console login appears, because I have a few sleeps (one is long) in my go file, and the console login does not appear until the go file is complete. Quote Link to comment
MartinQ Posted January 29, 2013 Share Posted January 29, 2013 RC11 is out: http://lime-technology.com/forum/index.php?topic=25609.0 " Changes from 5.0-rc10 to 5.0-rc11 --------------------------------- - emhttp: fixed spurious "title not found" log entries - emhttp: ensure new parity disk for 'swap disable' has a valid partition table - emhttp: fixed worker thread (format/clear/copy) inconsistent progress - emhttp: default timeZone "America/Los_Angeles" (eliminate first-boot error message) - flash boot: add menu item to boot kernel limiting memory use to 4GB - linux: use kernel 3.4.26 (for various disk controller and NIC driver bug fixes) - linux: added "Intel PIIX4 and compatible I2C driver" (i2c-piix4) per user request - linux: changed cpufreq drivers from modules to built-ins - shfs: fixed crash by replacing non-thread-safe readdir() with readdir_r() - shfs: use st_ino field to record object disk location - slack: add 10-sec timeout waiting for USB flash to appear as suggested by forum user Barzija - webGui: added very simple vsftp support - webGui: indexer: diplay disk location of objects " Quote Link to comment
KeeWay Posted February 1, 2013 Share Posted February 1, 2013 I have had a drive go disabled with the red ball, when i stopped the array it looked like the disk2 was not assigned and not selectable in the dropdown box but it listed below saying it was missing. i then took the syslog and rebooted the unraid box. since the reboot it seems that the drive is still disabled but when i stop the array it is listed in the dropdown box as selected. what are my options? the drive shows good under the health menu's and below is the disk report. ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0027 145 144 021 Pre-fail Always - 9708 4 Start_Stop_Count 0x0032 098 098 000 Old_age Always - 2980 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 067 067 000 Old_age Always - 24512 10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 79 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 41 193 Load_Cycle_Count 0x0032 168 168 000 Old_age Always - 96037 194 Temperature_Celsius 0x0022 123 111 000 Old_age Always - 29 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 10 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 6 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 17 thanks James syslog.zip Quote Link to comment
itimpi Posted February 1, 2013 Share Posted February 1, 2013 The thing that springs to mind is that there are sectors pending re-allocation. unRAID normally expects that value to be zero. Personally I like to get the data off such a disk and then put it through a pre-clear cycle to decide if I need to RMA the disk. Quote Link to comment
KeeWay Posted February 2, 2013 Share Posted February 2, 2013 I see that the count too but seems low, I guess I will replace the disk2 with one a spare i have just in case. Just seeing if there was something strange that might be RC10 related. thanks James Quote Link to comment
PeterB Posted February 2, 2013 Share Posted February 2, 2013 I have had a drive go disabled with the red ball, when i stopped the array it looked like the disk2 was not assigned and not selectable in the dropdown box but it listed below saying it was missing. i then took the syslog and rebooted the unraid box. since the reboot it seems that the drive is still disabled but when i stop the array it is listed in the dropdown box as selected. That is precisely the way that the system should behave once there has been a write error on a drive. Boot the system with that drive disconnected, so that it is 'forgotten'. Then boot again with the drive connected and the contents of the drive should be rebuilt from parity. This will also force the pending sectors to be re-allocated if the write errors are persistent. However, it would be better to rebuild onto another, tested, drive and then run multiple preclears on the old drive in order to determine whether it can be safely reused. Quote Link to comment
dheg Posted February 11, 2013 Share Posted February 11, 2013 In case Tom, or someone else, is keeping tabs on open issues. I reported very slow parity checks in RC10 here. I was coming from 4.7 with parity checks lasting about 8h, while in RC10 they lasted well over 12h. My syslog had thousands of attempting task abort! error messages. One particularity of my system is that I have a M1015 board with an Intel expander (RES2CV240). I found this post explaining how to update the firmware of the expander (from PH11 to PH13), so I did this weekend. My parity check has gone down to 6h 42min or 82.9MB/s (with 7 Green 2TB WD, 5 data + 1 parity +1 cache) and my syslog doesn't have a single attempting task abort! error. I guess this is a good as it gets to confirm slow parity are not RC10 related, at least for me . I hope it helps others Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.