unRAID Server Release 5.0-rc10 Available


limetech

Recommended Posts

r10 won't boot for me. I can't telnet into unRAID, webGUI won't come up either. I'm using Intel NIC. Unfortunately I don't have external monitor right now so I can't diagnose the issue. Unplugged the flash drive, copied back rc9a and it boots fine again.

 

Note that you may need to re-run the .bat to reinstall syslinux.

Thanks, somehow I missed it.

I'm using Mac, so I guess finding Windows machine to run .bat is just too much hassle, I'll wait for the 5.0 final :)

 

It may be better you test it before final as I'm not sure if that is really the problem that you are having :)  Also I did checked better and in fact from rc9a to rc10 there is no syslinux change (it is from rc8a to rc9a) then I think it makes no sense it require running the .bat, unless you didn't also ran it when you updated to rc9a? Also are you just trying to replace bzimage and bzroot (and readme.txt if you want...) ? You should not need to replace any other files... if your existing syslinux is working fine then you don't probably need to update it BUT then you should also NOT update menu.c32 (part of syslinux) and make sure you don't delete the existing ldlinux.sys on the flash (created when you did ran the .bat), I guess that is what caused these problems with these guys that had to re-run the .bat.

Link to comment
  • Replies 284
  • Created
  • Last Reply

Top Posters In This Topic

Top Posters In This Topic

Posted Images

 

It appears that spindown of disks are not always (kind of hit and miss) happening even though I see the spin down command executed in the log file.

After the above all disks are spinning down, but one, in this case disk 10 which keeps spinning.

In another case, another disk is affected.

 

Following that, because unraid think the disk has spun down, it never tries to spun it down again.

 

In case I use the command '/root/mdcmd spindown 10' it spins down immediately and keeps spun down.

 

Is this a potential bug or there is something I can look at on my side?

Are the colored status balls in the webGui consistent with the disk spinning state?  Also, I'd like to see the full system log.

 

Hi Tom,

 

Yes, the colored status ball reflect correctly to the spinning state. However if I look at unmenu, interestingly it shows the disk as spun down (but unmenu is wrong - probably due to different spinning state checking).

 

I am now removing SimpleFeatures (I already planned for a long time). Will come back with the full system log if the issue persists.

 

Hi Tom,

 

OK, I removed SF yesterday evening, but some customization is still running (VBox, CrashPlan and TVHeadend).

Please see the system log attached. FYI at 22:28:17 I was the one who manually spin up all the drives.

 

Today morning I see all the drives, but the cache drive spinning.

 

Let me know if you want me to test with full Vanilla unRAID (by removing every customization), but those apps never caused any issue like that before.

 

Yes, please test with vanilla.  The end of the syslog is "curious", my analysis:

 

Jan 24 22:08:33 Tower in.telnetd[27827]: connect from 192.168.9.145 (192.168.9.145)
Jan 24 22:08:36 Tower login[27828]: ROOT LOGIN  on '/dev/pts/0' from 'BUD8288N'
Jan 24 22:12:43 Tower in.telnetd[32070]: connect from 192.168.9.145 (192.168.9.145)
Jan 24 22:12:45 Tower login[32071]: ROOT LOGIN  on '/dev/pts/0' from 'BUD8288N'
Jan 24 22:14:02 Tower kernel: mdcmd (51): spindown 0
Jan 24 22:14:02 Tower emhttp: shcmd (78): /usr/sbin/hdparm -y /dev/sdd &> /dev/null
Jan 24 22:14:34 Tower kernel: mdcmd (52): spindown 1
Jan 24 22:14:45 Tower kernel: mdcmd (53): spindown 2
Jan 24 22:14:55 Tower kernel: mdcmd (54): spindown 3
Jan 24 22:15:06 Tower kernel: mdcmd (55): spindown 4
Jan 24 22:15:16 Tower kernel: mdcmd (56): spindown 5
Jan 24 22:15:27 Tower kernel: mdcmd (57): spindown 6
Jan 24 22:15:27 Tower kernel: mdcmd (58): spindown 8
Jan 24 22:17:00 Tower in.telnetd[4199]: connect from 192.168.9.145 (192.168.9.145)
Jan 24 22:17:02 Tower login[4202]: ROOT LOGIN  on '/dev/pts/0' from 'BUD8288N'
Jan 24 22:21:18 Tower kernel: mdcmd (59): spindown 7
Jan 24 22:21:39 Tower kernel: mdcmd (60): spindown 9
Jan 24 22:28:17 Tower emhttp: Spinning up all drives...
Jan 24 22:28:17 Tower emhttp: shcmd (79): /usr/sbin/hdparm -S0 /dev/sdd &> /dev/null
Jan 24 22:28:17 Tower kernel: mdcmd (61): spinup 0
Jan 24 22:28:17 Tower kernel: mdcmd (62): spinup 1
Jan 24 22:28:17 Tower kernel: mdcmd (63): spinup 2
Jan 24 22:28:17 Tower kernel: mdcmd (64): spinup 3
Jan 24 22:28:17 Tower kernel: mdcmd (65): spinup 4
Jan 24 22:28:17 Tower kernel: mdcmd (66): spinup 5
Jan 24 22:28:17 Tower kernel: mdcmd (67): spinup 6
Jan 24 22:28:17 Tower kernel: mdcmd (68): spinup 7
Jan 24 22:28:17 Tower kernel: mdcmd (69): spinup 8
Jan 24 22:28:17 Tower kernel: mdcmd (70): spinup 9
Jan 24 22:28:17 Tower kernel: mdcmd (71): spinup 10
Jan 24 22:58:24 Tower kernel: mdcmd (72): spindown 0
Jan 24 22:58:24 Tower kernel: mdcmd (73): spindown 1
Jan 24 22:58:24 Tower kernel: mdcmd (74): spindown 2
Jan 24 22:58:25 Tower kernel: mdcmd (75): spindown 3
Jan 24 22:58:25 Tower kernel: mdcmd (76): spindown 4
Jan 24 22:58:26 Tower kernel: mdcmd (77): spindown 5
Jan 24 22:58:26 Tower kernel: mdcmd (78): spindown 6
Jan 24 22:58:27 Tower kernel: mdcmd (79): spindown 7
Jan 24 22:58:27 Tower kernel: mdcmd (80): spindown 8
Jan 24 22:58:27 Tower kernel: mdcmd (81): spindown 9
Jan 24 22:58:28 Tower emhttp: shcmd (80): /usr/sbin/hdparm -y /dev/sdd &> /dev/null
Jan 24 22:58:50 Tower kernel: mdcmd (82): spindown 10
Jan 24 23:54:25 Tower kernel: mdcmd (83): spindown 10
Jan 25 01:01:26 Tower tvheadend[3986]: htsp: 192.168.9.4 [ XBMC Media Center ]: Disconnected
Jan 25 05:10:32 Tower emhttp: shcmd (81): /usr/sbin/hdparm -y /dev/sdd &> /dev/null

 

At 22:08:33 you connected and it took you 3 seconds to type the password and log in.  I guess you logged right out.

At 22:12:43 you connected and logged in again.

 

About a minute and a half later all the disks were spun down, but I don't see an entry that said this happened via clicking 'Spin Down' button (I should see "Spinning down all drives..." message if so).  Besides, they start spinning down more-or-less 10 seconds apart.  Why are they spinning down here, because they hit the inactivity time-out?

 

At 22:17:02 you log in again, and

at 22:28:17 you manually click the 'Spin Up' button, ok.

 

At 22:58:25 they all spin down again.  This is consistent with spin-down delay set at 30 minutes.  But disk10 does not spin down until 22:58:50, so something is accessing it.  Then is does spin down again at 23:54:25, so something again must have been accessing it earlier.  Also something accesses the cache drive because it doesn't spin down again until next day at 5:10:32.

 

I think this must be a plugin or host side app accessing the server causing this.

Link to comment

Also I did checked better and in fact from rc9a to rc10 there is no syslinux change (it is from rc8a to rc9a) then I think it makes no sense it require running the .bat, unless you didn't also ran it when you updated to rc9a?

I upgraded from rc8a to rc9 and then to rc9a without running the .bat file

 

Also are you just trying to replace bzimage and bzroot (and readme.txt if you want...) ? You should not need to replace any other files... if your existing syslinux is working fine then you don't probably need to update it BUT then you should also NOT update menu.c32 (part of syslinux) and make sure you don't delete the existing ldlinux.sys on the flash (created when you did ran the .bat), I guess that is what caused these problems with these guys that had to re-run the .bat.

I just replaced bzimage and bzroot, nothing else.

I should have access to external monitor and some time today, so I'll try again.

Link to comment

Ok I managed to find out why r10 wouldn't "boot" - it loads Intel e100 module, but ethernet won't come up.

Unfortunately I can't find keyboard wireless dongle, so I can't copy the logs or investigate the issue further.

 

On the other hand, it also loads r8169 module which didn't work previously, but now it work's just fine, so I'm on rc10 right now, but using different NIC.

 

Link to comment

Ok I managed to find out why r10 wouldn't "boot" - it loads Intel e100 module, but ethernet won't come up.

Unfortunately I can't find keyboard wireless dongle, so I can't copy the logs or investigate the issue further.

 

On the other hand, it also loads r8169 module which didn't work previously, but now it work's just fine, so I'm on rc10 right now, but using different NIC.

 

yea it wont boot for me to :( looks like i will have to open the case to get the usb flash drive.

Link to comment

Ok I managed to find out why r10 wouldn't "boot" - it loads Intel e100 module, but ethernet won't come up.

Unfortunately I can't find keyboard wireless dongle, so I can't copy the logs or investigate the issue further.

 

On the other hand, it also loads r8169 module which didn't work previously, but now it work's just fine, so I'm on rc10 right now, but using different NIC.

 

For users with multiple network chipsets, including both onboard and addon cards, be aware that different kernels may identify them in a different order, and this can keep your UnRAID server off the network after an upgrade.

 

UnRAID always uses eth0 for network comms, in effect using whatever the kernel found and set up first as eth0.  If you only have one NIC, then it's eth0 and everything is fine.  Unfortunately, if you have more than one, the kernel has not always been consistent as to which one it identifies first, so your preferred NIC may be set up as eth1, not eth0.  You should be able to see that in your syslog.

 

There are several ways to fix this.  The easiest is to try moving the network cable to the other network connector.  If it works, then it is the new eth0, and you will be back on the network almost immediately.  However, if you prefer your server to use the same network chipset it already was using, then you will have to either remove the other network chipset (if it is an addon card), or disable it in the BIOS settings (if it is onboard).

Link to comment

Migrating from unRAID 4.7 to unRAID 5.0

 

Kevin (stchas) has created a nice guide to upgrading to v5, and I would like to make sure it has maximum visibility, both to those who have experienced the upgrade already and have ideas for improvement, and for all those about to move to v5 when it goes Final.

 

When I first saw him working on it, my first reaction was "ho hum, why do we need this, all you need to do is copy bzroot and bzimage and run NewPerms".  But he has covered a lot of ground, including many of the gotchas such as changing the URL you use to access the WebGui, and fixing the line drawing characters in Midnight Commander, and setting up NFS.  Because he also uses a number of addons, he includes them, and discusses topics such as moving to the Cache-only shares, and the new Plugin system, and how to start using them.

 

There is still some polish needed, as well as some mention of the other issues some have faced, such as needing to run the make_bootable batch file if upgraded server does not boot, and the need for the MEM=4095M parameter for some who experience very slow writes on certain motherboards.

 

Feedback and suggestions and additional tips for the page should be posted in Kevin's thread announcing it, found here:  How-To: Migrate from unRAID 4.7 to unRAID 5.0.  I would like to see anyone that had problems upgrading originally to v5 or recently to RC10 review the page, and either add your tips to the wiki page, or comment on them in Kevin's thread.  If comprehensive enough, this could be THE guide for all future upgrades to v5.  If more upgraders use the guide, we might even see less support requests!

Link to comment

A brief note on parity checks ...

I could not have explained it better.  Thanks.  I wonder if that explanation should go in the Wiki somewhere.

 

I've added it to the Improving unRAID Performance wiki page, in a new Parity Check Speed section, third bullet.  I only linked to here, others (especially Gary? :) ) can improve/expand it.  Feel free to edit, correct, expand the entire page.  I updated it a little for v5, but it still could use more work.

Link to comment

Ok, the upgrade from rc8 to rc10-test went ok.  At first the server wouldn't boot.  I had copyied the two main files and then ran the make_boot.bat and thought I was ok.  I ended up having to also copy the menu.c32 file, and then it booted just fine.  I ran the new perm utility, took about 2.5 hours.  I then did a reboot, just to check that everything works ok.  The server came right up, but I noticed that the webgui and shares took a long time to load compared to previous versions.  Before when I saw the Login prompt on my console, it was only a matter of seconds before the webgui and shares were available.  After the upgrade it took about 2 mintues, which seemed like a real long time.  I was about to try launching the command to start the webgui manually, but it finaly came up.  Again, went ok, just took a long time to load for some reason.  Anyone else see this after upgrading?

 

I will do a parity check and test copies re smb next and report back.

 

Link to comment

I've added it to the Improving unRAID Performance wiki page, in a new Parity Check Speed section, third bullet.  I only linked to here, others (especially Gary? :) ) can improve/expand it.  Feel free to edit, correct, expand the entire page.  I updated it a little for v5, but it still could use more work.

 

Thanks, I added a paragraph about controllers - motherboard vs PCI-e (x1,x4,x8,x16) vs PCI.

Link to comment

 

It appears that spindown of disks are not always (kind of hit and miss) happening even though I see the spin down command executed in the log file.

After the above all disks are spinning down, but one, in this case disk 10 which keeps spinning.

In another case, another disk is affected.

 

Following that, because unraid think the disk has spun down, it never tries to spun it down again.

 

In case I use the command '/root/mdcmd spindown 10' it spins down immediately and keeps spun down.

 

Is this a potential bug or there is something I can look at on my side?

Are the colored status balls in the webGui consistent with the disk spinning state?  Also, I'd like to see the full system log.

 

Hi Tom,

 

Yes, the colored status ball reflect correctly to the spinning state. However if I look at unmenu, interestingly it shows the disk as spun down (but unmenu is wrong - probably due to different spinning state checking).

 

I am now removing SimpleFeatures (I already planned for a long time). Will come back with the full system log if the issue persists.

 

Hi Tom,

 

OK, I removed SF yesterday evening, but some customization is still running (VBox, CrashPlan and TVHeadend).

Please see the system log attached. FYI at 22:28:17 I was the one who manually spin up all the drives.

 

Today morning I see all the drives, but the cache drive spinning.

 

Let me know if you want me to test with full Vanilla unRAID (by removing every customization), but those apps never caused any issue like that before.

 

Yes, please test with vanilla.  The end of the syslog is "curious", my analysis:

 

Jan 24 22:08:33 Tower in.telnetd[27827]: connect from 192.168.9.145 (192.168.9.145)
Jan 24 22:08:36 Tower login[27828]: ROOT LOGIN  on '/dev/pts/0' from 'BUD8288N'
Jan 24 22:12:43 Tower in.telnetd[32070]: connect from 192.168.9.145 (192.168.9.145)
Jan 24 22:12:45 Tower login[32071]: ROOT LOGIN  on '/dev/pts/0' from 'BUD8288N'
Jan 24 22:14:02 Tower kernel: mdcmd (51): spindown 0
Jan 24 22:14:02 Tower emhttp: shcmd (78): /usr/sbin/hdparm -y /dev/sdd &> /dev/null
Jan 24 22:14:34 Tower kernel: mdcmd (52): spindown 1
Jan 24 22:14:45 Tower kernel: mdcmd (53): spindown 2
Jan 24 22:14:55 Tower kernel: mdcmd (54): spindown 3
Jan 24 22:15:06 Tower kernel: mdcmd (55): spindown 4
Jan 24 22:15:16 Tower kernel: mdcmd (56): spindown 5
Jan 24 22:15:27 Tower kernel: mdcmd (57): spindown 6
Jan 24 22:15:27 Tower kernel: mdcmd (58): spindown 8
Jan 24 22:17:00 Tower in.telnetd[4199]: connect from 192.168.9.145 (192.168.9.145)
Jan 24 22:17:02 Tower login[4202]: ROOT LOGIN  on '/dev/pts/0' from 'BUD8288N'
Jan 24 22:21:18 Tower kernel: mdcmd (59): spindown 7
Jan 24 22:21:39 Tower kernel: mdcmd (60): spindown 9
Jan 24 22:28:17 Tower emhttp: Spinning up all drives...
Jan 24 22:28:17 Tower emhttp: shcmd (79): /usr/sbin/hdparm -S0 /dev/sdd &> /dev/null
Jan 24 22:28:17 Tower kernel: mdcmd (61): spinup 0
Jan 24 22:28:17 Tower kernel: mdcmd (62): spinup 1
Jan 24 22:28:17 Tower kernel: mdcmd (63): spinup 2
Jan 24 22:28:17 Tower kernel: mdcmd (64): spinup 3
Jan 24 22:28:17 Tower kernel: mdcmd (65): spinup 4
Jan 24 22:28:17 Tower kernel: mdcmd (66): spinup 5
Jan 24 22:28:17 Tower kernel: mdcmd (67): spinup 6
Jan 24 22:28:17 Tower kernel: mdcmd (68): spinup 7
Jan 24 22:28:17 Tower kernel: mdcmd (69): spinup 8
Jan 24 22:28:17 Tower kernel: mdcmd (70): spinup 9
Jan 24 22:28:17 Tower kernel: mdcmd (71): spinup 10
Jan 24 22:58:24 Tower kernel: mdcmd (72): spindown 0
Jan 24 22:58:24 Tower kernel: mdcmd (73): spindown 1
Jan 24 22:58:24 Tower kernel: mdcmd (74): spindown 2
Jan 24 22:58:25 Tower kernel: mdcmd (75): spindown 3
Jan 24 22:58:25 Tower kernel: mdcmd (76): spindown 4
Jan 24 22:58:26 Tower kernel: mdcmd (77): spindown 5
Jan 24 22:58:26 Tower kernel: mdcmd (78): spindown 6
Jan 24 22:58:27 Tower kernel: mdcmd (79): spindown 7
Jan 24 22:58:27 Tower kernel: mdcmd (80): spindown 8
Jan 24 22:58:27 Tower kernel: mdcmd (81): spindown 9
Jan 24 22:58:28 Tower emhttp: shcmd (80): /usr/sbin/hdparm -y /dev/sdd &> /dev/null
Jan 24 22:58:50 Tower kernel: mdcmd (82): spindown 10
Jan 24 23:54:25 Tower kernel: mdcmd (83): spindown 10
Jan 25 01:01:26 Tower tvheadend[3986]: htsp: 192.168.9.4 [ XBMC Media Center ]: Disconnected
Jan 25 05:10:32 Tower emhttp: shcmd (81): /usr/sbin/hdparm -y /dev/sdd &> /dev/null

 

At 22:08:33 you connected and it took you 3 seconds to type the password and log in.  I guess you logged right out.

At 22:12:43 you connected and logged in again.

 

About a minute and a half later all the disks were spun down, but I don't see an entry that said this happened via clicking 'Spin Down' button (I should see "Spinning down all drives..." message if so).  Besides, they start spinning down more-or-less 10 seconds apart.  Why are they spinning down here, because they hit the inactivity time-out?

 

At 22:17:02 you log in again, and

at 22:28:17 you manually click the 'Spin Up' button, ok.

 

At 22:58:25 they all spin down again.  This is consistent with spin-down delay set at 30 minutes.  But disk10 does not spin down until 22:58:50, so something is accessing it.  Then is does spin down again at 23:54:25, so something again must have been accessing it earlier.  Also something accesses the cache drive because it doesn't spin down again until next day at 5:10:32.

 

I think this must be a plugin or host side app accessing the server causing this.

 

Hi Tom,

 

just to report back on this issue.

 

After a lot of testing and much more grey hair I tracked this down to the good old smarthistory application which were always running just fine and very reliable in the background. I had this application scheduled daily (with wake -ON options) and were running at 4:40 spinning up all disks to read and record some smart parameters.

 

However, it appears that smarthistory is spinning up the disks in a way, that unRAD still thinks the disks are all spun down, so it is not commanding them to spin down again until there is no "offical" wake up on those drives making unRAID aware that those are spinning. Disabling smarthistory solved the issue.

 

Apologizes for the false "alert"!

Link to comment

After this update, I see no difference in copy speeds, using smb.  I will test partiy check next.

 

So again, anyone else have their server take extra long time to load the webgui and publish the shares (about 2 min)?  Before this update, it was much faster at both of those.

 

Thanks!

Link to comment

So again, anyone else have their server take extra long time to load the webgui and publish the shares (about 2 min)?  Before this update, it was much faster at both of those.

 

Webgui display time and shares available time delay are 2 different things, and I can't think of a connection unless you have a bad drive that is having a hard time on the initial setup.  The shares are commonly delayed if there are hard disk issues to be dealt with, or lots of transactions to be replayed, and I suppose possibly issues with an addon in its setup.

 

As always, we really need to see a syslog!

Link to comment

Ok, syslog coming after my parity check and a fresh reboot.

 

Are you sure about the webgui and the shares not being related?  It's my understanding that the webgui start command (/usr/local/sbin/emhttp &) is required to start the webgui interface and that must be running before any shares are published.  If that is correct, they are very much related.

 

Again, this wasn't the case before the update, and I have no smart data or other performance issues that indicate a hard drive issue.

 

Other than the slow emhttp and share thing, this build seems good.

 

By the way, after you get the console login prompt, who long before you can access via IE webgui and connect to a share?

 

 

Link to comment

Ok, syslog coming after my parity check and a fresh reboot.

 

Are you sure about the webgui and the shares not being related?  It's my understanding that the webgui start command (/usr/local/sbin/emhttp &) is required to start the webgui interface and that must be running before any shares are published.  If that is correct, they are very much related.

 

Again, this wasn't the case before the update, and I have no smart data or other performance issues that indicate a hard drive issue.

 

Other than the slow emhttp and share thing, this build seems good.

 

By the way, after you get the console login prompt, who long before you can access via IE webgui and connect to a share?

Revert back to a stock configuration and see if the delay is still there.

 

I'll bet one of your add-ons is delaying the completion of emhttp's start up events. 

 

Joe L.

Link to comment

Ok, syslog coming after my parity check and a fresh reboot.

 

Are you sure about the webgui and the shares not being related?  It's my understanding that the webgui start command (/usr/local/sbin/emhttp &) is required to start the webgui interface and that must be running before any shares are published.  If that is correct, they are very much related.

 

Again, this wasn't the case before the update, and I have no smart data or other performance issues that indicate a hard drive issue.

 

Other than the slow emhttp and share thing, this build seems good.

 

By the way, after you get the console login prompt, who long before you can access via IE webgui and connect to a share?

 

The Webgui is usually available very quickly.  Once the kernel finishes identifying and setting up all of the hardware and drivers and other system modules, it tries to start and configure networking (if it can find a carrier), then loads plugins if any, then loads the Webgui.  emhttp then loads the array configuration, and sets up the drive list and array structure, then (if autostart is enabled) begins to mount all of the drives.  Once drives are mounted, it then sets up Samba and NFS and the disk shares and begins setting up the User shares.

 

The plugins do precede emhttp start, and they can take a while, especially if it's the first time for any of them.  They may have to download the files and dependencies for one or more plugins, which can take a long time, but should not be repeated on the next boot.

 

On my system, the Webgui is available long before the console login appears, because I have a few sleeps (one is long) in my go file, and the console login does not appear until the go file is complete.

Link to comment

RC11 is out:

http://lime-technology.com/forum/index.php?topic=25609.0

 

"

Changes from 5.0-rc10 to 5.0-rc11

---------------------------------

- emhttp: fixed spurious "title not found" log entries

- emhttp: ensure new parity disk for 'swap disable' has a valid partition table

- emhttp: fixed worker thread (format/clear/copy) inconsistent progress

- emhttp: default timeZone "America/Los_Angeles" (eliminate first-boot error message)

- flash boot: add menu item to boot kernel limiting memory use to 4GB

- linux: use kernel 3.4.26 (for various disk controller and NIC driver bug fixes)

- linux: added "Intel PIIX4 and compatible I2C driver" (i2c-piix4) per user request

- linux: changed cpufreq drivers from modules to built-ins

- shfs: fixed crash by replacing non-thread-safe readdir() with readdir_r()

- shfs: use st_ino field to record object disk location

- slack: add 10-sec timeout waiting for USB flash to appear as suggested by forum user Barzija

- webGui: added very simple vsftp support

- webGui: indexer: diplay disk location of objects

"

Link to comment

I have had a drive go disabled with the red ball, when i stopped the array it looked like the disk2 was not assigned and not selectable in the dropdown box but it listed below saying it was missing. i then took the syslog and rebooted the unraid box.  since the reboot it seems that the drive is still disabled but when i stop the array it is listed in the dropdown box as selected.

 

what are my options?

 

the drive shows good under the health menu's and below is the disk report.

 

ID# ATTRIBUTE_NAME          FLAG    VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

  1 Raw_Read_Error_Rate    0x002f  200  200  051    Pre-fail  Always      -      0

  3 Spin_Up_Time            0x0027  145  144  021    Pre-fail  Always      -      9708

  4 Start_Stop_Count        0x0032  098  098  000    Old_age  Always      -      2980

  5 Reallocated_Sector_Ct  0x0033  200  200  140    Pre-fail  Always      -      0

  7 Seek_Error_Rate        0x002e  200  200  000    Old_age  Always      -      0

  9 Power_On_Hours          0x0032  067  067  000    Old_age  Always      -      24512

10 Spin_Retry_Count        0x0032  100  100  000    Old_age  Always      -      0

11 Calibration_Retry_Count 0x0032  100  253  000    Old_age  Always      -      0

12 Power_Cycle_Count      0x0032  100  100  000    Old_age  Always      -      79

192 Power-Off_Retract_Count 0x0032  200  200  000    Old_age  Always      -      41

193 Load_Cycle_Count        0x0032  168  168  000    Old_age  Always      -      96037

194 Temperature_Celsius    0x0022  123  111  000    Old_age  Always      -      29

196 Reallocated_Event_Count 0x0032  200  200  000    Old_age  Always      -      0

197 Current_Pending_Sector  0x0032  200  200  000    Old_age  Always      -      10

198 Offline_Uncorrectable  0x0030  200  200  000    Old_age  Offline      -      6

199 UDMA_CRC_Error_Count    0x0032  200  200  000    Old_age  Always      -      0

200 Multi_Zone_Error_Rate  0x0008  200  200  000    Old_age  Offline      -      17

 

 

thanks

James

syslog.zip

Link to comment

I have had a drive go disabled with the red ball, when i stopped the array it looked like the disk2 was not assigned and not selectable in the dropdown box but it listed below saying it was missing. i then took the syslog and rebooted the unraid box.  since the reboot it seems that the drive is still disabled but when i stop the array it is listed in the dropdown box as selected.

 

That is precisely the way that the system should behave once there has been a write error on a drive.  Boot the system with that drive disconnected, so that it is 'forgotten'.  Then boot again with the drive connected and the contents of the drive should be rebuilt from parity.  This will also force the pending sectors to be re-allocated if the write errors are persistent.

 

However, it would be better to rebuild onto another, tested, drive and then run multiple preclears on the old drive in order to determine whether it can be safely reused.

Link to comment
  • 2 weeks later...

In case Tom, or someone else, is keeping tabs on open issues.

 

I reported very slow parity checks in RC10 here. I was coming from 4.7 with parity checks lasting about 8h, while in RC10 they lasted well over 12h. My syslog had thousands of attempting task abort! error messages.

 

One particularity of my system is that I have a M1015 board with an Intel expander (RES2CV240). I found this post explaining how to update the firmware of the expander (from PH11 to PH13), so I did this weekend. My parity check has gone down to 6h 42min or 82.9MB/s (with 7 Green 2TB WD, 5 data + 1 parity +1 cache) and my syslog doesn't have a single attempting task abort! error.

 

I guess this is a good as it gets to confirm slow parity are not RC10 related, at least for me  8).

 

I hope it helps others  ;)

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.