Re: preclear_disk.sh - a new utility to burn-in and pre-clear disks for quick add


Recommended Posts

Note from Joe L:

I reached the character limit for a single post in my original preclear_disk thread and wanted more room to add more release notes!  I split the threads so that I could add new stuff without getting rid of all of these following posts by users of the utility..  I would love to merge the threads back into a single one again, but it will only merge the threads in date order.  That would put my new release notes near the end of a very long set of pages.  For that reason, the original thread with only the release notes is locked, and this one split from it.

The original preclear_disk.sh thread with the release notes is here:

http://lime-technology.com/forum/index.php?topic=2817.0

 

 

What is the possibility of including this in the unmenu awk interface? Maybe in a future version.

 

My thoughts (provided it is possible) would be to include a button beside each disk on the "Disk Mgmt" page in the "Drive Partitions - Not In Protected Array" along with radio button for the options of "no read/write", "1 cycle", "5 cycles", "20 cycles" and "test"...

 

Cheers, and keep up the good work!

 

Matt

Link to comment

What is the possibility of including this in the unmenu awk interface? Maybe in a future version.

 

My thoughts (provided it is possible) would be to include a button beside each disk on the "Disk Mgmt" page in the "Drive Partitions - Not In Protected Array" along with radio button for the options of "no read/write", "1 cycle", "5 cycles", "20 cycles" and "test"...

 

Cheers, and keep up the good work!

 

Matt

That is a long-term goal, although I would put it on its own plug-in page.  The biggest issue is the display of progress as it performs the clear.  I am taking advantage of a feature of the "dd" command to get its status when writing to the drive.  I would need to re-write that section.

 

Then, I would need a way to start, and stop the process from the web-browser, and a way to get the periodic update of status as it progresses.  I know I don't want to submit a task and wait 200+ hours for the browser to return...  ;)

 

First I just want to make sure it is doing the correct "pre-cleared" signature, and not creating a black-hole somewhere in the universe.  ;) 

 

Before I tackle anything else, I need to get the next version of unmenu.awk out published.  bjp999 recently added a bit of logic and figured out how to get it to POST to a plug-in as well as GET.  I've added that to the next version, along with a few fixes.

(non-geek translation... unmenu.awk can now handle more complicated data entry forms)

 

Glad you like unmenu.awk.  It certainly is turning out to be interesting. 

 

Joe L.

Link to comment
Then, I would need a way to start, and stop the process from the web-browser, and a way to get the periodic update of status as it progresses.  I know I don't want to submit a task and wait 200+ hours for the browser to return...

 

Perhaps submit it to the background via batch.

The process can write it's pid in /var/run and log file in /var/log.

The browser interface section can refresh on the log and if need be use the pid file to send a kill to the process and it's children.

 

Link to comment

Then, I would need a way to start, and stop the process from the web-browser, and a way to get the periodic update of status as it progresses.  I know I don't want to submit a task and wait 200+ hours for the browser to return...

 

Perhaps submit it to the background via batch.

The process can write it's pid in /var/run and log file in /var/log.

The browser interface section can refresh on the log and if need be use the pid file to send a kill to the process and it's children.

 

Sounds like it could work...  I'll put it on the list if somebody does not get to it first...

 

I need to get my own array hardware working properly first.  I've determined that one drive tray slot locks up the server when it has a disk installed.  Not sure if it is the cable, or the controller, or the drive tray connector in the case... Time will tell.  (I can't experiment in the evenings as much, as we are using the server to watch movies...)

 

Joe L.

Link to comment

I re-seated my existing disk controller cables and interface card in an attempt to diagnose the DMA errors I've been experiencing since attempting to expand my array into the last 4 empty slots.  As I stated earlier, I was getting a DMA error that just locked up the server.

 

Since these 4 slots are not yet assigned to anything in my array, my data is safe... I do need to test these slots by reading and writing to disks in them, and this preclear_disk.sh script is perfect for this.  I can keep a disk far more active than otherwise, and at no risk to the overall parity protection of my array.

 

Last night I re-ran a pre-clear cycle of my tiny small 8Gig test drive  It is at the end connector of the first cable of the disk controller.  It ran successfully in about 25 minutes.  I then tried a pre-clear of a much larger 750Gig drive. It is on the end of the second cable off of the same Promise IDE controller.

 

As you might have guessed, the 750Gig drive took quite a bit longer to pre-read/clear/post-read than my 8Gig drive.  It took just under 10 hours for 1 cycle.  It also experienced some changes to the SMART data. 

 

The preclear_disk.sh script is designed to take a SMART status report when it starts, and another at its end, and to show you any differences between them if they exist.  In my example screen-shot, the Raw_Read_Error_Rate and See_Error_Rate are un-changed, but their "raw value" changed. (last value on the line)  These are not likely to be problems.  The Airflow_Temperature_Cel changed... also not likely to be a problem.  There was an increase in the Hardware_ECC_Recovered counter.   I'll need to keep an eye on that.  It indicated the hardware in the disk corrected an error it detected in reading the disk.  The unRAID OS never even knew anything as the error-correction-code in the drive's firmware handled the error.

 

Makes you kind of wonder if all this is also happening on disks in our Windows PCs, and we are not notified by it until it fails to boot...

 

Here is a screen shot of how it looked when it was done:

1zmo9kn.jpg

 

I'm going to run this 750Gig drive through a few more pre-read/clear/post-read cycles to see if it changes any more, or if I get any more DMA errors.   First, I'm going to save a copy of my syslog, as the SMART reports are logged there.  That way, if I do have another DMA error lockup, the SMART report in the saved syslog will be available next time for comparison.

 

Note: the SMART difference output is in "diff" format.  The lines with a leading "<" are from the before SMART report, the lines with a leading ">" are from the after SMART report.   Lines that are unchanged are not shown at all.

 

Joe L.

Link to comment

I just loaded the new 4.4 version of unRAID.  Apparently, Tom has not included the "ncurses" package.

 

This will break the display of the preclear_disk.sh script.

 

To fix it, you can either install the "ncurses" package, or, a lot easier, change these few lines in the script

 

from:

clearscreen=`tput clear`

goto_top=`tput cup 0 1`

screen_line_three=`tput cup 3 1`

bold=`tput smso`

norm=`tput rmso`

ul=`tput smul`

noul=`tput rmul`

 

to:

if [ -x /usr/bin/tput ]

then

  clearscreen=`tput clear`

  goto_top=`tput cup 0 1`

  screen_line_three=`tput cup 3 1`

  bold=`tput smso`

  norm=`tput rmso`

  ul=`tput smul`

  noul=`tput rmul`

else

  clearscreen=`echo -n -e "\033[H\033[2J"`

  goto_top=`echo -n -e "\033[1;2H"`

  screen_line_three=`echo -n -e "\033[4;2H"`

  bold=`echo -n -e "\033[7m"`

  norm=`echo -n -e "\033[27m"`

  ul=`echo -n -e "\033[4m"`

  noul=`echo -n -e "\033[24m"`

fi

I'll post a new version of the preclear script shortly with these changes.

 

Edit: updated version now attached to first post in this thread.

 

Joe L.

Link to comment

This is an excellent utility. I'm using it now to clear my new 1,5TB disk. This is gonna take a while ;)

I'll bet it will take a while... 

Please let us know how long it does take.... (I hope you only did one cycle, at least at first)

 

I figure your disk is twice the capacity as mine, but probably twice as fast, so 10 hours or so for one cycle is my guess....

 

Joe L.

Link to comment

I noticed that the 4.4final and 4.5beta releases do not have a working "smartctl" command.  This will not prevent the preclear script from running, but you will be unable to learn if the disk SMART attributes change during the preclear process.

 

To fix the smartctl program all you need to do is install the missing library it needs.  It can be downloaded from:

http://slackware.cs.utah.edu/pub/slackware/slackware-12.0/slackware/a/cxxlibs-6.0.8-i486-4.tgz

 

Then, using file-explorer on your windows PC you can open up

\\tower\flash

and create a packages folder.  You can copy or move the downloaded file there.  From windows it will be at

\\tower\flash\packages\cxxlibs-6.0.8-i486-4.tgz

 

If you log in via the system console, or via telnet, the flash drive is mounted at /boot.  The new directory you created is therefore at

/boot/packages  Your file will then be at /boot/packages/cxxlibs-6.0.8-i486-4.tgz

 

Once downloaded, and saved as cxxlibs-6.0.8-i486-4.tgz you can install it by changing to the directory where you downloaded it

(I have all my downloaded packages in the /boot/packages directory, so after logging in on the system console or via telnet

To change directory I type

cd /boot/packages

 

and install it by typing  :

installpkg cxxlibs-6.0.8-i486-4.tgz

 

As an alternative, if I did not want to change directory to where I put the file, I could just give the full path to the downloaded file like this:

installpkg /boot/packages/cxxlibs-6.0.8-i486-4.tgz

Once it is installed, the smartctl program will work until you reboot, at which time you will need to re-install it once more.

 

Joe L.

Link to comment

This is an excellent utility. I'm using it now to clear my new 1,5TB disk. This is gonna take a while ;)

I'll bet it will take a while... 

Please let us know how long it does take.... (I hope you only did one cycle, at least at first)

 

I figure your disk is twice the capacity as mine, but probably twice as fast, so 10 hours or so for one cycle is my guess....

 

Joe L.

 

Running for 13:20 hours now. Post-read @ 30%

approx. 3 minutes per percent so i guess still 3,5 hours to go.

 

 

Link to comment

This is an excellent utility. I'm using it now to clear my new 1,5TB disk. This is gonna take a while ;)

I'll bet it will take a while... 

Please let us know how long it does take.... (I hope you only did one cycle, at least at first)

 

I figure your disk is twice the capacity as mine, but probably twice as fast, so 10 hours or so for one cycle is my guess....

 

Joe L.

 

Running for 13:20 hours now. Post-read @ 30%

approx. 3 minutes per percent so i guess still 3,5 hours to go.

 

 

I'm running a preclear_disk cycle on a 750Gig SATA drive I have plugged into a new 2-port PCI-Bus SATA controller.  It is a very inexpensive controller card, and only rated at SATA 1.0 speeds, but I figure I am very limited by the PCI bus, so it really does not matter.

 

I'm  9 hours, 20 minutes into the process, and 85% complete in the post-read process. 

 

Joe L.

Link to comment

It took 10 hours, 2 minutes to pre-read/clear/post-read my 750Gig SATA drive.     This is interesting in that it is almost exactly the same time it took for an IDE based drive of the same size.    Read speeds averaged in the 75-80MB/s range.   Write speeds averaged in the mid 60MB/s through mid 70MB/s.  It shows the PCI bus can keep up with a single SATA drive and we are still mostly limited by the drive itself.

 

I stopped my unRAID array, assigned the newly cleared 750 Gig SATA drive on the devices page to an empty slot, then went back to the main page. I was presented with a display showing a "blue" indicator for the new drive.

 

The text alongside the "Start" button indicated it would be cleared if it was not already pre-cleared when the array was started. 

I checked the "checkbox" under the "Start" button (to enable it) and then clicked on the "Start" button to start the array.

 

The screen indicated that it was starting.  After I refreshed it, it showed the new disk as "Unformatted" and the array was up and running.  My array was off-line for perhaps a minute as I assigned the new drive.

 

Only a minute of down-time is a HUGE accomplishment, as in the past it took about 4 hours of down-time to add a 750 Gig drive while it was being cleared.   In addition, I had some confidence in the drive as any marginal sectors would have been identified.

 

I clicked on the "Format" button, and in a minute or two more I saw the new disk was available for data.

 

Joe L.

Link to comment

I clicked on the "Format" button, and in a minute or two more I saw the new disk was available for data.

 

Does it pay to have a FORMAT option in the pre-clear script?

 

No, it would not pay at all to have a FORMAT option. 

 

If the drive was formatted it would have a file-system type of "83" set in the 16 bytes that define the first partition in the MBR. It would then not have a valid pre-clear signature, and the unRAID software would go about clearing it when you assign it to the array.   After it cleared the drive, you would still have to format it.  For a large drive, you are facing 4 or more hours of down-time as the drive is cleared.

 

The only other way to add a "formatted" drive to the array would be to use the button labeled "Restore", the array would be on line quickly, but you would lose parity protection for many hours as it would then start a full parity calculation.   On my array, it takes over 12 hours... I'd rather keep the array protected, so this is not the best way to add a new drive for me.

 

Joe L.

Link to comment

Worked like a charm :D

 

Added disk in just a couple of minutes fully functional.

 

Tnx for this great utility.

Well... "a couple of minutes" plus 16 hours, 47 minutes pre-clearing time.  ;);D ;D ;D

 

Glad to hear it all went smoothly.  I guess you were in too much of a hurry to try 20 cycles... :(  It would have only taken 14 days....

 

Joe L.

Link to comment

I just received another 1.5TB drive from Amazon.  Sadly this one had the older firmware with the issues.  I updated the firmware and proceeded to use SpinRite 6 to check the drive.  After about an hour, SPinRite reported that an additional 400+ hours (more than 16 days) were needed to complete the very thorough check!  So I decided to choose a middle ground for testing.  I just loaded the drive into my array and am running this excellent utility.  Hopefully in about 17 hours, I will get some good news.

 

Thanks Joe L. !

 

Regards,  Peter

Link to comment

Good news and bad news.  First the good. The entire process took only 6:34:14, to do the 1.5TB Seagate.  The bad news is that post-read portion ended after the 40% mark.  Following that, the S.M.A.R.T. reports listed quite some errors.  Can anyone help me understand if I should send this drive back?

 

First the last Post-Read progress information:

===========================================================================

=                unRAID server Pre-Clear disk /dev/sda

=                      cycle 1 of 1

= Disk Pre-Clear-Read completed                                DONE

= Step 1 of 10 - Copying zeros to first 2048k bytes            DONE

= Step 2 of 10 - Copying zeros to remainder of disk to clear it DONE

= Step 3 of 10 - Disk is now cleared from MBR onward.          DONE

= Step 4 of 10 - Clearing MBR bytes for partition 2,3 & 4      DONE

= Step 5 of 10 - Clearing MBR code area                        DONE

= Step 6 of 10 - Setting MBR signature bytes                    DONE

= Step 7 of 10 - Setting partition 1 to precleared state        DONE

= Step 8 of 10 - Notifying kernel we changed the partitioning  DONE

= Step 9 of 10 - Creating the /dev/disk/by* entries            DONE

= Step 10 of 10 - Testing if the clear has been successful.    DONE

= Post-Read in progress: 40% complete. 

(  608,670,720,000  of  1,500,301,910,016  bytes read )

Elapsed Time:  6:31:49

 

Next, the Post-Read summary:

 

===========================================================================

=                unRAID server Pre-Clear disk /dev/sda

=                      cycle 1 of 1

= Disk Pre-Clear-Read completed                                DONE

= Step 1 of 10 - Copying zeros to first 2048k bytes            DONE

= Step 2 of 10 - Copying zeros to remainder of disk to clear it DONE

= Step 3 of 10 - Disk is now cleared from MBR onward.          DONE

= Step 4 of 10 - Clearing MBR bytes for partition 2,3 & 4      DONE

= Step 5 of 10 - Clearing MBR code area                        DONE

= Step 6 of 10 - Setting MBR signature bytes                    DONE

= Step 7 of 10 - Setting partition 1 to precleared state        DONE

= Step 8 of 10 - Notifying kernel we changed the partitioning  DONE

= Step 9 of 10 - Creating the /dev/disk/by* entries            DONE

= Step 10 of 10 - Testing if the clear has been successful.    DONE

= Disk Post-Clear-Read completed                                DONE

Elapsed Time:  6:34:14

============================================================================

==

== Disk /dev/sda has been successfully precleared

==

============================================================================

 

Now the S.M.A.R.T. error count:

 

S.M.A.R.T. error count differences detected after pre-clear

note, some 'raw' values may change, but not be an indication of a problem

54c54

<  1 Raw_Read_Error_Rate    0x000f  100  100  006    Pre-fail  Always      -      2194400

---

>  1 Raw_Read_Error_Rate    0x000f  103  099  006    Pre-fail  Always      -      42336756

57,58c57,58

<  5 Reallocated_Sector_Ct  0x0033  100  100  036    Pre-fail  Always      -      0

<  7 Seek_Error_Rate        0x000f  100  253  030    Pre-fail  Always      -      63626

---

>  5 Reallocated_Sector_Ct  0x0033  100  100  036    Pre-fail  Always      -      15

>  7 Seek_Error_Rate        0x000f  100  253  030    Pre-fail  Always      -      261274

62c62

< 187 Reported_Uncorrect      0x0032  100  100  000    Old_age  Always      -      0

---

> 187 Reported_Uncorrect      0x0032  076  076  000    Old_age  Always      -      24

64c64

< 189 High_Fly_Writes        0x003a  100  100  000    Old_age  Always      -      0

---

> 189 High_Fly_Writes        0x003a  075  075  000    Old_age  Always      -      25

66,68c66,68

< 195 Hardware_ECC_Recovered  0x001a  100  100  000    Old_age  Always     

< 197 Current_Pending_Sector  0x0012  100  100  000    Old_age  Always      -      0

< 198 Offline_Uncorrectable  0x0010  100  100  000    Old_age  Offline      -      0

---

> 195 Hardware_ECC_Recovered  0x001a  052  049  000    Old_age  Always     

> 197 Current_Pending_Sector  0x0012  100  100  000    Old_age  Always      -      1

> 198 Offline_Uncorrectable  0x0010  100  100  000    Old_age  Offline      -      1

72c72,170

< No Errors Logged

---

 

Finally, the last 5 errors (out of 24 apparently):

 

> ATA Error Count: 24 (device log contains only the most recent five errors)

> CR = Command Register [HEX]

> FR = Features Register [HEX]

> SC = Sector Count Register [HEX]

> SN = Sector Number Register [HEX]

> CL = Cylinder Low Register [HEX]

> CH = Cylinder High Register [HEX]

> DH = Device/Head Register [HEX]

> DC = Device Command Register [HEX]

> ER = Error register [HEX]

> ST = Status register [HEX]

> Powered_Up_Time is measured from power on, and printed as

> DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,

> SS=sec, and sss=millisec. It "wraps" after 49.710 days.

>

> Error 24 occurred at disk power-on lifetime: 8 hours (0 days + 8 hours)

>  When the command that caused the error occurred, the device was active or idle.

>

>  After command completion occurred, registers were:

>  ER ST SC SN CL CH DH

>  -- -- -- -- -- -- --

>  40 51 00 ff ff ff 0f

>

>  Commands leading to the command that caused the error were:

>  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name

>  -- -- -- -- -- -- -- --  ----------------  --------------------

>  60 00 00 ff ff ff 4f 00      06:55:09.851  READ FPDMA QUEUED

>  27 00 00 00 00 00 e0 02      06:55:09.831  READ NATIVE MAX ADDRESS EXT

>  ec 00 00 00 00 00 a0 02      06:55:09.811  IDENTIFY DEVICE

>  ef 03 46 00 00 00 a0 02      06:55:09.791  SET FEATURES [set transfer mode]

>  27 00 00 00 00 00 e0 02      06:55:09.771  READ NATIVE MAX ADDRESS EXT

>

> Error 23 occurred at disk power-on lifetime: 8 hours (0 days + 8 hours)

>  When the command that caused the error occurred, the device was active or idle.

>

>  After command completion occurred, registers were:

>  ER ST SC SN CL CH DH

>  -- -- -- -- -- -- --

>  40 51 00 ff ff ff 0f

>

>  Commands leading to the command that caused the error were:

>  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name

>  -- -- -- -- -- -- -- --  ----------------  --------------------

>  60 00 00 ff ff ff 4f 00      06:55:06.474  READ FPDMA QUEUED

>  27 00 00 00 00 00 e0 02      06:55:06.454  READ NATIVE MAX ADDRESS EXT

>  ec 00 00 00 00 00 a0 02      06:55:06.434  IDENTIFY DEVICE

>  ef 03 46 00 00 00 a0 02      06:55:06.414  SET FEATURES [set transfer mode]

>  27 00 00 00 00 00 e0 02      06:55:06.394  READ NATIVE MAX ADDRESS EXT

>

> Error 22 occurred at disk power-on lifetime: 8 hours (0 days + 8 hours)

>  When the command that caused the error occurred, the device was active or idle.

>

>  After command completion occurred, registers were:

>  ER ST SC SN CL CH DH

>  -- -- -- -- -- -- --

>  40 51 00 ff ff ff 0f

>

>  Commands leading to the command that caused the error were:

>  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name

>  -- -- -- -- -- -- -- --  ----------------  --------------------

>  60 00 00 ff ff ff 4f 00      06:55:02.987  READ FPDMA QUEUED

>  27 00 00 00 00 00 e0 02      06:55:02.967  READ NATIVE MAX ADDRESS EXT

>  ec 00 00 00 00 00 a0 02      06:55:02.947  IDENTIFY DEVICE

>  ef 03 46 00 00 00 a0 02      06:55:02.927  SET FEATURES [set transfer mode]

>  27 00 00 00 00 00 e0 02      06:55:02.907  READ NATIVE MAX ADDRESS EXT

>

> Error 21 occurred at disk power-on lifetime: 8 hours (0 days + 8 hours)

>  When the command that caused the error occurred, the device was active or idle.

>

>  After command completion occurred, registers were:

>  ER ST SC SN CL CH DH

>  -- -- -- -- -- -- --

>  40 51 00 ff ff ff 0f

>

>  Commands leading to the command that caused the error were:

>  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name

>  -- -- -- -- -- -- -- --  ----------------  --------------------

>  60 00 00 ff ff ff 4f 00      06:54:59.692  READ FPDMA QUEUED

>  60 00 00 ff ff ff 4f 00      06:54:59.690  READ FPDMA QUEUED

>  27 00 00 00 00 00 e0 02      06:54:59.670  READ NATIVE MAX ADDRESS EXT

>  ec 00 00 00 00 00 a0 02      06:54:59.650  IDENTIFY DEVICE

>  ef 03 46 00 00 00 a0 02      06:54:59.630  SET FEATURES [set transfer mode]

>

> Error 20 occurred at disk power-on lifetime: 8 hours (0 days + 8 hours)

>  When the command that caused the error occurred, the device was active or idle.

>

>  After command completion occurred, registers were:

>  ER ST SC SN CL CH DH

>  -- -- -- -- -- -- --

>  40 51 00 ff ff ff 0f

>

>  Commands leading to the command that caused the error were:

>  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name

>  -- -- -- -- -- -- -- --  ----------------  --------------------

>  60 00 00 ff ff ff 4f 00      06:54:56.314  READ FPDMA QUEUED

>  60 00 00 ff ff ff 4f 00      06:54:56.313  READ FPDMA QUEUED

>  27 00 00 00 00 00 e0 02      06:54:56.293  READ NATIVE MAX ADDRESS EXT

>  ec 00 00 00 00 00 a0 02      06:54:56.273  IDENTIFY DEVICE

>  ef 03 46 00 00 00 a0 02      06:54:56.253  SET FEATURES [set transfer mode]

============================================================================

 

 

Of course running the Seagate tools test indicates no problems.  Had this tool not existed, I would not have known any of this.  Was ignorance bliss?  I will re-run this as well as try some other tests.  Luckily I do not have an immediate need for this drive yet.

 

Thanks and regards,  Peter

 

Link to comment

Good news and bad news.  First the good. The entire process took only 6:34:14, to do the 1.5TB Seagate.  The bad news is that post-read portion ended after the 40% mark. 

It aborted the "post-read" when a read of 2000 blocks of data returned after reading less than 2000 blocks... So, 60% of the remaining blocks were not post-read.  We do not know about the pre-read...It could have aborted early too.  (I don't currently track if it got to the end, but, clearly I need to, as the display is overwritten by the next phase)  Odds are as good as any that the pre-read aborted too, especially with the short total elapsed time.

Following that, the S.M.A.R.T. reports listed quite some errors.  Can anyone help me understand if I should send this drive back?

 

First the last Post-Read progress information:

===========================================================================

=                unRAID server Pre-Clear disk /dev/sda

=                       cycle 1 of 1

= Disk Pre-Clear-Read completed                                 DONE

= Step 1 of 10 - Copying zeros to first 2048k bytes             DONE

= Step 2 of 10 - Copying zeros to remainder of disk to clear it DONE

= Step 3 of 10 - Disk is now cleared from MBR onward.           DONE

= Step 4 of 10 - Clearing MBR bytes for partition 2,3 & 4       DONE

= Step 5 of 10 - Clearing MBR code area                         DONE

= Step 6 of 10 - Setting MBR signature bytes                    DONE

= Step 7 of 10 - Setting partition 1 to precleared state        DONE

= Step 8 of 10 - Notifying kernel we changed the partitioning   DONE

= Step 9 of 10 - Creating the /dev/disk/by* entries             DONE

= Step 10 of 10 - Testing if the clear has been successful.     DONE

= Post-Read in progress: 40% complete. 

(  608,670,720,000  of  1,500,301,910,016  bytes read )

Elapsed Time:  6:31:49

 

Next, the Post-Read summary:

 

===========================================================================

=                unRAID server Pre-Clear disk /dev/sda

=                       cycle 1 of 1

= Disk Pre-Clear-Read completed                                 DONE

= Step 1 of 10 - Copying zeros to first 2048k bytes             DONE

= Step 2 of 10 - Copying zeros to remainder of disk to clear it DONE

= Step 3 of 10 - Disk is now cleared from MBR onward.           DONE

= Step 4 of 10 - Clearing MBR bytes for partition 2,3 & 4       DONE

= Step 5 of 10 - Clearing MBR code area                         DONE

= Step 6 of 10 - Setting MBR signature bytes                    DONE

= Step 7 of 10 - Setting partition 1 to precleared state        DONE

= Step 8 of 10 - Notifying kernel we changed the partitioning   DONE

= Step 9 of 10 - Creating the /dev/disk/by* entries             DONE

= Step 10 of 10 - Testing if the clear has been successful.     DONE

= Disk Post-Clear-Read completed                                DONE

Elapsed Time:  6:34:14

============================================================================

==

== Disk /dev/sda has been successfully precleared

==

============================================================================

 

Now the S.M.A.R.T. error count:

 

S.M.A.R.T. error count differences detected after pre-clear

note, some 'raw' values may change, but not be an indication of a problem

54c54

<   1 Raw_Read_Error_Rate     0x000f   100   100   006    Pre-fail  Always       -       2194400

---

>   1 Raw_Read_Error_Rate     0x000f   103   099   006    Pre-fail  Always       -       42336756

57,58c57,58

<   5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0

<   7 Seek_Error_Rate         0x000f   100   253   030    Pre-fail  Always       -       63626

---

>   5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       15

>   7 Seek_Error_Rate         0x000f   100   253   030    Pre-fail  Always       -       261274

62c62

< 187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0

---

> 187 Reported_Uncorrect      0x0032   076   076   000    Old_age   Always       -       24

64c64

< 189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0

---

> 189 High_Fly_Writes         0x003a   075   075   000    Old_age   Always       -       25

66,68c66,68

< 195 Hardware_ECC_Recovered  0x001a   100   100   000    Old_age   Always       

< 197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0

< 198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0

---

> 195 Hardware_ECC_Recovered  0x001a   052   049   000    Old_age   Always       

> 197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       1

> 198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       1

72c72,170

< No Errors Logged

---

 

Finally, the last 5 errors (out of 24 apparently):

 

> ATA Error Count: 24 (device log contains only the most recent five errors)

> CR = Command Register [HEX]

> FR = Features Register [HEX]

> SC = Sector Count Register [HEX]

> SN = Sector Number Register [HEX]

> CL = Cylinder Low Register [HEX]

> CH = Cylinder High Register [HEX]

> DH = Device/Head Register [HEX]

> DC = Device Command Register [HEX]

> ER = Error register [HEX]

> ST = Status register [HEX]

> Powered_Up_Time is measured from power on, and printed as

> DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,

> SS=sec, and sss=millisec. It "wraps" after 49.710 days.

>

> Error 24 occurred at disk power-on lifetime: 8 hours (0 days + 8 hours)

>   When the command that caused the error occurred, the device was active or idle.

>

>   After command completion occurred, registers were:

>   ER ST SC SN CL CH DH

>   -- -- -- -- -- -- --

>   40 51 00 ff ff ff 0f

>

>   Commands leading to the command that caused the error were:

>   CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

>   -- -- -- -- -- -- -- --  ----------------  --------------------

>   60 00 00 ff ff ff 4f 00      06:55:09.851  READ FPDMA QUEUED

>   27 00 00 00 00 00 e0 02      06:55:09.831  READ NATIVE MAX ADDRESS EXT

>   ec 00 00 00 00 00 a0 02      06:55:09.811  IDENTIFY DEVICE

>   ef 03 46 00 00 00 a0 02      06:55:09.791  SET FEATURES [set transfer mode]

>   27 00 00 00 00 00 e0 02      06:55:09.771  READ NATIVE MAX ADDRESS EXT

>

> Error 23 occurred at disk power-on lifetime: 8 hours (0 days + 8 hours)

>   When the command that caused the error occurred, the device was active or idle.

>

>   After command completion occurred, registers were:

>   ER ST SC SN CL CH DH

>   -- -- -- -- -- -- --

>   40 51 00 ff ff ff 0f

>

>   Commands leading to the command that caused the error were:

>   CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

>   -- -- -- -- -- -- -- --  ----------------  --------------------

>   60 00 00 ff ff ff 4f 00      06:55:06.474  READ FPDMA QUEUED

>   27 00 00 00 00 00 e0 02      06:55:06.454  READ NATIVE MAX ADDRESS EXT

>   ec 00 00 00 00 00 a0 02      06:55:06.434  IDENTIFY DEVICE

>   ef 03 46 00 00 00 a0 02      06:55:06.414  SET FEATURES [set transfer mode]

>   27 00 00 00 00 00 e0 02      06:55:06.394  READ NATIVE MAX ADDRESS EXT

>

> Error 22 occurred at disk power-on lifetime: 8 hours (0 days + 8 hours)

>   When the command that caused the error occurred, the device was active or idle.

>

>   After command completion occurred, registers were:

>   ER ST SC SN CL CH DH

>   -- -- -- -- -- -- --

>   40 51 00 ff ff ff 0f

>

>   Commands leading to the command that caused the error were:

>   CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

>   -- -- -- -- -- -- -- --  ----------------  --------------------

>   60 00 00 ff ff ff 4f 00      06:55:02.987  READ FPDMA QUEUED

>   27 00 00 00 00 00 e0 02      06:55:02.967  READ NATIVE MAX ADDRESS EXT

>   ec 00 00 00 00 00 a0 02      06:55:02.947  IDENTIFY DEVICE

>   ef 03 46 00 00 00 a0 02      06:55:02.927  SET FEATURES [set transfer mode]

>   27 00 00 00 00 00 e0 02      06:55:02.907  READ NATIVE MAX ADDRESS EXT

>

> Error 21 occurred at disk power-on lifetime: 8 hours (0 days + 8 hours)

>   When the command that caused the error occurred, the device was active or idle.

>

>   After command completion occurred, registers were:

>   ER ST SC SN CL CH DH

>   -- -- -- -- -- -- --

>   40 51 00 ff ff ff 0f

>

>   Commands leading to the command that caused the error were:

>   CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

>   -- -- -- -- -- -- -- --  ----------------  --------------------

>   60 00 00 ff ff ff 4f 00      06:54:59.692  READ FPDMA QUEUED

>   60 00 00 ff ff ff 4f 00      06:54:59.690  READ FPDMA QUEUED

>   27 00 00 00 00 00 e0 02      06:54:59.670  READ NATIVE MAX ADDRESS EXT

>   ec 00 00 00 00 00 a0 02      06:54:59.650  IDENTIFY DEVICE

>   ef 03 46 00 00 00 a0 02      06:54:59.630  SET FEATURES [set transfer mode]

>

> Error 20 occurred at disk power-on lifetime: 8 hours (0 days + 8 hours)

>   When the command that caused the error occurred, the device was active or idle.

>

>   After command completion occurred, registers were:

>   ER ST SC SN CL CH DH

>   -- -- -- -- -- -- --

>   40 51 00 ff ff ff 0f

>

>   Commands leading to the command that caused the error were:

>   CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

>   -- -- -- -- -- -- -- --  ----------------  --------------------

>   60 00 00 ff ff ff 4f 00      06:54:56.314  READ FPDMA QUEUED

>   60 00 00 ff ff ff 4f 00      06:54:56.313  READ FPDMA QUEUED

>   27 00 00 00 00 00 e0 02      06:54:56.293  READ NATIVE MAX ADDRESS EXT

>   ec 00 00 00 00 00 a0 02      06:54:56.273  IDENTIFY DEVICE

>   ef 03 46 00 00 00 a0 02      06:54:56.253  SET FEATURES [set transfer mode]

============================================================================

 

 

Of course running the Seagate tools test indicates no problems.  Had this tool not existed, I would not have known any of this.  Was ignorance bliss?  I will re-run this as well as try some other tests.  Luckily I do not have an immediate need for this drive yet.

 

Thanks and regards,  Peter

 

It appears to me as if the drive has already reallocated 15 sectors, and has 1 more pending re-allocation.  The "High-Fly Writes" are not too good either.

Hard to say what to do...  I'd run a few more cycles of preclear before I decide what to do... It sure might be a candidate for return, but they might not take it if their utility does not indicate it is over their failure "threshold"

 

In the mean-time, I'll see about a modification of the script to force it to continue reading past early "read" aborts of the drive.  In the unRAID server, the read failure would have resulting in the same data block being reconstructed from parity and then re-written to the drive.  That would have forced the sector reallocation.  Since this "read failure" happened on the disk post-read is troubling, as all reallocation should have taken place in the zeroing phase. (assuming the pre-read completed, that is)

At this point I'm guessing it did not complete either.

 

I'm learning how drives and SMART firmware react to this script as you are... glad to have helped, at least to identify a possible flaky drive.

 

Joe L.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.