Author Topic: preclear_disk.sh - a new utility to burn-in and pre-clear disks for quick add  (Read 163242 times)

Offline Joe L.

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 18774
A utility to "burn-in" a new disk, before adding it to your array has been requested several times.   Also requested is a process to "pre-clear" a hard disk before adding it to your array.  When a special "signature" is detected, the lengthy "clearing" step otherwise performed by unRAID is skipped.


This process is named:
preclear_disk.sh
It is attached to this post, look for the download link near the end of this first long post.
The current version is 1.15.  If you have an older version, please download the newest one.  Older versions prior to 1.14 did not have the ability to properly handle larger disks. (larger than 2.2TB) 
Versions prior to 1.15 did not work properly on 64 Bit unRAID.


If you are running unRAID 4.7 onward, in the absence of either a "-a" or "-A" option specified on the command line, preclear_disk.sh will use the alignment preference you specified in the unRAID settings screen as its default. 
(-a will force MBR-unaligned. -A will force MBR-4k-aligned )

 
(The link for the attachment is only visible after you log in as a user of this forum)
Download it to your PC, un-zip it there, then copy preclear_disk.sh to your flash drive to the same folder that currently has bzroot and bzimage.
On linux, that folder would be
/boot
From windows file-explorer,the folder would be at
\\tower\flash
if you just plug the flash drive into your windows PC, it will be the top level folder on the drive.

Once a disk has been successfully pre-cleared, you can "quickly" add it to your array by following these steps:
1. Stop your array.
2. Either do a screen print, or make note of the disk assignments on the "Devices" page. (just for your own records)
3. Assign the pre-cleared disk to the array in a new, previously un-assigned slot
or  Assign the pre-cleared disk to the array in place of a smaller or failing disk.
4. Start the array by pressing "Start"  (you may need to check the box under it to enable it when adding a drive)   Note: If running an older 4.X version of unRAID, DO NOT press any button labeled "restore" as it will immediately invalidate parity and start an entirely new parity calc process leaving your data unprotected until it is complete.   Always press "Start"

5. If the pre-cleared drive is being added to a new slot in the array, once the array is started you will be presented with a "Format" button.  Press the format button and within 30 seconds or so a few minutes the new pre-cleared disk will have been formatted and added to your array. (disks are bigger now, so they take longer to format)
5a. If the drive is replacing an existing drive, the contents of the existing drive will be reconstructed onto the new drive when you press "Start" and no "Format" button will be present.  This re-construction of contents will take many hours for a large disk (similar to the time it takes to do a full parity check).

Do NOT use anything other than the Format button on the main unRAID management console page to format a pre-cleared disk. The "Format" button will appear after you start the array after assigning a pre-cleared drive.   Do NOT format the drive on your own using a reiserfs linux command, or using a button in unMENU, if you do, the pre-clear will be invalid, and you may even fool unRAID into thinking it is cleared when it is not.  This would let you think you have parity protection when you do not.


How does it work?

The script:
1. gets a SMART report
2. pre-reads the entire disk
3. writes zeros to the entire disk
4. sets the special signature recognized by unRAID
5. verifies the signature
6. post-reads the entire disk
7. optionally repeats the process for additional cycles  (if you specified the "-c NN" option, where NN = a number from 1 to 20, default is to run 1 cycle)
8. gets a final SMART report
9. compares the SMART reports alerting you of differences.

All the time it is working, it presents a status display of its progress.

A very old 8 Gig Quantum Fireball I used in testing did 10 cycles in 4 hours.  It read the drive at about 25MB/s.   Most modern disks can be read at about 80MB/s.  A single cycle on a 2TB drive may take over 30 hours.

The process of "reading" the disk can be optionally skipped, but I recommend you do it with any new drive.  It allows the SMART firmware on the drive to identify any bad sectors and mark them for re-allocation.  The actual re-allocation takes place when the drive is cleared, and then a post-read lets it identify any remaining bad sectors.

If you don't care about finding the errors up front, while it is easy to RMA a drive, or if you don't have the time to wait, use the "-n" option to skip the pre/post read steps.  As I said, I recommend it always be done.

When pre/post reading the disk, I intersperse reading the beginning block, random blocks of data, the linear set of blocks of the entire drive, and the last block on the device.  I purposely keep the disk head moving a LOT more than if just reading each block in turn.  This is to identify hardware that is marginal.  If your disk or cables, or controller, or power supply cannot cope with constant activity... you probably want to know it before you assign the disk to a spot in your array.

If you wish to perform more than one cycle of read/clear/read, then use the "-c count" option where you can specify a number between 1 and 20.    Do note  that if it takes 30 hours to do one cycle on a large disk, 20 will take 600... a bit over 24 days... try one cycle before you kick it off for 20.

If you wish to test if a disk is already pre-cleared, you can use the "-t" option.  It runs in a few seconds and will let you know if the pre-cleared signature is present.   (The pre-clear signature varies based on the disk geometry and size... it is not as easy as you might initially think to generate it in a shell script.)

You will either need to kick this preclear_disk.sh script off from the system console, or from a telnet session.  You must leave the session open as it runs. (and it will typically run for many hours)

You are protected from an inadvertent error of giving the wrong drive by several sanity checks.  It will not pre-clear any drive assigned on your "Devices" page.  It will not process any drive that is currently mounted.  It must be an un-assigned drive physically connected to your array and otherwise accessible.

To invoke this script you simply list the name of a disk you wish cleared as an argument to the command, as in this example:
cd /boot
preclear_disk.sh /dev/hdk

(you will need to use the three letter device name for the disk being precleared.  On SATA drives, this will be sda, sdb, sdc, etc...  On IDE drives, or SATA drives in IDE emulation mode it will be hda, hdb, hdc, etc...   Use
preclear_disk.sh -l
to list the disks on your server available to be cleared.)

If on version 4.7 of unRAID and you are clearing an ADVANCED FORMAT drive that works best with a 4k alignment of its data, use the "-A" option as in
preclear_disk.sh -A /dev/sda
This will result in a starting sector of 64 for the resulting partition that will be used for unRAID's file system.  This is NOT backwards compatible with any version of unRAID prior to 4.7.

If you want a bit more help, simply type:
preclear_disk.sh -?   

Prior to doing anything to read or write the disk, you will be asked to confirm it is the disk you wish to clear.
You must answer "Yes" (Capital "Y" lower case "es")  Here is the confirmation screen.


Here is the status display as it is pre-reading a new 750 Gig Seagate drive in my array.  Looks like I've got about an hour to go in this step.


Here it is, 3 hours, 38 minutes into the process...
It is on step 2 of 10 of clearing the drive. (the step where the bulk of the drive is zeroed)
The script has cleared 48 GB of the 750 Gig.  It looks like it is writing to the disk at 70MB/s.  The status display here is updated every 10 seconds.


5 hours, 40 minutes into the process on this 750 Gig drive...  The clearing process has cleared 534GB of the 750.  It seems to have slowed a tiny bit as it works its way to the higher numbered cylinders on the disk.  It is now writing about 66MB/s.


Several hours have passed..., I got close... I had to re-start due to a hardware issue (bad power splitter)
In any case, the status display at that point looked like this:


Here is a screen-shot as it is in the post-read phase.   (This screen-shot is while clearing the much smaller disk I've been using for my tests)


Finally, here is a screen shot when the pre-clear is completed.


Just a few warnings...   
1. This utility is not supported by Tom at Lime-Technology, however he did supply me a code segment showing what was checked when looking for a pre-cleared signature on a disk.   
2. It is possible for you to clear a disk you did not intend...(if you have multiple disks that have data installed in your server, but they are not assigned to your array, and you want to clear one, but not the other, and you give the wrong device, it will be the wrong disk cleared...  If you do, sorry... I tried to protect you from yourself.  Do be careful. 

It is possible to crash your server if your hardware is not up to it.  I've been fighting a crash of that type in one of the slots on my array that turned out to be caused by an intermittent "Y" power splitter to that drive tray

1st Edit: attached version no longer dependent upon ncurses. (apparently, ncurses is not included in unRAID 4.4)

2nd Edit: I noticed that the 4.4final and 4.5beta releases of unRAID do not have a working "smartctl" command.  To get it working you must install a missing library file it uses. If you do not, this script will still clear the drive, but the feature where it compares SMART reports from before and after the clearing process will not work.
See this post for more details: http://lime-technology.com/forum/index.php?topic=2817.msg23548#msg23548

3rd Edit Dec 18, 2008, 4:44 PM EST: Modified the script to not abort the read phase if a "read-error" occurs. It will attempt to continue to read the remaining portion of the disk.

4th edit July 21, 2009. Version 0.9.3   
-- Worked around the bug in "bash" that caused the script to stop at 88% on some disks.
-- I also fixed a bug where I was improperly passing an argument to the "dd" commands intended to torture/exercise the disk by reading random blocks interspersed with the linear read of the entire disk.  The good news is that they now work, the bad news is, they now work, and they will slow down the pre and post-read times processing times slightly (because they are working)
-- Added new "mail" notification option as submitted by forum member jbuszkie.   You must have a working "mail" command to use this.

5th edit: August 31, 2009.  New version 0.9.6  (yes, 0.9.4 and 0.9.5 were internal versions as jbuszkie and I tested)
Version .9.4 - Enable SMART monitoring, just in case it is disabled on a drive.
Version .9.5 - Added disk temperature display and disk read speed display.
Version .9.6 - Enhanced the mail reporting to include some statistics (time, MB/s temp .. ect)
                 - Fixed a bug with using zero.txt and concurrent tests. Each test will use it's own file.
                 - Changed read block size to be bigger than 1,000,000 if smaller, to improve read speed

More instructions on how to install and run is in this post for those who might need more detail:
It also describes how to run multiple preclear_disk.sh processes at the same time on multiple disks.

Edit: Sept 25, 2009. Version .9.7
I made another improvement of the preclear script.  it now validates that the disk is all zeros in the post-read phase.  (Up until now, it just validated the pre-clear signature)

This added check will add 10-15% to the time needed to clear a drive.  It will be skipped if you use the -n option to not do any pre or post read.
it will also be skipped with a new -N option, this will still do the normal pre-read and post-read.

Edit: Oct 06, 2009. Version .9.8.
Added options to set the read/write block size and block count for use by users with servers with limited RAM and resources.  In general making the sizes smaller will result in a longer run time, but it will use less memory.

New options are:
       -w size  = write block size in bytes.  If not specified, default is 2048k.
       -r size  = read block size in bytes, default is one cylinder at a time ( heads * sectors * 512))
       -b count = number of blocks to read at a time.  If not specified, default is 200 blocks at a time.
These new -w, -r, and -b options are described in this post: http://lime-technology.com/forum/index.php?topic=2817.msg39972#msg39972

Edit: Jan 14, 2011.   Version .9.9
New options are:
      -A = force starting sector to be on sector 64 for 4k alignment.
      -l  = list disks and affiliated devices that are potential for clearing.  Makes it easier to identify correct device for new unRAID users.

The listing of potential disks looks like this:
root@Tower:/boot#  preclear_disk.sh -l
========================================
 Disks not assigned to the unRAID array
  (potential candidates for clearing)
========================================
     /dev/sdc = ata-Maxtor_6Y250P0_Y63KH45E
     /dev/hdj = ata-QUANTUM_FIREBALLlct15_08_611020017228


     
New feature is a vastly simplified output report. It will take the place of the current "diff" of the beginning and end smart report.

** Changed attributes in files: /tmp/smart_start_sda  /tmp/smart_finish_sda
                ATTRIBUTE   NEW_VAL OLD_VAL FAILURE_THRESHOLD STATUS      RAW_VALUE
      Raw_Read_Error_Rate =   111     119            6        ok          30377685
         Spin_Retry_Count =   100     100           97        near_thresh 0
        Unknown_Attribute =    99     100           99        FAILING_NOW 0
  Airflow_Temperature_Cel =    72      73           45        ok          28
   Hardware_ECC_Recovered =    57      29            0        ok          30377685

*** Failing SMART Attributes in /tmp/smart_finish_sda ***
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
184 Unknown_Attribute       0x0032   099   099   099    Old_age   Always   FAILING_NOW 0

 10 sectors were pending re-allocation before the start of the preclear.
 5 sectors are pending re-allocation at the end of the preclear,
    a change of -5 in the number of sectors pending re-allocation.
 1 sector had been re-allocated before the start of the preclear.
 3 sectors are re-allocated at the end of the preclear,
    a change of 2 in the number of sectors re-allocated.


It only will print lines for attributes that change, or are failing, or where the new-value is within 25 of the failure threshold.
It will not print lines where the initial value was 253, or 200, or 100, as those are frequently initialized values.

New feature, the individual SMART reports are named after their affiliated disk device.  They are also in /var/log/smart_start_sdX and /var/log/smart_finish_sdX so you can see them with your browser at:
   //tower/log/smart_start_sdX and //tower/smart_finish_sdX

The pre and post SMART reports are saved in both the syslog and in individual files in the /tmp directory.   Both will be erased when you reboot, so if you wish to view them, grab them before you reboot the server.

Edit: Jan 15, 2011   .9.9a  Improved output report based on user feedback.

Edit: Jan 16, 2011   .9.9b  Fixed report when using "-l" option to ignore new fields in dis.cfg when listing disks not in array.
                           Additional minor changes in output report.

Edit: Jan 16, 2011   .9.9c  Fixed report when using "-l" option when used on 5.X version of unRAID with different "ls" date format.
                           Additional improvements in output report.

Edit: Jan 17, 2011  .9.9d  Additional improvements in the output report to make it easier to read and understand.

Edit: Jan 23, 2011  1.1    Added -C 63 option to quickly convert a precleared disk from a sector 64 to 63 start
                                  Added -C 64 option to quickly convert a precleared disk from a sector 63 to 64 start
                                  Added display of command line arguments to confirmation screen.
                                  Added display of preclear_script.sh version to display screen.
                                  Added -W option to skip "preread" and start with "write" of zeros to the drive.
                                  Added -V option to skip the "preread" and "clear" and only perform the post-read verify.
                                  Some improvement to make sure the logged results in the syslog are more complete.

Edit: Jan 29, 2011  1.2   Fixed "-l" option to list drives even when there is no "ata-" entry in /dev/disk/by-id
                                 minor change to output report to eliminate report lines for smart values that are initializing.

Edit: Feb 1, 2011   1.3   - Added logic to read desired "default" Partition Type from /boot/config.
                                 - Added logic to save dated copies of the final preclear and SMART reports to a "preclear_reports" subdirectory on the flash drive.
                                 - Added "-R" option to suppress the copy of final reports to a "preclear_reports" directory on the flash drive. (they are always in /tmp until you reboot)

Edit: Feb 4, 2011    1.4   - Added "-D" option to suppress use of "-d ata" on smartctl commands
                                    Added "-d device_type" to allow use of alternate device_types as arguments to smartctl.
                                    Added "-z" option to zero the MBR and do nothing else. (remainder of the drive will not be cleared)

Edit: Feb 8, 2011    1.5   - Added Model/Serial number of disk to output report.
                                    Fixed default argument to smartctl when "-d" and "-D" options not given.
                                    Added intermediate report of sectors pending re-allocation.
Edit: Feb 8, 2011    1.6   - Fixed "-l" command and identification of assigned array disks when used on unRAID 5.0beta4 onward.

Edit: Feb 28, 2011  1.7   - Fixed "-l" command and identification of assigned array disks when used on unRAID 5.0beta5 onward.

Edit: March 13, 2011 1.8 - Changes to range of random blocks read to not read past last block on disk.   Added confirmation prompt to continue even if smartctl returns an error return status

Edit: March 19, 2011 1.9 - Fixed parse of default partition type and disk.cfg.

Edit:  additional version changes are described in the next post.  (this post got too big to add additional content)
« Last Edit: April 05, 2014, 01:02:35 PM by Joe L. »

Offline Joe L.

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 18774
I've split this thread in order to be able to continue with the revision history of preclear_disk.sh

You can find the balance of the original thread here: http://lime-technology.com/forum/index.php?topic=13054.0

Edit: May 1, 2011    1.10 version submitted by bjp999. (he added changes to allow integration with unMENU's MyMain)
Edit: May 18, 2011  1.11  Added changes submitted by bjp999 to allow the display of preclear status on the unMENU MyMain plugin.
                            Modified saved report files to be named after disk serial number rather than linux device since device names change
                            as hardware changes.  It will make historical use of the report files easier.   
                            A new "-S" option will revert to use the older report names with the linux device if you prefer.
Edit: Aug 19, 2011  1.12 - Added ability to create GPT partitions on disks > 2.2TB.
                           Fixed detection of 4k default setting in unRAID if no -A or -a given.
Edit: Aug 28, 2011  1.13 - Deployed correct fixed GPT version...  (1.12 was actually a 1.11 version variant... Sorry)
Edit: Nov 12, 2013  1.14  - Added text describing how -A and -a options are not used or needed on disks > 2.2TB.
                                          Added additional logic to detect assigned drives in the newest of 5.0 releases.
Edit: April 05, 2014  1.15 - fixed issue with inability to work properly on 64 Bit unRAID 6.X. 


The screen display in MyMain will show progress while pre-clearing a disk and a final summary when the clearing process is complete:


Yes, it looks like I've got one Seagate disk that is running hot.    Do not be surprised by the read and write times for the old IDE based 8Gig drive I use for testing.  It was never very fast.

To get the updated display, you'll need to update your version of unMENU and also download the newest version of the prclear_disk.sh script.

Joe L.

« Last Edit: April 05, 2014, 01:04:21 PM by Joe L. »

Offline Joe L.

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 18774
I just finished my first preclear runs, which values should I look, to see if a drive is safe to use?
For the most part you are looking for ANY individual parameter that is FAILING_NOW   (that would be bad)
And, you are looking for re-allocated sectors, or sectors pending re-allocation.    The "raw" counts on those columns are actual counts.

The "raw" column on many parameters is meaningful to only the manufacturer.  Do not worry if you see raw read errors, ALL drives have them, some report them, some do not.

If you see the "normalized" value changing in value and getting closer to the affiliated error threshold, be attentive to the rate of change.  Exception are those parameters where the failure threshold is only a few counts from the initial starting value.  (spin-up-retry failure is often set very close to the initial value, as only a few failures to spin up to speed indicates a drive that is pending a possible complete failure)

Many manufacturers have factory starting values of 253, and change to 100 or 200 once the drive has a few hours on it.  This is perfectly normal.

Any sectors pending re-allocation AFTER a preclear  are particularly bad.   Any un-readable sectors identified in the pre-read phase should have been re-allocated in the zeroing (writing) phase.  Any remaining after the preclear would have been identified in the post-read phase. (indicating what was written could not be read back)  An additional pre-clear should be performed, and if the numbers do not stabilize (additional non-readable sectors are found) then the disk should be returned as defective.

Of course, if the post-read did not find all zeros as expected, that is also bad.  Sometimes it can be traced to a bad power supply, or bad (or miss-configured) RAM in the server, but just as often a disk will not return what was written, but something else, and show no other error.  Those drives can cause hair-loss, since if you were to install them in your array you'll get random parity errors as the drive returns random bits now and then.  Be very cautious if a disk returns unexpected values after being cleared.

Joe L.
« Last Edit: May 14, 2013, 06:56:59 AM by Joe L. »