Pre-Clear Disks


Recommended Posts

Hello,

 

I am in the process of preclearing a disk.  I accidentally set the count to 20, thinking I really wanted to test the heck out of this drive before putting it into production.  Now that it has taken 7 days to reach 10 times, I am rethinking my strategy and am looking to cancel it after the 10th time.

 

Do I have to wait for the remaining 10 cycles for me to see the reports and place this new drive into service?  The reason I decided on an extended test was due to this report:

 

========================================================================1.13
== invoked as: ./preclear_disk -A /dev/sdd
==
== Disk /dev/sdd has NOT been successfully precleared
== Postread detected un-expected non-zero bytes on disk==
== Ran 1 cycle
==
== Using :Read block size = 8225280 Bytes
== Last Cycle's Pre Read Time  : 5:38:33 (147 MB/s)
== Last Cycle's Zeroing time   : 5:10:23 (161 MB/s)
== Last Cycle's Post Read Time : 14:10:44 (58 MB/s)
== Last Cycle's Total Time     : 25:00:42
==
== Total Elapsed Time 25:00:42
==
== Disk Start Temperature: 33C
==
== Current Disk Temperature: -->49<--C,
==
============================================================================
** Changed attributes in files: /tmp/smart_start_sdd  /tmp/smart_finish_sdd
                ATTRIBUTE   NEW_VAL OLD_VAL FAILURE_THRESHOLD STATUS      RAW_VALUE
      Raw_Read_Error_Rate =   114     100            6        ok          78450040
         Spin_Retry_Count =   100     100           97        near_thresh 0
         End-to-End_Error =   100     100           99        near_thresh 0
  Airflow_Temperature_Cel =    51      67           45        near_thresh 49
      Temperature_Celsius =    49      33            0        ok          49
No SMART attributes are FAILING_NOW

0 sectors were pending re-allocation before the start of the preclear.
0 sectors were pending re-allocation after pre-read in cycle 1 of 1.
0 sectors were pending re-allocation after zero of disk in cycle 1 of 1.
0 sectors are pending re-allocation at the end of the preclear,
    the number of sectors pending re-allocation did not change.
0 sectors had been re-allocated before the start of the preclear.
0 sectors are re-allocated at the end of the preclear,
    the number of sectors re-allocated did not change.
============================================================================

 

Should I have been worried about the Spin_Retry_Count, End-toEnd_Error indicators.  The AirFlow_Temperature_Cel indicator is just because its not mounted in the case yet.

 

Can I cancel this and still have all my testing remain so that It does not take my array down for a day as it "reformats" the drive.  This drive will be replacing a 2TB Parity Drive.

 

Thanks for all your help.

 

Sideband Samurai

Link to comment

Hello,

 

I am in the process of preclearing a disk.  I accidentally set the count to 20, thinking I really wanted to test the heck out of this drive before putting it into production.  Now that it has taken 7 days to reach 10 times, I am rethinking my strategy and am looking to cancel it after the 10th time.

 

Do I have to wait for the remaining 10 cycles for me to see the reports and place this new drive into service?  The reason I decided on an extended test was due to this report:

 

========================================================================1.13
== invoked as: ./preclear_disk -A /dev/sdd
==
== Disk /dev/sdd has NOT been successfully precleared
== Postread detected un-expected non-zero bytes on disk==
== Ran 1 cycle
==
== Using :Read block size = 8225280 Bytes
== Last Cycle's Pre Read Time  : 5:38:33 (147 MB/s)
== Last Cycle's Zeroing time   : 5:10:23 (161 MB/s)
== Last Cycle's Post Read Time : 14:10:44 (58 MB/s)
== Last Cycle's Total Time     : 25:00:42
==
== Total Elapsed Time 25:00:42
==
== Disk Start Temperature: 33C
==
== Current Disk Temperature: -->49<--C,
==
============================================================================
** Changed attributes in files: /tmp/smart_start_sdd  /tmp/smart_finish_sdd
                ATTRIBUTE   NEW_VAL OLD_VAL FAILURE_THRESHOLD STATUS      RAW_VALUE
      Raw_Read_Error_Rate =   114     100            6        ok          78450040
         Spin_Retry_Count =   100     100           97        near_thresh 0
         End-to-End_Error =   100     100           99        near_thresh 0
  Airflow_Temperature_Cel =    51      67           45        near_thresh 49
      Temperature_Celsius =    49      33            0        ok          49
No SMART attributes are FAILING_NOW

0 sectors were pending re-allocation before the start of the preclear.
0 sectors were pending re-allocation after pre-read in cycle 1 of 1.
0 sectors were pending re-allocation after zero of disk in cycle 1 of 1.
0 sectors are pending re-allocation at the end of the preclear,
    the number of sectors pending re-allocation did not change.
0 sectors had been re-allocated before the start of the preclear.
0 sectors are re-allocated at the end of the preclear,
    the number of sectors re-allocated did not change.
============================================================================

 

Should I have been worried about the Spin_Retry_Count, End-toEnd_Error indicators.  The AirFlow_Temperature_Cel indicator is just because its not mounted in the case yet.

 

Can I cancel this and still have all my testing remain so that It does not take my array down for a day as it "reformats" the drive.  This drive will be replacing a 2TB Parity Drive.

 

Thanks for all your help.

 

Sideband Samurai

Cancel any time in the post-read phase.  It will still be marked as pre-cleared.

 

you are toasting that drive... 45C is a bit high for my taste.

 

Joe L.

Link to comment

I have the following smart report for review.  is this drive ok to use in production?

 

root@DavyJones:/boot/preclear_reports# smartctl --all /dev/sdd
smartctl 5.40 2010-10-16 r3189 [i486-slackware-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Device Model:     ST3000DM001-1CH166
Serial Number:    W1F1LWRA
Firmware Version: CC24
User Capacity:    3,000,592,982,016 bytes
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 4
Local Time is:    Wed Sep 25 20:36:51 2013 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                 (  89) seconds.
Offline data collection
capabilities:                    (0x73) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 255) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x3085) SCT Status supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   118   100   006    Pre-fail  Always       -       199533168
  3 Spin_Up_Time            0x0003   099   099   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       2
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   060   060   030    Pre-fail  Always       -       1071071
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       217
10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       2
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   057   047   045    Old_age   Always       -       43 (Min/Max 26/53)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       2
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       8
194 Temperature_Celsius     0x0022   043   053   000    Old_age   Always       -       43 (0 26 0 0)
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       206781200466136
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       64465865376
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       136363348988

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

 

Thanks for your advice and help.

 

Sincerely,

 

Sideband Samurai

Link to comment

Well it looks like I have answered my own question.

 

The values in my report are Normal ... with the exception of the temperature which was abnormally high because of how I had the hard drive installed.

 

Seagate provides an excellent explenation of what Raw_Read_Error_Rate is and what is expected.  This RAW_VALUE is normally high for this brand of drives.

 

Here is a link to the article on Seagates Site:

 

http://forums.seagate.com/t5/Desktop-HDD-Desktop-SSHD/Seagate-s-Seek-Error-Rate-Raw-Read-Error-Rate-and-Hardware-ECC/td-p/122382

 

So the drive is ready to become my parity drive.  So I will install it tonight. 

 

Sincerely,

 

Sideband Samurai

Link to comment

In order to keep the drives temp under control, I purchased this at Frys for $17.00

 

Its not a permanent solution but it allows me to pre-clear disks with out taking the system covers off.  Seems to work well so far.

 

fei6.jpg

 

and here is the drive in action:

 

kwcr.jpg

 

================================================================== 1.13
=                unRAID server Pre-Clear disk /dev/sde
=               cycle 1 of 5, partition start on sector 1
= Disk Pre-Clear-Read completed                                 DONE
= Step 1 of 10 - Copying zeros to first 2048k bytes             DONE
= Step 2 of 10 - Copying zeros to remainder of disk to clear it
=  **** This will take a while... you can follow progress below:
=
=
=
=
=
=
=
=
Disk Temperature: 43C, Elapsed Time:  10:41:29
320479+4 records in
320479+4 records out
672098189312 bytes (672 GB) copied, 3422.8 s, 196 MB/s
Wrote  672,098,189,312  bytes out of  3,000,592,982,016  bytes (22% Done)

 

As you can see the drive temp is down to around 43C instead of 50C so that is much cooler.  This drive has been running for 10 hours.

 

--Sideband Samurai

Link to comment

You're correct r.e. the SMART data -- all looks fine.

 

The hot-swap cage is doing okay with the temps too ... at least relative to what you were seeing before.  I prefer to keep them under 40, but 43 isn't bad -- the thermal spec for most modern drives is 60, although I certainly don't like to get anywhere near that.

 

Link to comment

GaryCase,

 

Thanks for the confirmation and assurance.

 

I am currently pre-clearing my second 3 TB drive.  I am testing it for 5 cycles this time.  Its funny, the bios shows the drive at a 800GB instead of a 3TB.  Unraid shows the drive as 3TB though.  I am not worried about it as this system was cobbled together to test the WAF (Wife Acceptance Factor).  It went over very well, now I am in the process of getting new hardware and upgrading.  Running short on space now, so that's why I am installing the 3TB drives.

 

Sideband Samurai

Link to comment

Since this falls under the same subject, I wanted to ask about pre-clearing errors:

 

I have pre-cleared my second 3TB drive, and received the following error:

 

==  ST3000DM001-1CH166    W1F12655
== Disk /dev/sde has NOT been precleared successfully
== skip=151000 count=200 bs=8225280 returned 32768 instead of 00000 skip=161000 count=200 bs=8225280 returned 32768 instead of 00000 skip=171800 count=200 bs=8225280 returned 32768 instead of 00000 skip=179400 count=200 bs=8225280 returned 32768 instead of 00000 skip=183000 count=200 bs=8225280 returned 32768 instead of 00000 skip=187000 count=200 bs=8225280 returned 32768 instead of 00000 skip=193000 count=200 bs=8225280 returned 32768 instead of 00000 skip=220800 count=200 bs=8225280 returned 32768 instead of 00000 skip=289600 count=200 bs=8225280 returned 32768 instead of 00000 skip=326200 count=200 bs=8225280 returned 32768 instead of 00000

 

Is this a problem?  Note the pre-clear script reports /dev/sde has NOT been precleared successfully.

 

I saw this error before with the previous 3TB drive which is now the Parity drive.  I just want to make sure everything is ok before proceeding.

 

I am going to reboot the server to install the 5.0 release, then I will perform a parity check which as of the 30th of last month reported no errors with the new 3TB drive as a parity drive in place.

 

for further information, here is the full pre-clear report:

 

================================================================== 1.13
=                unRAID server Pre-Clear disk /dev/sde
=               cycle 2 of 2, partition start on sector 1
=
= Step 1 of 10 - Copying zeros to first 2048k bytes             DONE
= Step 2 of 10 - Copying zeros to remainder of disk to clear it DONE
= Step 3 of 10 - Disk is now cleared from MBR onward.           DONE
= Step 4 of 10 - Clearing MBR bytes for partition 2,3 & 4       DONE
= Step 5 of 10 - Clearing MBR code area                         DONE
= Step 6 of 10 - Setting MBR signature bytes                    DONE
= Step 7 of 10 - Setting partition 1 to precleared state        DONE
= Step 8 of 10 - Notifying kernel we changed the partitioning   DONE
= Step 9 of 10 - Creating the /dev/disk/by* entries             DONE
= Step 10 of 10 - Verifying if the MBR is cleared.              DONE
= Disk Post-Clear-Read completed                                DONE
Disk Temperature: 39C, Elapsed Time:  48:46:46
========================================================================1.13
==  ST3000DM001-1CH166    W1F12655
== Disk /dev/sde has NOT been precleared successfully
== skip=151000 count=200 bs=8225280 returned 32768 instead of 00000 skip=161000 count=200 bs=8225280 returned 32768 instead of 00000 skip=171800 count=200 bs=8225280 returned 32768 instead of 00000 skip=179400 count=200 bs=8225280 returned 32768 instead of 00000 skip=183000 count=200 bs=8225280 returned 32768 instead of 00000 skip=187000 count=200 bs=8225280 returned 32768 instead of 00000 skip=193000 count=200 bs=8225280 returned 32768 instead of 00000 skip=220800 count=200 bs=8225280 returned 32768 instead of 00000 skip=289600 count=200 bs=8225280 returned 32768 instead of 00000 skip=326200 count=200 bs=8225280 returned 32768 instead of 00000
============================================================================
** Changed attributes in files: /tmp/smart_start_sde  /tmp/smart_finish_sde
                ATTRIBUTE   NEW_VAL OLD_VAL FAILURE_THRESHOLD STATUS      RAW_VALUE
         Spin_Retry_Count =   100     100           97        near_thresh 0
         End-to-End_Error =   100     100           99        near_thresh 0
          High_Fly_Writes =    95      96            0        ok          5
  Airflow_Temperature_Cel =    61      63           45        near_thresh 39
      Temperature_Celsius =    39      37            0        ok          39
No SMART attributes are FAILING_NOW

0 sectors were pending re-allocation before the start of the preclear.
0 sectors were pending re-allocation after pre-read in cycle 1 of 2.
0 sectors were pending re-allocation after zero of disk in cycle 1 of 2.
0 sectors were pending re-allocation after post-read in cycle 1 of 2.
0 sectors were pending re-allocation after zero of disk in cycle 2 of 2.
0 sectors are pending re-allocation at the end of the preclear,
    the number of sectors pending re-allocation did not change.
0 sectors had been re-allocated before the start of the preclear.
0 sectors are re-allocated at the end of the preclear,
    the number of sectors re-allocated did not change.

 

Also the pre-clear test:

 

root@DavyJones:/boot# preclear_disk -t /dev/sde
Pre-Clear unRAID Disk /dev/sde
################################################################## 1.13
Device Model:     ST3000DM001-1CH166
Serial Number:    W1F12655
Firmware Version: CC43
User Capacity:    3,000,592,982,016 bytes

Disk /dev/sde: 3000.6 GB, 3000592982016 bytes
255 heads, 63 sectors/track, 364801 cylinders, total 5860533168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x00000000

   Device Boot      Start         End      Blocks   Id  System
/dev/sde1               1  4294967295  2147483647+   0  Empty
Partition 1 does not end on cylinder boundary.
Partition 1 does not start on physical sector boundary.
########################################################################
========================================================================1.13
==
== DISK /dev/sde IS PRECLEARED with a GPT Protective MBR
==
============================================================================
root@DavyJones:/boot#

 

Your advice is greatly appreciated

 

Sincerely,

 

Sideband Samurai

Link to comment

Since it wasn't precleared you will not be able to add it to the array without unRAID itself trying to clear it.  As to why it didn't clear I can't really say. 

 

I can say when I had that problem it was suggested to me to update the firmware on my M1015 HBA and also the firmware on my SAS Expander (Intel RES2SV240).  I ended up swaping the expander with a different one and preclearing a different model of drive and it worked so not sure which change affected it for me.

Link to comment

Darn!  I am using an old hp xw4300 workstation.  There is no new bios update that I can see (I am running the current version). 

 

My BIOS is reporting that the attached hard drive is 800GB not 3TB.  The other 2TB drives actually show up as 2TB drives.  So I am thinking that the 3TB hard drives though appear to be working, may actually not be absolutely happy.

 

I have a PCI to sata HBA I can use to support the 3TB drives for now.  At least I hope it supports 3TB drives.

 

Its strange though, Unraid shows these as 3TB drives, and the drive I just put into production, which is a 3TB drive does not appear to be having any issues.  I have checked the Parity twice with 0 errors.

 

Should I be worried?

 

If so, my plan would be to add the PCI to SATA HBA, attach the new drive tray to the HBA, then re-run a pre-clear to see if it clears ok.  If it does then I can remove the existing parity, and re-add the re-tested 3TB drive that is on the PCI to SATA HBA, as the NEW parity drive, rebuild the parity, re-check the array to make sure I still get no errors.

 

If everything goes well, do the same thing to the now to the existing 3TB parity and make it a data drive.

 

What does everyone think?  Any problems with this?  I am not totally worried about performance right now.  As I am in the process of building a whole new system these drives will go into.  Just waiting on extra funds for the important parts  (CPU and RAM).  I just need more space so I can put my wife's Horror collection on the media server (some 60 DVDs).

 

-- Sideband Samurai

Link to comment

As long as UnRAID is seeing the drives as 3TB you're okay.  That simply means your BIOS doesn't support drives > 2TB ... but your controller does, and Linux uses its own driver without relying on BIOS disk routines.

 

Note that if it works for 3TB, it will also work with 4TB, 5TB, 6TB, .... as the larger drives become available.

 

Link to comment

 

but why did the script return these errors, and that the disk had NOT been precleared successfully as seen below?

 

ST3000DM001-1CH166    W1F12655

== Disk /dev/sde has NOT been precleared successfully

== skip=151000 count=200 bs=8225280 returned 32768 instead of 00000 skip=161000 count=200 bs=8225280 returned 32768 instead of 00000 skip=171800 count=200 bs=8225280 returned 32768 instead of 00000 skip=179400 count=200 bs=8225280 returned 32768 instead of 00000 skip=183000 count=200 bs=8225280 returned 32768 instead of 00000 skip=187000 count=200 bs=8225280 returned 32768 instead of 00000 skip=193000 count=200 bs=8225280 returned 32768 instead of 00000 skip=220800 count=200 bs=8225280 returned 32768 instead of 00000 skip=289600 count=200 bs=8225280 returned 32768 instead of 00000 skip=326200 count=200 bs=8225280 returned 32768 instead of 00000

 

Link to comment

I don't know -- hopefully Joe L will comment on that for you.

 

It may have to do with the lack of BIOS support -- I don't know if Joe's script uses the BIOS disk routines or not.  But the SMART report looks just fine; and if UnRAID is using the drive okay (as you indicated it was) then I simply wouldn't worry about it.

 

Link to comment

As I said I also changed the model of HD that I was clearing.  I went from WD to Hitachi.  The WD wouldn't clear the Hitachi did.  As mentioned I swapped SAS Expanders as well.  But if it was the hard drive change that let me preclear then that suggests the WD had buggy firmware and that is a possiblity for YOUR drive as well.  You might see if there is a firmware update for the drive too.

 

I suggest that because I have a 3TB WD Green drive that alternates between zero and 65535 pending sectors depending on the preclear cycle.  One cycle ends with zero the next with 65535 then back to zero.  I've run at least 6 cycles and the last 4 or so were in that pattern.  Joe L suggested to me that it was likely because of buggy firmware so that is why I would look into it in your case as well. 

 

Basically look to update your bios, HDD controller firmware and drive firmware.  If none of that is possible or makes a difference then try clearing it on a different PC.  When it happened to me I just removed it and took it to my preclear station like most of my other drives have been precleared on.  You can use a free version of unRAID on another box to do your preclears you don't have to do it from a registered flash.

 

As a last ditch effort you can try a standard Windows long format first before you preclear.  I had a WD 3TB Red that wouldn't preclear on any PC and as a last ditch effort to make it work I formatted it in Windows.  For me it was failing on the write step in the preclear process so I thought I would try a Windows long format to see if it would work.  When the Windows format worked I then tried the preclear again and it worked as well.  Basically since Windows ignores errors that Linux does not I figure the WD Red just needed to be kicked in the a$$ to get it working.

 

Last suggestions I've got for you.

Link to comment

I would not use any disk the did not pass pre-clear. Pre-clear writes the signature before the post-read. It does not revoke the signature if the post-read fails. This is why the pre-clear can fail but the test passes. The test is only looking for the signature. This disk has failed pre-clear and should not be used. It is entirely possible that HW problems can be causing this issue.

Link to comment

I would not use any disk the did not pass pre-clear.

 

I fundamentally agree -- but in this case that MAY be because it's using a BIOS read routine that does not understand the disk size, so is reading from a different location than actually expected ... whereas bypassing the BIOS routines (as UnRAID does) works fine.

 

Not sure that's the case -- that's why I noted it'd be nice if Joe would comment on whether or not the script using BIOS access.    But it seems likely that all is actually just fine, since the disk is working perfectly in the system.

 

Link to comment

All very good comments, thank you for the advice.

 

1.  I did update the bios on the XW4300, it was on 1.06 and was updated to 1.12, but this did not fix the issue because the system still reports 800MB for a 3TB drive.

 

2.  I started using a SiL3114 host adapter by Sabrent.  Its a PCI to SATA HBA.  when I restarted the pre-clear, it showed the same exact numbers as the mother board ports.  Except it reported that a 3TB drive was attached.  I did not allow the pre-clear to complete.

 

3.  I will pre-clear this disk on a different system.  If it works, then I will replace the existing parity drive with this 3TB disk. 

 

4.  I will check for firmware updates for my particular drive model and see if that fixes my problem.

 

5.  I already have a 3TB drive that did not pass pre-clear running as a Parity.  I put it in production with out knowing that I should not have.  I have run 3 parity checks with no errors.  I will be replacing this drive if number 3 is successful.

 

I too would be interested in what JoeL has to say about the issue.  Maybe the signature should be wiped out if the pre-clear is unsuccessful to prevent its installation.  Although all that would do is just cause the array to not be mounted for 24 hours.  but at least when you rant the -t you can see that its not ready to be put in production.

 

Thanks again!

 

Sideband Samurai

Link to comment

Well here is an update.

 

I performed option number 4 (Check for firmware update).  Seagate reported none available and one available under certificate.  I downloaded the one under certificate as it was dated July of this year, and I have had the drives in storage since November 2012.

 

The Firmware version shipped was CC43.  The version I installed was CC29.  I think this is a downgrade of the firmware, and I have no way to restore the firmware with shipped version as seagate does not provide the "updated" firmware.

 

Currently I have started pre-clearing the CC29 hard drive on a different system.  Its BIOS is also reporting 800 GB instead of 3TB.  It seams I am just unlucky in finding a system that will show 3TB in the bios.  From previous posts, I don't think this really the actual issue.

 

It will take 24 hrs to re-preclear the drive.

 

-- Sideband Samurai

Link to comment

 

but why did the script return these errors, and that the disk had NOT been precleared successfully as seen below?

 

ST3000DM001-1CH166    W1F12655

== Disk /dev/sde has NOT been precleared successfully

== skip=151000 count=200 bs=8225280 returned 32768 instead of 00000 skip=161000 count=200 bs=8225280 returned 32768 instead of 00000 skip=171800 count=200 bs=8225280 returned 32768 instead of 00000 skip=179400 count=200 bs=8225280 returned 32768 instead of 00000 skip=183000 count=200 bs=8225280 returned 32768 instead of 00000 skip=187000 count=200 bs=8225280 returned 32768 instead of 00000 skip=193000 count=200 bs=8225280 returned 32768 instead of 00000 skip=220800 count=200 bs=8225280 returned 32768 instead of 00000 skip=289600 count=200 bs=8225280 returned 32768 instead of 00000 skip=326200 count=200 bs=8225280 returned 32768 instead of 00000

They indicate a problem... possibly with the disk electronics, but more likely with the disk controller or system RAM.

 

The script  wrote zeros.  In those locations indicated, it read back 32768.    If you use that disk in your array you will likely pull out your hair with constant parity errors as the values read back from the disk are not those written.

 

Since the returned value always seems to be a power of 2, I suspect a marginal "bit" in the electronics.  (or system RAM)

 

I'd start with a memory test, through several cycles, to ensure it is not RAM, followed by a systematic analysis of the remaining hardware.  Based on the report that you had the same issue with the prior drive, it is likely either the disk controller, or the system RAM.

Link to comment

Joe L.

 

Excellent analysis.  The system ONLY has 512MB.  I had forgotten that it was that short of ram.  I have NOT had a problem with it for the Year or so its been in production.  It has been really reliable all through the RC phase of the 5.0 release.

 

I am currently pre-clearing another drive in another system.  With firmware cc29, it seems to be performing much better than with the CC43 firmware.  One main difference is that this drive is pre-clearing on a different system with lots (8GB) of ram.  Its 11 hrs into the pre-clear process and its on the last step, so we will see soon enough.  My drive is a ST3000DM001-1CH1 3TB hard drive.

 

As for parity checks, they all have been successful with zero errors.  I have ran a total of 3.  For now, I have the server shutdown, while I complete the preclear on this new drive.

 

Joe, I have previous posts about weather or not the -A is required in the pre-clear script.  I was concerned that with out the -A option, it tells me that the partition will not be aligned.  Can you take a look at my previous posts on this thread and give me your opinion?

 

Thanks for your time.

 

Sincerely,

 

Sideband Samurai

Link to comment

Joe L.

 

Thanks for your suggestion, I have definitely confirmed that I have a RAM issue.  I just started retesting memory with just one stick installed and have found the bad module right off the bat.

 

I will swap the other module in just to make sure they both are not defective and get rid of the bad module.  This really makes me feel lots better and also explains the kernel panics I started seeing recently.

 

I will still continue to post on this thread until I have the system up on all my 3TB drives.

 

-- Sideband Samurai

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.