SidebandSamurai Posted September 26, 2013 Share Posted September 26, 2013 Hello, I am in the process of preclearing a disk. I accidentally set the count to 20, thinking I really wanted to test the heck out of this drive before putting it into production. Now that it has taken 7 days to reach 10 times, I am rethinking my strategy and am looking to cancel it after the 10th time. Do I have to wait for the remaining 10 cycles for me to see the reports and place this new drive into service? The reason I decided on an extended test was due to this report: ========================================================================1.13 == invoked as: ./preclear_disk -A /dev/sdd == == Disk /dev/sdd has NOT been successfully precleared == Postread detected un-expected non-zero bytes on disk== == Ran 1 cycle == == Using :Read block size = 8225280 Bytes == Last Cycle's Pre Read Time : 5:38:33 (147 MB/s) == Last Cycle's Zeroing time : 5:10:23 (161 MB/s) == Last Cycle's Post Read Time : 14:10:44 (58 MB/s) == Last Cycle's Total Time : 25:00:42 == == Total Elapsed Time 25:00:42 == == Disk Start Temperature: 33C == == Current Disk Temperature: -->49<--C, == ============================================================================ ** Changed attributes in files: /tmp/smart_start_sdd /tmp/smart_finish_sdd ATTRIBUTE NEW_VAL OLD_VAL FAILURE_THRESHOLD STATUS RAW_VALUE Raw_Read_Error_Rate = 114 100 6 ok 78450040 Spin_Retry_Count = 100 100 97 near_thresh 0 End-to-End_Error = 100 100 99 near_thresh 0 Airflow_Temperature_Cel = 51 67 45 near_thresh 49 Temperature_Celsius = 49 33 0 ok 49 No SMART attributes are FAILING_NOW 0 sectors were pending re-allocation before the start of the preclear. 0 sectors were pending re-allocation after pre-read in cycle 1 of 1. 0 sectors were pending re-allocation after zero of disk in cycle 1 of 1. 0 sectors are pending re-allocation at the end of the preclear, the number of sectors pending re-allocation did not change. 0 sectors had been re-allocated before the start of the preclear. 0 sectors are re-allocated at the end of the preclear, the number of sectors re-allocated did not change. ============================================================================ Should I have been worried about the Spin_Retry_Count, End-toEnd_Error indicators. The AirFlow_Temperature_Cel indicator is just because its not mounted in the case yet. Can I cancel this and still have all my testing remain so that It does not take my array down for a day as it "reformats" the drive. This drive will be replacing a 2TB Parity Drive. Thanks for all your help. Sideband Samurai Quote Link to comment
Joe L. Posted September 26, 2013 Share Posted September 26, 2013 Hello, I am in the process of preclearing a disk. I accidentally set the count to 20, thinking I really wanted to test the heck out of this drive before putting it into production. Now that it has taken 7 days to reach 10 times, I am rethinking my strategy and am looking to cancel it after the 10th time. Do I have to wait for the remaining 10 cycles for me to see the reports and place this new drive into service? The reason I decided on an extended test was due to this report: ========================================================================1.13 == invoked as: ./preclear_disk -A /dev/sdd == == Disk /dev/sdd has NOT been successfully precleared == Postread detected un-expected non-zero bytes on disk== == Ran 1 cycle == == Using :Read block size = 8225280 Bytes == Last Cycle's Pre Read Time : 5:38:33 (147 MB/s) == Last Cycle's Zeroing time : 5:10:23 (161 MB/s) == Last Cycle's Post Read Time : 14:10:44 (58 MB/s) == Last Cycle's Total Time : 25:00:42 == == Total Elapsed Time 25:00:42 == == Disk Start Temperature: 33C == == Current Disk Temperature: -->49<--C, == ============================================================================ ** Changed attributes in files: /tmp/smart_start_sdd /tmp/smart_finish_sdd ATTRIBUTE NEW_VAL OLD_VAL FAILURE_THRESHOLD STATUS RAW_VALUE Raw_Read_Error_Rate = 114 100 6 ok 78450040 Spin_Retry_Count = 100 100 97 near_thresh 0 End-to-End_Error = 100 100 99 near_thresh 0 Airflow_Temperature_Cel = 51 67 45 near_thresh 49 Temperature_Celsius = 49 33 0 ok 49 No SMART attributes are FAILING_NOW 0 sectors were pending re-allocation before the start of the preclear. 0 sectors were pending re-allocation after pre-read in cycle 1 of 1. 0 sectors were pending re-allocation after zero of disk in cycle 1 of 1. 0 sectors are pending re-allocation at the end of the preclear, the number of sectors pending re-allocation did not change. 0 sectors had been re-allocated before the start of the preclear. 0 sectors are re-allocated at the end of the preclear, the number of sectors re-allocated did not change. ============================================================================ Should I have been worried about the Spin_Retry_Count, End-toEnd_Error indicators. The AirFlow_Temperature_Cel indicator is just because its not mounted in the case yet. Can I cancel this and still have all my testing remain so that It does not take my array down for a day as it "reformats" the drive. This drive will be replacing a 2TB Parity Drive. Thanks for all your help. Sideband Samurai Cancel any time in the post-read phase. It will still be marked as pre-cleared. you are toasting that drive... 45C is a bit high for my taste. Joe L. Quote Link to comment
SidebandSamurai Posted September 26, 2013 Author Share Posted September 26, 2013 JoeL, I canceled it but after looking at my original OP, the report indicated that it had NOT been successfully pre-cleared. Have you seen anything like this? Also I did not get any report from the last 10 cycles so I don't know the health of the drive. Thanks for your fast response, Sideband Samurai Quote Link to comment
SidebandSamurai Posted September 26, 2013 Author Share Posted September 26, 2013 I have the following smart report for review. is this drive ok to use in production? root@DavyJones:/boot/preclear_reports# smartctl --all /dev/sdd smartctl 5.40 2010-10-16 r3189 [i486-slackware-linux-gnu] (local build) Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net === START OF INFORMATION SECTION === Device Model: ST3000DM001-1CH166 Serial Number: W1F1LWRA Firmware Version: CC24 User Capacity: 3,000,592,982,016 bytes Device is: Not in smartctl database [for details use: -P showall] ATA Version is: 8 ATA Standard is: ATA-8-ACS revision 4 Local Time is: Wed Sep 25 20:36:51 2013 PDT SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 89) seconds. Offline data collection capabilities: (0x73) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. No Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 255) minutes. Conveyance self-test routine recommended polling time: ( 2) minutes. SCT capabilities: (0x3085) SCT Status supported. SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 118 100 006 Pre-fail Always - 199533168 3 Spin_Up_Time 0x0003 099 099 000 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 2 5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0 7 Seek_Error_Rate 0x000f 060 060 030 Pre-fail Always - 1071071 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 217 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 2 183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0 184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0 189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0 190 Airflow_Temperature_Cel 0x0022 057 047 045 Old_age Always - 43 (Min/Max 26/53) 191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 2 193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 8 194 Temperature_Celsius 0x0022 043 053 000 Old_age Always - 43 (0 26 0 0) 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0 240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 206781200466136 241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 64465865376 242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 136363348988 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 No self-tests have been logged. [To run self-tests, use: smartctl -t] SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. Thanks for your advice and help. Sincerely, Sideband Samurai Quote Link to comment
SidebandSamurai Posted September 27, 2013 Author Share Posted September 27, 2013 *bump* Does anybody have an answer to my questions? Thank you. Sideband Samurai Quote Link to comment
SidebandSamurai Posted September 27, 2013 Author Share Posted September 27, 2013 Well it looks like I have answered my own question. The values in my report are Normal ... with the exception of the temperature which was abnormally high because of how I had the hard drive installed. Seagate provides an excellent explenation of what Raw_Read_Error_Rate is and what is expected. This RAW_VALUE is normally high for this brand of drives. Here is a link to the article on Seagates Site: http://forums.seagate.com/t5/Desktop-HDD-Desktop-SSHD/Seagate-s-Seek-Error-Rate-Raw-Read-Error-Rate-and-Hardware-ECC/td-p/122382 So the drive is ready to become my parity drive. So I will install it tonight. Sincerely, Sideband Samurai Quote Link to comment
SidebandSamurai Posted September 27, 2013 Author Share Posted September 27, 2013 In order to keep the drives temp under control, I purchased this at Frys for $17.00 Its not a permanent solution but it allows me to pre-clear disks with out taking the system covers off. Seems to work well so far. and here is the drive in action: ================================================================== 1.13 = unRAID server Pre-Clear disk /dev/sde = cycle 1 of 5, partition start on sector 1 = Disk Pre-Clear-Read completed DONE = Step 1 of 10 - Copying zeros to first 2048k bytes DONE = Step 2 of 10 - Copying zeros to remainder of disk to clear it = **** This will take a while... you can follow progress below: = = = = = = = = Disk Temperature: 43C, Elapsed Time: 10:41:29 320479+4 records in 320479+4 records out 672098189312 bytes (672 GB) copied, 3422.8 s, 196 MB/s Wrote 672,098,189,312 bytes out of 3,000,592,982,016 bytes (22% Done) As you can see the drive temp is down to around 43C instead of 50C so that is much cooler. This drive has been running for 10 hours. --Sideband Samurai Quote Link to comment
garycase Posted September 27, 2013 Share Posted September 27, 2013 You're correct r.e. the SMART data -- all looks fine. The hot-swap cage is doing okay with the temps too ... at least relative to what you were seeing before. I prefer to keep them under 40, but 43 isn't bad -- the thermal spec for most modern drives is 60, although I certainly don't like to get anywhere near that. Quote Link to comment
SidebandSamurai Posted September 27, 2013 Author Share Posted September 27, 2013 GaryCase, Thanks for the confirmation and assurance. I am currently pre-clearing my second 3 TB drive. I am testing it for 5 cycles this time. Its funny, the bios shows the drive at a 800GB instead of a 3TB. Unraid shows the drive as 3TB though. I am not worried about it as this system was cobbled together to test the WAF (Wife Acceptance Factor). It went over very well, now I am in the process of getting new hardware and upgrading. Running short on space now, so that's why I am installing the 3TB drives. Sideband Samurai Quote Link to comment
SidebandSamurai Posted October 2, 2013 Author Share Posted October 2, 2013 Since this falls under the same subject, I wanted to ask about pre-clearing errors: I have pre-cleared my second 3TB drive, and received the following error: == ST3000DM001-1CH166 W1F12655 == Disk /dev/sde has NOT been precleared successfully == skip=151000 count=200 bs=8225280 returned 32768 instead of 00000 skip=161000 count=200 bs=8225280 returned 32768 instead of 00000 skip=171800 count=200 bs=8225280 returned 32768 instead of 00000 skip=179400 count=200 bs=8225280 returned 32768 instead of 00000 skip=183000 count=200 bs=8225280 returned 32768 instead of 00000 skip=187000 count=200 bs=8225280 returned 32768 instead of 00000 skip=193000 count=200 bs=8225280 returned 32768 instead of 00000 skip=220800 count=200 bs=8225280 returned 32768 instead of 00000 skip=289600 count=200 bs=8225280 returned 32768 instead of 00000 skip=326200 count=200 bs=8225280 returned 32768 instead of 00000 Is this a problem? Note the pre-clear script reports /dev/sde has NOT been precleared successfully. I saw this error before with the previous 3TB drive which is now the Parity drive. I just want to make sure everything is ok before proceeding. I am going to reboot the server to install the 5.0 release, then I will perform a parity check which as of the 30th of last month reported no errors with the new 3TB drive as a parity drive in place. for further information, here is the full pre-clear report: ================================================================== 1.13 = unRAID server Pre-Clear disk /dev/sde = cycle 2 of 2, partition start on sector 1 = = Step 1 of 10 - Copying zeros to first 2048k bytes DONE = Step 2 of 10 - Copying zeros to remainder of disk to clear it DONE = Step 3 of 10 - Disk is now cleared from MBR onward. DONE = Step 4 of 10 - Clearing MBR bytes for partition 2,3 & 4 DONE = Step 5 of 10 - Clearing MBR code area DONE = Step 6 of 10 - Setting MBR signature bytes DONE = Step 7 of 10 - Setting partition 1 to precleared state DONE = Step 8 of 10 - Notifying kernel we changed the partitioning DONE = Step 9 of 10 - Creating the /dev/disk/by* entries DONE = Step 10 of 10 - Verifying if the MBR is cleared. DONE = Disk Post-Clear-Read completed DONE Disk Temperature: 39C, Elapsed Time: 48:46:46 ========================================================================1.13 == ST3000DM001-1CH166 W1F12655 == Disk /dev/sde has NOT been precleared successfully == skip=151000 count=200 bs=8225280 returned 32768 instead of 00000 skip=161000 count=200 bs=8225280 returned 32768 instead of 00000 skip=171800 count=200 bs=8225280 returned 32768 instead of 00000 skip=179400 count=200 bs=8225280 returned 32768 instead of 00000 skip=183000 count=200 bs=8225280 returned 32768 instead of 00000 skip=187000 count=200 bs=8225280 returned 32768 instead of 00000 skip=193000 count=200 bs=8225280 returned 32768 instead of 00000 skip=220800 count=200 bs=8225280 returned 32768 instead of 00000 skip=289600 count=200 bs=8225280 returned 32768 instead of 00000 skip=326200 count=200 bs=8225280 returned 32768 instead of 00000 ============================================================================ ** Changed attributes in files: /tmp/smart_start_sde /tmp/smart_finish_sde ATTRIBUTE NEW_VAL OLD_VAL FAILURE_THRESHOLD STATUS RAW_VALUE Spin_Retry_Count = 100 100 97 near_thresh 0 End-to-End_Error = 100 100 99 near_thresh 0 High_Fly_Writes = 95 96 0 ok 5 Airflow_Temperature_Cel = 61 63 45 near_thresh 39 Temperature_Celsius = 39 37 0 ok 39 No SMART attributes are FAILING_NOW 0 sectors were pending re-allocation before the start of the preclear. 0 sectors were pending re-allocation after pre-read in cycle 1 of 2. 0 sectors were pending re-allocation after zero of disk in cycle 1 of 2. 0 sectors were pending re-allocation after post-read in cycle 1 of 2. 0 sectors were pending re-allocation after zero of disk in cycle 2 of 2. 0 sectors are pending re-allocation at the end of the preclear, the number of sectors pending re-allocation did not change. 0 sectors had been re-allocated before the start of the preclear. 0 sectors are re-allocated at the end of the preclear, the number of sectors re-allocated did not change. Also the pre-clear test: root@DavyJones:/boot# preclear_disk -t /dev/sde Pre-Clear unRAID Disk /dev/sde ################################################################## 1.13 Device Model: ST3000DM001-1CH166 Serial Number: W1F12655 Firmware Version: CC43 User Capacity: 3,000,592,982,016 bytes Disk /dev/sde: 3000.6 GB, 3000592982016 bytes 255 heads, 63 sectors/track, 364801 cylinders, total 5860533168 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes Disk identifier: 0x00000000 Device Boot Start End Blocks Id System /dev/sde1 1 4294967295 2147483647+ 0 Empty Partition 1 does not end on cylinder boundary. Partition 1 does not start on physical sector boundary. ######################################################################## ========================================================================1.13 == == DISK /dev/sde IS PRECLEARED with a GPT Protective MBR == ============================================================================ root@DavyJones:/boot# Your advice is greatly appreciated Sincerely, Sideband Samurai Quote Link to comment
BobPhoenix Posted October 2, 2013 Share Posted October 2, 2013 Since it wasn't precleared you will not be able to add it to the array without unRAID itself trying to clear it. As to why it didn't clear I can't really say. I can say when I had that problem it was suggested to me to update the firmware on my M1015 HBA and also the firmware on my SAS Expander (Intel RES2SV240). I ended up swaping the expander with a different one and preclearing a different model of drive and it worked so not sure which change affected it for me. Quote Link to comment
SidebandSamurai Posted October 2, 2013 Author Share Posted October 2, 2013 Darn! I am using an old hp xw4300 workstation. There is no new bios update that I can see (I am running the current version). My BIOS is reporting that the attached hard drive is 800GB not 3TB. The other 2TB drives actually show up as 2TB drives. So I am thinking that the 3TB hard drives though appear to be working, may actually not be absolutely happy. I have a PCI to sata HBA I can use to support the 3TB drives for now. At least I hope it supports 3TB drives. Its strange though, Unraid shows these as 3TB drives, and the drive I just put into production, which is a 3TB drive does not appear to be having any issues. I have checked the Parity twice with 0 errors. Should I be worried? If so, my plan would be to add the PCI to SATA HBA, attach the new drive tray to the HBA, then re-run a pre-clear to see if it clears ok. If it does then I can remove the existing parity, and re-add the re-tested 3TB drive that is on the PCI to SATA HBA, as the NEW parity drive, rebuild the parity, re-check the array to make sure I still get no errors. If everything goes well, do the same thing to the now to the existing 3TB parity and make it a data drive. What does everyone think? Any problems with this? I am not totally worried about performance right now. As I am in the process of building a whole new system these drives will go into. Just waiting on extra funds for the important parts (CPU and RAM). I just need more space so I can put my wife's Horror collection on the media server (some 60 DVDs). -- Sideband Samurai Quote Link to comment
garycase Posted October 3, 2013 Share Posted October 3, 2013 As long as UnRAID is seeing the drives as 3TB you're okay. That simply means your BIOS doesn't support drives > 2TB ... but your controller does, and Linux uses its own driver without relying on BIOS disk routines. Note that if it works for 3TB, it will also work with 4TB, 5TB, 6TB, .... as the larger drives become available. Quote Link to comment
SidebandSamurai Posted October 3, 2013 Author Share Posted October 3, 2013 but why did the script return these errors, and that the disk had NOT been precleared successfully as seen below? ST3000DM001-1CH166 W1F12655 == Disk /dev/sde has NOT been precleared successfully == skip=151000 count=200 bs=8225280 returned 32768 instead of 00000 skip=161000 count=200 bs=8225280 returned 32768 instead of 00000 skip=171800 count=200 bs=8225280 returned 32768 instead of 00000 skip=179400 count=200 bs=8225280 returned 32768 instead of 00000 skip=183000 count=200 bs=8225280 returned 32768 instead of 00000 skip=187000 count=200 bs=8225280 returned 32768 instead of 00000 skip=193000 count=200 bs=8225280 returned 32768 instead of 00000 skip=220800 count=200 bs=8225280 returned 32768 instead of 00000 skip=289600 count=200 bs=8225280 returned 32768 instead of 00000 skip=326200 count=200 bs=8225280 returned 32768 instead of 00000 Quote Link to comment
garycase Posted October 3, 2013 Share Posted October 3, 2013 I don't know -- hopefully Joe L will comment on that for you. It may have to do with the lack of BIOS support -- I don't know if Joe's script uses the BIOS disk routines or not. But the SMART report looks just fine; and if UnRAID is using the drive okay (as you indicated it was) then I simply wouldn't worry about it. Quote Link to comment
BobPhoenix Posted October 3, 2013 Share Posted October 3, 2013 As I said I also changed the model of HD that I was clearing. I went from WD to Hitachi. The WD wouldn't clear the Hitachi did. As mentioned I swapped SAS Expanders as well. But if it was the hard drive change that let me preclear then that suggests the WD had buggy firmware and that is a possiblity for YOUR drive as well. You might see if there is a firmware update for the drive too. I suggest that because I have a 3TB WD Green drive that alternates between zero and 65535 pending sectors depending on the preclear cycle. One cycle ends with zero the next with 65535 then back to zero. I've run at least 6 cycles and the last 4 or so were in that pattern. Joe L suggested to me that it was likely because of buggy firmware so that is why I would look into it in your case as well. Basically look to update your bios, HDD controller firmware and drive firmware. If none of that is possible or makes a difference then try clearing it on a different PC. When it happened to me I just removed it and took it to my preclear station like most of my other drives have been precleared on. You can use a free version of unRAID on another box to do your preclears you don't have to do it from a registered flash. As a last ditch effort you can try a standard Windows long format first before you preclear. I had a WD 3TB Red that wouldn't preclear on any PC and as a last ditch effort to make it work I formatted it in Windows. For me it was failing on the write step in the preclear process so I thought I would try a Windows long format to see if it would work. When the Windows format worked I then tried the preclear again and it worked as well. Basically since Windows ignores errors that Linux does not I figure the WD Red just needed to be kicked in the a$$ to get it working. Last suggestions I've got for you. Quote Link to comment
dgaschk Posted October 3, 2013 Share Posted October 3, 2013 I would not use any disk the did not pass pre-clear. Pre-clear writes the signature before the post-read. It does not revoke the signature if the post-read fails. This is why the pre-clear can fail but the test passes. The test is only looking for the signature. This disk has failed pre-clear and should not be used. It is entirely possible that HW problems can be causing this issue. Quote Link to comment
garycase Posted October 3, 2013 Share Posted October 3, 2013 I would not use any disk the did not pass pre-clear. I fundamentally agree -- but in this case that MAY be because it's using a BIOS read routine that does not understand the disk size, so is reading from a different location than actually expected ... whereas bypassing the BIOS routines (as UnRAID does) works fine. Not sure that's the case -- that's why I noted it'd be nice if Joe would comment on whether or not the script using BIOS access. But it seems likely that all is actually just fine, since the disk is working perfectly in the system. Quote Link to comment
BobPhoenix Posted October 3, 2013 Share Posted October 3, 2013 I'm just guessing here too. But since preclear is a script (AWK) it would be using standard linux read and write routines to do the IO so if unRAID is fine so would the script be fine. Quote Link to comment
SidebandSamurai Posted October 3, 2013 Author Share Posted October 3, 2013 All very good comments, thank you for the advice. 1. I did update the bios on the XW4300, it was on 1.06 and was updated to 1.12, but this did not fix the issue because the system still reports 800MB for a 3TB drive. 2. I started using a SiL3114 host adapter by Sabrent. Its a PCI to SATA HBA. when I restarted the pre-clear, it showed the same exact numbers as the mother board ports. Except it reported that a 3TB drive was attached. I did not allow the pre-clear to complete. 3. I will pre-clear this disk on a different system. If it works, then I will replace the existing parity drive with this 3TB disk. 4. I will check for firmware updates for my particular drive model and see if that fixes my problem. 5. I already have a 3TB drive that did not pass pre-clear running as a Parity. I put it in production with out knowing that I should not have. I have run 3 parity checks with no errors. I will be replacing this drive if number 3 is successful. I too would be interested in what JoeL has to say about the issue. Maybe the signature should be wiped out if the pre-clear is unsuccessful to prevent its installation. Although all that would do is just cause the array to not be mounted for 24 hours. but at least when you rant the -t you can see that its not ready to be put in production. Thanks again! Sideband Samurai Quote Link to comment
dgaschk Posted October 4, 2013 Share Posted October 4, 2013 Whatever the cause, these disks should not be in your array. Using another system to preclear wont make them work correctly in the server. If a HBA in the server can pre clear the drive the the drive must be used with the HBA. Don't move them to an incompatible SATA port. Quote Link to comment
SidebandSamurai Posted October 4, 2013 Author Share Posted October 4, 2013 Well here is an update. I performed option number 4 (Check for firmware update). Seagate reported none available and one available under certificate. I downloaded the one under certificate as it was dated July of this year, and I have had the drives in storage since November 2012. The Firmware version shipped was CC43. The version I installed was CC29. I think this is a downgrade of the firmware, and I have no way to restore the firmware with shipped version as seagate does not provide the "updated" firmware. Currently I have started pre-clearing the CC29 hard drive on a different system. Its BIOS is also reporting 800 GB instead of 3TB. It seams I am just unlucky in finding a system that will show 3TB in the bios. From previous posts, I don't think this really the actual issue. It will take 24 hrs to re-preclear the drive. -- Sideband Samurai Quote Link to comment
Joe L. Posted October 4, 2013 Share Posted October 4, 2013 but why did the script return these errors, and that the disk had NOT been precleared successfully as seen below? ST3000DM001-1CH166 W1F12655 == Disk /dev/sde has NOT been precleared successfully == skip=151000 count=200 bs=8225280 returned 32768 instead of 00000 skip=161000 count=200 bs=8225280 returned 32768 instead of 00000 skip=171800 count=200 bs=8225280 returned 32768 instead of 00000 skip=179400 count=200 bs=8225280 returned 32768 instead of 00000 skip=183000 count=200 bs=8225280 returned 32768 instead of 00000 skip=187000 count=200 bs=8225280 returned 32768 instead of 00000 skip=193000 count=200 bs=8225280 returned 32768 instead of 00000 skip=220800 count=200 bs=8225280 returned 32768 instead of 00000 skip=289600 count=200 bs=8225280 returned 32768 instead of 00000 skip=326200 count=200 bs=8225280 returned 32768 instead of 00000 They indicate a problem... possibly with the disk electronics, but more likely with the disk controller or system RAM. The script wrote zeros. In those locations indicated, it read back 32768. If you use that disk in your array you will likely pull out your hair with constant parity errors as the values read back from the disk are not those written. Since the returned value always seems to be a power of 2, I suspect a marginal "bit" in the electronics. (or system RAM) I'd start with a memory test, through several cycles, to ensure it is not RAM, followed by a systematic analysis of the remaining hardware. Based on the report that you had the same issue with the prior drive, it is likely either the disk controller, or the system RAM. Quote Link to comment
SidebandSamurai Posted October 4, 2013 Author Share Posted October 4, 2013 Joe L. Excellent analysis. The system ONLY has 512MB. I had forgotten that it was that short of ram. I have NOT had a problem with it for the Year or so its been in production. It has been really reliable all through the RC phase of the 5.0 release. I am currently pre-clearing another drive in another system. With firmware cc29, it seems to be performing much better than with the CC43 firmware. One main difference is that this drive is pre-clearing on a different system with lots (8GB) of ram. Its 11 hrs into the pre-clear process and its on the last step, so we will see soon enough. My drive is a ST3000DM001-1CH1 3TB hard drive. As for parity checks, they all have been successful with zero errors. I have ran a total of 3. For now, I have the server shutdown, while I complete the preclear on this new drive. Joe, I have previous posts about weather or not the -A is required in the pre-clear script. I was concerned that with out the -A option, it tells me that the partition will not be aligned. Can you take a look at my previous posts on this thread and give me your opinion? Thanks for your time. Sincerely, Sideband Samurai Quote Link to comment
SidebandSamurai Posted October 5, 2013 Author Share Posted October 5, 2013 Joe L. Thanks for your suggestion, I have definitely confirmed that I have a RAM issue. I just started retesting memory with just one stick installed and have found the bad module right off the bat. I will swap the other module in just to make sure they both are not defective and get rid of the bad module. This really makes me feel lots better and also explains the kernel panics I started seeing recently. I will still continue to post on this thread until I have the system up on all my 3TB drives. -- Sideband Samurai Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.