vca

Members
  • Posts

    321
  • Joined

  • Last visited

Converted

  • Gender
    Undisclosed

vca's Achievements

Contributor

Contributor (5/14)

0

Reputation

  1. As bjp999 suggests there might be an issue with a drive not always returning the same data each time a block is read. I had a battle with this that is reported here: http://lime-technology.com/forum/index.php?topic=11515.msg109840#msg109840 though this sort of thing appears to be very rare, so might not be your case at all. If this is the cause you have to test all the drives by reading the blocks in the region that the parity error is reported. You do this many times and if you have this error on one of those drives the read will occasionally return different data (even though the drives are not being written to). Regards, Stephen
  2. From the SMART report your drive shows: 196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 1 197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 274 So only has one bad block that has already been remapped. So the surface is good. But has 274 UDMA errors which are usually some problem with the SATA or power cabling. So time to check (reseat or perhaps replace) the SATA or power cables. If you have a power splitter in the cable then look at it too. Regards, Stephen
  3. The preclear of a pair of 4TB Seagate desktop drives finished on the weekend. So here are the results for the old and new (32bit) preclears. Note the old preclear was done on a 2 pass basis. == invoked as: ./preclear_disk.sh -c 2 /dev/sdd == ST4000DM000-1F2168 Z30093E3 == Disk /dev/sdd has been successfully precleared == with a starting sector of 1 == Ran 2 cycles == == Using :Read block size = 8388608 Bytes == Last Cycle's Pre Read Time : 10:57:55 (101 MB/s) == Last Cycle's Zeroing time : 10:02:21 (110 MB/s) == Last Cycle's Post Read Time : 22:53:14 (48 MB/s) == Last Cycle's Total Time : 32:56:35 == == Total Elapsed Time 76:56:37 == invoked as: ./pc15b.sh -f /dev/sdc == ST4000DM000-1F2168 Z30093E3 == Disk /dev/sdc has been successfully precleared == with a starting sector of 1 == Ran 1 cycle == == Using :Read block size = 8388608 Bytes == Last Cycle's Pre Read Time : 11:07:18 (99 MB/s) == Last Cycle's Zeroing time : 9:55:48 (111 MB/s) == Last Cycle's Post Read Time : 11:39:35 (95 MB/s) == Last Cycle's Total Time : 32:43:43 == == Total Elapsed Time 32:43:43 == invoked as: ./preclear_disk.sh -c 2 /dev/sdc == ST4000DM000-1F2168 W3002WDC == Disk /dev/sdc has been successfully precleared == with a starting sector of 1 == Ran 2 cycles == == Using :Read block size = 8388608 Bytes == Last Cycle's Pre Read Time : 10:56:18 (101 MB/s) == Last Cycle's Zeroing time : 9:36:47 (115 MB/s) == Last Cycle's Post Read Time : 23:18:09 (47 MB/s) == Last Cycle's Total Time : 32:55:57 == == Total Elapsed Time 76:35:13 == invoked as: ./pc15b.sh -f /dev/sdd == ST4000DM000-1F2168 W3002WDC == Disk /dev/sdd has been successfully precleared == with a starting sector of 1 == Ran 1 cycle == == Using :Read block size = 8388608 Bytes == Last Cycle's Pre Read Time : 11:07:22 (99 MB/s) == Last Cycle's Zeroing time : 9:56:03 (111 MB/s) == Last Cycle's Post Read Time : 11:39:59 (95 MB/s) == Last Cycle's Total Time : 32:44:25 == == Total Elapsed Time 32:44:25 Regards, Stephen
  4. I have two of these drives, I used one in my unRAID server for about a year without any issues, but recently I have replaced it with the NAS version (I'll use the desktop version for backup storage). The thing that was bothering me about these drives was the UDMA_CRC_Error_Count, though most of that may have come from one cable problem. I just finished doing a retest of these drives by preclearing them without any indications of trouble. Both of these drives also show large values for the Seek_Error_Rate and they also have the 60/30 numbers for the worst and thresh normalized values - so I figure these are typical of this particular drive. Here are the smarts from my drives so you can compare (note the newer version of unRAID has an updated version of the smart tool that gives some better attribute names than the version you have): ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 120 099 006 Pre-fail Always - 2137384 3 Spin_Up_Time 0x0003 092 092 000 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 095 095 020 Old_age Always - 5765 5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0 7 Seek_Error_Rate 0x000f 071 060 030 Pre-fail Always - 13846410 9 Power_On_Hours 0x0032 093 093 000 Old_age Always - 6620 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 36 183 Runtime_Bad_Block 0x0032 098 098 000 Old_age Always - 2 184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0 0 0 189 High_Fly_Writes 0x003a 098 098 000 Old_age Always - 2 190 Airflow_Temperature_Cel 0x0022 071 059 045 Old_age Always - 29 (Min/Max 21/34) 191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 9 193 Load_Cycle_Count 0x0032 092 092 000 Old_age Always - 16430 194 Temperature_Celsius 0x0022 029 041 000 Old_age Always - 29 (0 20 0 0 0) 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x003e 200 195 000 Old_age Always - 6949 240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 1089h+21m+15.431s 241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 51506131152 242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 222555473047 1 Raw_Read_Error_Rate 0x000f 117 099 006 Pre-fail Always - 125981968 3 Spin_Up_Time 0x0003 092 091 000 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 098 098 020 Old_age Always - 2281 5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0 7 Seek_Error_Rate 0x000f 067 060 030 Pre-fail Always - 5927616 9 Power_On_Hours 0x0032 097 097 000 Old_age Always - 2670 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 31 183 Runtime_Bad_Block 0x0032 099 099 000 Old_age Always - 1 184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0 0 0 189 High_Fly_Writes 0x003a 091 091 000 Old_age Always - 9 190 Airflow_Temperature_Cel 0x0022 069 050 045 Old_age Always - 31 (Min/Max 21/36) 191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 8 193 Load_Cycle_Count 0x0032 097 097 000 Old_age Always - 6649 194 Temperature_Celsius 0x0022 031 050 000 Old_age Always - 31 (0 21 0 0 0) 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x003e 200 194 000 Old_age Always - 668 240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 475h+23m+44.493s 241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 44747895302 242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 147844353977 Regards, Stephen
  5. Here's my first result from the 32 bit version of the first beta, this is on a pair of old WD 2TB green drives: == invoked as: ./pc15b.sh -f /dev/sdc == WDCWD20EARS-00J2GB0 WD-WCAYY0100121 == Disk /dev/sdc has been successfully precleared == with a starting sector of 63 == Ran 1 cycle == == Using :Read block size = 8388608 Bytes == Last Cycle's Pre Read Time : 8:08:34 (68 MB/s) == Last Cycle's Zeroing time : 10:56:21 (50 MB/s) == Last Cycle's Post Read Time : 8:00:06 (69 MB/s) == Last Cycle's Total Time : 27:06:01 == invoked as: ./pc15b.sh -f /dev/sdd == WDCWD20EARS-00MVWB0 WD-WCAZA6293604 == Disk /dev/sdd has been successfully precleared == with a starting sector of 63 == Ran 1 cycle == == Using :Read block size = 8388608 Bytes == Last Cycle's Pre Read Time : 8:09:39 (68 MB/s) == Last Cycle's Zeroing time : 10:57:49 (50 MB/s) == Last Cycle's Post Read Time : 7:59:14 (69 MB/s) == Last Cycle's Total Time : 27:07:44 I'll rerun these drives through an old preclear next. And here's the results with the old preclear: == invoked as: ./preclear_disk.sh /dev/sdc == WDCWD20EARS-00J2GB0 WD-WCAYY0100121 == Disk /dev/sdc has been successfully precleared == with a starting sector of 63 == Ran 1 cycle == == Using :Read block size = 8388608 Bytes == Last Cycle's Pre Read Time : 7:37:40 (72 MB/s) == Last Cycle's Zeroing time : 11:13:52 (49 MB/s) == Last Cycle's Post Read Time : 15:02:26 (36 MB/s) == Last Cycle's Total Time : 33:54:57 == == Total Elapsed Time 33:54:57 == invoked as: ./preclear_disk.sh /dev/sdd == WDCWD20EARS-00MVWB0 WD-WCAZA6293604 == Disk /dev/sdd has been successfully precleared == with a starting sector of 63 == Ran 1 cycle == == Using :Read block size = 8388608 Bytes == Last Cycle's Pre Read Time : 7:37:22 (72 MB/s) == Last Cycle's Zeroing time : 11:13:41 (49 MB/s) == Last Cycle's Post Read Time : 14:51:50 (37 MB/s) == Last Cycle's Total Time : 33:43:53 == == Total Elapsed Time 33:43:53 So the new preclear cut the second pass time from 15 hours to 8 hours, which is great. One odd thing is the preread time is about 30 minutes longer with the new code. Stephen
  6. Just finished a two pass preclear (the old, slow, version) on a pair of 4TB Seagate NAS drives. Took about 75 hours to run. Looking forward to a faster version. Regards, Stephen
  7. I've been switching from WD Greens (I've replaced 4 so far) to WD Reds and the Seagate NAS drives. Both of these run as cool and have about the same power requirements as the Greens plus run faster. Regards, Stephen
  8. From your SMART report: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 5 Reallocated_Sector_Ct 0x0033 041 041 140 Pre-fail Always FAILING_NOW 1265 196 Reallocated_Event_Count 0x0032 001 001 000 Old_age Always - 829 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 191 191 000 Old_age Offline - 1833 the above lines are of great concern. It's actually rather rare that we get to see a report with the "FAILING_NOW" state set, usually we see them with far fewer errors and rarely see them with more probably because by the time the a drive gets to this point it rapidly fails... Given that Current_Pending_Sector is zero I think your drive has successfully remapped all the bad sectors it has found (though I'm not certain that none of your data has been corrupted). But as the Reallocated_Sector_Ct is so high there might not be many spare sectors left in case further bad spots develop. Certainly WD will RMA this drive (if it is still in warranty), I've done RMAs with them on drives with far less badness. One further note of caution, the one WD drive I have had that showed a significant value for Multi_Zone_Error_Rate failed completely after another 50 hours of heavy use. Copy your data of this drive as soon as you can and then replace it. Regards, Stephen
  9. I'd be a bit worried about your disk or maybe your power supply: 12 Power_Cycle_Count 0x0032 071 071 020 Old_age Always - 29950 this is showing that the drive has gone through almost 30,000 power cycles! And since its only logged about 5000 hours that is like one every 10 minutes. Seems very strange. Most of my drives (Seagates, WD, Hitachi) have fewer than 20 power cycles in several years of use. Perhaps there is a problem with your power supply or the power connector to the drive? Regards, Stephen
  10. The quantity of errors is not particularly alarming right now, but if you see more appearing during a few passes of preclearing then its time to either RMA (if the cost makes sense) or toss it in the bin. If it susrvives several preclear passes then its probably still safe to use, perhaps as an extra backup copy or a drive to experiment with. Regards, Stephen
  11. You might also have bad RAM in either the unRAID box or the computer these were copied from. I would run the memory tester on all the machines that these files were copied through. Regards, Stephen
  12. I'm doing preclears right now and for the next week... I've got a pair of new Seagate NAS 4TB drives to burn in and a pair of Seagate Desktop 4TB drives to retest and a pair of old WD 2TB greens that I'm taking out of service (so I'm preclearing them to erase and test). The 4tb NAS drives are just at 26% complete on the post-read of the first of 2 cycles and have taken about 26 hours so far, they'll probably take about 34-38 hours if I recall correctly. So I'd be interested in running the new beta. Regards, Stephen
  13. Unscrambling the important part of the report: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 119 095 006 Pre-fail Always - 223323636 3 Spin_Up_Time 0x0023 097 097 000 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 104 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 2 7 Seek_Error_Rate 0x002f 075 060 030 Pre-fail Always - 36417414 9 Power_On_Hours 0x0032 097 097 000 Old_age Always - 2743 10 Spin_Retry_Count 0x0033 100 100 097 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 38 180 Unknown_HDD_Attribute 0x002b 100 100 000 Pre-fail Always - 2112145650 183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0 184 End-to-End_Error 0x0032 100 100 097 Old_age Always - 0 187 Reported_Uncorrect 0x0032 001 001 000 Old_age Always - 217 188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0 189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0 190 Airflow_Temperature_Cel 0x0022 081 070 045 Old_age Always - 19 (Min/Max 14/28) 194 Temperature_Celsius 0x0022 019 040 000 Old_age Always - 19 (0 8 0 0 0) 195 Hardware_ECC_Recovered 0x003a 059 041 000 Old_age Always - 223323636 196 Reallocated_Event_Count 0x0032 100 100 036 Old_age Always - 2 197 Current_Pending_Sector 0x0032 098 098 000 Old_age Always - 112 198 Offline_Uncorrectable 0x0030 100 100 000 Old_age Offline - 99 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 The following lines are of concern: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 2 187 Reported_Uncorrect 0x0032 001 001 000 Old_age Always - 217 196 Reallocated_Event_Count 0x0032 100 100 036 Old_age Always - 2 197 Current_Pending_Sector 0x0032 098 098 000 Old_age Always - 112 198 Offline_Uncorrectable 0x0030 100 100 000 Old_age Offline - 99 There are 112 currently identified bad blocks that have not been remapped (Current_Pending_Sector), which I think puts this drive into the "do not trust" territory. Especially as the drive is not very old (2700 hours). Seeing it is only a 250GB drive it's probably not worth the bother of doing an RMA. Copy your data off it soon! Regards, Stephen
  14. When the tape drive I was using for backups started to die back in about 2004 or 2005 I ended up writing my own backup utility, initially to store the backups to DVDs and then as the cost of hard drives dropped I switched to using external drives. This utility is written in Python and I use it to backup my unRAID server to removable drives attached to my Windows desktop. It is built on the notion of a single full backup followed by an unlimited number of incrementals, so while the first backup takes a lot of time the incrementals run pretty quick. Typically I run an incremental pass on the weekend to grab all the new media files, a process that might take a half hour or so. The backups are written in user-configurable chunks, typically about 500MB (the system will automatically split large files across multiple chunks), to a drive in my Windows desktop machine. From there they get copied to an external drive in one of my backup media sets. I have two media sets, one is kept at a remote location (to further protect against fire, flood or theft - but not far enough away to protect against a meteor strike). Periodically I will take the external drive I am currently saving backups to over to the external location, swap it for the last disk in that set and bring that disk back. When I return with the swapped disk I then update it with the backup chunks that were kept on the workstation in its absence and then I can delete those from the work station and repeat the process. In this way I have quadruple redundancy for all the backed up data almost all the time: 1. the unraid disk where the data resides 2. the unraid parity protection (not truly a copy, but close) 3. the copy on the workstation internal cache drive 4. the copy on the local external drive once the data is swapped to site the items 3 and 4 become the local external drive and the remote external drive. About once every year or two I restart the whole process, because by then I'll have some higher capacity drives that I can use to remove the older (and smaller) back up drives from service. The last time I did this I was able to retire a handful of 500GB drives, replacing them with 2TB units that I had removed from the unRAID box when I started moving to 4TB drives. The data on the external drives is check summed both at the chunk level and at the individual file level. And the database that manages this has a SHA1 hash of all the individual files as well, so in theory I could use it to check against the current contents of the unRAID server without having to access any of the external drives. But I've not written that code yet. The backup utility is called ArcvBack and is available on: http://arcvback.com/arcvback.html It currently uses Python 2.5, one of these days I'll have to update it to the Python 3.x series. Regards, Stephen
  15. I'm using one with the X9scm-IIF motherboard, works fine. Speed should be the same as your motherboard ports, up to about 120MB/sec when you have 8 drives doing a parity check. Note I have found that putting both sata2 and sata3 drives on this card at the same time causes a major slow down to parity checking. When I had a sata2 drive attached my parity check speed was only 60MB/s after I moved that drive to the motherboard the speed rose to 105MB/s Also you must set the disk setting for tunable to something like 1024 otherwise parity check speed will be really bad. It was only 40MB/s for me at the default value of 384 Stephen