Hard drive errors in log and SMART check.


Recommended Posts

Hello,

 

My cache disk recently started spitting out these errors in the log file:

 

Jan 25 12:59:10 kenny kernel: hdb: task_pio_intr: status=0x59 { DriveReady SeekComplete DataRequest Error }
Jan 25 12:59:10 kenny kernel: hdb: task_pio_intr: error=0x40 { UncorrectableError }, LBAsect=360971415, sector=360971415
Jan 25 12:59:10 kenny kernel: hdb: possibly failed opcode: 0x29
Jan 25 12:59:10 kenny kernel: end_request: I/O error, dev hdb, sector 360971415
Jan 25 12:59:10 kenny kernel: Buffer I/O error on device hdb1, logical block 45121419
Jan 25 12:59:23 kenny kernel: hdb: task_pio_intr: status=0x59 { DriveReady SeekComplete DataRequest Error }
Jan 25 12:59:23 kenny kernel: hdb: task_pio_intr: error=0x40 { UncorrectableError }, LBAsect=360971927, sector=360971927
Jan 25 12:59:23 kenny kernel: hdb: possibly failed opcode: 0x29
Jan 25 12:59:23 kenny kernel: end_request: I/O error, dev hdb, sector 360971927
Jan 25 12:59:23 kenny kernel: Buffer I/O error on device hdb1, logical block 45121483

 

I did a full smart check and the following error was repeating at different LBA's:

 

Error 1258 occurred at disk power-on lifetime: 58815 hours (2450 days + 15 hours)
  When the command that caused the error occurred, the device was doing SMART Offline or Self-test.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 08 16 fc 83 f0  Error: UNC at LBA = 0x0083fc16 = 8649750

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  29 00 08 0f fc 83 15 00   5d+16:02:20.950  READ MULTIPLE EXT
  29 00 08 0f fc 83 15 00   5d+16:02:20.950  READ MULTIPLE EXT
  29 00 08 07 fc 83 15 00   5d+16:02:20.950  READ MULTIPLE EXT
  29 00 08 ff fb 83 15 00   5d+16:02:20.950  READ MULTIPLE EXT
  29 00 08 f7 fb 83 15 00   5d+16:02:20.950  READ MULTIPLE EXT

 

 

Is this a dying hard drive or is there a specific disk check i can do to help fix the problem?

Link to comment

SMART status Info for /dev/hdb

smartctl 5.40 2010-10-16 r3189 [i486-slackware-linux-gnu] (local build)

Copyright © 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

 

=== START OF INFORMATION SECTION ===

Model Family:    Western Digital Caviar SE family

Device Model:    WDC WD2000JB-00GVA0

Serial Number:    WD-WMAL81014852

Firmware Version: 08.02D08

User Capacity:    200,049,647,616 bytes

Device is:        In smartctl database [for details use: -P show]

ATA Version is:  6

ATA Standard is:  Exact ATA specification draft version not indicated

Local Time is:    Thu Jan 26 21:44:22 2012 EST

SMART support is: Available - device has SMART capability.

SMART support is: Enabled

 

=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED

See vendor-specific Attribute list for marginal Attributes.

 

General SMART Values:

Offline data collection status:  (0x84) Offline data collection activity

was suspended by an interrupting command from host.

Auto Offline Data Collection: Enabled.

Self-test execution status:      (  73) The previous self-test completed having

a test element that failed and the test

element that failed is not known.

Total time to complete Offline

data collection: (5778) seconds.

Offline data collection

capabilities: (0x7b) SMART execute Offline immediate.

Auto Offline data collection on/off support.

Suspend Offline collection upon new

command.

Offline surface scan supported.

Self-test supported.

Conveyance Self-test supported.

Selective Self-test supported.

SMART capabilities:            (0x0003) Saves SMART data before entering

power-saving mode.

Supports SMART auto save timer.

Error logging capability:        (0x01) Error logging supported.

No General Purpose Logging support.

Short self-test routine

recommended polling time: (  2) minutes.

Extended self-test routine

recommended polling time: (  75) minutes.

Conveyance self-test routine

recommended polling time: (  5) minutes.

SCT capabilities:       (0x001f) SCT Status supported.

SCT Error Recovery Control supported.

SCT Feature Control supported.

 

SMART Attributes Data Structure revision number: 16

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME          FLAG    VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

  1 Raw_Read_Error_Rate    0x000b  199  001  051    Pre-fail  Always  In_the_past 10

  3 Spin_Up_Time            0x0007  115  081  021    Pre-fail  Always      -      4783

  4 Start_Stop_Count        0x0032  096  096  040    Old_age  Always      -      4521

  5 Reallocated_Sector_Ct  0x0033  190  190  140    Pre-fail  Always      -      157

  7 Seek_Error_Rate        0x000b  200  200  051    Pre-fail  Always      -      0

  9 Power_On_Hours          0x0032  020  020  000    Old_age  Always      -      58847

10 Spin_Retry_Count        0x0013  100  100  051    Pre-fail  Always      -      0

11 Calibration_Retry_Count 0x0013  100  100  051    Pre-fail  Always      -      0

12 Power_Cycle_Count      0x0032  100  100  000    Old_age  Always      -      328

194 Temperature_Celsius    0x0022  101  085  000    Old_age  Always      -      49

196 Reallocated_Event_Count 0x0032  171  171  000    Old_age  Always      -      29

197 Current_Pending_Sector  0x0012  193  193  000    Old_age  Always      -      234

198 Offline_Uncorrectable  0x0012  200  200  000    Old_age  Always      -      0

199 UDMA_CRC_Error_Count    0x000a  200  253  000    Old_age  Always      -      500

200 Multi_Zone_Error_Rate  0x0009  200  200  051    Pre-fail  Offline      -      0

 

SMART Error Log Version: 1

ATA Error Count: 1321 (device log contains only the most recent five errors)

CR = Command Register [HEX]

FR = Features Register [HEX]

SC = Sector Count Register [HEX]

SN = Sector Number Register [HEX]

CL = Cylinder Low Register [HEX]

CH = Cylinder High Register [HEX]

DH = Device/Head Register [HEX]

DC = Device Command Register [HEX]

ER = Error register [HEX]

ST = Status register [HEX]

Powered_Up_Time is measured from power on, and printed as

DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,

SS=sec, and sss=millisec. It "wraps" after 49.710 days.

 

Error 1321 occurred at disk power-on lifetime: 58846 hours (2451 days + 22 hours)

  When the command that caused the error occurred, the device was doing SMART Offline or Self-test.

 

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  40 51 08 37 3c f8 f0  Error: UNC 8 sectors at LBA = 0x00f83c37 = 16268343

 

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  25 00 08 37 3c f8 0e 00      10:33:16.750  READ DMA EXT

  25 00 08 2f 3c f8 0e 00      10:33:16.750  READ DMA EXT

  25 00 08 27 3c f8 0e 00      10:33:16.750  READ DMA EXT

  25 00 08 1f 0d f8 00 00      10:33:16.750  READ DMA EXT

  25 00 08 1f 3c f8 0e 00      10:33:16.750  READ DMA EXT

 

Error 1320 occurred at disk power-on lifetime: 58846 hours (2451 days + 22 hours)

  When the command that caused the error occurred, the device was doing SMART Offline or Self-test.

 

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  40 51 80 37 3c f8 f0  Error: UNC 128 sectors at LBA = 0x00f83c37 = 16268343

 

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  25 00 80 cf 3b f8 0e 00      10:33:09.950  READ DMA EXT

  25 00 80 cf 3b f8 0e 00      10:33:09.950  READ DMA EXT

  10 00 3f 00 00 00 00 00      10:33:09.950  RECALIBRATE [OBS-4]

  25 00 80 cf 3b f8 0e 00      10:33:09.950  READ DMA EXT

  25 00 80 cf 3b f8 0e 00      10:33:09.950  READ DMA EXT

 

Error 1319 occurred at disk power-on lifetime: 58846 hours (2451 days + 22 hours)

  When the command that caused the error occurred, the device was doing SMART Offline or Self-test.

 

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  01 51 80 37 3c f8 f0  Error: AMNF 128 sectors at LBA = 0x00f83c37 = 16268343

 

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  25 00 80 cf 3b f8 0e 00      10:33:07.950  READ DMA EXT

  10 00 3f 00 00 00 00 00      10:33:07.950  RECALIBRATE [OBS-4]

  25 00 80 cf 3b f8 0e 00      10:33:07.950  READ DMA EXT

  25 00 80 cf 3b f8 0e 00      10:33:07.950  READ DMA EXT

  25 00 08 bf 0f f8 00 00      10:33:07.950  READ DMA EXT

 

Error 1318 occurred at disk power-on lifetime: 58846 hours (2451 days + 22 hours)

  When the command that caused the error occurred, the device was doing SMART Offline or Self-test.

 

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  01 51 80 37 3c f8 f0  Error: AMNF 128 sectors at LBA = 0x00f83c37 = 16268343

 

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  25 00 80 cf 3b f8 0e 00      10:33:05.900  READ DMA EXT

  25 00 80 cf 3b f8 0e 00      10:33:05.900  READ DMA EXT

  25 00 08 bf 0f f8 00 00      10:33:05.900  READ DMA EXT

  35 00 20 17 b1 c0 09 00      10:33:05.900  WRITE DMA EXT

  35 00 08 8f aa c0 09 00      10:33:05.900  WRITE DMA EXT

 

Error 1317 occurred at disk power-on lifetime: 58846 hours (2451 days + 22 hours)

  When the command that caused the error occurred, the device was doing SMART Offline or Self-test.

 

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  01 51 80 37 3c f8 f0  Error: AMNF 128 sectors at LBA = 0x00f83c37 = 16268343

 

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  25 00 80 cf 3b f8 0e 00      10:33:03.800  READ DMA EXT

  25 00 08 bf 0f f8 00 00      10:33:03.800  READ DMA EXT

  35 00 20 17 b1 c0 09 00      10:33:03.800  WRITE DMA EXT

  35 00 08 8f aa c0 09 00      10:33:03.800  WRITE DMA EXT

  25 00 80 8f 3a f8 0e 00      10:33:03.800  READ DMA EXT

 

SMART Self-test log structure revision number 1

Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error

# 1  Extended offline    Completed: unknown failure    90%    58818        22806528

# 2  Short offline      Completed without error      00%    47812        -

# 3  Short offline      Aborted by host              50%    47812        -

# 4  Short offline      Completed without error      00%    47812        -

 

SMART Selective self-test log data structure revision number 1

SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS

    1        0        0  Not_testing

    2        0        0  Not_testing

    3        0        0  Not_testing

    4        0        0  Not_testing

    5        0        0  Not_testing

Selective self-test flags (0x0):

  After scanning selected spans, do NOT read-scan remainder of disk.

If Selective self-test is pending on power-up, resume after 0 minute delay.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.