disk 5 showing SMART failure, what to do about it?

JustinChase · February 17, 2015

Here is the Disk Error Log from the Dashboard > Health page/tab...

Disk 5 attached to port: sdk

ATA Error Count: 11 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 11 occurred at disk power-on lifetime: 7839 hours (326 days + 15 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 a8 c0 1b 01  Error: UNC at LBA = 0x011bc0a8 = 18596008

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 08 a8 c0 1b 41 00      00:00:09.292  READ FPDMA QUEUED
  ef 10 02 00 00 00 a0 00      00:00:09.292  SET FEATURES [Enable SATA feature]
  27 00 00 00 00 00 e0 00      00:00:09.291  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  ec 00 00 00 00 00 a0 00      00:00:09.291  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 00      00:00:09.291  SET FEATURES [set transfer mode]

Error 10 occurred at disk power-on lifetime: 7839 hours (326 days + 15 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 a8 c0 1b 01  Error: UNC at LBA = 0x011bc0a8 = 18596008

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 08 a8 c0 1b 41 00      00:00:05.606  READ FPDMA QUEUED
  ef 10 02 00 00 00 a0 00      00:00:05.606  SET FEATURES [Enable SATA feature]
  27 00 00 00 00 00 e0 00      00:00:05.605  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  ec 00 00 00 00 00 a0 00      00:00:05.605  IDENTIFY DEVICE
  ef 10 06 00 00 00 00 00      00:00:05.605  SET FEATURES [Enable SATA feature]

Error 9 occurred at disk power-on lifetime: 7792 hours (324 days + 16 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 20 48 6b 01  Error: UNC at LBA = 0x016b4820 = 23808032

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 00 f0 2e 6b e1 00   4d+14:16:45.981  READ DMA EXT
  25 00 00 f0 2a 6b e1 00   4d+14:16:45.955  READ DMA EXT
  35 00 00 f0 26 6b e1 00   4d+14:16:45.953  WRITE DMA EXT
  35 00 00 f0 22 6b e1 00   4d+14:16:45.950  WRITE DMA EXT
  35 00 00 f0 1e 6b e1 00   4d+14:16:45.948  WRITE DMA EXT

Error 8 occurred at disk power-on lifetime: 7770 hours (323 days + 18 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 f8 2b 8a 00  Error: UNC at LBA = 0x008a2bf8 = 9055224

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 00 20 29 8a e0 00   3d+16:38:57.359  READ DMA EXT
  25 00 00 20 25 8a e0 00   3d+16:38:57.322  READ DMA EXT
  35 00 08 18 22 8a e0 00   3d+16:38:57.320  WRITE DMA EXT
  35 00 00 18 1e 8a e0 00   3d+16:38:57.318  WRITE DMA EXT
  35 00 00 18 1a 8a e0 00   3d+16:38:57.316  WRITE DMA EXT

Error 7 occurred at disk power-on lifetime: 7695 hours (320 days + 15 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 48 44 00 00  Error: UNC at LBA = 0x00004448 = 17480

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 00 18 3a 00 e0 00      13:34:47.358  READ DMA EXT
  25 00 00 18 36 00 e0 00      13:34:47.316  READ DMA EXT
  25 00 00 18 32 00 e0 00      13:34:47.279  READ DMA EXT
  35 00 10 ff ff ff ef 00      13:34:47.275  WRITE DMA EXT
  35 00 08 ff ff ff ef 00      13:34:47.272  WRITE DMA EXT

Squid · February 17, 2015

Here is the Disk Error Log from the Dashboard > Health page/tab...

Disk 5 attached to port: sdk

ATA Error Count: 11 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 11 occurred at disk power-on lifetime: 7839 hours (326 days + 15 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 a8 c0 1b 01  Error: UNC at LBA = 0x011bc0a8 = 18596008

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 08 a8 c0 1b 41 00      00:00:09.292  READ FPDMA QUEUED
  ef 10 02 00 00 00 a0 00      00:00:09.292  SET FEATURES [Enable SATA feature]
  27 00 00 00 00 00 e0 00      00:00:09.291  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  ec 00 00 00 00 00 a0 00      00:00:09.291  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 00      00:00:09.291  SET FEATURES [set transfer mode]

Error 10 occurred at disk power-on lifetime: 7839 hours (326 days + 15 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 a8 c0 1b 01  Error: UNC at LBA = 0x011bc0a8 = 18596008

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 08 a8 c0 1b 41 00      00:00:05.606  READ FPDMA QUEUED
  ef 10 02 00 00 00 a0 00      00:00:05.606  SET FEATURES [Enable SATA feature]
  27 00 00 00 00 00 e0 00      00:00:05.605  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  ec 00 00 00 00 00 a0 00      00:00:05.605  IDENTIFY DEVICE
  ef 10 06 00 00 00 00 00      00:00:05.605  SET FEATURES [Enable SATA feature]

Error 9 occurred at disk power-on lifetime: 7792 hours (324 days + 16 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 20 48 6b 01  Error: UNC at LBA = 0x016b4820 = 23808032

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 00 f0 2e 6b e1 00   4d+14:16:45.981  READ DMA EXT
  25 00 00 f0 2a 6b e1 00   4d+14:16:45.955  READ DMA EXT
  35 00 00 f0 26 6b e1 00   4d+14:16:45.953  WRITE DMA EXT
  35 00 00 f0 22 6b e1 00   4d+14:16:45.950  WRITE DMA EXT
  35 00 00 f0 1e 6b e1 00   4d+14:16:45.948  WRITE DMA EXT

Error 8 occurred at disk power-on lifetime: 7770 hours (323 days + 18 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 f8 2b 8a 00  Error: UNC at LBA = 0x008a2bf8 = 9055224

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 00 20 29 8a e0 00   3d+16:38:57.359  READ DMA EXT
  25 00 00 20 25 8a e0 00   3d+16:38:57.322  READ DMA EXT
  35 00 08 18 22 8a e0 00   3d+16:38:57.320  WRITE DMA EXT
  35 00 00 18 1e 8a e0 00   3d+16:38:57.318  WRITE DMA EXT
  35 00 00 18 1a 8a e0 00   3d+16:38:57.316  WRITE DMA EXT

Error 7 occurred at disk power-on lifetime: 7695 hours (320 days + 15 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 48 44 00 00  Error: UNC at LBA = 0x00004448 = 17480

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 00 18 3a 00 e0 00      13:34:47.358  READ DMA EXT
  25 00 00 18 36 00 e0 00      13:34:47.316  READ DMA EXT
  25 00 00 18 32 00 e0 00      13:34:47.279  READ DMA EXT
  35 00 10 ff ff ff ef 00      13:34:47.275  WRITE DMA EXT
  35 00 08 ff ff ff ef 00      13:34:47.272  WRITE DMA EXT

Post the output from the disk attributes section

JustinChase · February 17, 2015

ID#	ATTRIBUTE NAME	FLAG	VALUE	WORST	THRESH	TYPE	UPDATED	FAILED	RAW VALUE
1	Raw Read Error Rate	0x000f	109	099	006	Pre-fail	Always	Never	23094688
3	Spin Up Time	0x0003	091	091	000	Pre-fail	Always	Never	0
4	Start Stop Count	0x0032	098	098	020	Old age	Always	Never	2284
5	Reallocated Sector Ct	0x0033	100	100	010	Pre-fail	Always	Never	40
7	Seek Error Rate	0x000f	076	060	030	Pre-fail	Always	Never	48771086
9	Power On Hours	0x0032	088	088	000	Old age	Always	Never	10695
10	Spin Retry Count	0x0013	100	100	097	Pre-fail	Always	Never	0
12	Power Cycle Count	0x0032	100	100	020	Old age	Always	Never	279
183	Runtime Bad Block	0x0032	100	100	000	Old age	Always	Never	0
184	End-to-End Error	0x0032	100	100	099	Old age	Always	Never	0
187	Reported Uncorrect	0x0032	089	089	000	Old age	Always	Never	11
188	Command Timeout	0x0032	100	100	000	Old age	Always	Never	0 0 0
189	High Fly Writes	0x003a	076	076	000	Old age	Always	Never	24
190	Airflow Temperature Cel	0x0022	068	050	045	Old age	Always	Never	32 (Min/Max 22/33)
191	G-Sense Error Rate	0x0032	100	100	000	Old age	Always	Never	0
192	Power-Off Retract Count	0x0032	100	100	000	Old age	Always	Never	546
193	Load Cycle Count	0x0032	093	093	000	Old age	Always	Never	14948
194	Temperature Celsius	0x0022	032	050	000	Old age	Always	Never	32 (0 16 0 0 0)
197	Current Pending Sector	0x0012	100	100	000	Old age	Always	Never	0
198	Offline Uncorrectable	0x0010	100	100	000	Old age	Offline	Never	0
199	UDMA CRC Error Count	0x003e	200	200	000	Old age	Always	Never	0
240	Head Flying Hours	0x0000	100	253	000	Old age	Offline	Never	2856h+18m+54.573s
241	Total LBAs Written	0x0000	100	253	000	Old age	Offline	Never	140104783128
242	Total LBAs Read	0x0000	100	253	000	Old age	Offline	Never	518723927939

I noticed these 2 are orange...

5 Reallocated Sector Ct

187 Reported Uncorrect

never noticed that before. Are these actually bad or fixable or nothing to be too concerned about?

Squid · February 17, 2015

You've got 40 reallocated sectors, 0 pending, and 11 reported uncorrectable

Myself, I'd just keep an eye on them and only worry if they begin to increase further. The drive is not near its (manufacturer specified) thresholds for those items.

That being said, I believe that BackBlaze throws a drive out once the reported uncorrectable goes above 0.

Squid · February 17, 2015

Also, based on these occurances in the error log,:

 60 00 08 a8 c0 1b 41 00      00:00:05.606  READ FPDMA QUEUED
  ef 10 02 00 00 00 a0 00      00:00:05.606  SET FEATURES [Enable SATA feature]
  27 00 00 00 00 00 e0 00      00:00:05.605  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  ec 00 00 00 00 00 a0 00      00:00:05.605  IDENTIFY DEVICE
  ef 10 06 00 00 00 00 00      00:00:05.605  SET FEATURES [Enable SATA feature]

You're having (or have had) a cabling / power issue with the drive. I would check out your syslog and see if there aren't any drive error messages in it.

JustinChase · February 17, 2015

You've got 40 reallocated sectors, 0 pending, and 11 reported uncorrectable

Myself, I'd just keep an eye on them and only worry if they begin to increase further. The drive is not near its (manufacturer specified) thresholds for those items.

That being said, I believe that BackBlaze throws a drive out once the reported uncorrectable goes above 0.

Thanks for the quick responses.

5 was just the line number in the report

JustinChase · February 17, 2015

Also, based on these occurances in the error log,:
 60 00 08 a8 c0 1b 41 00      00:00:05.606  READ FPDMA QUEUED
  ef 10 02 00 00 00 a0 00      00:00:05.606  SET FEATURES [Enable SATA feature]
  27 00 00 00 00 00 e0 00      00:00:05.605  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  ec 00 00 00 00 00 a0 00      00:00:05.605  IDENTIFY DEVICE
  ef 10 06 00 00 00 00 00      00:00:05.605  SET FEATURES [Enable SATA feature]
You're having (or have had) a cabling / power issue with the drive. I would check out your syslog and see if there aren't any drive error messages in it.

I didn't see anything, but I just rebooted recently.

syslog.txt

dgaschk · February 19, 2015

The last issue was over 2000 hours ago. Have you noticed a problem with the server lately?

JustinChase · February 19, 2015

no issues at all, until today. The server 'froze' twice today, but I updated to beta13 yesterday, so i suspect that more than this disk being the issue. I'm reverting back to beta12 soon.

bungee91 · February 19, 2015

Wanted to chime in just in case we are seeing similar problems.

I recently noticed 3 of my disks giving the thumbs down (have 11 in array) that were absolutely green prior to.

They all are flagged for reallocated sectors, 163/16/1.

I noticed this came up RIGHT after updating Dynamix from within the update from plugins screen, never had a single red thumb down prior to.

It is very possible the drives have had this for a while and the update changed the notification of this (?), IDK?

I had recently moved and the server was in the car, hitting bumps, etc... so I can't be positive it is related, but pretty sure it is.

I relocated the server to the new house, set it up, all was well and ran for a day or so..... Update Dynamix and any other plugins (as applicable) reboot, 3 red thumbs down... Boo!.. =)

Also I am still on beta 12, will wait a little with what I've been reading in regards to 13!

bungee91 · February 19, 2015

One other thing, if you run a "SMART extended self-test" does it finish?

Mine seems to get stuck at 10% and just spin, however the short test finishes with no errors.

Squid · February 19, 2015

Wanted to chime in just in case we are seeing similar problems.

I recently noticed 3 of my disks giving the thumbs down (have 11 in array) that were absolutely green prior to.

They all are flagged for reallocated sectors, 163/16/1.

I noticed this came up RIGHT after updating Dynamix from within the update from plugins screen, never had a single red thumb down prior to.

It is very possible the drives have had this for a while and the update changed the notification of this (?), IDK?

I had recently moved and the server was in the car, hitting bumps, etc... so I can't be positive it is related, but pretty sure it is.

I relocated the server to the new house, set it up, all was well and ran for a day or so..... Update Dynamix and any other plugins (as applicable) reboot, 3 red thumbs down... Boo!.. =)

Also I am still on beta 12, will wait a little with what I've been reading in regards to 13!

The original GUI that came out with b12 didn't work quite right. The GUI update fixed quite a number of things, and also made it so that any disk that had even a single reallocated sector got the red thumbs down (I don't agree with this). What you want to do is configure email alerts, and the system will email you when there are any changes on the "big 5" smart attributes.

The drives themselves didn't necessarily get worse in the move. The GUI update which you just did is just telling you what's going on.

Squid · February 19, 2015

Also, if you start getting a ton of emails / notifications about attributes 195, 225, then you should also do the update as described in this thread: http://lime-technology.com/forum/index.php?topic=37817.0

JustinChase · February 19, 2015

One other thing, if you run a "SMART extended self-test" does it finish?

Mine seems to get stuck at 10% and just spin, however the short test finishes with no errors.

Your server is probably spinning down the disks before the test finishes. I had to extend my spin down period to about 5 hours for the extended tests to finish.

**Don't forget to change it back when you're done testing the drives (like I did )

SSD · February 19, 2015

You've got 40 reallocated sectors, 0 pending, and 11 reported uncorrectable

Myself, I'd just keep an eye on them and only worry if they begin to increase further. The drive is not near its (manufacturer specified) thresholds for those items.

That being said, I believe that BackBlaze throws a drive out once the reported uncorrectable goes above 0.

I would suggest running a parity checks every few days and compare the before and after reallocated sector and reported uncorrected counts. If they hold steady for three consecutive parity checks I'd trust the drive for now and continue to monitor. But if every parity check or two is increasing the counts, I'd RMA it if possible or look to retire it. I consider it something like a disk pothole. Once it occurs it just gets bigger and bigger!

My experience is zero is the number of reallocated sectors you want. Even one is a sign that it is on the road to deterioration.

bonienl · February 19, 2015

Wanted to chime in just in case we are seeing similar problems.

I recently noticed 3 of my disks giving the thumbs down (have 11 in array) that were absolutely green prior to.

They all are flagged for reallocated sectors, 163/16/1.

I noticed this came up RIGHT after updating Dynamix from within the update from plugins screen, never had a single red thumb down prior to.

It is very possible the drives have had this for a while and the update changed the notification of this (?), IDK?

I had recently moved and the server was in the car, hitting bumps, etc... so I can't be positive it is related, but pretty sure it is.

I relocated the server to the new house, set it up, all was well and ran for a day or so..... Update Dynamix and any other plugins (as applicable) reboot, 3 red thumbs down... Boo!.. =)

Also I am still on beta 12, will wait a little with what I've been reading in regards to 13!

The original GUI that came out with b12 didn't work quite right. The GUI update fixed quite a number of things, and also made it so that any disk that had even a single reallocated sector got the red thumbs down (I don't agree with this). What you want to do is configure email alerts, and the system will email you when there are any changes on the "big 5" smart attributes.

The drives themselves didn't necessarily get worse in the move. The GUI update which you just did is just telling you what's going on.

The latest version B13 will show thumbs down initially for any drives which have a reallocated sectors count greater than zero upon reboot. Once a warning notification is given, it will set the initial count as threshold and turns the icon into a thumbs up. Only when the reallocated sectors count starts to increase after this, warning notifications and thumbs down are given subsequently.

Squid · February 19, 2015

The latest version B13 will show thumbs down initially for any drives which have a reallocated sectors count greater than zero upon reboot. Once a warning notification is given, it will set the initial count as threshold and turns the icon into a thumbs up. Only when the reallocated sectors count starts to increase after this, warning notifications and thumbs down are given subsequently.

Good improvement... Have to admit that I didn't notice that subtle change.

I think that you should add into the code to save disks.ini to the flashdrive on shutdowns, and restore it on power-ups so that the system carries on from boot to boot. That way, if an attribute changes during the boot you'd be notified, but if everything stayed the same you won't get the extraneous warning

bungee91 · February 19, 2015

This is all awesome information (and sorry for the derailment!), thanks guys for explaining!

The fact that I moved the server and it was sitting (parked) in a cold trunk for a week just added another variable to this.

I understand now, and at least I know what is going on instead of just assuming all was well.

I'd consider it a "bug" if the smart extended test cannot be completed without changing the drive spin down time, so hopefully if it is still the case they will fix it for that use case.

Squid · February 19, 2015

I'd consider it a "bug" if the smart extended test cannot be completed without changing the drive spin down time, so hopefully if it is still the case they will fix it for that use case.

It's been that way forever, and I wouldn't hold your breath on a fix being implemented. But maybe a note could be added to the GUI to warn users of that.

WeeboTech · February 25, 2015

Your server is probably spinning down the disks before the test finishes. I had to extend my spin down period to about 5 hours for the extended tests to finish.

**Don't forget to change it back when you're done testing the drives (like I did)

I'd consider it a "bug" if the smart extended test cannot be completed without changing the drive spin down time, so hopefully if it is still the case they will fix it for that use case.

It's been that way forever, and I wouldn't hold your breath on a fix being implemented. But maybe a note could be added to the GUI to warn users of that.

This is something I had warned Tom about, i.e. smart long tests and the drive being forced to spin down.

In doing some tests I found a possible 'keep the drive busy' method.

Ideally when a smart test is triggered it comes back with an estimated time of completion.

This should be used as some value for a callout to emhttp to disable the spindown timer until that time.

It's also pollable via smart capabilities.

i.e.

root@unRAIDb:/tmp# smartctl -c /dev/sdj
<snip>
Extended self-test routine
recommended polling time:        ( 735) minutes.
<snip>

So if the smart capabilities were polled, then the long test triggered and the emhttp spin down timer disabled for 735 minutes. The smart long test could/should complete.

Another option would be to have emhttp do a dd call to the drive everytime it checked the status of the smart log (or both).

An example in keeping the drive busy would be.

root@unRAIDb:/tmp# grep 'sdj ' /proc/diskstats  > diskstats 
root@unRAIDb:/tmp# grep 'sdj ' /proc/diskstats  | diff diskstats -

Notice there were no changes.

root@unRAIDb:/tmp# dd iflag=direct if=/dev/sdj of=/dev/null bs=1024 count=1
1+0 records in
1+0 records out
1024 bytes (1.0 kB) copied, 0.00128997 s, 794 kB/s

root@unRAIDb:/tmp# grep 'sdj ' /proc/diskstats  | diff diskstats - 
1c1
<    8     144 sdj 7032323 596289509 4826546513 135141600 18444243 2311265383 18637677016 86167290 0 51909060 221299630
---
>    8     144 sdj 7032324 596289509 4826546515 135141600 18444243 2311265383 18637677016 86167290 0 51909060 221299630

and shown here there are changes.

So the dd iflag=direct bypasses the buffer cache thus allowing emhttp to see the drive as being busy.

JustinChase · February 25, 2015

I just upgraded to 14a, and the notes talk about how to setup notifications, which I haven't setup before today. I got it all working, but now I see 4 warnings (see attached). 1 of them is the same as the OP, but there are 3 others.

What should I do to get these issues resolved?

Squid · February 25, 2015

I just upgraded to 14a, and the notes talk about how to setup notifications, which I haven't setup before today. I got it all working, but now I see 4 warnings (see attached). 1 of them is the same as the OP, but there are 3 others.

What should I do to get these issues resolved?

On the first boot of b13+ (with notifications properly setup), the system will inform you if any of 5 s.m.a.r.t. attributes (5, 187, 188, 197, 198) have a non-zero value. After that, the system will not notify you again until either:

You reboot

or,

One of those attributes change. Of what you showed in the screen shot, the only one I'd be concerned about is the cache drive

JustinChase · February 25, 2015

Thanks for the feedback. I guess I'll need to see what I can do about the cache disk, without replacing it, if possible.

JustinChase · February 26, 2015

the only one I'd be concerned about is the cache drive

I just finished an extended SMART test on the cache drive, no errors reported.

Num	Test Description	Status	Remaining	LifeTime(hours)	LBA of first error
1	Extended offline	Completed without error	00%	35870	None

Squid · February 26, 2015

It was mainly because of the 32k reported uncorrectable errors. I'd still keep an eye on it.

disk 5 showing SMART failure, what to do about it?

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation