SMART errors on SSD cache disk

homejones · October 31, 2014

Hi,

Long time user and I am currently on 5.0.5. I installed an SSD cache disk about 6 months ago and have been using Dynamix for some time now. I recently logged into the server and saw a number of messages reported by Dynamix (the below messages repeat in this order every 2 minutes):

unRAID Cache disk SMART message: 10/30/2014 12:56

Notice: Cache disk passed SMART health check

M4-CT128M4SSD2_00000000115209006A26 (sdj)

unRAID Cache disk SMART failure: 10/30/2014 12:58

Alert: Cache disk failed SMART health check

M4-CT128M4SSD2_00000000115209006A26 (sdj)

This is my log from today:

Oct 31 09:45:10 Tower last message repeated 195 times

Oct 31 09:45:11 Tower emhttp: clear: 2% complete

Oct 31 09:45:12 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO

Oct 31 09:45:43 Tower last message repeated 117 times

Oct 31 09:46:44 Tower last message repeated 191 times

Oct 31 09:47:45 Tower last message repeated 189 times

Oct 31 09:47:56 Tower last message repeated 40 times

Oct 31 09:47:57 Tower emhttp: clear: 3% complete

Oct 31 09:47:58 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO

Oct 31 09:48:29 Tower last message repeated 93 times

Oct 31 09:49:30 Tower last message repeated 202 times

Oct 31 09:50:31 Tower last message repeated 175 times

Oct 31 09:50:49 Tower last message repeated 63 times

Oct 31 09:50:50 Tower emhttp: clear: 4% complete

Oct 31 09:50:51 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO

Oct 31 09:51:22 Tower last message repeated 92 times

Oct 31 09:52:24 Tower last message repeated 180 times

Oct 31 09:53:05 Tower last message repeated 131 times

Oct 31 09:53:05 Tower dhcpcd[1061]: eth0: renewing lease of 10.0.1.100

Oct 31 09:53:05 Tower dhcpcd[1061]: eth0: acknowledged 10.0.1.100 from 10.0.1.1

Oct 31 09:53:05 Tower dhcpcd[1061]: eth0: leased 10.0.1.100 for 14400 seconds

Oct 31 09:53:06 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO

Oct 31 09:53:38 Tower last message repeated 86 times

Oct 31 09:53:46 Tower last message repeated 33 times

Oct 31 09:53:47 Tower emhttp: clear: 5% complete

Oct 31 09:53:48 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO

Oct 31 09:54:19 Tower last message repeated 95 times

Oct 31 09:55:20 Tower last message repeated 186 times

Oct 31 09:56:22 Tower last message repeated 186 times

Oct 31 09:56:46 Tower last message repeated 66 times

Oct 31 09:56:47 Tower emhttp: clear: 6% complete

Oct 31 09:56:47 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO

Oct 31 09:57:18 Tower last message repeated 93 times

Oct 31 09:58:19 Tower last message repeated 179 times

Oct 31 09:59:20 Tower last message repeated 184 times

Oct 31 09:59:40 Tower last message repeated 74 times

Oct 31 09:59:41 Tower emhttp: clear: 7% complete

Oct 31 09:59:41 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO

Oct 31 10:00:12 Tower last message repeated 99 times

Oct 31 10:01:13 Tower last message repeated 167 times

Is my cache drive dying? Anything specific I can do to fix this (aside from replacing)? Thank you.

WeeboTech · October 31, 2014

do smartctl -a on the drive device (/dev/sdj from what I can see below) and post it here.

homejones · October 31, 2014

Thank you - will do and post back.

Just realized I accidentally posted this in the Unraid 6 forum - mods, please move as necessary. Thank you.

homejones · October 31, 2014

root@Tower:~# smartctl -a /dev/sdj

smartctl 6.2 2013-07-26 r3841 [i686-linux-3.9.11p-unRAID] (local build)

=== START OF INFORMATION SECTION ===

Vendor: /8:0:0:0

Product:

>> Terminate command early due to bad response to IEC mode page

A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options.

root@Tower:~# smartctl -a /dev/sdj -T permissive

smartctl 6.2 2013-07-26 r3841 [i686-linux-3.9.11p-unRAID] (local build)

=== START OF INFORMATION SECTION ===

Vendor: /8:0:0:0

Product:

scsiModePageOffset: response length too short, resp_len=47 offset=50 bd_len=46

>> Terminate command early due to bad response to IEC mode page

=== START OF READ SMART DATA SECTION ===

Error Counter logging not supported

Device does not support Self Test logging

homejones · October 31, 2014

... running two successive health tests from the command line results in:

root@Tower:~# smartctl -H /dev/sdj

smartctl 6.2 2013-07-26 r3841 [i686-linux-3.9.11p-unRAID] (local build)

=== START OF READ SMART DATA SECTION ===

SMART Health Status: OK

root@Tower:~# smartctl -H /dev/sdj

smartctl 6.2 2013-07-26 r3841 [i686-linux-3.9.11p-unRAID] (local build)

=== START OF READ SMART DATA SECTION ===

Log Sense failed, IE page [scsi response fails sanity test]

WeeboTech · October 31, 2014

I might look further in the log to see if there were other ATA errors. Maybe there are SATA interface errors.

if you can still access the cache drive, I would back it up and/or run the mover.

After that I would probably reboot the server. If the drive got into a bad state a power cycle may reset it or you may loose it 100%.

Which is why you should back it up if it's accessible.

homejones · October 31, 2014

Hoping it's an I/O issue. I am swapping motherboards next weekend, so let's see what happens when I do that. Thank you!

WeeboTech · October 31, 2014

Do a power cycle on your server, then smartctl -a on the drive. Post the syslog, let's see if that resets something internally.

I used to have an OCZ turbo model that would go offline like that intermittently.

In addition, smartd would constantly report sectors going offline, Sure did make me nervous as that was my vmware partition which had an XP instance on it.

SMART errors on SSD cache disk

Recommended Posts

homejones

Link to comment

WeeboTech

Link to comment

homejones

Link to comment

homejones

Link to comment

homejones

Link to comment

WeeboTech

Link to comment

homejones

Link to comment

WeeboTech

Link to comment

Join the conversation