Potential Samsung F4 issues.


Recommended Posts

  • Replies 239
  • Created
  • Last Reply

Top Posters In This Topic

Top Posters In This Topic

Posted Images

Question for the experts : Will the mover script pickup this corruption?

No script will pick up on the problem...  There is no error returned but the data "written" to the disk is not written to the disk platter.

 

It is even worse than that... If you do a parity "check" it will read the zeros (or whatever) was in the sectors that should have been written, report a parity error and then update parity to reflect the bad data on the disk.  Ouch.

 

If you disabled the F4 disk (un-assign it) then start the array without it, then stop and re-assign it you can get the parity disk in combination with the other disks to potentially write the correct data to the F4 drive as part of its re-construction.  You'll be without parity protection until it completes, but at least your data will be good when it is done.

 

This will only work if you have not over-written parity with the stock unRAID "Check" button before attempting the drive re-construction.

 

Joe L.

Link to comment

So rsync doesn't check it has written the data correctly before deleting the original files?  or do I not understand the actual problem? :)

I did not think of that... I'm not sure if it reads it back when on the same server, or just does the checksum when used across servers.

 

HOWEVER... since the file was just written it would still be in the Linix buffer cache and if read back it would not even go to the disk itself unless the file was huge and could not all be buffered.  The file would be read back from memory not the physical disk.

Link to comment

Yeah,  I'd pretty much came to that conclusion too Joe.  Seems I have 800gb of data with a question mark hanging over it......  time to get the original media out methinks :)

If you trust the rest of your disks,

Stop the array

un-assign the F4 disk

Start the array  (this will cause unRAID to forget the serial number of the F4 disk so it can be used as its own replacement)

Stop the array

re-assign the F4 disk

Start the array.

Let it re-construct the F4 disk.  It will fix any of the "data blocks" that were never written to the disk.

 

Yes you'll be without parity protection until the disk is re-constructed, but since you are really writing back exactly what was on the disk you can recover (somewhat) if another disk were to fail by forcing parity to be trusted.  You are really only going to "potentially" update the blocks that were not correctly written originally.

 

Joe L.

Link to comment

Actually, thinking about it, under what conditions would this problem occur under normal unRaid operation?  Refreshing main.htm while writing to the dodgy disk? Using smartctl or hdparm from the command line while writing obviously.

 

At this point I'm thinking I'll just disable write cache for the drive and suck up any corruption if and when I find it.  Parity check ran on the 1st without listing any errors and I've watched maybe 50% of the films on that drive without noticing anything.

 

Thanks for the instructions Joe but my monthly Parity check ran 3 days ago so I think, if I understand correctly, any bad data is now part of the array :)

Link to comment

Oh, this is depressing....especially on December 4th  Thanks for the command line to turn off write caching, Chris.

 

By turning off the caching as described, will it remain off if the server is restarted?  If not, how can I ensure that it does?

 

I know that I have never issued the commands at the command-line prompt....but are they something that UnMenu or even UnRaid make-use-of???

 

 

Link to comment

I am building a new server and I have 7 brand new untouched F4. I have one Hitachi 7k2000 that will act as parity.

 

How do I put them into the server to avoid any problems?

 

turn off write caching before you write any data to them :-

 

hdparm -W 0 /dev/sda  (replace sda with your disk)

 

wait for a new firmware from Samsung.

 

By turning off the caching as described, will it remain off if the server is restarted?  If not, how can I ensure that it does?

 

The setting survives a reboot on my system, yes.

 

Link to comment

This is absolutely ridiculous!! There doesn't seem to be any HDD that is safe to use. Apart from maybe the Hitachi 7K2000 2TB but that hdd is noisy, runs extremely hot and expensive!

 

What about the Seagate ST32000542AS ?  or is there some problem with that one too?

 

Apparently that one has only 50,000 load/unload cycles. There was something about the firmware.

Link to comment

The problem could not be reproduced with the above test if any of the following conditions are met:

 

* Disk write cache is disabled.

 

* NCQ is disabled. This may not always be true as the c't lab also reported problems with NCQ disabled.

 

* A modified test version of smartctl which does not issue IDENTIFY DEVICE commands is used. Then all other SMART and non-SMART commands used by smartctl work without any data loss.

 

Christian Franke

 

 

 

NCQ Is disabled on my system, i run virtual machines and have 5 of these 204UI disks and have never noticed the issue and you think i would considering a virtual machine would be very sensitive to data corruption.

 

Also putting the pc in IDE mode according to christian will alleviate the issue.

 

since im running so many of these disks im going to spend some serious time trying to make this issue show up.

Link to comment

Well I can recreate the issue and can confirm that it doesn't occur if you turn off write caching.  AVOID THESE DISKS

 

Methodology :-

 

Copy large file onto the Samsung disk from another

Run smartctl -i /dev/sdf a few times in another window

Run md5sum on source and destination files.  Different checksums reported.

 

After issuing hdparm -W 0 /dev/sdf repeat the above several times, checksums are always the same.

 

EDIT : Just in case there is any doubt, I do not recommend doing the above on your live array as it will invalidate parity!

Link to comment

If you run a parity check with one of these disks in the array, it would "correct" parity to fit the corrupted data.  Parity will show one or more sync errors as a result.

 

Therefore, I would recommend only running read-only parity checks.  If you only had one F4 you would be able to use unRAID to reverse the corruption using parity and the other disks, but if you have multiple of them you wouldn't have enough info.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.