I re-seated my existing disk controller cables and interface card in an attempt to diagnose the DMA errors I've been experiencing since attempting to expand my array into the last 4 empty slots. As I stated earlier, I was getting a DMA error that just locked up the server.
Since these 4 slots are not yet assigned to anything in my array, my data is safe... I do need to test these slots by reading and writing to disks in them, and this preclear_disk.sh
script is perfect for this. I can keep a disk far more active than otherwise, and at no risk to the overall parity protection of my array.
Last night I re-ran a pre-clear cycle of my tiny small 8Gig test drive It is at the end connector of the first cable of the disk controller. It ran successfully in about 25 minutes. I then tried a pre-clear of a much larger 750Gig drive. It is on the end of the second cable off of the same Promise IDE controller.
As you might have guessed, the 750Gig drive took quite a bit longer to pre-read/clear/post-read than my 8Gig drive. It took just under 10 hours for 1 cycle. It also experienced some changes to the SMART data.
The preclear_disk.sh script is designed to take a SMART status report when it starts, and another at its end, and to show you any differences between them if they exist. In my example screen-shot, the Raw_Read_Error_Rate and See_Error_Rate are un-changed, but their "raw value" changed. (last value on the line) These are not likely to be problems. The Airflow_Temperature_Cel changed... also not likely to be a problem. There was an increase in the Hardware_ECC_Recovered counter. I'll need to keep an eye on that. It indicated the hardware in the disk corrected an error it detected in reading the disk. The unRAID OS never even knew anything as the error-correction-code in the drive's firmware handled the error.
Makes you kind of wonder if all this is also happening on disks in our Windows PCs, and we are not notified by it until it fails to boot...
Here is a screen shot of how it looked when it was done:
I'm going to run this 750Gig drive through a few more pre-read/clear/post-read cycles to see if it changes any more, or if I get any more DMA errors. First, I'm going to save a copy of my syslog, as the SMART reports are logged there. That way, if I do have another DMA error lockup, the SMART report in the saved syslog will be available next time for comparison.
Note: the SMART difference output is in "diff" format. The lines with a leading "<" are from the before SMART report, the lines with a leading ">" are from the after SMART report. Lines that are unchanged are not shown at all.