Finding parity errors after back to back tests


Recommended Posts

About 3 months ago I was having issues with my parity, replaced a bad drive and all was clean since. A few days ago I experienced a power outage, and it ran a parity check (I can't recall if it was a correct or nocorrect) and ever since I have been unable to get a clean parity check. I'm not sure what drive is causing these issues (hopefully just a drive and not another hardware issue)..

 

Anyone have a moment to look at some SMART reports for me?

 

EDIT: I do have a drive that every now and then will trigger a high temperature email (~40C). It's pretty much an empty drive due to the split levels I have set so it pretty much only spins up during parity check or if rebooted. Could this cause it?

SMART_reports_08022013.txt

Link to comment

Not at home right now but if the memtest comes up clean, I need to unassign all drives and reassign them 2 at a time running parity check/rebuild inbetween to try to isolate where the errors are coming from?

 

EDIT: memtest has been running for 12h 36m and done 6 passes so far, no errors. I think I'll let it run over night tonight as well.

 

Sent from my Q10 using Tapatalk 2

 

 

Link to comment

Parity is already corrupt.

 

The data on 3 data disks is good. One of the data disks may be corrupt. Or the parity disk may be bad. Once the bad disk is determined if it's a data disk the contents of the drive will have to be copied to a new disk. A windows recovery tool may help.

 

Always assign the parity drive as parity or you will lose data.

 

Set a New Config (see image attached)

 

Assign parity, disk1 and disk2. Start the array and build parity. Check parity. If there a are zero errors then the issue is with disk 3 or 4.

 

Set a New Config.

 

This run will have parity and either disk1 or disk3 assigned depending on the previous result.

 

If all trials result in errors then try with a new parity disk

 

did have a question regarding this.. when i assign diskN designations, do i need to use my original config?

 

for example, i set parity, disk1 and disk2.. if this is good then i leave parity and disk1, then i set my disk3 as "disk2" or do i leave disk2 blank and assign my disk3 as "disk3" ?

Link to comment

Okay, parity + disk1 + disk2 = 142 errors.

 

I'm running another test with parity + disk3 and disk4 to see if if it errors again (to see if parity drive is bad). I'm thinking disk2 is the problem as I've had issues with that slot in the past.

 

if 0 errors, I'll run parity + disk1 again. if 0 errors again, this would confirm the issue is with disk2. in that case i would move disk2 to another slot* (see below) and assign all drives again, rebuild/resync parity and check it. if 0 errors that means everything is good and it is indeed a bad slot.

 

i'm confident it's not the drive now that this is the 2nd issue i'm having with that same slot. the previous drive i had in there might also be good as well (i still have it).

 

*I can do this right? put the hard drive in a different port and reassign it as disk2 just the same right?

Link to comment

I think the disk itself is fine, just a bad sata cable or something. I can't move it to another slot (I have 20 bays, all connected) and assign it as disk2 and do a parity sync? If that's the case I suppose I could move a sata cable instead..

 

Sent from my Q10 using Tapatalk 2

 

Link to comment

I think the disk itself is fine, just a bad sata cable or something. I can't move it to another slot (I have 20 bays, all connected) and assign it as disk2 and do a parity sync? If that's the case I suppose I could move a sata cable instead..

 

Sent from my Q10 using Tapatalk 2

Yes.

Link to comment

hmm okay, now i'm a little confused. i was sure i knew what the problem was, but the test for disk 3 and 4 finally finished and i'm seeing 1 error. i'm gonna run disk5 and disk6 and if it errors again, i guess that would point to a bad parity drive, right?

Link to comment

parity + disk1 + disk2 = 142 errors

parity + disk3 + disk4 = 1 error

parity + disk5 + disk6 = 0 errors

 

Is that 1 error "acceptable"? Because it kinda ruins my theory of a bad disk2 slot, and also the 0 error test for disk5 and disk6 rules out a bad parity drive, right? or does a bad parity drive not always show errors on every test?

Link to comment

If you are cleanly stopping the array (using the "Stop Array" button before powering down) then NO errors are acceptable. 

 

The single error might be caused by a un-readable sector on a disk, but you'll need to compare SMART reports on the drives involved to see if one occurred during the parity check.

Link to comment

Thanks guys. I currently have the complete array back up because I didn't expect the testing to take so long.. nearly a full day for a parity sync and parity check, 2 drives at a time. I did another parity sync with the complete array and it's parity checking now. I understand I probably don't have any redundancy right now if I'm getting errors during the other tests.

 

If I do replace all my SATA cables, i don't need to keep them in the same configuration right? meaning, each hdd doesn't need to be plugged into the same port on the mobo or RAID card, right? As long as I assign the proper disks to the proper assignments later on? It looks like I can if I follow this: http://lime-technology.com/wiki/index.php/FAQ#What_is_the_safe_way_to_rearrange_disk_numbers.2C_assignments.2C_slots.2C_etc.3F

 

But I'm also unsure if that would work if I have errors.

Link to comment
I understand I probably don't have any redundancy right now if I'm getting errors during the other tests.
If you are writing anything to the array, you could be corrupting what you write as well. I understand it takes a long time to do these tests, but if you value your data, you have to get to the bottom of the issue. Something is causing still causing data corruption as long as you continue to get errors.
Link to comment

Thanks guys. I currently have the complete array back up because I didn't expect the testing to take so long.. nearly a full day for a parity sync and parity check, 2 drives at a time. I did another parity sync with the complete array and it's parity checking now. I understand I probably don't have any redundancy right now if I'm getting errors during the other tests.

 

If I do replace all my SATA cables, i don't need to keep them in the same configuration right? meaning, each hdd doesn't need to be plugged into the same port on the mobo or RAID card, right? As long as I assign the proper disks to the proper assignments later on? It looks like I can if I follow this: http://lime-technology.com/wiki/index.php/FAQ#What_is_the_safe_way_to_rearrange_disk_numbers.2C_assignments.2C_slots.2C_etc.3F

 

But I'm also unsure if that would work if I have errors.

 

Which physical SATA port does not matter at all.

Link to comment

Okay, the confusion keeps building, lol  :o

 

Last checked on Mon Aug 5 19:23:51 2013 EDT (today), finding 0 errors.

> Duration: 7 hours, 46 minutes, 57 seconds. Average speed: 71.4 MB/sec

 

After assigning all my drives in original setup and starting the array, it did the parity-sync.. the parity check just finished and came up clean. I'm running another test immediately after to be sure, but maybe my unraid server just needed a reboot or two to clear out the cobwebs? Does that make any sense?

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.