drawde Posted August 2, 2013 Share Posted August 2, 2013 About 3 months ago I was having issues with my parity, replaced a bad drive and all was clean since. A few days ago I experienced a power outage, and it ran a parity check (I can't recall if it was a correct or nocorrect) and ever since I have been unable to get a clean parity check. I'm not sure what drive is causing these issues (hopefully just a drive and not another hardware issue).. Anyone have a moment to look at some SMART reports for me? EDIT: I do have a drive that every now and then will trigger a high temperature email (~40C). It's pretty much an empty drive due to the split levels I have set so it pretty much only spins up during parity check or if rebooted. Could this cause it? SMART_reports_08022013.txt Quote Link to comment
dgaschk Posted August 2, 2013 Share Posted August 2, 2013 Run memtest overnight. See here: http://lime-technology.com/forum/index.php?topic=28652.msg255110#msg255110 Quote Link to comment
drawde Posted August 2, 2013 Author Share Posted August 2, 2013 Run memtest overnight. See here: http://lime-technology.com/forum/index.php?topic=28652.msg255110#msg255110 Due to logistics reasons I cannot run memtest tonight, but will definitely tomorrow. Looking through that thread it looks like I'm gonna have a busy weekend, lol. I think I might be hoping for bad RAM at this point. Quote Link to comment
drawde Posted August 2, 2013 Author Share Posted August 2, 2013 Started memtest around 11AM this morning. Won't be home until late so it should have a decent amount of passes by then. Are any of those smart reports looking bad? Quote Link to comment
dgaschk Posted August 3, 2013 Share Posted August 3, 2013 Reports look fine. Attach a syslog. zip if needed. Quote Link to comment
drawde Posted August 3, 2013 Author Share Posted August 3, 2013 Not at home right now but if the memtest comes up clean, I need to unassign all drives and reassign them 2 at a time running parity check/rebuild inbetween to try to isolate where the errors are coming from? EDIT: memtest has been running for 12h 36m and done 6 passes so far, no errors. I think I'll let it run over night tonight as well. Sent from my Q10 using Tapatalk 2 Quote Link to comment
drawde Posted August 3, 2013 Author Share Posted August 3, 2013 I'm attaching my syslog after a fresh boot. 0 errors found after 12+ hours. syslog-2013-08-03.txt Quote Link to comment
drawde Posted August 3, 2013 Author Share Posted August 3, 2013 Parity is already corrupt. The data on 3 data disks is good. One of the data disks may be corrupt. Or the parity disk may be bad. Once the bad disk is determined if it's a data disk the contents of the drive will have to be copied to a new disk. A windows recovery tool may help. Always assign the parity drive as parity or you will lose data. Set a New Config (see image attached) Assign parity, disk1 and disk2. Start the array and build parity. Check parity. If there a are zero errors then the issue is with disk 3 or 4. Set a New Config. This run will have parity and either disk1 or disk3 assigned depending on the previous result. If all trials result in errors then try with a new parity disk did have a question regarding this.. when i assign diskN designations, do i need to use my original config? for example, i set parity, disk1 and disk2.. if this is good then i leave parity and disk1, then i set my disk3 as "disk2" or do i leave disk2 blank and assign my disk3 as "disk3" ? Quote Link to comment
drawde Posted August 3, 2013 Author Share Posted August 3, 2013 bump plz. anyone know? i'm just finishing up my first test.. parity+disk 1+disk2.. no errors so far on parity check. Quote Link to comment
dgaschk Posted August 3, 2013 Share Posted August 3, 2013 The order of the data disks does not matter. Quote Link to comment
drawde Posted August 3, 2013 Author Share Posted August 3, 2013 Thanks. I was just worried if if I set my disk3 as disk2 it'll try to rewrite me stuff with disk2's content Sent from my Q10 using Tapatalk 2 Quote Link to comment
dgaschk Posted August 3, 2013 Share Posted August 3, 2013 I suggest you leave them in the starting positions so you don't get them mixed up. The drive slots don't have to filled in order. Quote Link to comment
drawde Posted August 3, 2013 Author Share Posted August 3, 2013 Ah ok I get it. When I do the new config i'm assigning a couple drives at a time and it rebuilds the parity based on those drives, so the order doesn't matter. Sent from my Q10 using Tapatalk 2 Quote Link to comment
drawde Posted August 3, 2013 Author Share Posted August 3, 2013 Okay, parity + disk1 + disk2 = 142 errors. I'm running another test with parity + disk3 and disk4 to see if if it errors again (to see if parity drive is bad). I'm thinking disk2 is the problem as I've had issues with that slot in the past. if 0 errors, I'll run parity + disk1 again. if 0 errors again, this would confirm the issue is with disk2. in that case i would move disk2 to another slot* (see below) and assign all drives again, rebuild/resync parity and check it. if 0 errors that means everything is good and it is indeed a bad slot. i'm confident it's not the drive now that this is the 2nd issue i'm having with that same slot. the previous drive i had in there might also be good as well (i still have it). *I can do this right? put the hard drive in a different port and reassign it as disk2 just the same right? Quote Link to comment
dgaschk Posted August 4, 2013 Share Posted August 4, 2013 You'll have to copy the files off of disk 2 because it's not possible to rebuild from parity at this point. There are several Windows recovery tools that may help getting the data from the disk. Quote Link to comment
drawde Posted August 4, 2013 Author Share Posted August 4, 2013 I think the disk itself is fine, just a bad sata cable or something. I can't move it to another slot (I have 20 bays, all connected) and assign it as disk2 and do a parity sync? If that's the case I suppose I could move a sata cable instead.. Sent from my Q10 using Tapatalk 2 Quote Link to comment
dgaschk Posted August 4, 2013 Share Posted August 4, 2013 I think the disk itself is fine, just a bad sata cable or something. I can't move it to another slot (I have 20 bays, all connected) and assign it as disk2 and do a parity sync? If that's the case I suppose I could move a sata cable instead.. Sent from my Q10 using Tapatalk 2 Yes. Quote Link to comment
drawde Posted August 4, 2013 Author Share Posted August 4, 2013 hmm okay, now i'm a little confused. i was sure i knew what the problem was, but the test for disk 3 and 4 finally finished and i'm seeing 1 error. i'm gonna run disk5 and disk6 and if it errors again, i guess that would point to a bad parity drive, right? Quote Link to comment
drawde Posted August 5, 2013 Author Share Posted August 5, 2013 parity + disk1 + disk2 = 142 errors parity + disk3 + disk4 = 1 error parity + disk5 + disk6 = 0 errors Is that 1 error "acceptable"? Because it kinda ruins my theory of a bad disk2 slot, and also the 0 error test for disk5 and disk6 rules out a bad parity drive, right? or does a bad parity drive not always show errors on every test? Quote Link to comment
Joe L. Posted August 5, 2013 Share Posted August 5, 2013 If you are cleanly stopping the array (using the "Stop Array" button before powering down) then NO errors are acceptable. The single error might be caused by a un-readable sector on a disk, but you'll need to compare SMART reports on the drives involved to see if one occurred during the parity check. Quote Link to comment
drawde Posted August 5, 2013 Author Share Posted August 5, 2013 Thanks guys. I currently have the complete array back up because I didn't expect the testing to take so long.. nearly a full day for a parity sync and parity check, 2 drives at a time. I did another parity sync with the complete array and it's parity checking now. I understand I probably don't have any redundancy right now if I'm getting errors during the other tests. If I do replace all my SATA cables, i don't need to keep them in the same configuration right? meaning, each hdd doesn't need to be plugged into the same port on the mobo or RAID card, right? As long as I assign the proper disks to the proper assignments later on? It looks like I can if I follow this: http://lime-technology.com/wiki/index.php/FAQ#What_is_the_safe_way_to_rearrange_disk_numbers.2C_assignments.2C_slots.2C_etc.3F But I'm also unsure if that would work if I have errors. Quote Link to comment
JonathanM Posted August 5, 2013 Share Posted August 5, 2013 I understand I probably don't have any redundancy right now if I'm getting errors during the other tests.If you are writing anything to the array, you could be corrupting what you write as well. I understand it takes a long time to do these tests, but if you value your data, you have to get to the bottom of the issue. Something is causing still causing data corruption as long as you continue to get errors. Quote Link to comment
dgaschk Posted August 5, 2013 Share Posted August 5, 2013 Thanks guys. I currently have the complete array back up because I didn't expect the testing to take so long.. nearly a full day for a parity sync and parity check, 2 drives at a time. I did another parity sync with the complete array and it's parity checking now. I understand I probably don't have any redundancy right now if I'm getting errors during the other tests. If I do replace all my SATA cables, i don't need to keep them in the same configuration right? meaning, each hdd doesn't need to be plugged into the same port on the mobo or RAID card, right? As long as I assign the proper disks to the proper assignments later on? It looks like I can if I follow this: http://lime-technology.com/wiki/index.php/FAQ#What_is_the_safe_way_to_rearrange_disk_numbers.2C_assignments.2C_slots.2C_etc.3F But I'm also unsure if that would work if I have errors. Which physical SATA port does not matter at all. Quote Link to comment
drawde Posted August 5, 2013 Author Share Posted August 5, 2013 Okay, the confusion keeps building, lol Last checked on Mon Aug 5 19:23:51 2013 EDT (today), finding 0 errors. > Duration: 7 hours, 46 minutes, 57 seconds. Average speed: 71.4 MB/sec After assigning all my drives in original setup and starting the array, it did the parity-sync.. the parity check just finished and came up clean. I'm running another test immediately after to be sure, but maybe my unraid server just needed a reboot or two to clear out the cobwebs? Does that make any sense? Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.