Blade Posted February 1, 2010 Share Posted February 1, 2010 Ever since updating to 4.5 final I have been getting parity error because of the HPA on my drive. Previous to 4.5 final it was not an issue at all. I just lived with the first data drive losing a bit of space to the HPA from the Gigabyte motherboard. I have since disabled the bios backup on the unraid server in the bios but when the parity ran today I got errors Last checked on 2/1/2010 12:07:33 PM, finding 3102043 errors How can I resolve this and get it back to 0 errors? I have 11 data drives and 1 parity drive. Thx Link to comment
Joe L. Posted February 1, 2010 Share Posted February 1, 2010 Ever since updating to 4.5 final I have been getting parity error because of the HPA on my drive. Previous to 4.5 final it was not an issue at all. I just lived with the first data drive losing a bit of space to the HPA from the Gigabyte motherboard. I have since disabled the bios backup on the unraid server in the bios but when the parity ran today I got errors Last checked on 2/1/2010 12:07:33 PM, finding 3102043 errors How can I resolve this and get it back to 0 errors? I have 11 data drives and 1 parity drive. Thx Step 1. Post a syslog. Step 2. Run a second parity check. The first should have already fixed the parity. If the second finds no additional parity errors, you are done Step 3 stop the array, reboot, make sure the HPA is not added once more. Step 4. Post a syslog. Joe L. Link to comment
Blade Posted February 1, 2010 Author Share Posted February 1, 2010 Here is my syslog right now. I will run another parity check tonight. Can I just run it from the http://tower web page? I think this is right. Can anything be accessing the tower when a parity check is running? syslog.zip Link to comment
Joe L. Posted February 1, 2010 Share Posted February 1, 2010 Here is my syslog right now. I will run another parity check tonight. Can I just run it from the http://tower web page? I think this is right. Yes Can anything be accessing the tower when a parity check is running? You can use the server as usual. Link to comment
Blade Posted February 1, 2010 Author Share Posted February 1, 2010 I just started a parity check and it has already found over 500 sync errors. I have no idea why the errors are not getting fixed. Please help Link to comment
Blade Posted February 1, 2010 Author Share Posted February 1, 2010 OMG 1.8% complete and well over 15,000 sync errors. UPDATE: 4.2% complete and over 35,000 sync errors. Something is definitely wrong. I need help. Why is the parity check not fixing the sync errors? Link to comment
prostuff1 Posted February 1, 2010 Share Posted February 1, 2010 OMG 1.8% complete and well over 15,000 sync errors. UPDATE: 4.2% complete and over 35,000 sync errors. Something is definitely wrong. I need help. Why is the parity check not fixing the sync errors? Need a syslog Link to comment
Blade Posted February 1, 2010 Author Share Posted February 1, 2010 I posted a zip of it a few posts up. Link to comment
Blade Posted February 1, 2010 Author Share Posted February 1, 2010 Here is a syslog of the parity check I just started. I stopped the parity check tho because I was still seeing a lot of errors. Here is the log syslog.zip Link to comment
Blade Posted February 2, 2010 Author Share Posted February 2, 2010 Here is a little info on my system I have 1 parity and 11 data drives 6 SATA ports on the motherboard I have a SATAII CARD ROSEWILL|RC-218 RETAIL - Retail ---- 4 drives on this one I have a SYBA SD-SA2PEX-2IR PCI Express SATA II Controller Card - Retail ---- 2 drives on this one I did not have a problem until I upgraded to 4.5 final which I guess ignores the HPA or something. I did have my Gigabyte motherboard set to use the bios backup on data disk1 which was no problem until I upgraded to 4.5 final. I had been using 4.5 beta 12 for the longest time with not 1 parity error ever. I have since upgraded the Gigabyte bios and set it to not backup the bios. I am really lost on this one but I really want good parity in case a drive fails. I would hate to lose of all my hours of loading my blu rays. Link to comment
Blade Posted February 2, 2010 Author Share Posted February 2, 2010 I ran a full parity check again last night and it finished this morning with 1.5 million sync errors. I tried running the check again this morning and it keeps giving me lots of errors. I am wondering if switching back to 4.5 beta 12 will help me or not. Should I just re-enable the bios backup in the bios. All of the data is perfectly fine. I just have no idea how to fix this. Link to comment
Blade Posted February 2, 2010 Author Share Posted February 2, 2010 Here is the syslog from the parity check I ran last night syslog.zip Link to comment
Joe L. Posted February 2, 2010 Share Posted February 2, 2010 It is a bit confusing to me too. I'm trying to understand what is happening. I would expect parity sync errors one time... but not again and again, unless the bios is continuing to add its data to the area it thinks it reserved. The issue is that the size of the reiserfs partition on the disk, as stated in its definition in the partition, is larger than the partition with the added HPA. What apparently happened is the file-system was created before the HPA was added. Then, then BIOS added the HPA, cutting off the last megabyte of space from the disk, but NOT changing the partition size as defined in the partition table in the MBR. So.... disk was originally manufactured with a size of 500107862016 bytes. ( 976773168 sectors of 512 bytes) Disk installed in unRAID array. Partition 1 created of 976773105 sectors. (It starts at sector 63, sector 1 through 62 are unused, sector 0 = MBR) We do not know what size the reiserfs is in the partition, but let's assume it was created when the disk was full size, therefore it expects to be able to use the entire set of 976773105 sectors. Now, your BIOS adds an HPA, making the disk size to be reported as 500106780160 bytes. ( 1081856 bytes, or 2113 sectors smaller) apparently, the older version of Linux either did not look at the disparity of partition size vs reported disk size... the newer kernel does. It is trying to help by using the full size of the disk when it detects the first partition extends beyond the artificially shortened physical disk, but fits in the actual physical disk. It does that here: Jan 1 12:00:23 Tower kernel: hdc: Host Protected Area detected. Jan 1 12:00:23 Tower kernel: ^Icurrent capacity is 976771055 sectors (500106 MB) Jan 1 12:00:23 Tower kernel: ^Inative capacity is 976773168 sectors (500107 MB) Jan 1 12:00:23 Tower kernel: hdc: 976771055 sectors (500106 MB) w/16384KiB Cache, CHS=60801/255/63 Jan 1 12:00:23 Tower kernel: hdc: cache flushes supported Jan 1 12:00:23 Tower kernel: hdc: hdc1 Jan 1 12:00:23 Tower kernel: hdc: p1 size 976773105 exceeds device capacity, enabling native capacity Jan 1 12:00:23 Tower kernel: hdc: detected capacity change from 500106780160 to 500107862016 The fix is to get rid of the HPA, to make sure the reiserfs is not corrupt in the larger space, or keep the HPA, and re-size the partition to fit the smaller space, and again check/fix the file-system to fit in the partition. As far as the parity errors.... I'm still not certain what is going on. I'd expect it to fix itself the first time it runs, and over a million parity errors could occur once.... but they should not occur a second time as it should have corrected the parity the first time. You might send support@lime-technology an e-mail pointing Tom to this thread... He may have ideas about the parity calcs. In the interim, I'd revert back to the older version of unRAID you were using and see if the parity errors still occur. Right now I'd make copies of any critical files on your server as I do not trust your ability to recover from a disk failure. Basically, I think you need to: 1. disable the HPA creation going forward (BIOS config option/update) 2. permanently reset the HPA. You can probably do that using the hdparm command. 3. ensure the partition is sized correctly (It probably is already, as it is being detected as the actual full size of the disk) 4. check the reiserfs file system to be certain it has no corruption. 5. check parity once more. 6. reboot, make sure BIOS does not put HPA on again, or on a different disk 7. re-check parity a final time. Joe L. Link to comment
Blade Posted February 2, 2010 Author Share Posted February 2, 2010 Thanks Joe. I appreciate you looking at this. I am really lost of this one. I am very new to Unraid. I have only had it running for 6 months or so. I sent an email to Tom and pointed him to this thread. I really hope I can salvage this and get my parity correct. I would hate to lose my data on 11 disks. I have disabled the bios backup in the gigabyte bios settings. This definitely started with 4.5 final install here. I really need a step by step procedure as I do not trust myself to go off and figure this one out. God I hope Tom can help. Link to comment
Blade Posted February 3, 2010 Author Share Posted February 3, 2010 OK so no reply from Tom yet. I hope he got my email. Link to comment
lionelhutz Posted February 4, 2010 Share Posted February 4, 2010 I would immediately go back to the version of unRAID that worked and ensure you once again have a properly parity build/check. Then, I would boot from a boot CD and remove the HPA. I use the Ultimate Boot Disk and one of the HD utils last time I had to do this. Reboot unRAID and do a file system check on that drive. Check the array again to ensure you're still getting a good parity check. If the above works, then you are ready to upgrade again. There is a command to tell unRAID to do a parity check without doing any correcting. You can use this to test without hurting any data or the parity. So, if you did happen to lose a drive you could go back to the working version of unRAID and then replace it. This is why I think getting back to a system that has a good parity check is something you should do right away. If you start to work through this and need help with any of the steps then let us know how you're making out. Someone can tell you the next thing to do. Once thing to note. If the motherboard is still writing the HPA each boot you'll get an error that it can only be read once or something like that which means you will not be able to erase it. Peter Link to comment
Joe L. Posted February 4, 2010 Share Posted February 4, 2010 Before you do anything, if you have any critical data that is irreplaceable on the server, make a backup copy elsewhere. Then, a quick question... Did you add a boot parameter to config/syslinux.cfg in an attempt to work around an HPA on an earlier release of unRAID and forgotten it is there? Perhaps the new 4.5 release is actually using it now. I mention this because I see this line in the syslog: Jan 1 12:00:23 Tower kernel: Kernel command line: initrd=bzroot rootdelay=10 libata.ignore_hpa=1 BOOT_IMAGE=bzimage Apparently this line is being respected by the SATA driver on /dev/sdh as seen here: Jan 1 12:00:23 Tower kernel: ata7.00: HPA unlocked: 1953523055 -> 1953525168, native 1953525168 Jan 1 12:00:23 Tower kernel: ata7.00: ATA-8: WDC WD10EADS-00M2B0, 01.00A01, max UDMA/133 Jan 1 12:00:23 Tower kernel: ata7.00: 1953525168 sectors, multi 16: LBA48 NCQ (depth 31/32) Jan 1 12:00:23 Tower kernel: ata7.00: configured for UDMA/133 Jan 1 12:00:23 Tower kernel: scsi 7:0:0:0: Direct-Access ATA WDC WD10EADS-00M 01.0 PQ: 0 ANSI: 5 Jan 1 12:00:23 Tower kernel: sd 7:0:0:0: [sdh] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB) Jan 1 12:00:23 Tower kernel: sd 7:0:0:0: [sdh] Write Protect is off Jan 1 12:00:23 Tower kernel: sd 7:0:0:0: [sdh] Mode Sense: 00 3a 00 00 Jan 1 12:00:23 Tower kernel: sd 7:0:0:0: [sdh] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA You also seem to have an HPA on one of your older PATA disks /dev/hdc: Jan 1 12:00:23 Tower kernel: hdc: Host Protected Area detected. Jan 1 12:00:23 Tower kernel: ^Icurrent capacity is 976771055 sectors (500106 MB) Jan 1 12:00:23 Tower kernel: ^Inative capacity is 976773168 sectors (500107 MB) Jan 1 12:00:23 Tower kernel: hdc: 976771055 sectors (500106 MB) w/16384KiB Cache, CHS=60801/255/63 Jan 1 12:00:23 Tower kernel: hdc: cache flushes supported Jan 1 12:00:23 Tower kernel: hdc: hdc1 Jan 1 12:00:23 Tower kernel: hdc: p1 size 976773105 exceeds device capacity, enabling native capacity Jan 1 12:00:23 Tower kernel: hdc: detected capacity change from 500106780160 to 500107862016 Perhaps you can try without the added boot code in syslinux.cfg. (you'll need to change it, then reboot) Next, before you do anything to reset the HPA it is best to know how the disk was partitioned. (was it partitioned with the HPA in place, or without it?) Does the partition end at the physical end of the disk, or the artificial HPA end of the disk? Here are several commands you can run to help us know how the disks are currently partitioned: Type sfdisk -g /dev/hdc and sfdisk -g /dev/sdh and blockdev --getsz /dev/hdc and blockdev --getsz /dev/sdh and fdisk -l -u /dev/hdc and fdisk -l -u /dev/sdh and od -x -A d /dev/hdc | head and lastly od -x -A d /dev/sdh | head Those are all different ways of displaying the geometry of the drives and the current partitioning. If the partitions are the correct size, then we can go forward and check file system for errors... Otherwise, we need to fix the partitioning. None of the above commands will modify the disk, all they do is read it. As a double-check, you can download, unzip, and invoke a script I wrote for the other thread to examine a disk's partition table to see if it is correctly partitioned. You can find the script attached to this post: http://lime-technology.com/forum/index.php?topic=5072.msg47122#msg47122 you would run it on your disk as follows: unraid_partition_disk.sh /dev/sdh and unraid_partition_disk.sh /dev/hdc The unraid_partition_disk.sh script makes no change to the disk unless you request it to by using the "-p" option. If you use the "-p" option it only re-creates the partition table and MBR if you respond to a "are you sure" prompt, so it is also safe to use on your existing disk as shown in the example above (without the -p option) Do not make any changes to the disks until after you first report back on their current partitioning. (in other words, forget there is a "-p" option for now ) This post describes how to use the hdparm command to reset an HPA. http://lime-technology.com/forum/index.php?topic=5072.msg46903#msg46903 Obviously, you'll need to use the correct value for your drives, but you do not need to use another distribution or boot up a different OS. For you to see the current native size and HPA you would type: hdparm -N /dev/sdh To set the disk to use its full native size you would use: hdparm -N p1953525168 /dev/sdh (the "1953525168" is the native size as reported in your syslog for that disk. Preceding it with a "p" is the syntax for the hdparm command to make the change permanent. ) followed by hdparm -N /dev/sdh to see if it worked. For your smaller /dev/hdc disk, the hdparm command to reset the HPA would be: hdparm -N p976773168 /dev/hdc followed by hdparm -N /dev/hdc to see if it was effective. You can read about the hdparm command here: http://lime-technology.com/forum/index.php?topic=4194 Please verify the numbers I've given by the output of an initial hdparm -N on each drive. If the HPA was able to be removed/reset to full size of the drive, then a reboot might get you to where all the disks will mount. (there still might be corruption of the file-system, so first priority is to get the disk to report its full size.) So... there are some preliminary steps you can take to get rid of the HPA on /dev/sdh. Before you do anything, if you have any critical data that is irreplaceable on the server, make a backup copy elsewhere. As already mentioned, if you can remove the ignore_hpa boot code and proceed on 4.5 without any issues, do that first... post another syslog. If need be, revert back to your older release and get to where parity checks are clean (although you'll probably need to do them twice, first to set things correctly, second to verify) We really want to get to where we do not suspect any hardware issues. If it makes you feel any better, the million parity errors are probably all in the HPA area of the disks (that last million or so bytes it is reserving) and would not affect your recovery of a failed disk... but they sure don't make me feel comfortable that they do not go away... That still has me stumped. Joe L. Link to comment
Blade Posted February 4, 2010 Author Share Posted February 4, 2010 This is my syslinux.cfg file: default menu.c32 menu title Lime Technology LLC prompt 0 timeout 50 label unRAID OS menu default kernel bzimage append initrd=bzroot rootdelay=10 libata.ignore_hpa=1 label Memtest86+ kernel memtest Link to comment
Joe L. Posted February 4, 2010 Share Posted February 4, 2010 apparently you added the libata.ignore_hpa=1 to what was originally there? Or, did it come that way on a drive you purchased or software you downloaded? Link to comment
Blade Posted February 4, 2010 Author Share Posted February 4, 2010 It was added a while ago when I saw some responses to the HPA issue with Giagbyte motherboards So my first step should be the following: I should change my syslinux.cfg file to the following and reboot default menu.c32 menu title Lime Technology LLC prompt 0 timeout 50 label unRAID OS menu default kernel bzimage append initrd=bzroot rootdelay=10 label Memtest86+ I want to do one thing at a time and then report back. If this is the first thing I should do, let me know and I will do it and post a syslog upon a reboot. Thx Link to comment
Joe L. Posted February 4, 2010 Share Posted February 4, 2010 So my first step should be the following: I should change my syslinux.cfg file to the following and reboot default menu.c32 menu title Lime Technology LLC prompt 0 timeout 50 label unRAID OS menu default kernel bzimage append initrd=bzroot rootdelay=10 label Memtest86+ I want to do one thing at a time and then report back. If this is the first thing I should do, let me know and I will do it and post a syslog upon a reboot. Thx I would FIRST run all the commands I gave to print the existing partitioning on those two drives. I'd hate for an HPA to be recognized, and mess you up in some other way (corrupted file-system??) since I'm certain your syslog is saying the partition goes to the physical end of the drive, not the artificially smaller end made by the presence of the HPA. Joe L. Link to comment
Joe L. Posted February 4, 2010 Share Posted February 4, 2010 Then, I'd revert back to the earlier version of unRAID..., leaving the syslinux.cg as it is for now, just to be sure the newer 4.5 version is not causing your issues. Do you remember adding the ignore_hpa boot code? Was it on the older release? Joe L. Link to comment
Blade Posted February 4, 2010 Author Share Posted February 4, 2010 Yes I added that a while back when the HPA issue surfaced. I am removing it now and rebooting I will post back shortly. Link to comment
Blade Posted February 4, 2010 Author Share Posted February 4, 2010 OK I removed that from the syslinux.cfg file and rebooted. I kept the unraid version the same. I now have a message saying that the "Replacment disk is too small." and is red on disk1 which is /dev/sdh SO right now my tower is not started. I have attached the syslog. syslog.zip Link to comment
Joe L. Posted February 4, 2010 Share Posted February 4, 2010 You'll probably need to put to back in the syslinux.cfg for now and reboot once more. (Obviously, it has an effect) Guess you did not read my previous post in time to print the existing partitioning before changing anything. Link to comment
Recommended Posts
Archived
This topic is now archived and is closed to further replies.