rpf717 Posted August 22, 2013 Share Posted August 22, 2013 Thanks in advance for your help. Yesterday I upgraded to 5.0 and had no issue with the server, it was behaving normally. Today I powered down, removed a 1TB drive and replaced it with a 3TB drive. Then powered on. Using the Web GUI, it recognized the 1TB drive was missing and I selected the 3TB to replace it. I clicked the "start array array and rebuid/expand" or whatever it is, but then nothing happens. The drives didn't all spin up to rebuild the 1TB onto the new 3TB. I could no longer access the Web GUI, the shares all disappeared and I could only access the flash drive over the nextwork. I have attached the syslog Any ideas? syslog-2013-08-22.txt Quote Link to comment
mobias1313 Posted August 22, 2013 Share Posted August 22, 2013 Do you have a 3tb parity disk currently? Quote Link to comment
Joe L. Posted August 23, 2013 Share Posted August 23, 2013 I added a post in the 5.0 bugs thread here: http://lime-technology.com/forum/index.php?topic=29035.0 Send lime-tech an e-mail pointing him to this thread if you do not see Tom respond to the bug report. Joe L. Quote Link to comment
tr0910 Posted August 23, 2013 Share Posted August 23, 2013 I clicked the "start array array and rebuid/expand" or whatever it is, but then nothing happens. The drives didn't all spin up to rebuild the 1TB onto the new 3TB. I could no longer access the Web GUI, the shares all disappeared and I could only access the flash drive over the nextwork. I have seen something similar to this too. Just this AM one of my servers had a disabled 2tb disk. Pulled and replaced with a 3tb one to match the rest of the server. For at least 10 minutes, the server was unresponsive. Telnet access still worked, and unMenu worked but was very unresponsive. Finally it started the rebuild process and started behaving normally. (using rc16 on a TamSolutions intel Xeon 5130 4gb Ram - 24 bay server with 10 - 3tb disks installed.) How long did you try to access it? Quote Link to comment
RobJ Posted August 24, 2013 Share Posted August 24, 2013 Problem seems likely to be related to corruption in the GPT of the replacement drive. Here's the relevant parts of the syslog: Aug 22 19:30:53 Tower kernel: ata1.00: ATA-9: WDC WD30EFRX-68AX9N0, WD-WMC1T3169627, 80.00A80, max UDMA/133 Aug 22 19:30:53 Tower kernel: ata1.00: 5860533168 sectors, multi 16: LBA48 NCQ (depth 0/32) ... Aug 22 19:30:53 Tower kernel: scsi 1:0:0:0: Direct-Access ATA WDC WD30EFRX-68A 80.0 PQ: 0 ANSI: 5 Aug 22 19:30:53 Tower kernel: sd 1:0:0:0: Attached scsi generic sg1 type 0 Aug 22 19:30:53 Tower kernel: sd 1:0:0:0: [sdb] 5860533168 512-byte logical blocks: (3.00 TB/2.72 TiB) ... Aug 22 19:30:53 Tower kernel: GPT:Primary header thinks Alt. header is not at the end of the disk. Aug 22 19:30:53 Tower kernel: GPT:1565565871 != 5860533167 Aug 22 19:30:53 Tower kernel: GPT:Alternate GPT header not at the end of the disk. Aug 22 19:30:53 Tower kernel: GPT:1565565871 != 5860533167 Aug 22 19:30:53 Tower kernel: GPT: Use GNU Parted to correct GPT errors. Aug 22 19:30:53 Tower kernel: sdb: sdb1 sdb2 Aug 22 19:30:53 Tower kernel: sd 1:0:0:0: [sdb] Attached SCSI disk ... Aug 22 19:30:53 Tower emhttp: Device inventory: Aug 22 19:30:53 Tower emhttp: WDC_WD30EFRX-68AX9N0_WD-WMC1T3169627 (sdb) 2930266584 ... Aug 22 19:32:57 Tower kernel: mdcmd (2): import 1 8,16 2930266532 WDC_WD30EFRX-68AX9N0_WD-WMC1T3169627 Aug 22 19:32:57 Tower kernel: md: import disk1: [8,16] (sdb) WDC_WD30EFRX-68AX9N0_WD-WMC1T3169627 size: 2930266532 Aug 22 19:32:57 Tower kernel: md: disk1 wrong (this is normal, it's a replacement) ... Aug 22 19:33:23 Tower emhttp: writing GPT on disk (sdb), with partition 1 offset 64, erased: 0 Aug 22 19:33:23 Tower emhttp: shcmd (36): sgdisk -Z /dev/sdb &> /dev/null Aug 22 19:33:24 Tower emhttp: shcmd (37): sgdisk -o -a 64 -n 1:64:0 /dev/sdb |& logger Aug 22 19:33:24 Tower kernel: sdb: unknown partition table Aug 22 19:33:24 Tower logger: ^GCaution: invalid main GPT header, but valid backup; regenerating main header Aug 22 19:33:24 Tower logger: from backup! Aug 22 19:33:24 Tower logger: Aug 22 19:33:24 Tower logger: Caution! After loading partitions, the CRC doesn't check out! Aug 22 19:33:24 Tower logger: ^GWarning! Main partition table CRC mismatch! Loaded backup partition table Aug 22 19:33:24 Tower logger: instead of main partition table! Aug 22 19:33:24 Tower logger: Aug 22 19:33:24 Tower logger: Warning! One or more CRCs don't match. You should repair the disk! Aug 22 19:33:24 Tower logger: Aug 22 19:33:24 Tower logger: Invalid partition data! Aug 22 19:33:25 Tower logger: Information: Creating fresh partition table; will override earlier problems! Aug 22 19:33:25 Tower logger: The operation has completed successfully. Aug 22 19:33:25 Tower emhttp: shcmd (38): udevadm settle Aug 22 19:33:25 Tower kernel: sdb: unknown partition table Aug 22 19:33:25 Tower emhttp: Start array... Aug 22 19:33:25 Tower kernel: mdcmd (56): start UPGRADE_DISK Aug 22 19:33:25 Tower kernel: md: do_run: lock_rdev error: -6 Aug 22 19:33:25 Tower kernel: md1: stopping Aug 22 19:33:25 Tower kernel: BUG: unable to handle kernel NULL pointer dereference at 000001d4 Aug 22 19:33:25 Tower kernel: IP: [<f867412c>] do_stop+0x54/0xd4 [md_mod] Aug 22 19:33:25 Tower kernel: *pdpt = 0000000037490001 *pde = 0000000000000000 Aug 22 19:33:25 Tower kernel: Oops: 0000 [#1] SMP ... Your replacement drive had 2 existing partitions, but also had errors in the GPT. You can see the attempts to overwrite it, to fix it, but the kernel repeatedly says "kernel: sdb: unknown partition table", so it apparently was not fixed. The rebuild begins, without a good partition table, and crashes immediately. I do have one other recommendation, I see ata_piix being used, which usually means that in your BIOS SATA settings, you have SATA mode set to IDE emulation. I strongly urge you to change that to AHCI, anything but an IDE emulation mode. It should be slightly faster, and a little safer. Quote Link to comment
limetech Posted October 30, 2013 Share Posted October 30, 2013 I can't reproduce the GPT issue exactly, but I do see the bug which caused the kernel crash. Fixed in 5.0.1. What will happen if this GPT issue occurs again is the array just won't Start. If anyone sees this issue, please send me an email: tomm@lime-technology. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.