Nothing happening after replacing/upgrading to larger disc

rpf717 · August 22, 2013

Thanks in advance for your help.

Yesterday I upgraded to 5.0 and had no issue with the server, it was behaving normally. Today I powered down, removed a 1TB drive and replaced it with a 3TB drive. Then powered on. Using the Web GUI, it recognized the 1TB drive was missing and I selected the 3TB to replace it. I clicked the "start array array and rebuid/expand" or whatever it is, but then nothing happens. The drives didn't all spin up to rebuild the 1TB onto the new 3TB. I could no longer access the Web GUI, the shares all disappeared and I could only access the flash drive over the nextwork.

I have attached the syslog

Any ideas?

syslog-2013-08-22.txt

mobias1313 · August 22, 2013

Do you have a 3tb parity disk currently?

rpf717 · August 22, 2013

I do.

Joe L. · August 23, 2013

I added a post in the 5.0 bugs thread here:

http://lime-technology.com/forum/index.php?topic=29035.0

Send lime-tech an e-mail pointing him to this thread if you do not see Tom respond to the bug report.

Joe L.

tr0910 · August 23, 2013

I clicked the "start array array and rebuid/expand" or whatever it is, but then nothing happens. The drives didn't all spin up to rebuild the 1TB onto the new 3TB. I could no longer access the Web GUI, the shares all disappeared and I could only access the flash drive over the nextwork.

I have seen something similar to this too. Just this AM one of my servers had a disabled 2tb disk. Pulled and replaced with a 3tb one to match the rest of the server. For at least 10 minutes, the server was unresponsive. Telnet access still worked, and unMenu worked but was very unresponsive. Finally it started the rebuild process and started behaving normally. (using rc16 on a TamSolutions intel Xeon 5130 4gb Ram - 24 bay server with 10 - 3tb disks installed.)

How long did you try to access it?

RobJ · August 24, 2013

Problem seems likely to be related to corruption in the GPT of the replacement drive. Here's the relevant parts of the syslog:

Aug 22 19:30:53 Tower kernel: ata1.00: ATA-9: WDC WD30EFRX-68AX9N0, WD-WMC1T3169627, 80.00A80, max UDMA/133

Aug 22 19:30:53 Tower kernel: ata1.00: 5860533168 sectors, multi 16: LBA48 NCQ (depth 0/32)

...

Aug 22 19:30:53 Tower kernel: scsi 1:0:0:0: Direct-Access ATA WDC WD30EFRX-68A 80.0 PQ: 0 ANSI: 5

Aug 22 19:30:53 Tower kernel: sd 1:0:0:0: Attached scsi generic sg1 type 0

Aug 22 19:30:53 Tower kernel: sd 1:0:0:0: [sdb] 5860533168 512-byte logical blocks: (3.00 TB/2.72 TiB)

...

Aug 22 19:30:53 Tower kernel: GPT:Primary header thinks Alt. header is not at the end of the disk.

Aug 22 19:30:53 Tower kernel: GPT:1565565871 != 5860533167

Aug 22 19:30:53 Tower kernel: GPT:Alternate GPT header not at the end of the disk.

Aug 22 19:30:53 Tower kernel: GPT:1565565871 != 5860533167

Aug 22 19:30:53 Tower kernel: GPT: Use GNU Parted to correct GPT errors.

Aug 22 19:30:53 Tower kernel: sdb: sdb1 sdb2

Aug 22 19:30:53 Tower kernel: sd 1:0:0:0: [sdb] Attached SCSI disk

...

Aug 22 19:30:53 Tower emhttp: Device inventory:

Aug 22 19:30:53 Tower emhttp: WDC_WD30EFRX-68AX9N0_WD-WMC1T3169627 (sdb) 2930266584

...

Aug 22 19:32:57 Tower kernel: mdcmd (2): import 1 8,16 2930266532 WDC_WD30EFRX-68AX9N0_WD-WMC1T3169627

Aug 22 19:32:57 Tower kernel: md: import disk1: [8,16] (sdb) WDC_WD30EFRX-68AX9N0_WD-WMC1T3169627 size: 2930266532

Aug 22 19:32:57 Tower kernel: md: disk1 wrong (this is normal, it's a replacement)

...

Aug 22 19:33:23 Tower emhttp: writing GPT on disk (sdb), with partition 1 offset 64, erased: 0

Aug 22 19:33:23 Tower emhttp: shcmd (36): sgdisk -Z /dev/sdb &> /dev/null

Aug 22 19:33:24 Tower emhttp: shcmd (37): sgdisk -o -a 64 -n 1:64:0 /dev/sdb |& logger

Aug 22 19:33:24 Tower kernel: sdb: unknown partition table

Aug 22 19:33:24 Tower logger: ^GCaution: invalid main GPT header, but valid backup; regenerating main header

Aug 22 19:33:24 Tower logger: from backup!

Aug 22 19:33:24 Tower logger:

Aug 22 19:33:24 Tower logger: Caution! After loading partitions, the CRC doesn't check out!

Aug 22 19:33:24 Tower logger: ^GWarning! Main partition table CRC mismatch! Loaded backup partition table

Aug 22 19:33:24 Tower logger: instead of main partition table!

Aug 22 19:33:24 Tower logger:

Aug 22 19:33:24 Tower logger: Warning! One or more CRCs don't match. You should repair the disk!

Aug 22 19:33:24 Tower logger:

Aug 22 19:33:24 Tower logger: Invalid partition data!

Aug 22 19:33:25 Tower logger: Information: Creating fresh partition table; will override earlier problems!

Aug 22 19:33:25 Tower logger: The operation has completed successfully.

Aug 22 19:33:25 Tower emhttp: shcmd (38): udevadm settle

Aug 22 19:33:25 Tower kernel: sdb: unknown partition table

Aug 22 19:33:25 Tower emhttp: Start array...

Aug 22 19:33:25 Tower kernel: mdcmd (56): start UPGRADE_DISK

Aug 22 19:33:25 Tower kernel: md: do_run: lock_rdev error: -6

Aug 22 19:33:25 Tower kernel: md1: stopping

Aug 22 19:33:25 Tower kernel: BUG: unable to handle kernel NULL pointer dereference at 000001d4

Aug 22 19:33:25 Tower kernel: IP: [<f867412c>] do_stop+0x54/0xd4 [md_mod]

Aug 22 19:33:25 Tower kernel: *pdpt = 0000000037490001 *pde = 0000000000000000

Aug 22 19:33:25 Tower kernel: Oops: 0000 [#1] SMP

...

Your replacement drive had 2 existing partitions, but also had errors in the GPT. You can see the attempts to overwrite it, to fix it, but the kernel repeatedly says "kernel: sdb: unknown partition table", so it apparently was not fixed. The rebuild begins, without a good partition table, and crashes immediately.

I do have one other recommendation, I see ata_piix being used, which usually means that in your BIOS SATA settings, you have SATA mode set to IDE emulation. I strongly urge you to change that to AHCI, anything but an IDE emulation mode. It should be slightly faster, and a little safer.

limetech · October 30, 2013

I can't reproduce the GPT issue exactly, but I do see the bug which caused the kernel crash. Fixed in 5.0.1. What will happen if this GPT issue occurs again is the array just won't Start. If anyone sees this issue, please send me an email: tomm@lime-technology.

Nothing happening after replacing/upgrading to larger disc

Recommended Posts

rpf717

Link to comment

mobias1313

Link to comment

rpf717

Link to comment

Joe L.

Link to comment

tr0910

Link to comment

RobJ

Link to comment

limetech

Link to comment

Join the conversation