unRAID Server release 4.5 "final" Available


limetech

Recommended Posts

Do I need to reboot the server after the change? I would assume so.

 

I was having problems with losing contact with the server before the upgrade also. The NIC is on the motherboard and nothing has changed with the setup since I built it.

 

Yes, please reboot.  These things sometimes happen where a small timing change in the linux kernel or a NIC driver causes little 'glitches' to show up.  I'll end up putting a better fix in, but for now I'd like to verify this is the problem.

Link to comment
  • Replies 208
  • Created
  • Last Reply

Top Posters In This Topic

Do I need to reboot the server after the change? I would assume so.

 

I was having problems with losing contact with the server before the upgrade also. The NIC is on the motherboard and nothing has changed with the setup since I built it.

 

Yes, please reboot.  These things sometimes happen where a small timing change in the linux kernel or a NIC driver causes little 'glitches' to show up.  I'll end up putting a better fix in, but for now I'd like to verify this is the problem.

 

I was editing my other post while you were posting.

 

Anyway I was wondering what the safe way is to reboot when I can't get at the web interface. I can telnet but don't know the commands.

 

I have mad the chage to the go script

 

Thank You

Link to comment

My system hangs when I try to stop the array using the unRaid GUI. Here is a snippet of my syslog...

 

Dec 16 11:37:47 unraid sshd[6188]: lastlog_filetype: Couldn't stat /var/log/lastlog: No such file or directory
Dec 16 11:37:47 unraid sshd[6188]: lastlog_openseek: /var/log/lastlog is not a file or directory!
Dec 16 11:37:47 unraid sshd[6188]: lastlog_filetype: Couldn't stat /var/log/lastlog: No such file or directory
Dec 16 11:37:47 unraid sshd[6188]: lastlog_openseek: /var/log/lastlog is not a file or directory!
Dec 16 11:51:58 unraid ntpd[1298]: synchronized to 205.209.166.11, stratum 2
Dec 16 12:27:43 unraid ntpd[1298]: synchronized to 216.45.57.38, stratum 2
Dec 16 13:01:00 unraid unmenu[1367]: ls: cannot access /boot/custom/etc/rc.d/*: No such file or directory
Dec 16 13:01:23 unraid emhttp: shcmd (35): /etc/rc.d/rc.samba stop | logger
Dec 16 13:01:23 unraid emhttp: shcmd (36): /etc/rc.d/rc.nfsd stop | logger
Dec 16 13:01:24 unraid emhttp: Spinning up all drives...
Dec 16 13:01:24 unraid emhttp: shcmd (37): sync
Dec 16 13:01:24 unraid kernel: mdcmd (10174): spinup 0
Dec 16 13:01:24 unraid kernel: mdcmd (10175): spinup 1
Dec 16 13:01:24 unraid kernel: mdcmd (10176): spinup 2
Dec 16 13:01:24 unraid kernel: mdcmd (10177): spinup 3
Dec 16 13:01:24 unraid kernel: mdcmd (10178): spinup 4
Dec 16 13:01:24 unraid kernel: mdcmd (10179): spinup 5
Dec 16 13:01:24 unraid kernel: mdcmd (10180): spinup 6
Dec 16 13:01:24 unraid kernel: mdcmd (10181): spinup 7
Dec 16 13:01:32 unraid emhttp: shcmd (38): umount /mnt/user >/dev/null 2>&1
Dec 16 13:01:32 unraid emhttp: _shcmd: shcmd (38): exit status: 1
Dec 16 13:01:32 unraid emhttp: shcmd (39): rmdir /mnt/user >/dev/null 2>&1
Dec 16 13:01:32 unraid emhttp: _shcmd: shcmd (39): exit status: 1
Dec 16 13:01:32 unraid emhttp: Retry unmounting user share(s)...
Dec 16 13:01:33 unraid emhttp: shcmd (40): umount /mnt/user >/dev/null 2>&1
Dec 16 13:01:33 unraid emhttp: _shcmd: shcmd (40): exit status: 1
Dec 16 13:01:33 unraid emhttp: shcmd (41): rmdir /mnt/user >/dev/null 2>&1
Dec 16 13:01:33 unraid emhttp: _shcmd: shcmd (41): exit status: 1
Dec 16 13:01:33 unraid emhttp: Retry unmounting user share(s)...
Dec 16 13:01:34 unraid emhttp: shcmd (42): umount /mnt/user >/dev/null 2>&1

It just repeats like that over and over.

Link to comment

Do I need to reboot the server after the change? I would assume so.

 

I was having problems with losing contact with the server before the upgrade also. The NIC is on the motherboard and nothing has changed with the setup since I built it.

 

Yes, please reboot.  These things sometimes happen where a small timing change in the linux kernel or a NIC driver causes little 'glitches' to show up.  I'll end up putting a better fix in, but for now I'd like to verify this is the problem.

 

I was editing my other post while you were posting.

 

Anyway I was wondering what the safe way is to reboot when I can't get at the web interface. I can telnet but don't know the commands.

 

I have mad the chage to the go script

 

Thank You

 

From telnet or console, type these commands:

 

killall emhttp
emhttp &

 

This will kill the webGui process and then re-start it.  Hopefully you will be able to now communicate with the webGui in order to Stop and Reboot.  If not, I'll give more instructions.

Link to comment

Do I need to reboot the server after the change? I would assume so.

 

I was having problems with losing contact with the server before the upgrade also. The NIC is on the motherboard and nothing has changed with the setup since I built it.

 

Yes, please reboot.  These things sometimes happen where a small timing change in the linux kernel or a NIC driver causes little 'glitches' to show up.  I'll end up putting a better fix in, but for now I'd like to verify this is the problem.

 

I was editing my other post while you were posting.

 

Anyway I was wondering what the safe way is to reboot when I can't get at the web interface. I can telnet but don't know the commands.

 

I have mad the chage to the go script

 

Thank You

 

From telnet or console, type these commands:

 

killall emhttp
emhttp &

 

This will kill the webGui process and then re-start it.  Hopefully you will be able to now communicate with the webGui in order to Stop and Reboot.  If not, I'll give more instructions.

 

Well I did manage to get into the web interface and stop the array and reboot it. Now I can't get into the web interface (IP or by Name) or Telnet. Sometimes I can get into the files through windows for a little bit. I can ping it all the time no problem.

 

Also these problems were happening before I jumped to 4.5 so I don't want you to think it is something with 4.5, I was just thinking maybe I would get lucky and 4.5 would fix what ever is happening.

 

So right now I can even get you a syslog.......

 

Scratch that all of the sudden I got in long enough to get a log, took a couple of trys though

Link to comment

Do I need to reboot the server after the change? I would assume so.

 

I was having problems with losing contact with the server before the upgrade also. The NIC is on the motherboard and nothing has changed with the setup since I built it.

 

Yes, please reboot.  These things sometimes happen where a small timing change in the linux kernel or a NIC driver causes little 'glitches' to show up.  I'll end up putting a better fix in, but for now I'd like to verify this is the problem.

 

I was editing my other post while you were posting.

 

Anyway I was wondering what the safe way is to reboot when I can't get at the web interface. I can telnet but don't know the commands.

 

I have mad the chage to the go script

 

Thank You

 

From telnet or console, type these commands:

 

killall emhttp
emhttp &

 

This will kill the webGui process and then re-start it.  Hopefully you will be able to now communicate with the webGui in order to Stop and Reboot.  If not, I'll give more instructions.

 

Well I did manage to get into the web interface and stop the array and reboot it. Now I can't get into the web interface (IP or by Name) or Telnet. Sometimes I can get into the files through windows for a little bit. I can ping it all the time no problem.

 

Also these problems were happening before I jumped to 4.5 so I don't want you to think it is something with 4.5, I was just thinking maybe I would get lucky and 4.5 would fix what ever is happening.

 

So right now I can even get you a syslog.......

please describe your network... Do you have a router? or a switch?  Did you make the cables connecting them? or purchase them?

 

Are they Cat-5e?  or older Cat-5?  Do you have fixed IP addresses on your LAN? or do you have a DHCP server to dynamically assign them?  If using fixed addresses, and if  two machines on your LAN were accidentally assigned the same address, you would get collisions and horrible results...

 

What IP addreses are you using locally?  is it 192.168.x.x or 10.1.x.x, or something else?

 

Your problems sound more like networking problems and not unRAID problems.  It could be a bad cable, or one wired with the wrong pins in the connector (There are two standards for wiring the connector, one for telephone use, the other for LAN use.  They "pair" the conductors differently.  Using the telephone "pairing" for LAN use would give very poor results if you got a connection at all (sound familiar?)  I have "crimping tool" and make my own cables.  Guess what the color-coded guide shows in its lid... the "telephone" standard.  Following it for a LAN cable would be a huge mistake.

 

Joe L.

Link to comment

Well I thought I might have found my problem but not. The router DHPC was set to start at the same IP as the server, though maybe it was making a conflict but it shouldn't give out the address when it is in use I think. Either way I changed it and it didn't seem to help.

 

Linksys WRV200 Router

TrendNet P24C6 Switch

TrendNet P24C6 Patch Panel

All Cat6 wired A Standard

 

Router IP 192.168.77.1

Computers are all static 192.168.77.10-20

Printer Static192.168.77.50

Server is Static IP 192.168.77.100

Router DHPC is set to start at 192.168.77.101-110

 

I have tried eliminating the switch and hooking direct to the router, no change.

 

I am going to try and swith the cable to the server now. I am actually going to swap it with the one running to my PC which has been no problem. It will also use a different port on the switch that way.

 

 

Update.......

Switch cable different port in wall total different cabling and no help. I agree that is does seem like I network thing.

But right no I can get into the UnMenu Web GUI but not the normal UnRaid GUI, and that has happend before. Even if I click on "unRaid Main" in UnMenu is doesn't come up.

 

Got another SYSLOG

Link to comment

Well I thought I might have found my problem but not. The router DHPC was set to start at the same IP as the server, though maybe it was making a conflict but it shouldn't give out the address when it is in use I think. Either way I changed it and it didn't seem to help.

 

Linksys WRV200 Router

TrendNet P24C6 Switch

TrendNet P24C6 Patch Panel

All Cat6 wired A Standard

 

Router IP 192.168.77.100

Computers are all static 192.168.77.10-20

Printer Static192.168.77.50

Server is Static IP 192.168.77.100

Router DHPC is set to start at 192.168.77.101-110

 

I have tried eliminating the switch and hooking direct to the router, no change.

 

I am going to try and swith the cable to the server now. I am actually going to swap it with the one running to my PC which has been no problem. It will also use a different port on the switch that way.

 

 

Update.......

Switch cable different port in wall total different cabling and no help. I agree that is does seem like I network thing.

But right no I can get into the UnMenu Web GUI but not the normal UnRaid GUI, and that has happend before. Even if I click on "unRaid Main" in UnMenu is doesn't come up.

 

Got another SYSLOG

 

 

hi your router and your server have the same ip, your router has a web interface on port 80, did you try to correct this ?

Link to comment

Since upgrading to this version, I have problems (i assume) with Joe's powerdown script.

 

Every morning now I have a parity check start when the server boots up.

 

Dec 16 23:00:00 Tower cache_dirs: killing cache_dirs process 2033
Dec 16 23:00:10 Tower logger: Powerdown initiated
Dec 16 23:00:10 Tower logger: Shutting down Samba
Dec 16 23:00:10 Tower logger: Syncing the drives
Dec 16 23:00:24 Tower logger: Killing active pids on the array drives
Dec 16 23:00:24 Tower logger: root      4327  4313  0 22:57 ?        00:00:00 find /mnt/disk3/MP3 -noleaf
Dec 16 23:00:24 Tower logger: Umounting the drives
Dec 16 23:00:27 Tower logger: Stopping the Array
Dec 16 23:00:27 Tower kernel: mdcmd (6128): stop 
Dec 16 23:00:27 Tower kernel: md: 2 devices still in use.
Dec 16 23:00:28 Tower logger: cmdOper=stop
Dec 16 23:00:28 Tower logger: cmdResult=failed
Dec 16 23:00:28 Tower logger: 
Dec 16 23:00:29 Tower logger: Saving current syslog to /boot/logs/syslog.txt

 

There was nothing accessing the drives at 23:00, other than I guess the cache_dirs script.

Link to comment

Tom, first of all, thank you very much for your continous and great efforts!

 

I have two questions/comment in regards to logging:

1. Authenticated mount requests for NFS shares geting logged. Is this on purpose? Syslog seems very spammed by this.

2. It seems that folders on the cache drive begining with "." are geting logged during mover operation (they are not moved, but logged). That was an issue with 4.4.2, but I've used the mover script from one of the 4.5 betas, as per Joe.L suggested and it was OK. Is this little bug reintroduced at some point?

Link to comment

Since upgrading to this version, I have problems (i assume) with Joe's powerdown script.

 

Every morning now I have a parity check start when the server boots up.

 

Dec 16 23:00:00 Tower cache_dirs: killing cache_dirs process 2033
Dec 16 23:00:10 Tower logger: Powerdown initiated
Dec 16 23:00:10 Tower logger: Shutting down Samba
Dec 16 23:00:10 Tower logger: Syncing the drives
Dec 16 23:00:24 Tower logger: Killing active pids on the array drives
Dec 16 23:00:24 Tower logger: root      4327  4313  0 22:57 ?        00:00:00 find /mnt/disk3/MP3 -noleaf
Dec 16 23:00:24 Tower logger: Umounting the drives
Dec 16 23:00:27 Tower logger: Stopping the Array
Dec 16 23:00:27 Tower kernel: mdcmd (6128): stop 
Dec 16 23:00:27 Tower kernel: md: 2 devices still in use.
Dec 16 23:00:28 Tower logger: cmdOper=stop
Dec 16 23:00:28 Tower logger: cmdResult=failed
Dec 16 23:00:28 Tower logger: 
Dec 16 23:00:29 Tower logger: Saving current syslog to /boot/logs/syslog.txt

 

There was nothing accessing the drives at 23:00, other than I guess the cache_dirs script.

You are correct.  You need to stop the cache_dirs program first...  (or any other program running currently accessing the disks)

 

Once we get a true pre-shutdown trigger in version 5.0 of unRAID, we'll be able to more cleanly shut down processes like cache_dirs that we invoke.  At that time, we'll all be making changes in how we start and stop add-on processes.

 

Joe L.

Link to comment

Well I thought I might have found my problem but not. The router DHPC was set to start at the same IP as the server, though maybe it was making a conflict but it shouldn't give out the address when it is in use I think. Either way I changed it and it didn't seem to help.

 

Linksys WRV200 Router

TrendNet P24C6 Switch

TrendNet P24C6 Patch Panel

All Cat6 wired A Standard

 

Router IP 192.168.77.100

Computers are all static 192.168.77.10-20

Printer Static192.168.77.50

Server is Static IP 192.168.77.100

Router DHPC is set to start at 192.168.77.101-110

 

I have tried eliminating the switch and hooking direct to the router, no change.

 

I am going to try and swith the cable to the server now. I am actually going to swap it with the one running to my PC which has been no problem. It will also use a different port on the switch that way.

 

 

Update.......

Switch cable different port in wall total different cabling and no help. I agree that is does seem like I network thing.

But right no I can get into the UnMenu Web GUI but not the normal UnRaid GUI, and that has happend before. Even if I click on "unRaid Main" in UnMenu is doesn't come up.

 

Got another SYSLOG

 

 

hi your router and your server have the same ip, your router has a web interface on port 80, did you try to correct this ?

 

Sorry that was a typo, router is 192.168.77.1    too many numbers in my head......

Link to comment

Since upgrading to this version, I have problems (i assume) with Joe's powerdown script.

 

Every morning now I have a parity check start when the server boots up.

 

Dec 16 23:00:00 Tower cache_dirs: killing cache_dirs process 2033
Dec 16 23:00:10 Tower logger: Powerdown initiated
Dec 16 23:00:10 Tower logger: Shutting down Samba
Dec 16 23:00:10 Tower logger: Syncing the drives
Dec 16 23:00:24 Tower logger: Killing active pids on the array drives
Dec 16 23:00:24 Tower logger: root      4327  4313  0 22:57 ?        00:00:00 find /mnt/disk3/MP3 -noleaf
Dec 16 23:00:24 Tower logger: Umounting the drives
Dec 16 23:00:27 Tower logger: Stopping the Array
Dec 16 23:00:27 Tower kernel: mdcmd (6128): stop 
Dec 16 23:00:27 Tower kernel: md: 2 devices still in use.
Dec 16 23:00:28 Tower logger: cmdOper=stop
Dec 16 23:00:28 Tower logger: cmdResult=failed
Dec 16 23:00:28 Tower logger: 
Dec 16 23:00:29 Tower logger: Saving current syslog to /boot/logs/syslog.txt

 

There was nothing accessing the drives at 23:00, other than I guess the cache_dirs script.

You are correct.   You need to stop the cache_dirs program first...   (or any other program running currently accessing the disks)

 

Once we get a true pre-shutdown trigger in version 5.0 of unRAID, we'll be able to more cleanly shut down processes like cache_dirs that we invoke.  At that time, we'll all be making changes in how we start and stop add-on processes.

 

Joe L.

 

Hi Joe,

 

It does kill the cache_dirs script, you can see it on the first line of the log above.

 

I've simply added /boot/custom/bin/cache_dirs -q to the top of the powerdown script.. is that not enough?

Link to comment

Since upgrading to this version, I have problems (i assume) with Joe's powerdown script.

 

Every morning now I have a parity check start when the server boots up.

 

Dec 16 23:00:00 Tower cache_dirs: killing cache_dirs process 2033
Dec 16 23:00:10 Tower logger: Powerdown initiated
Dec 16 23:00:10 Tower logger: Shutting down Samba
Dec 16 23:00:10 Tower logger: Syncing the drives
Dec 16 23:00:24 Tower logger: Killing active pids on the array drives
Dec 16 23:00:24 Tower logger: root      4327  4313  0 22:57 ?        00:00:00 find /mnt/disk3/MP3 -noleaf
Dec 16 23:00:24 Tower logger: Umounting the drives
Dec 16 23:00:27 Tower logger: Stopping the Array
Dec 16 23:00:27 Tower kernel: mdcmd (6128): stop 
Dec 16 23:00:27 Tower kernel: md: 2 devices still in use.
Dec 16 23:00:28 Tower logger: cmdOper=stop
Dec 16 23:00:28 Tower logger: cmdResult=failed
Dec 16 23:00:28 Tower logger: 
Dec 16 23:00:29 Tower logger: Saving current syslog to /boot/logs/syslog.txt

 

There was nothing accessing the drives at 23:00, other than I guess the cache_dirs script.

You are correct.   You need to stop the cache_dirs program first...   (or any other program running currently accessing the disks)

 

Once we get a true pre-shutdown trigger in version 5.0 of unRAID, we'll be able to more cleanly shut down processes like cache_dirs that we invoke.  At that time, we'll all be making changes in how we start and stop add-on processes.

 

Joe L.

 

Hi Joe,

 

It does kill the cache_dirs script, you can see it on the first line of the log above.

 

I've simply added /boot/custom/bin/cache_dirs -q to the top of the powerdown script.. is that not enough?

Add a line to give it time to stop.  It only looks for the absence of the lock file (which the -q removes) once each time through the loop, and the loop runes every few seconds.

 

If you are on 4.5final, you can invoke cache_dirs with the "-B" flag so it will not create the child processes it does otherwise to keep you from seeing the old "Unformatted" messages when attempting to stop the array..

 

So, after the cache_dirs -q in your powerdown sequence, add a line something like this:

sleep 10

before it continues onward.

Link to comment

Since upgrading to this version, I have problems (i assume) with Joe's powerdown script.

 

Every morning now I have a parity check start when the server boots up.

 

Dec 16 23:00:00 Tower cache_dirs: killing cache_dirs process 2033
Dec 16 23:00:10 Tower logger: Powerdown initiated
Dec 16 23:00:10 Tower logger: Shutting down Samba
Dec 16 23:00:10 Tower logger: Syncing the drives
Dec 16 23:00:24 Tower logger: Killing active pids on the array drives
Dec 16 23:00:24 Tower logger: root      4327  4313  0 22:57 ?        00:00:00 find /mnt/disk3/MP3 -noleaf
Dec 16 23:00:24 Tower logger: Umounting the drives
Dec 16 23:00:27 Tower logger: Stopping the Array
Dec 16 23:00:27 Tower kernel: mdcmd (6128): stop 
Dec 16 23:00:27 Tower kernel: md: 2 devices still in use.
Dec 16 23:00:28 Tower logger: cmdOper=stop
Dec 16 23:00:28 Tower logger: cmdResult=failed
Dec 16 23:00:28 Tower logger: 
Dec 16 23:00:29 Tower logger: Saving current syslog to /boot/logs/syslog.txt

 

There was nothing accessing the drives at 23:00, other than I guess the cache_dirs script.

You are correct.   You need to stop the cache_dirs program first...   (or any other program running currently accessing the disks)

 

Once we get a true pre-shutdown trigger in version 5.0 of unRAID, we'll be able to more cleanly shut down processes like cache_dirs that we invoke.  At that time, we'll all be making changes in how we start and stop add-on processes.

 

Joe L.

 

Hi Joe,

 

It does kill the cache_dirs script, you can see it on the first line of the log above.

 

I've simply added /boot/custom/bin/cache_dirs -q to the top of the powerdown script.. is that not enough?

Add a line to give it time to stop.   It only looks for the absence of the lock file (which the -q removes) once each time through the loop, and the loop runes every few seconds.

 

If you are on 4.5final, you can invoke cache_dirs with the "-B" flag so it will not create the child processes it does otherwise to keep you from seeing the old "Unformatted" messages when attempting to stop the array..

 

So, after the cache_dirs -q in your powerdown sequence, add a line something like this:

sleep 10

before it continues onward.

 

Hi Joe thanks for the info, although i already have sleep 10 in the script, and am invoking using - B:

 

/boot/custom/bin/cache_dirs -w -B

 

However it doesnt seem to solve my issues stopping the array or using the powerdown script.

 

Should I change to sleep 20, or longer?

Link to comment

I upgraded my Unraid to the 4.5 final. I was on 4.5 beta6.

When I restarted the machine, the array did not restart automatically.

So I went to the web page and it said upgraded disk. I started the array and it is rebuilding my disk1 drive in the array.

I did not make any hardware changes.

 

Can someone tell me why it would do this?

 

Thx

Link to comment

I upgraded my Unraid to the 4.5 final. I was on 4.5 beta6.

When I restarted the machine, the array did not restart automatically.

So I went to the web page and it said upgraded disk. I started the array and it is rebuilding my disk1 drive in the array.

I did not make any hardware changes.

 

Can someone tell me why it would do this?

 

Thx

Post your syslog.  It is the only way to learn what it thinks happened.  From your description disk1 changed in some way.

 

Joe L.

Link to comment

Here's a link to my SAS problem and history http://lime-technology.com/forum/index.php?topic=3109.0

 

Basically I have an LSI SAS card that works but is having some issues.  Tom sent me a 4.5.1 to test some changes he made to get rid of one error and another has show up. 

 

The good news is I've tested all 6 of my disks on the card and received a 50MB/S sync rate.  I was also able to simulate a failed disk and the parity drive did it's job I had full access to the data.

 

Here are the errors now...if others want to help troubleshoot.

 

After start up

 

Dec 16 22:00:10 Backup kernel: md5: import: scsi_inquiry (std inquiry) error: -14

Dec 16 22:00:10 Backup kernel: md5: import: scsi_inquiry (vpd: unit ser no) error: -14

 

disk_temperature: ioctl (smart_enable): Invalid argument

 

Trying to spin down

 

md: disk3: ATA_OP_STANDBYNOW1 ioctl error: -22

Link to comment

I upgraded my Unraid to the 4.5 final. I was on 4.5 beta6.

When I restarted the machine, the array did not restart automatically.

So I went to the web page and it said upgraded disk. I started the array and it is rebuilding my disk1 drive in the array.

I did not make any hardware changes.

 

Can someone tell me why it would do this?

 

A very interesting action took place here, that caused a resize of your Disk 1, which resulted in the disk being inconsistent with the disk config in unRAID, caused a rewrite of the MBR, then a rebuild of the disk, which was perhaps unnecessary.  Your Disk 1 had a Gigabyte HPA installed, and is connected to the first SATA port on the motherboard.  No other drives have HPA's.  Here are the relevant lines:

Dec 17 16:36:25 Tower kernel: ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 300)

Dec 17 16:36:25 Tower kernel: ata7.00: HPA unlocked: 1953523055 -> 1953525168, native 1953525168

Dec 17 16:36:25 Tower kernel: ata7.00: ATA-8: WDC WD10EADS-00M2B0, 01.00A01, max UDMA/133

 

Dec 17 16:36:25 Tower kernel: md: import disk1: [8,96] (sdg) WDC WD10EADS-00M2B0                          WD-WMAV50195436 offset: 63 size: 976762552

Dec 17 16:36:25 Tower kernel: md: disk1 wrong

 

Dec 17 16:37:16 Tower emhttp: writing mbr on disk 1 (/dev/sdg)

Dec 17 16:37:16 Tower emhttp: re-reading /dev/sdg partition table

Dec 17 16:37:16 Tower kernel:  sdg: sdg1

Dec 17 16:37:17 Tower kernel: mdcmd (31): start UPGRADE_DISK

 

Dec 17 16:37:17 Tower kernel: md: recovery thread rebuilding disk1 ...

 

Dec 17 16:37:19 Tower emhttp: resized: /mnt/disk1

 

There is some concern here, because we really don't know yet what actually happened, and therefore don't know yet what will happen when this rebuild reaches the last megabyte of this drive.  If the size change is artificial, that is, the kernel is saying that this *should* be the true size, but the hard drive firmware has not truly removed the HPA, then there are going to be drive errors at the end of the rebuild, when the drive refuses writes to that area.  If this latest kernel now includes logic to actually remove the HPA *AND* make the Gigabyte board turn off this "BIOS backup in an HPA" feature, then this is a great new feature of the kernel, and the rebuild should write zeroes into that area, clearing it.  I have to wonder though if this is going to stop the Gigabyte BIOS from trying to create an HPA again on the *next* boot.  It will be good to hear from other Gigabyte board owners with HPA's.  What is especially interesting here, is what happens at the end of this drive, and what happens on the next boot.

 

There is another possibility, did you perhaps find a BIOS setting that disabled this feature, and just changed it now?  Perhaps the new kernel detects that and tries to recover the space.

 

Just to be clear, there is and was nothing wrong with the drive, but the kernel has attempted to remove the HPA, which changes the size of the drive, and that makes unRAID think the drive has changed.

 

I feel I need to caution you here, as to the action you took, especially so quickly.  Any time that the Web Management indicates an action or status of a drive that is not in accord with our understanding of that drive, you really should step back and try to find out what happened first, before proceeding.  When it said that the drive needed to be rebuilt, this in effect was similar to it saying that the drive needs formatting, and you would not want to proceed very quickly if you unexpectedly saw that message.  A request to rebuild is effectively asking to completely overwrite a drive, in effect losing everything that was stored there (although we hope it will overwrite with what is already there).  The first step to take is to check the Device assignments, to make sure that the new kernel has not changed the order of drive detection, and now a different drive is assigned there.  I don't think we have had a catastrophic case like that yet, rather, device changes have simply resulted in unassigned drives, but still if the parity drive had somehow been assigned now as Disk 1, it could have resulted in the complete loss of Disk 1.  I would want to make absolutely sure that the Disk 1 I will overwrite with the contents of Disk 1, is really the correct drive and serial number.  After that, I would want some idea of why it is trying to overwrite this disk.  It could be valid, or not, and I would very much want to know if it should not be overwritten/rebuilt.

 

In this case, after verifying the drive assignments, all you needed to do was run the Trust My Array procedure.  It would have reported a number of parity errors at the very end of the drive, but that is expected.

Link to comment

Not sure I understood all that LOL.

Yes that drive is on the 1st SATA port and does have the HPA on it. I have a Gigabyte motherboard but I did not change anything on the hardware. All I did was upgrade unraid to the 4.5 final.

The rebuild is in progress and I hope it goes ok. The drive that is rebuilding is my drive I use for backups.

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.