limetech Posted December 16, 2009 Author Share Posted December 16, 2009 Do I need to reboot the server after the change? I would assume so. I was having problems with losing contact with the server before the upgrade also. The NIC is on the motherboard and nothing has changed with the setup since I built it. Yes, please reboot. These things sometimes happen where a small timing change in the linux kernel or a NIC driver causes little 'glitches' to show up. I'll end up putting a better fix in, but for now I'd like to verify this is the problem. Quote Link to comment
Mopar_Mudder Posted December 16, 2009 Share Posted December 16, 2009 Do I need to reboot the server after the change? I would assume so. I was having problems with losing contact with the server before the upgrade also. The NIC is on the motherboard and nothing has changed with the setup since I built it. Yes, please reboot. These things sometimes happen where a small timing change in the linux kernel or a NIC driver causes little 'glitches' to show up. I'll end up putting a better fix in, but for now I'd like to verify this is the problem. I was editing my other post while you were posting. Anyway I was wondering what the safe way is to reboot when I can't get at the web interface. I can telnet but don't know the commands. I have mad the chage to the go script Thank You Quote Link to comment
kapperz Posted December 16, 2009 Share Posted December 16, 2009 My system hangs when I try to stop the array using the unRaid GUI. Here is a snippet of my syslog... Dec 16 11:37:47 unraid sshd[6188]: lastlog_filetype: Couldn't stat /var/log/lastlog: No such file or directory Dec 16 11:37:47 unraid sshd[6188]: lastlog_openseek: /var/log/lastlog is not a file or directory! Dec 16 11:37:47 unraid sshd[6188]: lastlog_filetype: Couldn't stat /var/log/lastlog: No such file or directory Dec 16 11:37:47 unraid sshd[6188]: lastlog_openseek: /var/log/lastlog is not a file or directory! Dec 16 11:51:58 unraid ntpd[1298]: synchronized to 205.209.166.11, stratum 2 Dec 16 12:27:43 unraid ntpd[1298]: synchronized to 216.45.57.38, stratum 2 Dec 16 13:01:00 unraid unmenu[1367]: ls: cannot access /boot/custom/etc/rc.d/*: No such file or directory Dec 16 13:01:23 unraid emhttp: shcmd (35): /etc/rc.d/rc.samba stop | logger Dec 16 13:01:23 unraid emhttp: shcmd (36): /etc/rc.d/rc.nfsd stop | logger Dec 16 13:01:24 unraid emhttp: Spinning up all drives... Dec 16 13:01:24 unraid emhttp: shcmd (37): sync Dec 16 13:01:24 unraid kernel: mdcmd (10174): spinup 0 Dec 16 13:01:24 unraid kernel: mdcmd (10175): spinup 1 Dec 16 13:01:24 unraid kernel: mdcmd (10176): spinup 2 Dec 16 13:01:24 unraid kernel: mdcmd (10177): spinup 3 Dec 16 13:01:24 unraid kernel: mdcmd (10178): spinup 4 Dec 16 13:01:24 unraid kernel: mdcmd (10179): spinup 5 Dec 16 13:01:24 unraid kernel: mdcmd (10180): spinup 6 Dec 16 13:01:24 unraid kernel: mdcmd (10181): spinup 7 Dec 16 13:01:32 unraid emhttp: shcmd (38): umount /mnt/user >/dev/null 2>&1 Dec 16 13:01:32 unraid emhttp: _shcmd: shcmd (38): exit status: 1 Dec 16 13:01:32 unraid emhttp: shcmd (39): rmdir /mnt/user >/dev/null 2>&1 Dec 16 13:01:32 unraid emhttp: _shcmd: shcmd (39): exit status: 1 Dec 16 13:01:32 unraid emhttp: Retry unmounting user share(s)... Dec 16 13:01:33 unraid emhttp: shcmd (40): umount /mnt/user >/dev/null 2>&1 Dec 16 13:01:33 unraid emhttp: _shcmd: shcmd (40): exit status: 1 Dec 16 13:01:33 unraid emhttp: shcmd (41): rmdir /mnt/user >/dev/null 2>&1 Dec 16 13:01:33 unraid emhttp: _shcmd: shcmd (41): exit status: 1 Dec 16 13:01:33 unraid emhttp: Retry unmounting user share(s)... Dec 16 13:01:34 unraid emhttp: shcmd (42): umount /mnt/user >/dev/null 2>&1 It just repeats like that over and over. Quote Link to comment
limetech Posted December 17, 2009 Author Share Posted December 17, 2009 Do I need to reboot the server after the change? I would assume so. I was having problems with losing contact with the server before the upgrade also. The NIC is on the motherboard and nothing has changed with the setup since I built it. Yes, please reboot. These things sometimes happen where a small timing change in the linux kernel or a NIC driver causes little 'glitches' to show up. I'll end up putting a better fix in, but for now I'd like to verify this is the problem. I was editing my other post while you were posting. Anyway I was wondering what the safe way is to reboot when I can't get at the web interface. I can telnet but don't know the commands. I have mad the chage to the go script Thank You From telnet or console, type these commands: killall emhttp emhttp & This will kill the webGui process and then re-start it. Hopefully you will be able to now communicate with the webGui in order to Stop and Reboot. If not, I'll give more instructions. Quote Link to comment
limetech Posted December 17, 2009 Author Share Posted December 17, 2009 My system hangs when I try to stop the array using the unRaid GUI. Here is a snippet of my syslog... You have an add-on, or some other server (rsync?), or a telnet/console session open with one of your share directories as the 'current directory'. Quote Link to comment
Mopar_Mudder Posted December 17, 2009 Share Posted December 17, 2009 Do I need to reboot the server after the change? I would assume so. I was having problems with losing contact with the server before the upgrade also. The NIC is on the motherboard and nothing has changed with the setup since I built it. Yes, please reboot. These things sometimes happen where a small timing change in the linux kernel or a NIC driver causes little 'glitches' to show up. I'll end up putting a better fix in, but for now I'd like to verify this is the problem. I was editing my other post while you were posting. Anyway I was wondering what the safe way is to reboot when I can't get at the web interface. I can telnet but don't know the commands. I have mad the chage to the go script Thank You From telnet or console, type these commands: killall emhttp emhttp & This will kill the webGui process and then re-start it. Hopefully you will be able to now communicate with the webGui in order to Stop and Reboot. If not, I'll give more instructions. Well I did manage to get into the web interface and stop the array and reboot it. Now I can't get into the web interface (IP or by Name) or Telnet. Sometimes I can get into the files through windows for a little bit. I can ping it all the time no problem. Also these problems were happening before I jumped to 4.5 so I don't want you to think it is something with 4.5, I was just thinking maybe I would get lucky and 4.5 would fix what ever is happening. So right now I can even get you a syslog....... Scratch that all of the sudden I got in long enough to get a log, took a couple of trys though Quote Link to comment
Joe L. Posted December 17, 2009 Share Posted December 17, 2009 Do I need to reboot the server after the change? I would assume so. I was having problems with losing contact with the server before the upgrade also. The NIC is on the motherboard and nothing has changed with the setup since I built it. Yes, please reboot. These things sometimes happen where a small timing change in the linux kernel or a NIC driver causes little 'glitches' to show up. I'll end up putting a better fix in, but for now I'd like to verify this is the problem. I was editing my other post while you were posting. Anyway I was wondering what the safe way is to reboot when I can't get at the web interface. I can telnet but don't know the commands. I have mad the chage to the go script Thank You From telnet or console, type these commands: killall emhttp emhttp & This will kill the webGui process and then re-start it. Hopefully you will be able to now communicate with the webGui in order to Stop and Reboot. If not, I'll give more instructions. Well I did manage to get into the web interface and stop the array and reboot it. Now I can't get into the web interface (IP or by Name) or Telnet. Sometimes I can get into the files through windows for a little bit. I can ping it all the time no problem. Also these problems were happening before I jumped to 4.5 so I don't want you to think it is something with 4.5, I was just thinking maybe I would get lucky and 4.5 would fix what ever is happening. So right now I can even get you a syslog....... please describe your network... Do you have a router? or a switch? Did you make the cables connecting them? or purchase them? Are they Cat-5e? or older Cat-5? Do you have fixed IP addresses on your LAN? or do you have a DHCP server to dynamically assign them? If using fixed addresses, and if two machines on your LAN were accidentally assigned the same address, you would get collisions and horrible results... What IP addreses are you using locally? is it 192.168.x.x or 10.1.x.x, or something else? Your problems sound more like networking problems and not unRAID problems. It could be a bad cable, or one wired with the wrong pins in the connector (There are two standards for wiring the connector, one for telephone use, the other for LAN use. They "pair" the conductors differently. Using the telephone "pairing" for LAN use would give very poor results if you got a connection at all (sound familiar?) I have "crimping tool" and make my own cables. Guess what the color-coded guide shows in its lid... the "telephone" standard. Following it for a LAN cable would be a huge mistake. Joe L. Quote Link to comment
Mopar_Mudder Posted December 17, 2009 Share Posted December 17, 2009 Well I thought I might have found my problem but not. The router DHPC was set to start at the same IP as the server, though maybe it was making a conflict but it shouldn't give out the address when it is in use I think. Either way I changed it and it didn't seem to help. Linksys WRV200 Router TrendNet P24C6 Switch TrendNet P24C6 Patch Panel All Cat6 wired A Standard Router IP 192.168.77.1 Computers are all static 192.168.77.10-20 Printer Static192.168.77.50 Server is Static IP 192.168.77.100 Router DHPC is set to start at 192.168.77.101-110 I have tried eliminating the switch and hooking direct to the router, no change. I am going to try and swith the cable to the server now. I am actually going to swap it with the one running to my PC which has been no problem. It will also use a different port on the switch that way. Update....... Switch cable different port in wall total different cabling and no help. I agree that is does seem like I network thing. But right no I can get into the UnMenu Web GUI but not the normal UnRaid GUI, and that has happend before. Even if I click on "unRaid Main" in UnMenu is doesn't come up. Got another SYSLOG Quote Link to comment
damien Posted December 17, 2009 Share Posted December 17, 2009 Well I thought I might have found my problem but not. The router DHPC was set to start at the same IP as the server, though maybe it was making a conflict but it shouldn't give out the address when it is in use I think. Either way I changed it and it didn't seem to help. Linksys WRV200 Router TrendNet P24C6 Switch TrendNet P24C6 Patch Panel All Cat6 wired A Standard Router IP 192.168.77.100 Computers are all static 192.168.77.10-20 Printer Static192.168.77.50 Server is Static IP 192.168.77.100 Router DHPC is set to start at 192.168.77.101-110 I have tried eliminating the switch and hooking direct to the router, no change. I am going to try and swith the cable to the server now. I am actually going to swap it with the one running to my PC which has been no problem. It will also use a different port on the switch that way. Update....... Switch cable different port in wall total different cabling and no help. I agree that is does seem like I network thing. But right no I can get into the UnMenu Web GUI but not the normal UnRaid GUI, and that has happend before. Even if I click on "unRaid Main" in UnMenu is doesn't come up. Got another SYSLOG hi your router and your server have the same ip, your router has a web interface on port 80, did you try to correct this ? Quote Link to comment
dvd.collector Posted December 17, 2009 Share Posted December 17, 2009 Since upgrading to this version, I have problems (i assume) with Joe's powerdown script. Every morning now I have a parity check start when the server boots up. Dec 16 23:00:00 Tower cache_dirs: killing cache_dirs process 2033 Dec 16 23:00:10 Tower logger: Powerdown initiated Dec 16 23:00:10 Tower logger: Shutting down Samba Dec 16 23:00:10 Tower logger: Syncing the drives Dec 16 23:00:24 Tower logger: Killing active pids on the array drives Dec 16 23:00:24 Tower logger: root 4327 4313 0 22:57 ? 00:00:00 find /mnt/disk3/MP3 -noleaf Dec 16 23:00:24 Tower logger: Umounting the drives Dec 16 23:00:27 Tower logger: Stopping the Array Dec 16 23:00:27 Tower kernel: mdcmd (6128): stop Dec 16 23:00:27 Tower kernel: md: 2 devices still in use. Dec 16 23:00:28 Tower logger: cmdOper=stop Dec 16 23:00:28 Tower logger: cmdResult=failed Dec 16 23:00:28 Tower logger: Dec 16 23:00:29 Tower logger: Saving current syslog to /boot/logs/syslog.txt There was nothing accessing the drives at 23:00, other than I guess the cache_dirs script. Quote Link to comment
olympia Posted December 17, 2009 Share Posted December 17, 2009 Tom, first of all, thank you very much for your continous and great efforts! I have two questions/comment in regards to logging: 1. Authenticated mount requests for NFS shares geting logged. Is this on purpose? Syslog seems very spammed by this. 2. It seems that folders on the cache drive begining with "." are geting logged during mover operation (they are not moved, but logged). That was an issue with 4.4.2, but I've used the mover script from one of the 4.5 betas, as per Joe.L suggested and it was OK. Is this little bug reintroduced at some point? Quote Link to comment
Joe L. Posted December 17, 2009 Share Posted December 17, 2009 Since upgrading to this version, I have problems (i assume) with Joe's powerdown script. Every morning now I have a parity check start when the server boots up. Dec 16 23:00:00 Tower cache_dirs: killing cache_dirs process 2033 Dec 16 23:00:10 Tower logger: Powerdown initiated Dec 16 23:00:10 Tower logger: Shutting down Samba Dec 16 23:00:10 Tower logger: Syncing the drives Dec 16 23:00:24 Tower logger: Killing active pids on the array drives Dec 16 23:00:24 Tower logger: root 4327 4313 0 22:57 ? 00:00:00 find /mnt/disk3/MP3 -noleaf Dec 16 23:00:24 Tower logger: Umounting the drives Dec 16 23:00:27 Tower logger: Stopping the Array Dec 16 23:00:27 Tower kernel: mdcmd (6128): stop Dec 16 23:00:27 Tower kernel: md: 2 devices still in use. Dec 16 23:00:28 Tower logger: cmdOper=stop Dec 16 23:00:28 Tower logger: cmdResult=failed Dec 16 23:00:28 Tower logger: Dec 16 23:00:29 Tower logger: Saving current syslog to /boot/logs/syslog.txt There was nothing accessing the drives at 23:00, other than I guess the cache_dirs script. You are correct. You need to stop the cache_dirs program first... (or any other program running currently accessing the disks) Once we get a true pre-shutdown trigger in version 5.0 of unRAID, we'll be able to more cleanly shut down processes like cache_dirs that we invoke. At that time, we'll all be making changes in how we start and stop add-on processes. Joe L. Quote Link to comment
Mopar_Mudder Posted December 17, 2009 Share Posted December 17, 2009 Well I thought I might have found my problem but not. The router DHPC was set to start at the same IP as the server, though maybe it was making a conflict but it shouldn't give out the address when it is in use I think. Either way I changed it and it didn't seem to help. Linksys WRV200 Router TrendNet P24C6 Switch TrendNet P24C6 Patch Panel All Cat6 wired A Standard Router IP 192.168.77.100 Computers are all static 192.168.77.10-20 Printer Static192.168.77.50 Server is Static IP 192.168.77.100 Router DHPC is set to start at 192.168.77.101-110 I have tried eliminating the switch and hooking direct to the router, no change. I am going to try and swith the cable to the server now. I am actually going to swap it with the one running to my PC which has been no problem. It will also use a different port on the switch that way. Update....... Switch cable different port in wall total different cabling and no help. I agree that is does seem like I network thing. But right no I can get into the UnMenu Web GUI but not the normal UnRaid GUI, and that has happend before. Even if I click on "unRaid Main" in UnMenu is doesn't come up. Got another SYSLOG hi your router and your server have the same ip, your router has a web interface on port 80, did you try to correct this ? Sorry that was a typo, router is 192.168.77.1 too many numbers in my head...... Quote Link to comment
dvd.collector Posted December 17, 2009 Share Posted December 17, 2009 Since upgrading to this version, I have problems (i assume) with Joe's powerdown script. Every morning now I have a parity check start when the server boots up. Dec 16 23:00:00 Tower cache_dirs: killing cache_dirs process 2033 Dec 16 23:00:10 Tower logger: Powerdown initiated Dec 16 23:00:10 Tower logger: Shutting down Samba Dec 16 23:00:10 Tower logger: Syncing the drives Dec 16 23:00:24 Tower logger: Killing active pids on the array drives Dec 16 23:00:24 Tower logger: root 4327 4313 0 22:57 ? 00:00:00 find /mnt/disk3/MP3 -noleaf Dec 16 23:00:24 Tower logger: Umounting the drives Dec 16 23:00:27 Tower logger: Stopping the Array Dec 16 23:00:27 Tower kernel: mdcmd (6128): stop Dec 16 23:00:27 Tower kernel: md: 2 devices still in use. Dec 16 23:00:28 Tower logger: cmdOper=stop Dec 16 23:00:28 Tower logger: cmdResult=failed Dec 16 23:00:28 Tower logger: Dec 16 23:00:29 Tower logger: Saving current syslog to /boot/logs/syslog.txt There was nothing accessing the drives at 23:00, other than I guess the cache_dirs script. You are correct. You need to stop the cache_dirs program first... (or any other program running currently accessing the disks) Once we get a true pre-shutdown trigger in version 5.0 of unRAID, we'll be able to more cleanly shut down processes like cache_dirs that we invoke. At that time, we'll all be making changes in how we start and stop add-on processes. Joe L. Hi Joe, It does kill the cache_dirs script, you can see it on the first line of the log above. I've simply added /boot/custom/bin/cache_dirs -q to the top of the powerdown script.. is that not enough? Quote Link to comment
Joe L. Posted December 17, 2009 Share Posted December 17, 2009 Since upgrading to this version, I have problems (i assume) with Joe's powerdown script. Every morning now I have a parity check start when the server boots up. Dec 16 23:00:00 Tower cache_dirs: killing cache_dirs process 2033 Dec 16 23:00:10 Tower logger: Powerdown initiated Dec 16 23:00:10 Tower logger: Shutting down Samba Dec 16 23:00:10 Tower logger: Syncing the drives Dec 16 23:00:24 Tower logger: Killing active pids on the array drives Dec 16 23:00:24 Tower logger: root 4327 4313 0 22:57 ? 00:00:00 find /mnt/disk3/MP3 -noleaf Dec 16 23:00:24 Tower logger: Umounting the drives Dec 16 23:00:27 Tower logger: Stopping the Array Dec 16 23:00:27 Tower kernel: mdcmd (6128): stop Dec 16 23:00:27 Tower kernel: md: 2 devices still in use. Dec 16 23:00:28 Tower logger: cmdOper=stop Dec 16 23:00:28 Tower logger: cmdResult=failed Dec 16 23:00:28 Tower logger: Dec 16 23:00:29 Tower logger: Saving current syslog to /boot/logs/syslog.txt There was nothing accessing the drives at 23:00, other than I guess the cache_dirs script. You are correct. You need to stop the cache_dirs program first... (or any other program running currently accessing the disks) Once we get a true pre-shutdown trigger in version 5.0 of unRAID, we'll be able to more cleanly shut down processes like cache_dirs that we invoke. At that time, we'll all be making changes in how we start and stop add-on processes. Joe L. Hi Joe, It does kill the cache_dirs script, you can see it on the first line of the log above. I've simply added /boot/custom/bin/cache_dirs -q to the top of the powerdown script.. is that not enough? Add a line to give it time to stop. It only looks for the absence of the lock file (which the -q removes) once each time through the loop, and the loop runes every few seconds. If you are on 4.5final, you can invoke cache_dirs with the "-B" flag so it will not create the child processes it does otherwise to keep you from seeing the old "Unformatted" messages when attempting to stop the array.. So, after the cache_dirs -q in your powerdown sequence, add a line something like this: sleep 10 before it continues onward. Quote Link to comment
peter_sm Posted December 17, 2009 Share Posted December 17, 2009 About to modify to power down script, do you need to modify the .tqz file ? or is it a better way to add cache_dirs -q when you reboot/shut-down the server ? Quote Link to comment
dvd.collector Posted December 17, 2009 Share Posted December 17, 2009 Since upgrading to this version, I have problems (i assume) with Joe's powerdown script. Every morning now I have a parity check start when the server boots up. Dec 16 23:00:00 Tower cache_dirs: killing cache_dirs process 2033 Dec 16 23:00:10 Tower logger: Powerdown initiated Dec 16 23:00:10 Tower logger: Shutting down Samba Dec 16 23:00:10 Tower logger: Syncing the drives Dec 16 23:00:24 Tower logger: Killing active pids on the array drives Dec 16 23:00:24 Tower logger: root 4327 4313 0 22:57 ? 00:00:00 find /mnt/disk3/MP3 -noleaf Dec 16 23:00:24 Tower logger: Umounting the drives Dec 16 23:00:27 Tower logger: Stopping the Array Dec 16 23:00:27 Tower kernel: mdcmd (6128): stop Dec 16 23:00:27 Tower kernel: md: 2 devices still in use. Dec 16 23:00:28 Tower logger: cmdOper=stop Dec 16 23:00:28 Tower logger: cmdResult=failed Dec 16 23:00:28 Tower logger: Dec 16 23:00:29 Tower logger: Saving current syslog to /boot/logs/syslog.txt There was nothing accessing the drives at 23:00, other than I guess the cache_dirs script. You are correct. You need to stop the cache_dirs program first... (or any other program running currently accessing the disks) Once we get a true pre-shutdown trigger in version 5.0 of unRAID, we'll be able to more cleanly shut down processes like cache_dirs that we invoke. At that time, we'll all be making changes in how we start and stop add-on processes. Joe L. Hi Joe, It does kill the cache_dirs script, you can see it on the first line of the log above. I've simply added /boot/custom/bin/cache_dirs -q to the top of the powerdown script.. is that not enough? Add a line to give it time to stop. It only looks for the absence of the lock file (which the -q removes) once each time through the loop, and the loop runes every few seconds. If you are on 4.5final, you can invoke cache_dirs with the "-B" flag so it will not create the child processes it does otherwise to keep you from seeing the old "Unformatted" messages when attempting to stop the array.. So, after the cache_dirs -q in your powerdown sequence, add a line something like this: sleep 10 before it continues onward. Hi Joe thanks for the info, although i already have sleep 10 in the script, and am invoking using - B: /boot/custom/bin/cache_dirs -w -B However it doesnt seem to solve my issues stopping the array or using the powerdown script. Should I change to sleep 20, or longer? Quote Link to comment
NAS Posted December 17, 2009 Share Posted December 17, 2009 Just a note to say the upgrade from the last beta to the stable was flawless and I have not had any issues at all with this release. Nice work Quote Link to comment
Blade Posted December 17, 2009 Share Posted December 17, 2009 I upgraded my Unraid to the 4.5 final. I was on 4.5 beta6. When I restarted the machine, the array did not restart automatically. So I went to the web page and it said upgraded disk. I started the array and it is rebuilding my disk1 drive in the array. I did not make any hardware changes. Can someone tell me why it would do this? Thx Quote Link to comment
Joe L. Posted December 17, 2009 Share Posted December 17, 2009 I upgraded my Unraid to the 4.5 final. I was on 4.5 beta6. When I restarted the machine, the array did not restart automatically. So I went to the web page and it said upgraded disk. I started the array and it is rebuilding my disk1 drive in the array. I did not make any hardware changes. Can someone tell me why it would do this? Thx Post your syslog. It is the only way to learn what it thinks happened. From your description disk1 changed in some way. Joe L. Quote Link to comment
Blade Posted December 17, 2009 Share Posted December 17, 2009 Here is my syslog. Thx for looking at it. Quote Link to comment
erikatcuse Posted December 17, 2009 Share Posted December 17, 2009 Here's a link to my SAS problem and history http://lime-technology.com/forum/index.php?topic=3109.0 Basically I have an LSI SAS card that works but is having some issues. Tom sent me a 4.5.1 to test some changes he made to get rid of one error and another has show up. The good news is I've tested all 6 of my disks on the card and received a 50MB/S sync rate. I was also able to simulate a failed disk and the parity drive did it's job I had full access to the data. Here are the errors now...if others want to help troubleshoot. After start up Dec 16 22:00:10 Backup kernel: md5: import: scsi_inquiry (std inquiry) error: -14 Dec 16 22:00:10 Backup kernel: md5: import: scsi_inquiry (vpd: unit ser no) error: -14 disk_temperature: ioctl (smart_enable): Invalid argument Trying to spin down md: disk3: ATA_OP_STANDBYNOW1 ioctl error: -22 Quote Link to comment
RobJ Posted December 18, 2009 Share Posted December 18, 2009 I upgraded my Unraid to the 4.5 final. I was on 4.5 beta6. When I restarted the machine, the array did not restart automatically. So I went to the web page and it said upgraded disk. I started the array and it is rebuilding my disk1 drive in the array. I did not make any hardware changes. Can someone tell me why it would do this? A very interesting action took place here, that caused a resize of your Disk 1, which resulted in the disk being inconsistent with the disk config in unRAID, caused a rewrite of the MBR, then a rebuild of the disk, which was perhaps unnecessary. Your Disk 1 had a Gigabyte HPA installed, and is connected to the first SATA port on the motherboard. No other drives have HPA's. Here are the relevant lines: Dec 17 16:36:25 Tower kernel: ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Dec 17 16:36:25 Tower kernel: ata7.00: HPA unlocked: 1953523055 -> 1953525168, native 1953525168 Dec 17 16:36:25 Tower kernel: ata7.00: ATA-8: WDC WD10EADS-00M2B0, 01.00A01, max UDMA/133 Dec 17 16:36:25 Tower kernel: md: import disk1: [8,96] (sdg) WDC WD10EADS-00M2B0 WD-WMAV50195436 offset: 63 size: 976762552 Dec 17 16:36:25 Tower kernel: md: disk1 wrong Dec 17 16:37:16 Tower emhttp: writing mbr on disk 1 (/dev/sdg) Dec 17 16:37:16 Tower emhttp: re-reading /dev/sdg partition table Dec 17 16:37:16 Tower kernel: sdg: sdg1 Dec 17 16:37:17 Tower kernel: mdcmd (31): start UPGRADE_DISK Dec 17 16:37:17 Tower kernel: md: recovery thread rebuilding disk1 ... Dec 17 16:37:19 Tower emhttp: resized: /mnt/disk1 There is some concern here, because we really don't know yet what actually happened, and therefore don't know yet what will happen when this rebuild reaches the last megabyte of this drive. If the size change is artificial, that is, the kernel is saying that this *should* be the true size, but the hard drive firmware has not truly removed the HPA, then there are going to be drive errors at the end of the rebuild, when the drive refuses writes to that area. If this latest kernel now includes logic to actually remove the HPA *AND* make the Gigabyte board turn off this "BIOS backup in an HPA" feature, then this is a great new feature of the kernel, and the rebuild should write zeroes into that area, clearing it. I have to wonder though if this is going to stop the Gigabyte BIOS from trying to create an HPA again on the *next* boot. It will be good to hear from other Gigabyte board owners with HPA's. What is especially interesting here, is what happens at the end of this drive, and what happens on the next boot. There is another possibility, did you perhaps find a BIOS setting that disabled this feature, and just changed it now? Perhaps the new kernel detects that and tries to recover the space. Just to be clear, there is and was nothing wrong with the drive, but the kernel has attempted to remove the HPA, which changes the size of the drive, and that makes unRAID think the drive has changed. I feel I need to caution you here, as to the action you took, especially so quickly. Any time that the Web Management indicates an action or status of a drive that is not in accord with our understanding of that drive, you really should step back and try to find out what happened first, before proceeding. When it said that the drive needed to be rebuilt, this in effect was similar to it saying that the drive needs formatting, and you would not want to proceed very quickly if you unexpectedly saw that message. A request to rebuild is effectively asking to completely overwrite a drive, in effect losing everything that was stored there (although we hope it will overwrite with what is already there). The first step to take is to check the Device assignments, to make sure that the new kernel has not changed the order of drive detection, and now a different drive is assigned there. I don't think we have had a catastrophic case like that yet, rather, device changes have simply resulted in unassigned drives, but still if the parity drive had somehow been assigned now as Disk 1, it could have resulted in the complete loss of Disk 1. I would want to make absolutely sure that the Disk 1 I will overwrite with the contents of Disk 1, is really the correct drive and serial number. After that, I would want some idea of why it is trying to overwrite this disk. It could be valid, or not, and I would very much want to know if it should not be overwritten/rebuilt. In this case, after verifying the drive assignments, all you needed to do was run the Trust My Array procedure. It would have reported a number of parity errors at the very end of the drive, but that is expected. Quote Link to comment
Blade Posted December 18, 2009 Share Posted December 18, 2009 Not sure I understood all that LOL. Yes that drive is on the 1st SATA port and does have the HPA on it. I have a Gigabyte motherboard but I did not change anything on the hardware. All I did was upgrade unraid to the 4.5 final. The rebuild is in progress and I hope it goes ok. The drive that is rebuilding is my drive I use for backups. Quote Link to comment
Blade Posted December 18, 2009 Share Posted December 18, 2009 The rebuild completed successfully. No errors at all. The drive now shows that the total size of the drive is the same as the other 1 TB drives so the HPA bios area must have been removed. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.