Jump to content

[Solved] Some Unraid Errors - Help Please


robinsj

Recommended Posts

Okay, so I have been using unraid for awhile now, and not had any issues.  I just added another supermicro AOC-SAT2-MV8, so now I have 2.  At the same time I did that, I changed out my 1.5tb parity drive for a 2tb parity drive, and added the old parity drive to the array.  Everything was going well, it was rebuilding the parity data, then tonight it just locked up.  I couldn't pull up the webpage, I couldn't telnet into it, a ping was even non-responsive.  I had no monitor hooked up to it, so couldn't see anything on screen.  I hooked up a monitor to it, but nothing come on, which would happen if it was completely locked up.  So I had to do a Hard reset.    I have left the monitor hooked up, just in case it locks up again and I can see what is happening.  In the mean time it has booted back up, and has restarted building the parity drive.

 

I took a look at the system log and looks like there is some errors as posted below. I am running unraid 4.7.  I have attached the whole system log to the post also.  Any help is very very much appreciated.  I have no idea what they mean....

 

Feb 10 23:47:11 Tower kernel: ------------[ cut here ]------------

Feb 10 23:47:11 Tower kernel: WARNING: at fs/proc/generic.c:590 proc_register+0x11c/0x14b() (Minor Issues)

Feb 10 23:47:11 Tower kernel: Hardware name: P5Q Premium

Feb 10 23:47:11 Tower kernel: proc_dir_entry 'scsi_tgt/mvst_scst' already registered (Drive related)

Feb 10 23:47:11 Tower kernel: Modules linked in: mvsas(+) libsas scst scsi_transport_sas (Drive related)

Feb 10 23:47:11 Tower kernel: Pid: 890, comm: modprobe Not tainted 2.6.32.9-unRAID #8 (Errors)

Feb 10 23:47:11 Tower kernel: Call Trace: (Errors)

Feb 10 23:47:11 Tower kernel:  [<c102449e>] warn_slowpath_common+0x60/0x77 (Errors)

Feb 10 23:47:11 Tower kernel:  [<c10244e9>] warn_slowpath_fmt+0x24/0x27 (Errors)

Feb 10 23:47:11 Tower kernel:  [<c109cf0e>] proc_register+0x11c/0x14b (Errors)

Feb 10 23:47:11 Tower kernel:  [<c109d0cc>] proc_mkdir_mode+0x2f/0x43 (Errors)

Feb 10 23:47:11 Tower kernel:  [<c109d0ef>] proc_mkdir+0xf/0x11 (Errors)

Feb 10 23:47:11 Tower kernel:  [<f83e7a36>] scst_build_proc_target_dir_entries+0x55/0xdc [scst] (Routine)

Feb 10 23:47:11 Tower kernel:  [<f83ceca9>] __scst_register_target_template+0x16c/0x3af [scst] (Routine)

Feb 10 23:47:11 Tower kernel:  [<f844cd5d>] mvst_init+0x3b/0x5b [mvsas] (Errors)

Feb 10 23:47:11 Tower kernel:  [<f8451895>] mvs_pci_init+0xaa5/0xaf7 [mvsas] (Errors)

Feb 10 23:47:11 Tower kernel:  [<c10062d9>] ? dma_generic_alloc_coherent+0x0/0xdb (Errors)

Feb 10 23:47:11 Tower kernel:  [<c1142050>] local_pci_probe+0xe/0x10 (Errors)

Feb 10 23:47:11 Tower kernel:  [<c11426ad>] pci_device_probe+0x48/0x66 (Errors)

Feb 10 23:47:11 Tower kernel:  [<c1194956>] driver_probe_device+0x79/0xed (Errors)

Feb 10 23:47:11 Tower kernel:  [<c1194a0d>] __driver_attach+0x43/0x5f (Errors)

Feb 10 23:47:11 Tower kernel:  [<c11940a7>] bus_for_each_dev+0x39/0x5a (Errors)

Feb 10 23:47:11 Tower kernel:  [<f8459000>] ? mvs_init+0x0/0x45 [mvsas] (Errors)

Feb 10 23:47:11 Tower kernel:  [<c119482f>] driver_attach+0x14/0x16 (Errors)

Feb 10 23:47:11 Tower kernel:  [<c11949ca>] ? __driver_attach+0x0/0x5f (Errors)

Feb 10 23:47:11 Tower kernel:  [<c119451c>] bus_add_driver+0x9f/0x1c5 (Errors)

Feb 10 23:47:11 Tower kernel:  [<f8459000>] ? mvs_init+0x0/0x45 [mvsas] (Errors)

Feb 10 23:47:11 Tower kernel:  [<c1194ccf>] driver_register+0x7b/0xd7 (Errors)

Feb 10 23:47:11 Tower kernel:  [<f8459000>] ? mvs_init+0x0/0x45 [mvsas] (Errors)

Feb 10 23:47:11 Tower kernel:  [<c1142882>] __pci_register_driver+0x39/0x8c (Errors)

Feb 10 23:47:11 Tower kernel:  [<f8459000>] ? mvs_init+0x0/0x45 [mvsas] (Errors)

Feb 10 23:47:11 Tower kernel:  [<f8459030>] mvs_init+0x30/0x45 [mvsas] (Errors)

Feb 10 23:47:11 Tower kernel:  [<c1001139>] do_one_initcall+0x4c/0x131 (Errors)

Feb 10 23:47:11 Tower kernel:  [<c1042e6e>] sys_init_module+0xa7/0x1dd (Errors)

Feb 10 23:47:11 Tower kernel:  [<c1002935>] syscall_call+0x7/0xb (Errors)

Feb 10 23:47:11 Tower kernel: ---[ end trace 9b764bc3b4e92c4f ]---

syslog.txt

Link to comment

Thanks for the response Joe.  Just noticed it locked up again through the night.  Any other recommendations?

 

On a side note, when the computer is booting up, when it is at the Bios Screen for the Supermicro cards where it spins up the drives, after it has spun up the drives, it sits at that screen for at least 10 minutes before moving on.  Before I added the second card, once the drives were spun up, it moved along right away?  Think that could be related?

Link to comment

What motherboard do you have? If it is NVidia based these are some of the "quirks" you will experience from time to time.

 

Any reasons to have four network controllers and they are also all Marvell based (may not work very well).

Unraid uses only one. (at least for now)

 

Disabling the unused hardware in the BIOS (not only LANs, but serial and par. ports, audio, firewire, IDE controllers, floppy...) will free some resources and possibly eliminate or at least significantly decrease the frequency of the lock ups.

And do not forget to update to the latest BIOS.

Link to comment

Thanks for the Response.  I have the Asus PQ5 Premium Board - which is Intel Chipset.  Its not on the hardware list, but it was the motherboard I had at the time.  Which it has worked great for me so far.  Probably been running for a year with no problems.  Seems these problems kicked in when i added that 2nd supermicro card.

 

I know I had disabled all the extra stuff in the bios, but must have missed the 4 lan ports, so I will get the extra 3 disabled.  I will also check into the bios to see if it needs an update.  

 

I am running a telnet session on my pc, with the tail -f /var/log/syslog so I can see what happens if it locks up again...

 

Also, I am pretty sure it isn't a power supply issue, as I am using a Seasonic SS-650KM, which has 54A on the 12V, which should be plenty.

Link to comment

Just reread your post bcbgboy13, and must not have read between the lines the first time about the Marvell Network.  Does Marvell not work very well?  I have since disabled 3 of the lan controllers and so just the one is running, but if there is a better card that is out that I can throw in there I would do it...

Link to comment

If you never had weird network problems and have acceptable performance (writing to and reading from the server) you should not worry at all about the Marvell LAN anymore. It works.

Asus P5Q is a solid board and if you have disabled all the unused hardware you should have more free resources for the two SM controllers and hopefully avoid any lock-ups.

 

Link to comment

Well, unfortunately, locked up again.  I think it made it to somewhere between 55-65% of the parity build.  But I do have everything it posted in my terminal window for the lockup. Hopefully someone can help me decipher what is going on.  Here it is -

 

Tower login: root

Password:

Linux 2.6.32.9-unRAID.

root@Tower:~# tail -f /var/log/syslog

Feb 11 11:15:49 Tower ntpd[2437]: ntp_io: estimated max descriptors: 1024, initi

al socket boundary: 16

Feb 11 11:15:49 Tower ntpd[2437]: Listening on interface #0 wildcard, 0.0.0.0#12

3 Disabled

Feb 11 11:15:49 Tower ntpd[2437]: Listening on interface #1 lo, 127.0.0.1#123 En

abled

Feb 11 11:15:49 Tower ntpd[2437]: Listening on interface #2 eth0, 192.168.1.110#

123 Enabled

Feb 11 11:15:49 Tower ntpd[2437]: kernel time sync status 0040

Feb 11 11:15:49 Tower ntpd[2437]: frequency initialized 155.777 PPM from /etc/nt

p/drift

Feb 11 11:15:49 Tower ifplugd(eth0)[1708]: Program executed successfully.

Feb 11 11:15:58 Tower ntpd[2437]: synchronized to 69.164.222.108, stratum 2

Feb 11 11:21:10 Tower in.telnetd[2706]: connect from 192.168.1.11 (192.168.1.11)

Feb 11 11:21:12 Tower login[2707]: ROOT LOGIN  on `pts/0' from `192.168.1.11'

Feb 11 19:27:39 Tower kernel: sas: command 0xc3604600, task 0xf6e763c0, timed ou

t: BLK_EH_NOT_HANDLED

Feb 11 19:27:39 Tower kernel: sas: Enter sas_scsi_recover_host

Feb 11 19:27:39 Tower kernel: sas: trying to find task 0xf6e763c0

Feb 11 19:27:39 Tower kernel: sas: sas_scsi_find_task: aborting task 0xf6e763c0

Feb 11 19:27:39 Tower kernel: /usr/src/sas/trunk/mvsas_tgt/mv_sas.c 1701:mvs_abo

rt_task:rc= 5

Feb 11 19:27:39 Tower kernel: sas: sas_scsi_find_task: querying task 0xf6e763c0

Feb 11 19:27:39 Tower kernel: /usr/src/sas/trunk/mvsas_tgt/mv_sas.c 1645:mvs_que

ry_task:rc= 5

Feb 11 19:27:39 Tower kernel: sas: sas_scsi_find_task: task 0xf6e763c0 failed to

abort

Feb 11 19:27:39 Tower kernel: sas: task 0xf6e763c0 is not at LU: I_T recover

Feb 11 19:27:39 Tower kernel: sas: I_T nexus reset for dev 0600000000000000

Feb 11 19:27:39 Tower kernel: sas: I_T 0600000000000000 recovered

Feb 11 19:27:39 Tower kernel: sas: --- Exit sas_scsi_recover_host

Feb 11 19:28:09 Tower kernel: sas: command 0xc3604600, task 0xf6e54000, timed ou

t: BLK_EH_NOT_HANDLED

Feb 11 19:28:09 Tower kernel: sas: Enter sas_scsi_recover_host

Feb 11 19:28:09 Tower kernel: sas: trying to find task 0xf6e54000

Feb 11 19:28:09 Tower kernel: sas: sas_scsi_find_task: aborting task 0xf6e54000

Feb 11 19:28:09 Tower kernel: /usr/src/sas/trunk/mvsas_tgt/mv_sas.c 1701:mvs_abo

rt_task:rc= 5

Feb 11 19:28:09 Tower kernel: sas: sas_scsi_find_task: querying task 0xf6e54000

Feb 11 19:28:09 Tower kernel: /usr/src/sas/trunk/mvsas_tgt/mv_sas.c 1645:mvs_que

ry_task:rc= 5

Feb 11 19:28:09 Tower kernel: sas: sas_scsi_find_task: task 0xf6e54000 failed to

abort

Feb 11 19:28:09 Tower kernel: sas: task 0xf6e54000 is not at LU: I_T recover

Feb 11 19:28:09 Tower kernel: sas: I_T nexus reset for dev 0600000000000000

Feb 11 19:28:09 Tower kernel: sas: I_T 0600000000000000 recovered

Feb 11 19:28:09 Tower kernel: sas: --- Exit sas_scsi_recover_host

Feb 11 19:28:40 Tower kernel: sas: command 0xc3604600, task 0xf6e54000, timed ou

t: BLK_EH_NOT_HANDLED

Feb 11 19:28:40 Tower kernel: sas: Enter sas_scsi_recover_host

Feb 11 19:28:40 Tower kernel: sas: trying to find task 0xf6e54000

Feb 11 19:28:40 Tower kernel: sas: sas_scsi_find_task: aborting task 0xf6e54000

Feb 11 19:28:40 Tower kernel: /usr/src/sas/trunk/mvsas_tgt/mv_sas.c 1701:mvs_abo

rt_task:rc= 5

Feb 11 19:28:40 Tower kernel: sas: sas_scsi_find_task: querying task 0xf6e54000

Feb 11 19:28:40 Tower kernel: /usr/src/sas/trunk/mvsas_tgt/mv_sas.c 1645:mvs_que

ry_task:rc= 5

Feb 11 19:28:40 Tower kernel: sas: sas_scsi_find_task: task 0xf6e54000 failed to

abort

Feb 11 19:28:40 Tower kernel: sas: task 0xf6e54000 is not at LU: I_T recover

Feb 11 19:28:40 Tower kernel: sas: I_T nexus reset for dev 0600000000000000

Feb 11 19:28:40 Tower kernel: sas: I_T 0600000000000000 recovered

Feb 11 19:28:40 Tower kernel: sas: --- Exit sas_scsi_recover_host

 

Message from syslogd@Tower at Fri Feb 11 19:28:44 2011 ...

Tower kernel: Oops: 0000 [#1] SMP

 

Message from syslogd@Tower at Fri Feb 11 19:28:44 2011 ...

Tower kernel: last sysfs file: /sys/devices/pci0000:00/0000:00:1f.2/host3/target

3:0:0/3:0:0:0/block/sdl/stat

 

Message from syslogd@Tower at Fri Feb 11 19:28:44 2011 ...

Tower kernel: Stack:

 

Message from syslogd@Tower at Fri Feb 11 19:28:44 2011 ...

Tower kernel: Process swapper (pid: 0, ti=f7498000 task=f7471ef0 task.ti=f749800

0)

 

Message from syslogd@Tower at Fri Feb 11 19:28:44 2011 ...

Tower kernel: Call Trace:

 

 

 

Link to comment

Well I think I got it fixed.  It was appearing to lock up pretty much around 60-65% complete of the parity build.  So I decided to try swapping my graphics card slot for one of the Supermicro SATA card slots, and it booted, finished the parity build, and has been up and running for over 24 hours now.... 

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...