Author Topic: Preclear.sh results - Questions about your results? Post them here.  (Read 94675 times)

Offline jbuszkie

  • Sr. Member
  • ****
  • Posts: 298
In an effort to keep the Preclear script thread more about questions about the script itself, I've started another thread here to discuss the results.  The preclear thread is peppered with result questions and questions about the script and is now 15 pages long!  So I'm thinking that a seperate thread was warranted. So I'll start it off...

After running 3 interations on my new 1TB green disk I had

< 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
---
 > 5 Reallocated_Sector_Ct 0x0033 199 199 140 Pre-fail Always - 5
64c64
< 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
---
 > 196 Reallocated_Event_Count 0x0032 199 199 000 Old_age Always - 1

Are 5 reallocated sectors anything to worry about..  I was hoping for 0! :)

This is still running on the old version of the script..  Maybe I should try the new version.. (I started my test the morning before Joe posted the new version!)   I did start a cycle again on a different controller (one cycle this time - and still the old script)

Another thought...  Should we start a new thread for preclear disk result questions and keep this thread for questions/comments about the functionality of preclear?

Jim

If it stays at 5, in my opinion, no problem.  If it increases over time, then you might want to use the RMA process.  Odds are good it will stabilize.  I have one 250Gig drive that has had 100 relocated sectors since the first time I ran smartctl on it.  That number has never changed on that disk.

I'd say, download the new version of preclear_disk.sh and run another set of test cycles and see if it shows an increase in re-allocated sectors.  (the new version stress-tests the drive more.  The old one had a bug that prevented the random cylinders from being read in addition to the linear read that was properly occurring)  If the number stays at 5, fine, if not another test cycle might be in order.  At that point you have all the evidence you need if an RMA is warranted.

You might want to start a thread with your preclear experience.  It will allow the questions about the output to all be in one spot.

Joe L.
Ok..  I ran one more full cycle with the new verions of the script and I got no reallocated sector changes.  Should I run once more or do you think I'm good now and can put the disk into service?

So...  first 3 cycles. - 5 reallocated sectors
4th cycle - no more reallocated sectors.

Jim
« Last Edit: January 05, 2011, 01:05:08 PM by Rajahal »

Offline bjp999

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 3511
  • SuperMicro C2SEE-O (A), Asus P5B VM DO (B+)
Re: Preclear.sh results - Questions about your results? Post them here.
« Reply #1 on: July 24, 2009, 06:32:25 AM »
Experience here has been that ANY reallocated sector count is a bad sign.  I agree that if it holds stable (even at 100 or more) it is nothing to worry about, but experience here has shown that even a small number of reallocated sectors usually lead to more (and more and more ...).  You might think of it like a string hanging from your favorite shirt.  Pull on it and the entire shirt will unravel.

The fact that you've run several cycles and the number has held steady is comforting and not typical of the unraveling behavior.  I'd still recommend diligence in making sure that the count doesn't increase further.

Offline Joe L.

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 17799
Re: Preclear.sh results - Questions about your results? Post them here.
« Reply #2 on: July 24, 2009, 06:35:33 AM »
In an effort to keep the Preclear script thread more about questions about the script itself, I've started another thread here to discuss the results.  The preclear thread is peppered with result questions and questions about the script and is now 15 pages long!  So I'm thinking that a seperate thread was warranted. So I'll start it off...

After running 3 interations on my new 1TB green disk I had

< 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
---
 > 5 Reallocated_Sector_Ct 0x0033 199 199 140 Pre-fail Always - 5
64c64
< 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
---
 > 196 Reallocated_Event_Count 0x0032 199 199 000 Old_age Always - 1

Are 5 reallocated sectors anything to worry about..  I was hoping for 0! :)

This is still running on the old version of the script..  Maybe I should try the new version.. (I started my test the morning before Joe posted the new version!)   I did start a cycle again on a different controller (one cycle this time - and still the old script)

Another thought...  Should we start a new thread for preclear disk result questions and keep this thread for questions/comments about the functionality of preclear?

Jim

If it stays at 5, in my opinion, no problem.  If it increases over time, then you might want to use the RMA process.  Odds are good it will stabilize.  I have one 250Gig drive that has had 100 relocated sectors since the first time I ran smartctl on it.  That number has never changed on that disk.

I'd say, download the new version of preclear_disk.sh and run another set of test cycles and see if it shows an increase in re-allocated sectors.  (the new version stress-tests the drive more.  The old one had a bug that prevented the random cylinders from being read in addition to the linear read that was properly occurring)  If the number stays at 5, fine, if not another test cycle might be in order.  At that point you have all the evidence you need if an RMA is warranted.

You might want to start a thread with your preclear experience.  It will allow the questions about the output to all be in one spot.

Joe L.
Ok..  I ran one more full cycle with the new verions of the script and I got no reallocated sector changes.  Should I run once more or do you think I'm good now and can put the disk into service?

So...  first 3 cycles. - 5 reallocated sectors
4th cycle - no more reallocated sectors.

Jim
If you need the space, and need it now, go ahead and assign it to the array.  

If not in a real rush, let it run another cycle or two, or overnight.  Remember, you did 3 cycles to identify the first 5 sectors... you do not know if they all showed up in the the first cycle, or the third.
It is good that no more bad sectors were identified.  

Glad it is working for you.  How long did it take to run a cycle on the 1TB drive in your server?    

Joe L.

Offline jbuszkie

  • Sr. Member
  • ****
  • Posts: 298
Re: Preclear.sh results - Questions about your results? Post them here.
« Reply #3 on: July 24, 2009, 06:43:10 AM »
]If not in a real rush, let it run another cycle or two, or overnight.  Remember, you did 3 cycles to identify the first 5 sectors... you do not know if they all showed up in the the first cycle, or the third.
It is good that no more bad sectors were identified.  

Glad it is working for you.  How long did it take to run a cycle on the 1TB drive in your server?    

Joe L.
Each cycle is just about 12hours.  I'm in no immediate rush so I just popped off another cycle.  Maybe an interesting additiion to the script would be to save the smart data after every cycle so we can see when the events happend.  When I ran the 1st 3 cycles I don't know if the events happened in the 1st, 2nd, or 3rd cycle..

Jim

Offline Guzzi

  • Full Member
  • ***
  • Posts: 151
Re: Preclear.sh results - Questions about your results? Post them here.
« Reply #4 on: July 24, 2009, 01:00:15 PM »
Hi, I have succesfully precleared a disk, but got smartdifferences as below. Is this something I have to worry about or can I use this disk? I realized some interface errors in the log in the very beginning, but no errors in the script.
Thanks, Guzzi

============================================================================
==
== Disk /dev/sdq has been successfully precleared
==
============================================================================
 S.M.A.R.T. error count differences detected after pre-clear
note, some 'raw' values may change, but not be an indication of a problem
62,63c62,63
< 192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       31
< 193 Load_Cycle_Count        0x0032   192   192   000    Old_age   Always       -       25344
---
> 192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       32
> 193 Load_Cycle_Count        0x0032   192   192   000    Old_age   Always       -       25345
============================================================================

Offline Joe L.

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 17799
Re: Preclear.sh results - Questions about your results? Post them here.
« Reply #5 on: July 24, 2009, 01:26:34 PM »
Hi, I have succesfully precleared a disk, but got smartdifferences as below. Is this something I have to worry about or can I use this disk? I realized some interface errors in the log in the very beginning, but no errors in the script.
Thanks, Guzzi

============================================================================
==
== Disk /dev/sdq has been successfully precleared
==
============================================================================
 S.M.A.R.T. error count differences detected after pre-clear
note, some 'raw' values may change, but not be an indication of a problem
62,63c62,63
< 192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       31
< 193 Load_Cycle_Count        0x0032   192   192   000    Old_age   Always       -       25344
---
> 192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       32
> 193 Load_Cycle_Count        0x0032   192   192   000    Old_age   Always       -       25345
============================================================================

This is a new one to me... According to a "google" search on "Power-Off_Retract_Count", I got the following

# Power-Off_Retract_Count = No of times drive was powered off in an emergency, called Emergency Unload.
# Load_Cycle_Count = This number is highly affected by your power management policies. For e.g. a too aggressive power management might put hard disk to sleep too often. This number is indicative of when your hard disk parks, unparks , spins up, spins down.

So. reading between the lines... unless you powered down the disk while it was being cleared, it *thought* it had lost power, or it really did lose power. 
It retracted the disk heads in an emergency-unload, thinking it had lost power, then loaded them again once it thought power had been restored.

I'd check the system log for any other errors while the drive was being cleared.   I'd also check any power connectors or "Y" splitters.  They can be intermittent.

Joe L.

Offline Guzzi

  • Full Member
  • ***
  • Posts: 151
Re: Preclear.sh results - Questions about your results? Post them here.
« Reply #6 on: July 24, 2009, 02:39:14 PM »
Hi, I have succesfully precleared a disk, but got smartdifferences as below. Is this something I have to worry about or can I use this disk? I realized some interface errors in the log in the very beginning, but no errors in the script.
Thanks, Guzzi

============================================================================
==
== Disk /dev/sdq has been successfully precleared
==
============================================================================
 S.M.A.R.T. error count differences detected after pre-clear
note, some 'raw' values may change, but not be an indication of a problem
62,63c62,63
< 192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       31
< 193 Load_Cycle_Count        0x0032   192   192   000    Old_age   Always       -       25344
---
> 192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       32
> 193 Load_Cycle_Count        0x0032   192   192   000    Old_age   Always       -       25345
============================================================================

This is a new one to me... According to a "google" search on "Power-Off_Retract_Count", I got the following

# Power-Off_Retract_Count = No of times drive was powered off in an emergency, called Emergency Unload.
# Load_Cycle_Count = This number is highly affected by your power management policies. For e.g. a too aggressive power management might put hard disk to sleep too often. This number is indicative of when your hard disk parks, unparks , spins up, spins down.

So. reading between the lines... unless you powered down the disk while it was being cleared, it *thought* it had lost power, or it really did lose power. 
It retracted the disk heads in an emergency-unload, thinking it had lost power, then loaded them again once it thought power had been restored.

I'd check the system log for any other errors while the drive was being cleared.   I'd also check any power connectors or "Y" splitters.  They can be intermittent.

Joe L.

Hi Joe,

checking the powerconnectors is no problem - I can do that.
I cheked the syslog several times during preclear and except in the very first minutes (some drive not ready) there was nothing special.
But it seems, that in the post read there happened a lot - which I do not understand; could you have a look in the log? It's the complete preclear-process from beginning to the end!?

Thanks, Guzzi

Offline Joe L.

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 17799
Re: Preclear.sh results - Questions about your results? Post them here.
« Reply #7 on: July 24, 2009, 04:07:16 PM »
You have several drives with errors, not just the one you are trying to clear... and it looks like you are running out of memory too. 
Are you running any add-on packages? (other than the pre-clear)  The user-share file system is constantly reporting it cannot allocate memory.
How much RAM are you running?

I can't go into detail now...  Perhaps RobJ can take a look and provide his input.  Perhaps send him a PM and ask him to take a look.

Joe L. 

Offline Guzzi

  • Full Member
  • ***
  • Posts: 151
Re: Preclear.sh results - Questions about your results? Post them here.
« Reply #8 on: July 24, 2009, 04:21:25 PM »
You have several drives with errors, not just the one you are trying to clear... and it looks like you are running out of memory too. 
Are you running any add-on packages? (other than the pre-clear)  The user-share file system is constantly reporting it cannot allocate memory.
How much RAM are you running?

I can't go into detail now...  Perhaps RobJ can take a look and provide his input.  Perhaps send him a PM and ask him to take a look.

Joe L. 

I have 2 GB RAM in the box:
(from /usr/bin/top -b -n1)

top - 01:15:04 up  1:13,  0 users,  load average: 3.94, 4.00, 3.73
Tasks:  73 total,   2 running,  71 sleeping,   0 stopped,   0 zombie
Cpu(s):  7.8%us, 60.5%sy,  0.0%ni, 22.3%id,  5.0%wa,  0.6%hi,  3.7%si,  0.0%st
Mem:   1943344k total,  1617648k used,   325696k free,    39868k buffers
Swap:        0k total,        0k used,        0k free,  1481180k cached

(Did a reboot after I saw those kernel things in syslog - never had that before, just during this specific preclear)

Addons: I have disabled cachedirs to keep memory free while moving data to the box. Here is the goscript:

#!/bin/bash
# Start the Management Utility
/usr/local/sbin/emhttp &
cd /boot/packages && find . -name '*.auto_install' -type f -print | sort | xargs -n1 sh -c

# Unraid_Notify (E-Mail Notification)
#installpkg /boot/packages/socat-1.7.0.0-i486-2bj.tgz
#installpkg /boot/packages/unraid_notify-2.30-noarch-unRAID.tgz
installpkg /boot/packages/acpitool-0.4.7-i486-1goa.tgz
#unraid_notify start

sleep 30

# enable wakeup
/usr/sbin/ethtool -s eth0 wol g

# Start UnMenu
/boot/unmenu/uu

I have to say that I was moving constantly data to the box while clearing the disk - maybe the problems with the disk has blocked the copy process?

Do I need to upgrade the RAM to 4 GB?

Offline RobJ

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 2995
  • Epox MF570 (nForce570) (A-) / Biostar nForce4 (D-)
Re: Preclear.sh results - Questions about your results? Post them here.
« Reply #9 on: July 24, 2009, 04:42:17 PM »
That syslog is a mess!  And it's only the latter part too, it is missing the 600 to 900 odd lines of system setup at the beginning.

The drive with ID of sdn probably has a poor quality cable.  I would replace it if at all possible.

And Joe is right, there were page allocation failures for many subsystems, including the share file system, Samba, and possibly involving the networking and Reiser file system modules, which is worrying.  In this piece of the syslog, I don't see any kernel panics, so I don't think we can say for sure that there is any damage, such as evidence of flaky memory, or corrupted Reiser file systems, but I never fully trust a system that has crashed.  Always better to restart fresh.  I certainly would not try to run anything important, once I saw the first sign of suspicious system operation.  Those 'Call Traces' definitely qualify as suspicious system operation.  Grabbing the syslog and waiting for advice was the correct thing to do.

Even though I saw no 'panics' here, to be safe, I would reboot and run a full memory test first, then run reiserfsck on each of the data drives (see the Check Disk File systems page for instructions).  I'm sorry, it is somewhat time-consuming, but it is better to be safe.  The memory test is probably not needed, so you can postpone it if you wish, but I like to be thorough, and know whether a system is truly trustworthy, especially when I have just had extensive memory-related problems.  I would like to say test only the data drives you were actually using, but it appears that there were numerous spin downs to many drives, and the mover ran at least twice, so it looks like all or most of your drives may have been written to.

2 GB of memory should have been more than enough.  I can't see any reason so far for the problems, at least not from this syslog.
Need help, start here:  Troubleshooting      Questions?  Try the FAQ      Please contribute to the unRAID Wiki

Offline Joe L.

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 17799
Re: Preclear.sh results - Questions about your results? Post them here.
« Reply #10 on: July 24, 2009, 05:02:00 PM »
I have 2 GB RAM in the box:
Addons: I have disabled cachedirs to keep memory free while moving data to the box. Here is the goscript:
Do I need to upgrade the RAM to 4 GB?
Part of the original cache_dirs script set the cache-pressure to 0.  I've since learned that value does NOT free up ram when other processes need it.

even if you had stopped cache_dirs, the memory Linux allocated for cache would not have been freed.
You would need to type something like:
sysctl vm.vfs_cache_pressure=10
to allow it to use the memory is had put into cache.   (The most recent version of cache_dirs fixed that and uses cache_pressure=5 by default)

More memory might help, but 2 Gig should be plenty.  Your first priority should be the disk errors.

These errors are /dev/sdn
Code: [Select]
ul 23 03:34:19 XMS-GMI-01 kernel: ata12.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Jul 23 03:34:19 XMS-GMI-01 kernel: ata12.00: BMDMA2 stat 0xd0009
Jul 23 03:34:19 XMS-GMI-01 kernel: ata12.00: cmd 25/00:00:30:8a:06/00:02:00:00:00/e0 tag 0 dma 262144 in
Jul 23 03:34:19 XMS-GMI-01 kernel:          res 51/04:7f:b1:8b:06/00:00:00:00:00/f0 Emask 0x1 (device error)
Jul 23 03:34:19 XMS-GMI-01 kernel: ata12.00: status: { DRDY ERR }
Jul 23 03:34:19 XMS-GMI-01 kernel: ata12.00: error: { ABRT }
Jul 23 03:34:19 XMS-GMI-01 kernel: ata12.00: configured for UDMA/100
Jul 23 03:34:19 XMS-GMI-01 kernel: ata12: EH complete
Jul 23 03:34:19 XMS-GMI-01 kernel: sd 12:0:0:0: [sdn] 1953525168 512-byte hardware sectors: (1.00 TB/931 GiB)
Jul 23 03:34:19 XMS-GMI-01 kernel: sd 12:0:0:0: [sdn] Write Protect is off
Jul 23 03:34:19 XMS-GMI-01 kernel: sd 12:0:0:0: [sdn] Mode Sense: 00 3a 00 00
Jul 23 03:34:19 XMS-GMI-01 kernel: sd 12:0:0:0: [sdn] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Jul 23 03:34:32 XMS-GMI-01 kernel: ata12.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Jul 23 03:34:32 XMS-GMI-01 kernel: ata12.00: BMDMA2 stat 0xd0009
Jul 23 03:34:32 XMS-GMI-01 kernel: ata12.00: cmd 25/00:00:30:dc:12/00:02:00:00:00/e0 tag 0 dma 262144 in
Jul 23 03:34:32 XMS-GMI-01 kernel:          res 51/04:00:2f:de:12/00:00:00:00:00/f0 Emask 0x1 (device error)
Jul 23 03:34:32 XMS-GMI-01 kernel: ata12.00: status: { DRDY ERR }
Jul 23 03:34:32 XMS-GMI-01 kernel: ata12.00: error: { ABRT }
Jul 23 03:34:32 XMS-GMI-01 kernel: ata12.00: configured for UDMA/100
Jul 23 03:34:32 XMS-GMI-01 kernel: ata12: EH complete
Jul 23 03:34:32 XMS-GMI-01 kernel: sd 12:0:0:0: [sdn] 1953525168 512-byte hardware sectors: (1.00 TB/931 GiB)
Jul 23 03:34:32 XMS-GMI-01 kernel: sd 12:0:0:0: [sdn] Write Protect is off
Jul 23 03:34:32 XMS-GMI-01 kernel: sd 12:0:0:0: [sdn] Mode Sense: 00 3a 00 00
Jul 23 03:34:32 XMS-GMI-01 kernel: sd 12:0:0:0: [sdn] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

These are memory allocation errors:
Code: [Select]
Jul 24 01:37:13 XMS-GMI-01 kernel: shfs: page allocation failure. order:0, mode:0x4020
Jul 24 01:37:13 XMS-GMI-01 kernel: Pid: 5060, comm: shfs Not tainted 2.6.29.1-unRAID #2
Jul 24 01:37:13 XMS-GMI-01 kernel: Call Trace:
Jul 24 01:37:13 XMS-GMI-01 kernel:  [<c0146307>] __alloc_pages_internal+0x33f/0x352
Jul 24 01:37:13 XMS-GMI-01 kernel:  [<c015ec2c>] __slab_alloc+0x158/0x42b
Jul 24 01:37:13 XMS-GMI-01 kernel:  [<c015fce6>] __kmalloc_track_caller+0x75/0xbe
Jul 24 01:37:13 XMS-GMI-01 kernel:  [<c02d8535>] ? __netdev_alloc_skb+0x17/0x34
Jul 24 01:37:13 XMS-GMI-01 kernel:  [<c02d8535>] ? __netdev_alloc_skb+0x17/0x34
Jul 24 01:37:13 XMS-GMI-01 kernel:  [<c02d8217>] __alloc_skb+0x4a/0x102
Jul 24 01:37:13 XMS-GMI-01 kernel:  [<c02d8535>] __netdev_alloc_skb+0x17/0x34
Jul 24 01:37:13 XMS-GMI-01 kernel:  [<f82512fa>] rtl8169_rx_fill+0x91/0x144 [r8169]
Jul 24 01:37:13 XMS-GMI-01 kernel:  [<f82516cf>] rtl8169_rx_interrupt+0x322/0x379 [r8169]
Jul 24 01:37:13 XMS-GMI-01 kernel:  [<f825276c>] rtl8169_poll+0x2f/0x124 [r8169]
Jul 24 01:37:13 XMS-GMI-01 kernel:  [<c02df24c>] net_rx_action+0x5d/0x119
Jul 24 01:37:13 XMS-GMI-01 kernel:  [<c0124a48>] __do_softirq+0x84/0x121
Jul 24 01:37:13 XMS-GMI-01 kernel:  [<c0124b1a>] do_softirq+0x35/0x3a
Jul 24 01:37:13 XMS-GMI-01 kernel:  [<c0124d97>] irq_exit+0x38/0x3a
Jul 24 01:37:13 XMS-GMI-01 kernel:  [<c0104a69>] do_IRQ+0x67/0x7e
Jul 24 01:37:13 XMS-GMI-01 kernel:  [<c01033a7>] common_interrupt+0x27/0x2c
Jul 24 01:37:13 XMS-GMI-01 kernel: Mem-Info:
Jul 24 01:37:13 XMS-GMI-01 kernel: DMA per-cpu:
Jul 24 01:37:13 XMS-GMI-01 kernel: CPU    0: hi:    0, btch:   1 usd:   0
Jul 24 01:37:13 XMS-GMI-01 kernel: Normal per-cpu:
Jul 24 01:37:13 XMS-GMI-01 kernel: CPU    0: hi:  186, btch:  31 usd: 180
Jul 24 01:37:13 XMS-GMI-01 kernel: HighMem per-cpu:
Jul 24 01:37:13 XMS-GMI-01 kernel: CPU    0: hi:  186, btch:  31 usd: 136
Jul 24 01:37:13 XMS-GMI-01 kernel: Active_anon:1704 active_file:6958 inactive_anon:1964
Jul 24 01:37:13 XMS-GMI-01 kernel:  inactive_file:416907 unevictable:31739 dirty:16436 writeback:1553 unstable:0
Jul 24 01:37:13 XMS-GMI-01 kernel:  free:1895 slab:11856 mapped:1835 pagetables:175 bounce:0
Jul 24 01:37:13 XMS-GMI-01 kernel: DMA free:3488kB min:64kB low:80kB high:96kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:8676kB unevictable:0kB present:15852kB pages_scanned:0 all_unreclaimable? no
Jul 24 01:37:13 XMS-GMI-01 kernel: lowmem_reserve[]: 0 867 1887 1887
Jul 24 01:37:13 XMS-GMI-01 kernel: Normal free:1320kB min:3732kB low:4664kB high:5596kB active_anon:1888kB inactive_anon:2148kB active_file:16288kB inactive_file:772872kB unevictable:40kB present:887976kB pages_scanned:0 all_unreclaimable? no
Jul 24 01:37:13 XMS-GMI-01 kernel: lowmem_reserve[]: 0 0 8158 8158
Jul 24 01:37:13 XMS-GMI-01 kernel: HighMem free:2772kB min:512kB low:1608kB high:2704kB active_anon:4928kB inactive_anon:5708kB active_file:11544kB inactive_file:886080kB unevictable:126916kB present:1044328kB pages_scanned:0 all_unreclaimable? no
Jul 24 01:37:13 XMS-GMI-01 kernel: lowmem_reserve[]: 0 0 0 0
Jul 24 01:37:13 XMS-GMI-01 kernel: DMA: 0*4kB 0*8kB 0*16kB 1*32kB 0*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 0*4096kB = 3488kB
Jul 24 01:37:13 XMS-GMI-01 kernel: Normal: 134*4kB 2*8kB 1*16kB 1*32kB 1*64kB 1*128kB 0*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 1304kB
Jul 24 01:37:13 XMS-GMI-01 kernel: HighMem: 23*4kB 13*8kB 15*16kB 33*32kB 8*64kB 2*128kB 2*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 2772kB
Jul 24 01:37:13 XMS-GMI-01 kernel: 455669 total pagecache pages
Jul 24 01:37:13 XMS-GMI-01 kernel: 0 pages in swap cache
Jul 24 01:37:13 XMS-GMI-01 kernel: Swap cache stats: add 0, delete 0, find 0/0
Jul 24 01:37:13 XMS-GMI-01 kernel: Free swap  = 0kB
Jul 24 01:37:13 XMS-GMI-01 kernel: Total swap = 0kB
Jul 24 01:37:13 XMS-GMI-01 kernel: 490976 pages RAM
Jul 24 01:37:13 XMS-GMI-01 kernel: 263138 pages HighMem
Jul 24 01:37:13 XMS-GMI-01 kernel: 5140 pages reserved
Jul 24 01:37:13 XMS-GMI-01 kernel: 318096 pages shared
Jul 24 01:37:13 XMS-GMI-01 kernel: 170962 pages non-shared
Joe L.

Offline Guzzi

  • Full Member
  • ***
  • Posts: 151
Re: Preclear.sh results - Questions about your results? Post them here.
« Reply #11 on: July 24, 2009, 06:38:38 PM »
Thanks Rob, Joe for the feedback.
sdn and sdq are the two drives, I currently have not yet in the array - because they both were showing those errors when i first tried setting up the empty array some weeks ago.
All other drives are in the array and were fine, showing no errors.
Because I didn't trust those 2 drives I ran preclear script to be safe - with the result above.
it was the very first time, I encountered such memoryrelated errors, never had it before - but you're right, I had even problems, accessing sambashares after this.

I restarted the box and everything is fine so far, no errors at all in the syslog (except this DMA-stuff on the IDE-port - " kernel: atiixp 0000:00:14.1: simplex device: DMA disabled").
BTW: starting preclear on either of those 2 unassigned drives gives me those above "ata12.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0" - errors in the log. They do NOT appear during startup.

cache_dirs was not started at all - removed it from go script and rebooted before I moved the files. So it definately cannot be responsible for any memoryrelated stuff.

Me too I am worried, if I see such things - I think I will remove both of the drives and test them separately and see, if they need to be RMAed.
Will also perform memorytest and chkdsk on all drives as recommended to be sure, everything is fine.
And yes, there is already stuff on almost all drives, since I am already moving data during the last weeks.
Will post after running the tests.
Guzzi
« Last Edit: July 24, 2009, 06:44:57 PM by Guzzi »

Offline Joe L.

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 17799
Re: Preclear.sh results - Questions about your results? Post them here.
« Reply #12 on: July 24, 2009, 07:14:28 PM »

Me too I am worried, if I see such things - I think I will remove both of the drives and test them separately and see, if they need to be RMAed.
Will also perform memorytest and chkdsk on all drives as recommended to be sure, everything is fine.
And yes, there is already stuff on almost all drives, since I am already moving data during the last weeks.
Will post after running the tests.
Guzzi
The preclear_disk script is very good at thrashing exercising a disk.  As already said, it is far easier to RMA the drives before they are loaded with your data if you find they do not test well.   The errors you saw could be because of bad SATA cables or bad power cables/splitters, or even a bad disk controller.   But...

Remember, your SMART report showed an emergency retraction of the heads to a safe landing spot when it thought the drive was losing power in the middle of the preclearing process.  That is pretty drastic as it tries to save itself from a head crash.

Is your power supply being overloaded?  Are you using a backplane for power distribution?    Lots to check out, but, at least you are more informed than most Window's OS users.  They just blue-screen.

Joe L.

Offline Guzzi

  • Full Member
  • ***
  • Posts: 151
Re: Preclear.sh results - Questions about your results? Post them here.
« Reply #13 on: July 25, 2009, 02:53:59 AM »

Me too I am worried, if I see such things - I think I will remove both of the drives and test them separately and see, if they need to be RMAed.
Will also perform memorytest and chkdsk on all drives as recommended to be sure, everything is fine.
And yes, there is already stuff on almost all drives, since I am already moving data during the last weeks.
Will post after running the tests.
Guzzi
The preclear_disk script is very good at thrashing exercising a disk.  As already said, it is far easier to RMA the drives before they are loaded with your data if you find they do not test well.   The errors you saw could be because of bad SATA cables or bad power cables/splitters, or even a bad disk controller.   But...

Remember, your SMART report showed an emergency retraction of the heads to a safe landing spot when it thought the drive was losing power in the middle of the preclearing process.  That is pretty drastic as it tries to save itself from a head crash.

Is your power supply being overloaded?  Are you using a backplane for power distribution?    Lots to check out, but, at least you are more informed than most Window's OS users.  They just blue-screen.

Joe L.

Maybe I wastn't completely clear: I have NO data on those 2 "suspicious" drives (they're unassigned and I didn't mount them except for temporal checking if they're empty) - only the array is filled with data (where I didn't encounter problems with the drives so far).
The drives are not new - most of them are coming from my former windows box and had been running there as raid-5 for 1-2 years (hope warranty not yet over ...)
I never got BSODs on the windows box - but a remember once or twice drives where showing "yellow" - which probably was the same CRC-Problem as now.
But nevertheless I have to admit, that there is much more transparence with unraid and linux tools what's "really" happening - windows doesn't help you much with that (just "reactivate" the drive, errors corrected by raid-layer anyway).

BTW: I ran the memorytest overnight - it passed 8 times without errors. Will chkdsk the drives when finding the time (currently working with my son on his motorcyle ;-))

The biggest hasstle with those "many-disk-machines (regardless of windows or linux, or something else) is power and cabling - and very difficult to diagnose.

Power might be fine for all normal operations - but if you are accessing a disk and at the same time 20 other disks spin up it might pull the voltage down - and I experienced in the past that HDs are VERY sensitive to voltages below 4,8 v on the 5Vrail - to be measured at the drive itself, not somewhere else, because you loose voltage on the cables.

Anyway, I thought to be safe, because I operated the windows box and now the unraid box with same powersupply but 8 drives less... so maybe again checking the cables - it seems to be focused on those two ports...
So I don't think it's overloaded powerwise, but unraid is in a diffenent box with different cabvles, no powerbackplane and there might be issues - I won't have any other possibility than to check and solve - because there is planned to add the remaining 4 disks from the windowsraid to the unraid-array as soon as the 17+ bug is solved...
I hope to soon reach the stage to put the box back in the corner and forget it for the next years ;-)

Offline Guzzi

  • Full Member
  • ***
  • Posts: 151
Re: Preclear.sh results - Questions about your results? Post them here.
« Reply #14 on: July 25, 2009, 04:49:31 PM »
That syslog is a mess!  And it's only the latter part too, it is missing the 600 to 900 odd lines of system setup at the beginning.

The drive with ID of sdn probably has a poor quality cable.  I would replace it if at all possible.

And Joe is right, there were page allocation failures for many subsystems, including the share file system, Samba, and possibly involving the networking and Reiser file system modules, which is worrying.  In this piece of the syslog, I don't see any kernel panics, so I don't think we can say for sure that there is any damage, such as evidence of flaky memory, or corrupted Reiser file systems, but I never fully trust a system that has crashed.  Always better to restart fresh.  I certainly would not try to run anything important, once I saw the first sign of suspicious system operation.  Those 'Call Traces' definitely qualify as suspicious system operation.  Grabbing the syslog and waiting for advice was the correct thing to do.

Even though I saw no 'panics' here, to be safe, I would reboot and run a full memory test first, then run reiserfsck on each of the data drives (see the Check Disk File systems page for instructions).  I'm sorry, it is somewhat time-consuming, but it is better to be safe.  The memory test is probably not needed, so you can postpone it if you wish, but I like to be thorough, and know whether a system is truly trustworthy, especially when I have just had extensive memory-related problems.  I would like to say test only the data drives you were actually using, but it appears that there were numerous spin downs to many drives, and the mover ran at least twice, so it looks like all or most of your drives may have been written to.

2 GB of memory should have been more than enough.  I can't see any reason so far for the problems, at least not from this syslog.

... I'm done... I ran the memorytest overnight - it passed 8 times without errors plus I ran the reisefsck on all data drives - all went through without any errors reported. Checked syslog also, no errors, neither after boot nor after all those activities.

Anything else I can / should do? So it seems that those problems are all around those 2 drives ? If so, I probably prefer to dispose them and order 2 new ones - much cheaper than the time it took me to check the whole server ... ;-)