[SOLVED] PSU issues + Parity Swap Disable


Recommended Posts

I'm going nuts fault-finding and would appreciate some advice!

 

Running Unraid Pro 5.0RC12, with 5x WD20EARS and 1x WD20EARX. Parity is a WD20EARS. No Cache drive.

 

The WD20EARX redballs every 3-8 weeks. It did on 4.7, but after the latest data cable-wiggling, it seemed stable so I went to 5.0, and now it's started again. 5.0 otherwise seems fine and normal. I have no packages or customisations added. None of the other HDDs redball.

 

Latest redball was 25GB through a 30GB Bluray - suddenly lots of errors.

 

I've changed SATA cables, swapped power cables between HDDs and the redball stays with the drive.

I've moved the HDD from the Gigabyte SATA2 port to the Southbridge SATA3 ports - no improvement.

 

(Mobo is a Gigabyte GA870A-UD3 with AMD SB850 southbridge giving 6x Sata 6Gb/s and a Gigabyte chip with 2x Sata 3Gb/s)

 

PSU is a BeQuiet 530W with 2x 12V rails at 22A each, 35A combined, and the HDDS are split evenly across power cables.

Ok, it's not a Seasonic single rail, but looks ok for running 6x green HDDs surely?

 

I've run a Smartctl short test which looks ok too. Temps are and always have been low.

 

The EARS drives are Sata 2, and the EARX is Sata 3 - could I 'instruct' it to connect at Sata 2 somehow?

 

Any ideas, war stories or approaches gratefully received. :-\

syslog and smart test attached.

 

Thanks, Judy

smartsde.txt

syslog.txt

Link to comment

The drive has had to retract its heads on a unexpected power loss 16 times.

192 Power-Off_Retract_Count 0x0032  200  200  000    Old_age  Always      -      16

 

I would suspect the power supply, or power cabling first.

I see 6 disks installed.  What power supply are you using? (exact make/model?)  Many have found issues with 6 or 7 disks on a multi-rail power supply.

 

Joe L.

Link to comment
What power supply are you using? (exact make/model?)  Many have found issues with 6 or 7 disks on a multi-rail power supply.

 

Already stated - BeQuiet, with two 12V rails at 22A each - should be okay for six green drives ... unless the drives are connected to the same rail supplying a power-hungry gpu.  To be absolutely sure we'd need to know complete hardware configuration, and how the 12V lines are split across the two rails.

 

However, the '22Amps each, 35Amps combined' suggests, to me, that this is one of those psus with 12V rails which are split nominally, but are actually commoned at the regulator.

Link to comment

However, the '22Amps each, 35Amps combined' suggests, to me, that this is one of those psus with 12V rails which are split nominally, but are actually commoned at the regulator.

Alternativly, they can only have a max of 35 Amps connected, not that they are connected internally.

On 99% of the power supplies, only 1 12 volt rail has connectors for disks.  The others are used for the MB and PCI cards.

 

Joe L.

Link to comment
Alternativly, they can only have a max of 35 Amps connected, not that they are connected internally.

On 99% of the power supplies, only 1 12 volt rail has connectors for disks.  The others are used for the MB and PCI cards.

 

Indeed, a real possibility, but I would have thought that 35 amps total for the system, 22 amps for six green drives, ought to be more than sufficient.  Working on our, normally conservative, 2 amps per green drive, we have a maximum current draw of 12 amps for the drives and 22 amps (the other rail) for the rest of the system (total 34 amps).

Link to comment

The drive has had to retract its heads on a unexpected power loss 16 times.

192 Power-Off_Retract_Count 0x0032  200  200  000    Old_age  Always      -      16

 

I would suspect the power supply, or power cabling first.

I see 6 disks installed.  What power supply are you using? (exact make/model?)  Many have found issues with 6 or 7 disks on a multi-rail power supply.

 

Joe L.

 

One of my drives gives me this:

Power-Off_Retract_Count 0x0032  100  100  000    Old_age  Always      -      48

so that means that this drive has had unexpected power losses 48 times? I have only had the drive for a month and only started using the unRAID server for about a week now. I do not even think that I have turn on and off the computer that many times... thoughts??

Link to comment

Alternativly, they can only have a max of 35 Amps connected, not that they are connected internally.

On 99% of the power supplies, only 1 12 volt rail has connectors for disks.  The others are used for the MB and PCI cards.

 

Indeed, a real possibility, but I would have thought that 35 amps total for the system, 22 amps for six green drives, ought to be more than sufficient.  Working on our, normally conservative, 2 amps per green drive, we have a maximum current draw of 12 amps for the drives and 22 amps (the other rail) for the rest of the system (total 34 amps).

Unfortunately, it is often that the second 12 volt rail is dedicated to the PCIe connectors and its capabilities completely unused and not accessible unless you hack at the wiring of the supply.  On one high wattage supply I purchased the 12 volt rail that powered all the disk connectors ALSO powered the 24 pin motherboard connector.    Most of its 12 volt amperage was not usable for disks... (I do expect to rewire it, but that is not a task for a beginner)

 

As far as it being a single drive... it could be the one most sensitive to power drops and voltage fluctuations.

Link to comment

The drive has had to retract its heads on a unexpected power loss 16 times.

192 Power-Off_Retract_Count 0x0032  200  200  000    Old_age  Always      -      16

 

I would suspect the power supply, or power cabling first.

I see 6 disks installed.  What power supply are you using? (exact make/model?)  Many have found issues with 6 or 7 disks on a multi-rail power supply.

 

Joe L.

 

One of my drives gives me this:

Power-Off_Retract_Count 0x0032  100  100  000    Old_age  Always      -      48

so that means that this drive has had unexpected power losses 48 times? I have only had the drive for a month and only started using the unRAID server for about a week now. I do not even think that I have turn on and off the computer that many times... thoughts??

Intermittent connection perhaps?  Too many drives on the same supply rail?  Intermittent splitter (I had one that nearly drove me crazy)

In any case, the drive was reacting to what it thought was a loss of power.

 

Link to comment
Intermittent connection perhaps?  Too many drives on the same supply rail?  Intermittent splitter (I had one that nearly drove me crazy)

In any case, the drive was reacting to what it thought was a loss of power.

 

Well I only have 3 hard drives but the PSU is modular and I only have two of the Sata connectors plugged in right now with no splitters and I use an UPS. I read from (http://kb.acronis.com/content/9127) that  the S.M.A.R.T. Attribute: Power-off Retract Count happens every time the machine is powered down, put to sleep or is idle). Would the spin down after an hour make this count rise?

Link to comment
  • 4 weeks later...

Part 2 ... I installed a Corsair CX500 psu and also replaced all the Sata signal cables with new Sata3 connectors locking at both ends.

 

No changes to the drive setup, and it's been running stably for a few weeks on 5.0RC12 with 2TB parity drive.

 

I checked parity (all good) and disconnected the parity 2TB drive.

 

I connected a 3TB drive as parity and built new parity - no problem on the parity drive, but 289 errors on an EARS data drive - not the drive that had errors before.

 

I wiggled the signal cables on the EARS drive, and rebuilt parity again on the 3TB drive - no problem on the parity drive but now 357 errors on the same EARS data drive. Both drives get power direct from the psu without any molex adapters.

 

Not sure what's going on here. Have attached syslog, current smart and smart from a month ago for the EARS data drive with errors.

 

All my drives have non-zero power off retract counts in their smart, however the retract count hasn't changed in the last month for the EARS drive - suggesting the psu was the original issue and perhaps these data errors aren't psu-related.

 

I don't believe the erroring 2TB EARS drive has suddenly gone bad. Should I now revert to the original 2TB parity drive (no data changes have been made) and rebuild the erroring 2TB data drive?

 

In which case as I'm on 5.0RC12 I'm scared of screwing up and would appreciate instructions on how to.

 

Many thanks, Judy

syslog230513.txt

smartsdb.txt

smartsdb230513.txt

Link to comment

The drive has pending sectors. The pending sectors are unreadable and must be rewritten. No drive can have any pending sectors in order for parity protection to work. The pending sectors could have resulted from power issues or the drive could be bad. The self-test section shows this drive has never passed a test.

 

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed: read failure       90%      3846         1381237139
# 2  Short offline       Completed: read failure       80%      1692         1389526210
# 3  Short offline       Completed: read failure       70%      1065         1389526209
# 4  Short offline       Completed: read failure       80%      1064         1389526210
# 5  Extended offline    Completed: read failure       90%       136         1389526208
# 6  Short offline       Completed: read failure       90%        52         1389526208

 

You're lucky that you never had a drive failure because this drive cannot be used to rebuild a failed drive; just as it's unable to rebuild parity. You'll need to replace the original parity drive and rebuild this data drive. If you have a spare 2T drive use it now and then run pre-clear on this disk and see what happens to the pending sector count. If you have no spare 2T then use the parity-swap-disabled process or rebuild the drive onto itself. Check all of your drives for pending sectors. There can be none in order for any type of rebuild to work.

Link to comment

Part 2 ... I installed a Corsair CX500 psu and also replaced all the Sata signal cables with new Sata3 connectors locking at both ends.

 

Unfortunately, the CX series are nowhere near as good as the higher-end Corsairs.  I've seen several cases where systems wouldn't boot reliably with CX series supplies ... but work fine with their better units.    Personally, I only buy HX series units, but the TX series is also quite good.  [The AX are outstanding supplies, but I can't think of any reason to spend that much.]

 

 

I don't believe the erroring 2TB EARS drive has suddenly gone bad. Should I now revert to the original 2TB parity drive (no data changes have been made) and rebuild the erroring 2TB data drive?

 

Yes, this is a good idea.  But don't try to rebuild it on the failing drive ... you can deal with that later.  You will, of course, need a new 2TB drive.    I'd suggest a new WD Red unit.

 

You'll probably have to use the "trust parity" feature to force UnRAID to use the old parity drive; but if you're CERTAIN ... I mean REALLY CERTAIN ... I mean ABSOLUTELY REALLY CERTAIN ... that nothing's been written to the array, then that's fine  :)

Link to comment

I'd also replace your new PSU with a new, BETTER PSU  :)    ==> BEFORE doing the rebuild.

I just have  :) I chose the Corsair as it's on the recommended PSU list http://lime-technology.com/wiki/index.php/PSU ?

 

Well it's easy to understand why you selected that unit, then.  It's unfortunate it's on the list .. CX units aren't necessarily "bad", but they certainly aren't in the same league as the better Corsair units.

 

Link to comment

Part 2 ... I installed a Corsair CX500 psu and also replaced all the Sata signal cables with new Sata3 connectors locking at both ends.

 

Unfortunately, the CX series are nowhere near as good as the higher-end Corsairs.  I've seen several cases where systems wouldn't boot reliably with CX series supplies ... but work fine with their better units.    Personally, I only buy HX series units, but the TX series is also quite good.  [The AX are outstanding supplies, but I can't think of any reason to spend that much.]

 

 

I don't believe the erroring 2TB EARS drive has suddenly gone bad. Should I now revert to the original 2TB parity drive (no data changes have been made) and rebuild the erroring 2TB data drive?

 

Yes, this is a good idea.  But don't try to rebuild it on the failing drive ... you can deal with that later.  You will, of course, need a new 2TB drive.    I'd suggest a new WD Red unit.

 

You'll probably have to use the "trust parity" feature to force UnRAID to use the old parity drive; but if you're CERTAIN ... I mean REALLY CERTAIN ... I mean ABSOLUTELY REALLY CERTAIN ... that nothing's been written to the array, then that's fine  :)

 

A new 2T is not required. Use the parity-swap-disabled procedure:

 

1. Install the original 2T parity drive.

2. Replace disk1 with a 3T drive.

3. Power-on the server

4. Assign the 3T drive as parity and the (formerly) parity drive as disk 1.

5. Start the array.

 

UnRAID will copy parity from the old parity drive to the new 3T parity drive and then rebuild disk1 on the 2T drive that was formerly parity.

Link to comment

Part 2 ... I installed a Corsair CX500 psu and also replaced all the Sata signal cables with new Sata3 connectors locking at both ends.

 

Unfortunately, the CX series are nowhere near as good as the higher-end Corsairs.  I've seen several cases where systems wouldn't boot reliably with CX series supplies ... but work fine with their better units.    Personally, I only buy HX series units, but the TX series is also quite good.  [The AX are outstanding supplies, but I can't think of any reason to spend that much.]

 

 

I don't believe the erroring 2TB EARS drive has suddenly gone bad. Should I now revert to the original 2TB parity drive (no data changes have been made) and rebuild the erroring 2TB data drive?

 

Yes, this is a good idea.  But don't try to rebuild it on the failing drive ... you can deal with that later.  You will, of course, need a new 2TB drive.    I'd suggest a new WD Red unit.

 

You'll probably have to use the "trust parity" feature to force UnRAID to use the old parity drive; but if you're CERTAIN ... I mean REALLY CERTAIN ... I mean ABSOLUTELY REALLY CERTAIN ... that nothing's been written to the array, then that's fine  :)

 

A new 2T is not required. Use the parity-swap-disabled procedure:

 

1. Install the original 2T parity drive.

2. Replace disk1 with a 3T drive.

3. Power-on the server

4. Assign the 3T drive as parity and the (formerly) parity drive as disk 1.

5. Start the array.

 

UnRAID will copy parity from the old parity drive to the new 3T parity drive and then rebuild disk1 on the 2T drive that was formerly parity.

Be careful here.  The swap-DISABLED procedure will only work if the disk being replaced is already disabled.  You cannot just swap around working disks.
Link to comment

Part 2 ... I installed a Corsair CX500 psu and also replaced all the Sata signal cables with new Sata3 connectors locking at both ends.

 

Unfortunately, the CX series are nowhere near as good as the higher-end Corsairs.  I've seen several cases where systems wouldn't boot reliably with CX series supplies ... but work fine with their better units.    Personally, I only buy HX series units, but the TX series is also quite good.  [The AX are outstanding supplies, but I can't think of any reason to spend that much.]

 

 

I don't believe the erroring 2TB EARS drive has suddenly gone bad. Should I now revert to the original 2TB parity drive (no data changes have been made) and rebuild the erroring 2TB data drive?

 

Yes, this is a good idea.  But don't try to rebuild it on the failing drive ... you can deal with that later.  You will, of course, need a new 2TB drive.    I'd suggest a new WD Red unit.

 

You'll probably have to use the "trust parity" feature to force UnRAID to use the old parity drive; but if you're CERTAIN ... I mean REALLY CERTAIN ... I mean ABSOLUTELY REALLY CERTAIN ... that nothing's been written to the array, then that's fine  :)

 

A new 2T is not required. Use the parity-swap-disabled procedure:

 

1. Install the original 2T parity drive.

2. Replace disk1 with a 3T drive.

3. Power-on the server

4. Assign the 3T drive as parity and the (formerly) parity drive as disk 1.

5. Start the array.

 

UnRAID will copy parity from the old parity drive to the new 3T parity drive and then rebuild disk1 on the 2T drive that was formerly parity.

Be careful here.  The swap-DISABLED procedure will only work if the disk being replaced is already disabled.  You cannot just swap around working disks.

 

If disk1 is not disabled you can disable it by starting the array with disk 1 unassigned.

Link to comment

Thank you guys - embarrassed - I missed that disk 1 had pending sectors. I've now run self tests on all the drives and no other drives have pending sectors.

 

I'm up for a parity-swap-disable - let me check exactly how to do this in 5.0:

 

Currently parity = 3T

Disk 1 = 2T EARS with pending sectors

And I have an old (good) 2T parity - connected and unallocated

Disk allocation screen grab attached.

 

1. Disconnect the pending sectored 2T disk 1

2. Connect the 3T parity drive as disk1

3. Power-on the server

4. Assign the 3T drive as parity and the old good 2T parity drive as disk 1.

5. Start the array.

 

UnRAID will copy parity from the old 2T parity drive to the new 3T parity drive and then rebuild disk1 on the 2T drive that was formerly parity.

 

Are there any gotchas - won't unraid complain about missing disks or refuse to start ?

after_2nd_3T_parity_build_post_smart.jpg.a6108077073f760f7098ac49c9c941c5.jpg

Link to comment

You forgot the step where you START the array with disk1 unassigned or disconnected.

Unless disk1 (the one you will assign the current parity disk) is showing a RED indicator BEFORE you swap any other disks into the array, the swap-DISABLE procedure will not work.

Once it is showing a RED indicator, you can stop the array and swap the disks as you described.

Link to comment

Thank you - taking this slowly ...

1. disconnected 2T disk 1 with pending sectors

2. started array - disk 1 red balled missing: good

3. powered off - disconnected 3T parity

4. started array - parity & disk 1 red balled missing: good

5. moved 3T parity to SATA connector for disk 1

6. started array - 3T still found as parity - I expected it to stay red-balled.

7. assigned old 2T good parity as disk 1 - disk1 still red-balled as wrong drive (screen grab attached)

8. about to start array, but dialogue says 'Start will bring array on-line START DATA RE-BUILD etc.' (screen attached)

 

This looks like it will use the 3T parity to reconstruct the old disk1 image on the old good 2T parity drive

As the 3T parity is affected by the read errors on the now disconnected 2T with pending sectors I don't want this.

 

Do I tick Yes and Start, please?

PSD_about_to_start_old_parity_as_disk_1.jpg.14792dcf8eb82b506529d7ead7652398.jpg

start_data-rebuild.jpg.2c7a5d2c0766e2c05a13ea336dcf8db3.jpg

Link to comment

Well, Parity Swap Disable didn't take me anywhere, so I left the 3T parity connected and the failing 2T as disk1, and added a 3T disk6. Then:

1. copied disk1 to disk 6 http://lime-technology.com/wiki/index.php/Transferring_Files_Within_the_unRAID_Server

2. Utils/New config

3. unassigned old disk1, assigned the 3T that was disk 6 as disk1

4. started array, cancelled parity build

5. removed failing 2T and installed on Windows machine

6. surface tested extensively with HD Sentinel Pro

7. discovered which files on 2T were on poor sectors

8. recopied affected files from backups to unraid

9. built parity

10. and checked parity - all ok

 

For the failing 2T drive,

11. reallocated weak and pending sectors in HD Sentinel

12. formatted drive as 2 volumes with a 3 cylinder-wide raw partition covering the poor area

13. now using the old 2T drive for off-site backups - I surface check all my backups very 3 months looking for changes and i my experience drives with local weaknesses can be fine so long as carefully mapped around and regularly checked.

 

This procedure isn't optimal as there's no parity protection for a while but as I have double off-site backups, I was ok with the risk.

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.