Unraid generates errors constantly, i can't figure out what's wrong!


Recommended Posts

My server failed for the first time like that about 2 months ago. Everything worked fine until tons of errors popped up on one of the disks. What was interesting was that the disks appeared to be OK tested with mhdd, and passed the smart test so i remounted it and rebuilt the data. Since the time i started to experience such a behaviour repeatedly. It happened very irregularly sometimes few times a week, sometimes not at all. First thing i thought it might have been was the PSU (Be Quiet 450W) so i swapped it with Corsair 450W (which has one 12v rail) but it didn't change anything. Then i decided to rewire the server and neither did it help. Couple of days ago i, fed up with the problem, put all the parts out of the case, placed on the table, cabled, updated the bios and run initial configuration of unraid 4.5-beta6. Everything seemed to be OK for 48 hours of running high-loaded (parity checks, file movements, reboots) and then it failed again.

 

I have no idea what may cause the server fail. I used all my experience and have done any test i know and i haven't found the answer.

 

I checked voltages, temperatures, memory. I tried changing disks' order. I tried variety of sata configuration - AHCI, IDE Enhanced, IDE Compatible, as well as many unraid distro's - 4.3.3, 4.4.2, 4.5-beta4 and 6. It didn't change a thing.

 

I attach syslog after a failure and current smart reports:

Link to comment

The problem appears to be communication issues, probably cables.  The protocols and modes are being slowed down, until it finally gives up and disables the drive, in this case, sdd - Seagate 750GB.  However, looking at the SMART reports, there are a number of drives that have UDMA errors and ICRC errors reported, which are generally associated with cabling and connector issues.  I would like to suggest replacing all of these SATA cables with better quality ones.  Unfortunately, it is hard to tell what are the best quality cables, although I think those that include locking connectors and are from a good source are a good bet.

 

It's also possible that it is related to the disk controller or its busses, such as PCI.  You might check for motherboard chipsets getting too hot.

 

The drives seem fine, except that there a couple of Current Pending sectors, and they are not causing the errors.

Link to comment

The temperatures are shurely fine. I placed the motherboard on a sound foam now and directed two 120mm to flow on it. I will buy new cables. Do you think that these will be ok: http://allegro.pl/item635734847_kabel_dla_koneserow_sata_chieftec_katowy_do_dysk.html ?

 

And what if replacing cables doesnt help?

 

They may or may not work.  It is usually best to go with cables that are of the locking type.  See here for an example of what I am talking about.  I buy all my cables from Monoprice and have never had a problem

Link to comment

This page has some useful comments about SATA cables, and possible differences between them.

 

Ideally, you want cables that at least claim to be designed for SATA II operation and 3.0 Gbps speeds, are made or sold by a reputable company, and are recommended by someone who has tested them hard, at their highest speeds and loads.  That of course is not usually possible, but do the best you can.  What I have read is that SATA cables and their connectors are the weak spot in the whole SATA scheme.

Link to comment

I've bought some gigabyte yellow cables (lockable as you recommended)

20090519134.th.jpg

and replaced with the old ones

20090519135.th.jpg

 

Now I'm trying to load the server as much as possible. I make 300gb's of data from various disks go around my home network (download and upload to the server) and run parity checks repeatedly. We'll see the result.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.