Please advice: replacing my AOC-SASLP-MV8 cards


Recommended Posts

LSI 9210 or 9211-8i - these can be found already in IT mode, if not it's just a normal flash, usually more expensive

IBM M1015/Dell H310 - these need to be crossflashed to LSI IT mode, easy but more involved, when crossflashed they are the same as 9210/9211

 

This one would be fine right ?

 

Dell PERC H200 PCI-e SAS Controller 047MCV / LSI 9211-8i

 

It can be delivered flashed in IT mode for 49 euro ..

Link to comment

Just ordered another dell that will arrive pre-flashes to it mode... i still see the errors happening in my log, so changing the controllers will be a good check on what was broken.. card, cable or drivecage... i also ordered new cables just to be sure, i will only pit them in if the issue arrises again.

 

 

Verzonden vanaf mijn iPhone met Tapatalk

Link to comment

My two new Dell cards arrived today. I just installed the latest unraid version 6.3.1 and rebooted, that gives me a clean look at my log, at the moment I see the following SATA errors:

 

Feb  9 07:10:07 Tower kernel: sas: Enter sas_scsi_recover_host busy: 1 failed: 1
Feb  9 07:10:07 Tower kernel: sas: ata11: end_device-1:4: cmd error handler
Feb  9 07:10:07 Tower kernel: sas: ata7: end_device-1:0: dev error handler
Feb  9 07:10:07 Tower kernel: sas: ata8: end_device-1:1: dev error handler
Feb  9 07:10:07 Tower kernel: sas: ata9: end_device-1:2: dev error handler
Feb  9 07:10:07 Tower kernel: sas: ata10: end_device-1:3: dev error handler
Feb  9 07:10:07 Tower kernel: sas: ata11: end_device-1:4: dev error handler
Feb  9 07:10:07 Tower kernel: ata11.00: request sense failed stat 50 emask 0
Feb  9 07:10:07 Tower kernel: sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 1 tries: 1
Feb  9 07:10:07 Tower kernel: sas: Enter sas_scsi_recover_host busy: 1 failed: 1
Feb  9 07:10:07 Tower kernel: sas: ata11: end_device-1:4: cmd error handler
Feb  9 07:10:07 Tower kernel: sas: ata7: end_device-1:0: dev error handler
Feb  9 07:10:07 Tower kernel: sas: ata8: end_device-1:1: dev error handler
Feb  9 07:10:07 Tower kernel: sas: ata9: end_device-1:2: dev error handler
Feb  9 07:10:07 Tower kernel: sas: ata10: end_device-1:3: dev error handler
Feb  9 07:10:07 Tower kernel: sas: ata11: end_device-1:4: dev error handler
Feb  9 07:10:07 Tower kernel: ata11.00: request sense failed stat 50 emask 0
Feb  9 07:10:07 Tower kernel: sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 1 tries: 1
Feb  9 07:10:07 Tower kernel: sas: Enter sas_scsi_recover_host busy: 1 failed: 1
Feb  9 07:10:07 Tower kernel: sas: ata7: end_device-1:0: cmd error handler
Feb  9 07:10:07 Tower kernel: sas: ata7: end_device-1:0: dev error handler
Feb  9 07:10:07 Tower kernel: sas: ata8: end_device-1:1: dev error handler
Feb  9 07:10:07 Tower kernel: sas: ata9: end_device-1:2: dev error handler
Feb  9 07:10:07 Tower kernel: sas: ata10: end_device-1:3: dev error handler
Feb  9 07:10:07 Tower kernel: sas: ata11: end_device-1:4: dev error handler
Feb  9 07:10:07 Tower kernel: ata7.00: request sense failed stat 50 emask 0
Feb  9 07:10:07 Tower kernel: sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 1 tries: 1
Feb  9 07:10:08 Tower kernel: sas: Enter sas_scsi_recover_host busy: 1 failed: 1
Feb  9 07:10:08 Tower kernel: sas: ata7: end_device-1:0: cmd error handler
Feb  9 07:10:08 Tower kernel: sas: ata7: end_device-1:0: dev error handler
Feb  9 07:10:08 Tower kernel: sas: ata8: end_device-1:1: dev error handler
Feb  9 07:10:08 Tower kernel: sas: ata9: end_device-1:2: dev error handler
Feb  9 07:10:08 Tower kernel: sas: ata10: end_device-1:3: dev error handler
Feb  9 07:10:08 Tower kernel: sas: ata11: end_device-1:4: dev error handler

 

I will now shut down the system and replace the SASLP cards with the Dell cards, reboot and check the log again.

Link to comment

The Dell cards are in..

 

And the errors are gone in the log !

 

Just to be complete: I only changed the cards and just plugged in the old cables to the new cards. I reseated nothing.

 

So if the errors now remain gone that is a very serious point towards the SASLP cards having issues..

 

But.. proof of the pudding is in the eating, so lets wait a few weeks..

 

I have started a parity check to put a bit of stress on the system.

Link to comment

After fighting endlessly with my SAS2LP during the v6 beta I finally got things stabilized.  It ran fine with 6.0.x and 6.1.x.  But the same type of errors that you're experiencing reappeared with 6.2.x.  I'll be interested in what you find running the Dell cards, these errors on the SAS2LP make me uncomfortable even if the card recovers more often than not.

Link to comment

After fighting endlessly with my SAS2LP during the v6 beta I finally got things stabilized.  It ran fine with 6.0.x and 6.1.x.  But the same type of errors that you're experiencing reappeared with 6.2.x.  I'll be interested in what you find running the Dell cards, these errors on the SAS2LP make me uncomfortable even if the card recovers more often than not.

 

This is exactly why I exchanged the cards... Unraid has been rock-rock-rock stable for me for years and years. I have never had ANY disk drop for reasons other then errors that were also visible in SMART, I have never had ANY issues... Issues started to arise beginning of this year only (and I am always running the latest version of unraid).

 

They actually appeared around the time that I added a few 5in3 drives cages so it would be most likely that they are the cause.. It is only because more and more people are mentioning issues with these cards that I thought I needed to check that.

 

I am now running with the new cards and have kept everything exactly the same. So if the issues now stay away it really was the cards.. If they come back, then it is something else..

 

I am still thinking there is something going on with the combination of the 5in3 cages.. I have a secundary unraid server running that is also using a SASLP card (exactly the same) and it is not showing errors.. No 5in3's there..

 

Maybe the cards output less power and make them more volnurable to something else not beiing optimal..

 

Thing is, I need unraid to just work.. I like tinkering but I really hated having to wake up to drive failure.. In the last two months that happened more then in the previous 5 years..

Link to comment

When adding the 5in3 cages, did you also add more drives?

That could have increased the load on the controller generating more thermal stress.

 

Speaking of DELL cards, we know, that those run quite hot and are normally designed for a minimal air flow that is usually given in server racks.

The cards are also used in SOHO servers but then mostly only one port is populated.

Most of the cards I bought had one SAS plug still closed with the plastic cap.

 

This means, it is possible that the cards will get damaged over time when load is high and cooling is not given.

 

Another source could be the backplane of the 5in3 cage - if there is one.

You could rule this out by temporary bypassing it.

Link to comment

When adding the 5in3 cages, did you also add more drives?

That could have increased the load on the controller generating more thermal stress.

 

Speaking of DELL cards, we know, that those run quite hot and are normally designed for a minimal air flow that is usually given in server racks.

The cards are also used in SOHO servers but then mostly only one port is populated.

Most of the cards I bought had one SAS plug still closed with the plastic cap.

 

This means, it is possible that the cards will get damaged over time when load is high and cooling is not given.

 

Another source could be the backplane of the 5in3 cage - if there is one.

You could rule this out by temporary bypassing it.

 

Nope... not more drives.. My system actually has quite a lot of airflow.. Each cage is pulling in air (so 4 fans up front, and an exhaust at the back..)

Link to comment

The Dell cards are in..

 

And the errors are gone in the log !

 

Just to be complete: I only changed the cards and just plugged in the old cables to the new cards. I reseated nothing.

 

So if the errors now remain gone that is a very serious point towards the SASLP cards having issues..

 

But.. proof of the pudding is in the eating, so lets wait a few weeks..

 

I have started a parity check to put a bit of stress on the system.

 

Cards have been running ever since. No errors in the log and parity check has completed without any issues.. Very happy with the move !

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.