[Partially SOLVED] Is there an effort to solve the SAS2LP issue? (Tom Question)


TODDLT

Recommended Posts

Here are the differences between the SAS2LP 9485 cards of bkastner and johnnie.black.  Cards appear to be identical models -

01:00.0 RAID bus controller: Marvell Technology Group Ltd. 88SE9485 SAS/SATA 6Gb/s controller (rev 03)

        Subsystem: Marvell Technology Group Ltd. Device 9480

 

bkastner's card (works fine) - (only showing lines with differences)

                        RlxdOrd- ExtTag+ PhantFunc- AuxPwr- NoSnoop-

                LnkCtl: ASPM L0s Enabled; RCB 64 bytes Disabled- Retrain- CommClk+

                LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-

                CESta:  RxErr+ BadTLP+ BadDLLP+ Rollover- Timeout+ NonFatalErr+

 

johnnie.black's card (works very slow on parity checks) -

                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-

                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+

                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-

                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+

 

Interesting!  ASPM is "Active State Power Management" - a feature of PCIe.  I always turn it off in the bios.  Maybe those who experience issues with their cards can check how this is set in their bios?

Link to comment
  • Replies 453
  • Created
  • Last Reply

Top Posters In This Topic

Top Posters In This Topic

Posted Images

 

Interesting!  ASPM is "Active State Power Management" - a feature of PCIe.  I always turn it off in the bios.  Maybe those who experience issues with their cards can check how this is set in their bios?

 

I found this in mine:

 

LnkCap: Port #0, Speed 5GT/s, Width x8, ASPM L0s L1, Latency L0 <512ns, L1 <64us

ClockPM- Surprise- LLActRep- BwNot-

LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+

 

If the card that worked has this enabled, how do you turn it on? nothing seemed to jump out at me in MB BIOS, or for the card.

Link to comment

Output of my 2 (1015 reflashed) cards - and I am a "affected user", but not using marvell adapters:

02:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] (rev 03)
Subsystem: LSI Logic / Symbios Logic Device 3020
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 18
Region 0: I/O ports at ee00 [size=256]
Region 1: Memory at f04c0000 (64-bit, non-prefetchable) [size=16K]
Region 3: Memory at f0080000 (64-bit, non-prefetchable) [size=256K]
[virtual] Expansion ROM at f0000000 [disabled] [size=512K]
Capabilities: [50] Power Management version 3
	Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
	Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [68] Express (v2) Endpoint, MSI 00
	DevCap:	MaxPayload 4096 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
		ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+
	DevCtl:	Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
		RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset-
		MaxPayload 128 bytes, MaxReadReq 512 bytes
	DevSta:	CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-
	LnkCap:	Port #0, Speed 5GT/s, Width x8, ASPM L0s, Latency L0 <64ns, L1 <1us
		ClockPM- Surprise- LLActRep- BwNot-
	LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
		ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
	LnkSta:	Speed 5GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
	DevCap2: Completion Timeout: Range BC, TimeoutDis+, LTR-, OBFF Not Supported
	DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
	LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
		 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
		 Compliance De-emphasis: -6dB
	LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
		 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
Capabilities: [d0] Vital Product Data
	Unknown small resource type 00, will not decode more.
Capabilities: [a8] MSI: Enable- Count=1/1 Maskable- 64bit+
	Address: 0000000000000000  Data: 0000
Capabilities: [c0] MSI-X: Enable+ Count=15 Masked-
	Vector table: BAR=1 offset=00002000
	PBA: BAR=1 offset=00003800
Capabilities: [100 v1] Advanced Error Reporting
	UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
	UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
	UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
	CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
	CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
	AERCap:	First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
Capabilities: [138 v1] Power Budgeting <?>
Capabilities: [150 v1] Single Root I/O Virtualization (SR-IOV)
	IOVCap:	Migration-, Interrupt Message Number: 000
	IOVCtl:	Enable- Migration- Interrupt- MSE- ARIHierarchy-
	IOVSta:	Migration-
	Initial VFs: 16, Total VFs: 16, Number of VFs: 16, Function Dependency Link: 00
	VF offset: 1, stride: 1, Device ID: 0072
	Supported Page Size: 00000553, System Page Size: 00000001
	Region 0: Memory at 00000000f04c4000 (64-bit, non-prefetchable)
	Region 2: Memory at 00000000f00c0000 (64-bit, non-prefetchable)
	VF Migration: offset: 00000000, BIR: 0
Capabilities: [190 v1] Alternative Routing-ID Interpretation (ARI)
	ARICap:	MFVC- ACS-, Next Function: 0
	ARICtl:	MFVC- ACS-, Function Group: 0
Kernel driver in use: mpt2sas
Kernel modules: mpt2sas

03:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] (rev 03)
Subsystem: LSI Logic / Symbios Logic Device 3020
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 16
Region 0: I/O ports at de00 [size=256]
Region 1: Memory at f0ac0000 (64-bit, non-prefetchable) [size=16K]
Region 3: Memory at f0680000 (64-bit, non-prefetchable) [size=256K]
[virtual] Expansion ROM at f0600000 [disabled] [size=512K]
Capabilities: [50] Power Management version 3
	Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
	Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [68] Express (v2) Endpoint, MSI 00
	DevCap:	MaxPayload 4096 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
		ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+
	DevCtl:	Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
		RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset-
		MaxPayload 128 bytes, MaxReadReq 512 bytes
	DevSta:	CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-
	LnkCap:	Port #0, Speed 5GT/s, Width x8, ASPM L0s, Latency L0 <64ns, L1 <1us
		ClockPM- Surprise- LLActRep- BwNot-
	LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
		ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
	LnkSta:	Speed 5GT/s, Width x4, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
	DevCap2: Completion Timeout: Range BC, TimeoutDis+, LTR-, OBFF Not Supported
	DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
	LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
		 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
		 Compliance De-emphasis: -6dB
	LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
		 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
Capabilities: [d0] Vital Product Data
	Unknown small resource type 00, will not decode more.
Capabilities: [a8] MSI: Enable- Count=1/1 Maskable- 64bit+
	Address: 0000000000000000  Data: 0000
Capabilities: [c0] MSI-X: Enable+ Count=15 Masked-
	Vector table: BAR=1 offset=00002000
	PBA: BAR=1 offset=00003800
Capabilities: [100 v1] Advanced Error Reporting
	UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
	UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
	UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
	CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
	CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
	AERCap:	First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
Capabilities: [138 v1] Power Budgeting <?>
Capabilities: [150 v1] Single Root I/O Virtualization (SR-IOV)
	IOVCap:	Migration-, Interrupt Message Number: 000
	IOVCtl:	Enable- Migration- Interrupt- MSE- ARIHierarchy-
	IOVSta:	Migration-
	Initial VFs: 16, Total VFs: 16, Number of VFs: 16, Function Dependency Link: 00
	VF offset: 1, stride: 1, Device ID: 0072
	Supported Page Size: 00000553, System Page Size: 00000001
	Region 0: Memory at 00000000f0ac4000 (64-bit, non-prefetchable)
	Region 2: Memory at 00000000f06c0000 (64-bit, non-prefetchable)
	VF Migration: offset: 00000000, BIR: 0
Capabilities: [190 v1] Alternative Routing-ID Interpretation (ARI)
	ARICap:	MFVC- ACS-, Next Function: 0
	ARICtl:	MFVC- ACS-, Function Group: 0
Kernel driver in use: mpt2sas
Kernel modules: mpt2sas

Short description of what I "see":

- A parity check does not constantly run at full i/o speed (which is approx 1500mb/s total), but instad runs a lot at approx 500 mb/sec with peak happening to 1500 -> this results in a totally reduced overall speed

- unraid GUI reports a lot more "reads" in the disk page and they are not close to be the same for all disks - this was the case when I did run unrai5 (with the same hardware and disks) - I remember, that the reads (and write, when doing a rebuild) where closed to be the same on the affected disks (depending on their size of course!)

Because of the latter, I thought it might be interesting if there is anything hapening on the driver level and debug output might be interesting. @Tom: from where do you get the "readcounts" within the unraid driver?

Link to comment

I found the setting in the bios for PCIe power management, it was off, set to auto but there was no difference in the parity check speed.

 

root@Testv6:~# lspci -vv -d 1b4b:*
01:00.0 RAID bus controller: Marvell Technology Group Ltd. 88SE9485 SAS/SATA 6Gb/s controller (rev 03)
        Subsystem: Marvell Technology Group Ltd. Device 9480
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 16
        Region 0: Memory at dfa40000 (64-bit, non-prefetchable) [size=128K]
        Region 2: Memory at dfa00000 (64-bit, non-prefetchable) [size=256K]
        Expansion ROM at dfa60000 [disabled] [size=64K]
        Capabilities: [40] Power Management version 3
                Flags: PMEClk- DSI- D1+ D2- AuxCurrent=375mA PME(D0+,D1+,D2-,D3hot+,D3cold-)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [70] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 4096 bytes, PhantFunc 0, Latency L0s <1us, L1 <8us
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
                        MaxPayload 256 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-
                LnkCap: Port #0, Speed 5GT/s, Width x8, ASPM L0s L1, Latency L0 <512ns, L1 <64us
                        ClockPM- Surprise- LLActRep- BwNot-
                LnkCtl: ASPM L0s Enabled; RCB 64 bytes Disabled- Retrain- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 5GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Not Supported, TimeoutDis+, LTR-, OBFF Not Supported
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
                LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
                         EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
        Capabilities: [100 v1] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
        Capabilities: [140 v1] Virtual Channel
                Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
                Arb:    Fixed- WRR32- WRR64- WRR128-
                Ctrl:   ArbSelect=Fixed
                Status: InProgress-
                VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
                        Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
                        Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=01
                        Status: NegoPending- InProgress-
        Kernel driver in use: mvsas
        Kernel modules: mvsas

 

 

Bkastner card is also limited, is one of the luckier ones in the upper limit but based on is drives and average speed I roughly calculated is starting speed at around 90Mb/s, should be 150Mb/s, he can confirm this by starting a parity check, waiting 5 minutes and post the speed, so while I wouldn’t complain if that was my speed he is still being limited.

 

I’m starting to think this is more a driver than Unraid issue, in which case there’s probably little Tom can do about it.

Maybe we found why only some SAS2LP hare having stability issues, those with the 9845 chipset, and something can be done about that, because it’s a far more serious problem.

 

Link to comment

Short description of what I "see":

- A parity check does not constantly run at full i/o speed (which is approx 1500mb/s total), but instad runs a lot at approx 500 mb/sec with peak happening to 1500 -> this results in a totally reduced overall speed

 

 

This symptom is very similar to one I was having earlier, by any chance do you have any Samsung disks model HD203WI or HD153WI?

 

If so see here:

http://lime-technology.com/forum/index.php?topic=42384.0

Link to comment

 

So, back to the question I posted earlier? Have you tried the tunables script at all?

 

http://lime-technology.com/forum/index.php?topic=29009.0

 

I would highly suggest you use this to refine your parameters and see where it leaves you. Moving to v6 is also moving to 64-bit which can change things. I think dropping 50% is definitely extreme, but with a correctly tuned environment this may be substantially reduced.

 

How about this for output:

 

Tunables Report from  unRAID Tunables Tester v2.2 by Pauven

NOTE: Use the smallest set of values that produce good results. Larger values
      increase server memory use, and may cause stability issues with unRAID,
      especially if you have any add-ons or plug-ins installed.

Test | num_stripes | write_limit | sync_window |   Speed 
--- FULLY AUTOMATIC TEST PASS 1 (Rough - 20 Sample Points @ 3min Duration)---
   1  |    1408     |     768     |     512     |   0.0 MB/s 
   2  |    1536     |     768     |     640     |   0.0 MB/s 
   3  |    1664     |     768     |     768     |   0.0 MB/s 
   4  |    1920     |     896     |     896     |   0.0 MB/s 
   5  |    2176     |    1024     |    1024     |   0.0 MB/s 
   6  |    2560     |    1152     |    1152     |   0.0 MB/s 
   7  |    2816     |    1280     |    1280     |   0.0 MB/s 
   8  |    3072     |    1408     |    1408     |   0.0 MB/s 
   9  |    3328     |    1536     |    1536     |   0.0 MB/s 
  10  |    3584     |    1664     |    1664     |   0.0 MB/s 
  11  |    3968     |    1792     |    1792     |   0.0 MB/s 
  12  |    4224     |    1920     |    1920     |   0.0 MB/s 
  13  |    4480     |    2048     |    2048     |   0.0 MB/s 
  14  |    4736     |    2176     |    2176     |   0.0 MB/s 
  15  |    5120     |    2304     |    2304     |   0.0 MB/s 
  16  |    5376     |    2432     |    2432     |   0.0 MB/s 
  17  |    5632     |    2560     |    2560     |   0.0 MB/s 
  18  |    5888     |    2688     |    2688     |   0.0 MB/s 
  19  |    6144     |    2816     |    2816     |   0.0 MB/s 
  20  |    6528     |    2944     |    2944     |   0.0 MB/s 
--- Targeting Fastest Result of md_sync_window 0 bytes for Final Pass ---
--- FULLY AUTOMATIC TEST PASS 2 (Final - 16 Sample Points @ 4min Duration)---
  21  |    720     |     768     |     -120     |   0.0 MB/s 
  22  |    728     |     768     |     -112     |   0.0 MB/s 
  23  |    736     |     768     |     -104     |   0.0 MB/s 
  24  |    744     |     768     |     -96     |   0.0 MB/s 
  25  |    752     |     768     |     -88     |   0.0 MB/s 
  26  |    760     |     768     |     -80     |   0.0 MB/s 
  27  |    768     |     768     |     -72     |   0.0 MB/s 
  28  |    776     |     768     |     -64     |   0.0 MB/s 
  29  |    784     |     768     |     -56     |   0.0 MB/s 
  30  |    800     |     768     |     -48     |   0.0 MB/s 
  31  |    808     |     768     |     -40     |   0.0 MB/s 
  32  |    816     |     768     |     -32     |   0.0 MB/s 
  33  |    824     |     768     |     -24     |   0.0 MB/s 
  34  |    832     |     768     |     -16     |   0.0 MB/s 
  35  |    840     |     768     |     -8     |   0.0 MB/s 
  36  |    848     |     768     |     0     |   0.0 MB/s 

Completed: 2 Hrs 4 Min 17 Sec.

Best Bang for the Buck: Test 0 with a speed of 1 MB/s

     Tunable (md_num_stripes): 0
     Tunable (md_write_limit): 0
     Tunable (md_sync_window): 0

These settings will consume 0MB of RAM on your hardware.


Unthrottled values for your server came from Test 0 with a speed of  MB/s

     Tunable (md_num_stripes): 0
     Tunable (md_write_limit): 0
     Tunable (md_sync_window): 0

These settings will consume 0MB of RAM on your hardware.
This is -115MB less than your current utilization of 115MB.
NOTE: Adding additional drives will increase memory consumption.

In unRAID, go to Settings > Disk Settings to set your chosen parameter values.

 

So what did I do wrong?  this was run in full auto.

Link to comment

Bkastner card is also limited, is one of the luckier ones in the upper limit but based on is drives and average speed I roughly calculated is starting speed at around 90Mb/s, should be 150Mb/s, he can confirm this by starting a parity check, waiting 5 minutes and post the speed, so while I wouldn’t complain if that was my speed he is still being limited.

 

I’m starting to think this is more a driver than Unraid issue, in which case there’s probably little Tom can do about it.

Maybe we found why only some SAS2LP hare having stability issues, those with the 9845 chipset, and something can be done about that, because it’s a far more serious problem.

 

I started a parity check and get the following:

 

Total size:                 6 TB

Elapsed time:         53 minutes

Current position: 342 GB (5.7 %)

Estimated speed: 109.4 MB/sec

Estimated finish: 14 hours, 22 minutes

 

I agree it would be great if I was seeing 150 MB/sec, but I still don't think it's that bad.

Link to comment

Short description of what I "see":

- A parity check does not constantly run at full i/o speed (which is approx 1500mb/s total), but instad runs a lot at approx 500 mb/sec with peak happening to 1500 -> this results in a totally reduced overall speed

 

 

This symptom is very similar to one I was having earlier, by any chance do you have any Samsung disks model HD203WI or HD153WI?

 

If so see here:

http://lime-technology.com/forum/index.php?topic=42384.0

No, no Samsung disks, they're all WD (of different sizes) - and I had hight parity check speeds under unraid 5 with the same disks.

Link to comment

OK so here are the results. 

 

Tunables Report from  unRAID Tunables Tester v2.2 by Pauven

NOTE: Use the smallest set of values that produce good results. Larger values
      increase server memory use, and may cause stability issues with unRAID,
      especially if you have any add-ons or plug-ins installed.

Test | num_stripes | write_limit | sync_window |   Speed 
--- FULLY AUTOMATIC TEST PASS 1 (Rough - 20 Sample Points @ 3min Duration)---
   1  |    1408     |     768     |     512     |  37.4 MB/s 
   2  |    1536     |     768     |     640     |  38.1 MB/s 
   3  |    1664     |     768     |     768     |  39.2 MB/s 
   4  |    1920     |     896     |     896     |  37.6 MB/s 
   5  |    2176     |    1024     |    1024     |  38.7 MB/s 
   6  |    2560     |    1152     |    1152     |  39.2 MB/s 
   7  |    2816     |    1280     |    1280     |  32.4 MB/s 
   8  |    3072     |    1408     |    1408     |  33.1 MB/s 
   9  |    3328     |    1536     |    1536     |  31.9 MB/s 
  10  |    3584     |    1664     |    1664     |  32.1 MB/s 
  11  |    3968     |    1792     |    1792     |  30.6 MB/s 
  12  |    4224     |    1920     |    1920     |  33.1 MB/s 
  13  |    4480     |    2048     |    2048     |  32.0 MB/s 
  14  |    4736     |    2176     |    2176     |  33.2 MB/s 
  15  |    5120     |    2304     |    2304     |  32.8 MB/s 
  16  |    5376     |    2432     |    2432     |  33.2 MB/s 
  17  |    5632     |    2560     |    2560     |  32.3 MB/s 
  18  |    5888     |    2688     |    2688     |  32.0 MB/s 
  19  |    6144     |    2816     |    2816     |  32.1 MB/s 
  20  |    6528     |    2944     |    2944     |  32.0 MB/s 
--- Targeting Fastest Result of md_sync_window 768 bytes for Final Pass ---
--- FULLY AUTOMATIC TEST PASS 2 (Final - 16 Sample Points @ 4min Duration)---
  21  |    1568     |     768     |     648     |  38.5 MB/s 
  22  |    1576     |     768     |     656     |  38.1 MB/s 
  23  |    1584     |     768     |     664     |  38.4 MB/s 
  24  |    1600     |     768     |     672     |  38.4 MB/s 
  25  |    1608     |     768     |     680     |  38.3 MB/s 
  26  |    1616     |     768     |     688     |  38.3 MB/s 
  27  |    1624     |     768     |     696     |  38.1 MB/s 
  28  |    1632     |     768     |     704     |  38.4 MB/s 
  29  |    1640     |     768     |     712     |  38.8 MB/s 
  30  |    1648     |     768     |     720     |  38.2 MB/s 
  31  |    1656     |     768     |     728     |  38.5 MB/s 
  32  |    1664     |     768     |     736     |  38.3 MB/s 
  33  |    1680     |     768     |     744     |  38.3 MB/s 
  34  |    1688     |     768     |     752     |  38.4 MB/s 
  35  |    1696     |     768     |     760     |  38.4 MB/s 
  36  |    1704     |     768     |     768     |  38.4 MB/s 

Completed: 2 Hrs 10 Min 59 Sec.

Best Bang for the Buck: Test 3 with a speed of 39.2 MB/s

     Tunable (md_num_stripes): 1664
     Tunable (md_write_limit): 768
     Tunable (md_sync_window): 768

These settings will consume 65MB of RAM on your hardware.


Unthrottled values for your server came from Test 29 with a speed of 38.8 MB/s

     Tunable (md_num_stripes): 1640
     Tunable (md_write_limit): 768
     Tunable (md_sync_window): 712

These settings will consume 64MB of RAM on your hardware.
This is 14MB more than your current utilization of 50MB.
NOTE: Adding additional drives will increase memory consumption.

In unRAID, go to Settings > Disk Settings to set your chosen parameter values.

 

It doesnt look to me like there is any appreciable difference.

Link to comment

Short description of what I "see":

- A parity check does not constantly run at full i/o speed (which is approx 1500mb/s total), but instad runs a lot at approx 500 mb/sec with peak happening to 1500 -> this results in a totally reduced overall speed

 

 

This symptom is very similar to one I was having earlier, by any chance do you have any Samsung disks model HD203WI or HD153WI?

 

If so see here:

http://lime-technology.com/forum/index.php?topic=42384.0

 

I have two HD204UI's.  But if you look at hte below speed test, they dont stand out as being slower than my other older drives and with the SASLP card my speed is substantially faster.  I dont think it's a drive issue.  Thanks though.

speed_test.JPG.a22dd2782641c55b758185738f327118.JPG

Link to comment

I have two HD204UI's.  But if you look at hte below speed test, they dont stand out as being slower than my other older drives and with the SASLP card my speed is substantially faster.  I dont think it's a drive issue.  Thanks though.

 

I also have some Samsung HD204UI and they work fine, the problem I had was only with the older Samsung HD203WI.

Link to comment

I have two HD204UI's.  But if you look at hte below speed test, they dont stand out as being slower than my other older drives and with the SASLP card my speed is substantially faster.  I dont think it's a drive issue.  Thanks though.

 

I also have some Samsung HD204UI and they work fine, the problem I had was only with the older Samsung HD203WI.

 

Before characterizing it as a "problem", be sure you're comparing the speeds appropriately.    The HD203WI is a 500GB/platter drive;  the HD204UI is 667GB/platter ... so the HD204UI's will transfer data 33% faster than the HD203WIs.    Are you seeing a greater difference than that?

 

I haven't followed this issue enough to have checked all the various drives in use and platter densities -- but I do know that folks with arrays using all the same drives -- and 1TB/platter areal density -- are definitely having speed issues, so SOMETHING is indeed wrong.    Just want to be sure we're looking at actual issues and not just drives with different areal densities having different speeds that match those density differences.

 

Link to comment

 

Before characterizing it as a "problem", be sure you're comparing the speeds appropriately.    The HD203WI is a 500GB/platter drive;  the HD204UI is 667GB/platter ... so the HD204UI's will transfer data 33% faster than the HD203WIs.    Are you seeing a greater difference than that?

 

I haven't followed this issue enough to have checked all the various drives in use and platter densities -- but I do know that folks with arrays using all the same drives -- and 1TB/platter areal density -- are definitely having speed issues, so SOMETHING is indeed wrong.    Just want to be sure we're looking at actual issues and not just drives with different areal densities having different speeds that match those density differences.

 

Gary, you can read more about this issue here:

 

http://lime-technology.com/forum/index.php?topic=42384.0

 

The problem is that during most or big parts of a parity check, parity sync or disk rebuild speed will slow down to about 30 – 35Mb/s, with spikes or periods at normal speed, 80 to 100Mb depending on the disk position.

 

While I can’t say this will affect everyone with these disks, I tested on my different servers with very different hardware, like a Supermicro Intel based server and a HP N54L AMD based microserver, all have the same issue, seems to be worse if there are 2 or more of these disks

Link to comment

One final observation for Tom or anyone else, in the example below the parity drive is on the motherboard controller and all 8 data drives on the SAS2LP, first did a parity sync and then a parity check, as far as I understand both operations simultaneously read all disks on the SAS2LP and nothing more, but the read numbers for most drives during the parity check are a lot higher, is there any difference in the way a check works that could explain this issue?

 

hxtuaEq.jpg kXK0l0t.jpg

Link to comment

Unfortunately I need to contribute to this post.

 

I finally set up my backup server and build parity yesterday.

Speed were fine above 100MB/s.

Today I started the parity verification and noticed speeds around 35MB/s. @100% CPU load (dashboard).

 

Specs:

Board: Supermicro X7SBA

RAM: 4GB nonECC

Drives: 7+1 parity (WD, Seagate, Hitachi, 2TB-4TB)

Controller: onboard + DELL Perc H310 running LSI IT firmware P19

OS: stock unRAID 6.1.0; no plugins, no dockers

 

Let me know if you need more information!

chart.jpg.69397605be5141703388db0c8814442d.jpg

Link to comment

One final observation for Tom or anyone else, in the example below the parity drive is on the motherboard controller and all 8 data drives on the SAS2LP, first did a parity sync and then a parity check, as far as I understand both operations simultaneously read all disks on the SAS2LP and nothing more, but the read numbers for most drives during the parity check are a lot higher, is there any difference in the way a check works that could explain this issue?

 

hxtuaEq.jpg kXK0l0t.jpg

What I see there is what I noticed myself: The read counts are a lot different - e.g. 75k vs. 103k. I knwo they are not always "exactly" the same, but with my build on unraid 5 they were usually very close (unless there was any trouble with a respective disk).

@Tom: from were does unraid get those counts? Does it make sense to activate debug logging of the sas2lp driver to see, if there is something strange happening causing re-reads - which could of course slow down performance and reduce overall throughput? Obviously, we do NOT end in read errors - as noone so far reported that reads really do fail ...

Link to comment

It's normal for the read counts to be very different, due to differences in the size of the various I/O requests.

 

I know it’s normal for read count to be different between disks, what I noticed it that they are also very different for a parity sync and a parity check, for example in the example above, counter was reset before the operation.

 

Parity sync reads for disk 2 @ 23.5% = 103.854

Parity check reads for disk 2 @ 17.9% = 1.559.532

 

This led me to think that maybe there’s a difference in the way both operations work that could explain the performance issue.

 

Link to comment

Yes, it is a Celeron 450.

Is there no other way as of switching the CPU?

 

I believe if you are seeing 100% CPU usage it’s your only option, it doesn’t have to be an expensive CPU, any dual core close to 2Ghz or above should be enough.

 

The CPU is not limiting I/O operation. I have a Celeron G1840 and it handles it just fine. There is an underlying issue with reads during a parity check.

Link to comment

The CPU is not limiting I/O operation. I have a Celeron G1840 and it handles it just fine. There is an underlying issue with reads during a parity check.

 

Celeron G1840 is dual core, if you read my post I only mention single core Celerons, usually socket 775, those do limit parity checks, easily confirmed by checking if the CPU utilization is pinned at 100% during one.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.