[SOLVED] Performance terrible since V5 upgrade


Recommended Posts

I upgraded to V5 at the weekend and that process went smoothly enough. But now I'm noticing absolutely terrible delays, stuttering and playback problems when running my media centre software (MediaPortal) off the user shares. This has never been a problem before. MP simply references the SMB shares, and the performance problem is certainly not at the client end as the files all playback fine if copied locally. When starting playback it takes ages (30s or so) to even start anything at all, and then there is generally stuttering, sound picture breakup and if I'm extremely lucky it might actually kick in and play the files back properly, but generally it eventually dies completely and I have to actually kill off MePo. I thought it might be playback from the cache, but that isn't it. I made sure the files were on the array and same thing happens there. It doesn't work consistently even if I simply use something like graphstudio to playback directly from the /mnt/diskX wherever the file is actually located rather than through the User share. It doesn't seem to be related to any one disk in the array either. I'm stumped.

 

I'm not sure what to do or what to run to investigate. Downgrading back to 4.7 will be a pain, but looks like the only option at present. Syslog and smart reports all attached but I can't see anything in there. What kind of diagnostics can be run? Is there any particular hardware that is known to cause problems. All my 7 drives in the array and the cache are attached to the SM AOC-SAS2LP-MV8, only the cache is connected to a Mobo SATA port.

 

I upgraded to V5 all ready to order some 4TB drives to start replacing the old 1TB ones, but it looks like that is on permanent hold as going back to 4.7 will preclude that upgrade. Any ideas most welcome. 4.7 had been rock solid for me, and nothing else has been done to the rig, only V5.

syslog.txt

smart.all.txt

Link to comment

On quick look, three possible things you may want to look into:

 

1. "ondemand governor failed, too long transition latency of HW".  Look into the BIOS settings.

1b. ...so it went on to load the "p4-clockmod" module instead.  p4-clockmod has always caused problems. (It can be blacklisted with a boot option.)

2. You have a Realtec NIC. Some people reported problems with v5 and Realtec.

3. You're running a crapload of plugins with only 2GB RAM. Reboot without the plugins and see if your problems persist.

thanks, I'll start with #3 though I only actually installed Sabnzbd and SSH plugins. Can't really do without either in the long term but (especially SSH unless I want to use telnet not putty - urgh). This hasn't been a problem before as I had a whole load more packages installed on 4.7 than I do now. But its an easy one to test.

 

re #1 - a) what should I look at in the BIOS and b) p4-clockmod, how would one blacklist it and what would be the effect of doing so, given I have no clue that it does :)

 

Link to comment

Look in the syslog, Sabnzbd installs a crapload of packages.  If you prove that without it your problems go away, then your cheapest option may be to upgrade from 2GB to 4GB of RAM. (I strongly discourage using more than 4GB on a 32-bit distro. That causes a whole new set of problems.)

 

In the 'append' line in your syslinux.cfg you can add:  module_name.blacklist=yes

But leave that for last.

 

I am most suspicious of your Realtec NIC.

Many thanks for the suggestions, will try with no plugins first. replacing the 2x1Gb memory with a 4Gb stick or 2x2Gb shouldn't be too expensive though DDR2-667 isn't that easy to come by these days.

I think I have an Intel 1GigE PCI card lying around somewhere, that I could try instead of the onboard Realtek.

 

Will all have to be tomorrow now...a bit late in the UK now on a work night to be mucking around with my server - once I get going I'd be here at 6am I'm sure ;)

 

Link to comment

Look in the syslog, Sabnzbd installs a crapload of packages.  If you prove that without it your problems go away, then your cheapest option may be to upgrade from 2GB to 4GB of RAM. (I strongly discourage using more than 4GB on a 32-bit distro. That causes a whole new set of problems.)

Or use nzbget instead of SABnzbd. It has a much smaller RAM footprint and has now overtaken SAB in terms of functionality IMO.

Link to comment

A quick update. It looks like it is the lack of memory. Have had a stutter free evening of viewing, fingers crossed this is the only issue and it isn't Realtek NIC related. 2x2Gb is on order. Before rebooting without all the plug-ins I did notice that with just SSH and SAB running it was bumping along with only 50-60Mb free, which isn't a lot, probably mostly SAB, but I do prefer that over NZBget and know it better. Presumably V5 has a larger starting memory footprint than 4.7 or the current SAB package is much bigger than it used to be.

 

I'll leave it a day or so (and the new memory) before marking as SOLVED  :) Thanks for the help. I'll also leave it a few weeks before thinking about 4TB drives just incase a downgrade becomes necessary.

 

 

Link to comment

Unfortunately one of my old WD 1TB drives red balled yesterday, haven't had this happen for quite some time.  :'(

 

Tried to do JoeL's unassign/reassign procedure to get it to rebuild itself, still bad. Lots of errors in the syslog and maybe it was also contributing to the original performance problems. So, I'm currently unprotected and preclearing another 1TB spare I was lucky to have lying around. I did actually start the preclear while the bad drive was still in it's slot, and the performance was terrible ~12MB/s and the server hungup completely. Switched off, pulled out the bad disk5 and restarted the preclear, immediately up over 60MB/s (very old drives remember). It is now on the postclear steps so will be ready to go into the disk5 slot this evening wait for it to rebuild. I wouldn't do anything else while unprotected (esp as there are other quite old drives in there, don't want to lose two at once).

 

I hope to be able to go back to the original performance problem testing this weekend, and also the extra memory should turn up today. I've found my old Intel 1GigE PCI card as well in case the Realtek is the problem. Looks like a full UnRAID weekend is on the cards.

 

I guess once the current setup has finally stabilised again I really should swap out the old 1TB's for shiny new 4TB drives.  ;D

Link to comment

Staggering from one disaster to the next  >:(

 

Got everything up and running, rebuilt disk5. All operational all night, had SAB running too, and the mem usage didn't seem to be a problem.

 

This AM decided to try the new memory 2x2Gb, booted up fine. 2 hours later inoperable, turned off plugged into a monitor and the VGA out wasn't even working and in one instance the machine wouldn't even power up or POST, though tricky to tell why without a screen!! Reverted to old memory - VGA and boot all working again.

 

So now I'm back to the original configuration, simply with a different 1TB drive in place of the one throwing errors. Fingers crossed that it was the failing HDD the actual cause of the performance problems.

 

Mem Free doesn't look too bad right now, even when SAB was busy downloading last night it had lots free in the cache.

 

            total      used      free    shared    buffers    cached

Mem:      2064888    480536    1584352          0      43592    331780

Low:        889648    113124    776524

High:      1175240    367412    807828

-/+ buffers/cache:    105164    1959724

Swap:            0          0          0

 

Currently in the middle of a parity check, due to the last crash I really need it to be fine before going any further, which will take many hours.

 

If I still get problems the next item to try is NIC replacement, after that its total replacement of Mobo/CPU/RAM time. I have another less important machine I could butcher to get its CPU, and have a spare C2D mobo for it lying around. I guess I may as well stick a lower wattage 64-bit CPU in there while I'm at it in prep for a 64-bit UnRAID that could use all the memory in any new rig.

 

Does anyone know why bad or incorrect memory would cause the onboard VGA to stop working and failure to start-up? I've got 2x2GB DDR2 667MHz PC2-5300 memory just lying around now. It might be fine so RMA might not be possible.

 

Link to comment

The memory could very well be bad. Bad memory can cause anything to not work.

Put one stick in the DIMM0 slot and run memtest for 24hrs or until first sign of error. Repeat for both sticks.

See, you can have sticks that are each good by itself, but not good together.  Ultimately, the memtest should be run with all sticks installed.  If that passes without an error, then you know that your problems are somewhere else.

 

Adspence, you should ALWAYS go through a memtest when installing new memory.

 

The OP bought them at the same time. I assumed he got a matched pair or the same brand which should work together just fine.

 

Link to comment

Running a memtest is certainly a good idea, but not really possible when you can't actually get to the machine at all. Anyway, I don't believe the cause of my problems is the lack of memory. Once I've sorted out what is really going on I'll test out the new memory again 1 stick at a time.

 

At the time being I'm doing yet another parity check after replacing disk2 (was disk5 the first time). I'm now out of options for replacement 1TB drives. My main worry at the moment isn't the duff memory I might have bought, but the risk of losing two drives at once having lost 2 separately over the last 3 days. I have some new 2TB (and 4TB for testing) coming now, but either there is something seriously wrong with my rig or I'm just unlucky. It seems far too much of a coincidence that all this started happening 2 days after the V5 install when it had been fine for 1-2 years running without any changes at all.

 

Until the new 2Tb drives arrive I'm not touching anything else. If I'd lost another one when rebuilding disk2 I'd have been screwed.

Link to comment

 

Why are you doing this to yourself?  You are creating a mess of the whole situation.

 

Attack the problem the proper way: First, prove for a fact whether the memory if good or not.  Without that, you can only be making a bigger mess of your disks.  So, put your new 2x2GB sticks in the server, and boot it with the memtest menu.  Let's see if it can pass at least 12 hours of memory tests without an error.  Once we know the result of that, then we can worry about other things.

Good suggestion. I'll do this overnight tonight. I have to ship the server into my lounge to attach a monitor, and boy is it noisy with all the fans whirring away. That's if I see anything out of the VGA out when the 2x2GB is attached, then it would be down to each one separately to see which is causing the problem.

 

Hopefully I won't lose any drives in the interim, the over night parity check fixed 75 errors, is there any way to figure out which drive(s) were fixed?

 

Link to comment

There won't be any memtesting tonight. I tried to boot the server with just one of the sticks, worked with one and not with the other. So clearly I've a duff 2GB stick and will need to RMA it. :(

 

It will mean that I won't be able to bump the server up to 4Gb to see if that is the root cause of the performance issues (still think this is unlikely given drives are now causing problems intermittently). Not sure there is much point testing the single working 2Gb stick, I've been running with the old 2x1GB happily for years, is there any point testing that?

 

I do have some 2x2Gb DDR2 in my other Desktop but that is DDR2-800 not DDR2-667. Should I try that while I wait for the RMA RAM? I'll only butcher that machine if we think it will help find the root cause of the overall performance problems.

Link to comment

I've been running with the old 2x1GB happily for years, is there any point testing that?

Yes, if you want to clearly prove that the ram was not what caused your problems.

 

I do have some 2x2Gb DDR2 in my other Desktop but that is DDR2-800 not DDR2-667.

DDR2-800 is good to run at 667 speed.  The '800' means good for speeds up to 800.  The BIOS will know how to run it.

 

Should I try that while I wait for the RMA RAM? I'll only butcher that machine if we think it will help find the root cause of the overall performance problems.

That's totally up to you.  Do you plan to keep on using tha other machine?  There's always the risk of breaking things when pulling parts out.

Thanks for the info. So, I'll put the previous 2x1Gb sticks (which also happen to be 800 I noticed) back in the server and do an overnight memtest on that. I actually did take the 2x2Gb out of the other PC and quickly check that out and it is fine to run the server on (and did a single memtest pass). I could survive without the other PC for the time being, or with it on a reduced amount of memory, if we think that the server needs 4Gb.

 

I don't want to do too much actual usage on the server right now anyway until I've some new spare drives. So an extended memory test is fine. I've copied some video locally to watch tonight anyway :)

 

Link to comment

Test with no add-ons.

Sure, and that is what I was doing last week when two 1TB drives redballed one after the other. One of those drives had to go back in the array, as I'd run out of 1TB spares. It doesn't seem to have any problems that I can see in smart reports, and did 2x preclears on it prior to reintroduction and rebuild.

 

I will hopefully receive two new 2TB drives tomorrow so if this old 1TB is really suspect then I have replacements ready at hand, so won't be doing much real testing until after they come.

Link to comment

Finally everything is now back and running

- one disk swapped out permanently (was disk2)

- one 1TB put back into rotation as disk2 (was disk 5)

- disk 5 (the first redball last week, now disk2) is now a replacement "old" 1TB drive which wasn't previously part of the array when I started this journey

- No plugins

- Parity checked

- original 2x1GB that has a successful memtest done for 12+ hours

- playback stutters from a video file on disk5.

- Bags of free memory.

- Copied the file onto cache disk, same thing, stuttering playback which eventually hangs completely

- not a client problem as the file plays without problem when local

 

Really no closer to figuring out why. What are my options for figuring out these performance problems? Or am I faced with downgrading to 4.7 to see if it gets better, if so how simple is this to do?

smart.sdd.disk5.20130911.2050.txt

syslog.20130911.2040.zip

Link to comment

Make several large transfers and then collect the syslog.

What would be your definition of large? And presumably you mean from the server to my clients? Nothing ended up in the log whatsoever during simple video playback, 12 files of ~1Gb each didn't show anything in the log either.

Argh, it turns out that the spare NIC I had was a D-Link DGE-528T not an Intel as I first thought. As per http://lime-technology.com/forum/index.php?topic=25008.msg217232#msg217232 it is actually a Realtek 8169 and it doesn't work...at all. Bl**dy Realtek, all my mobo's have them and so now do these cards...I even have TWO of these D-Link's. It seems like another order is required for a kosher Intel NIC. Any other options?

Link to comment

On a quick search of eBay, it looks like some decent Intel gigabit PCI-e NICs can be had for around $10-$15 US shipped.

Already order a new Intel PRO/1000GT NIC, and more replacement 2x2GB memory after having to RMA the last lot.

 

While I had the back off the server I realised that maybe my PSU might be a problem. While it is nominally 400W, it has 2 12V rails, both rated at 15A. It isn't altogether clear to me how the two rails are arranged, but it seems likely that I'm trying to run 8-10 drives on 15A (depending on cache and if I have my "spare preclear slot" busy). A few of the 2TB drives are 7200rpm and others non-green so they'll be drawing a bit more than the others when busy. That said, if this is a problem I should have been having it almost every time a parity check ran with all the drives fired up (though I get that is just 9 because 1 is my cache).

 

Given I plan to upgrade all the 1TB/2TB drives to 4TB soon I thought it prudent to beef this up. So I've order a Seasonic X650 as well. The likelihood is that I'll be adding another 5x3 drive cage at some point anyway :)

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.