Pauven

Members
  • Content count

    548
  • Joined

  • Last visited

  • Days Won

    3

Pauven last won the day on July 5 2017

Pauven had the most liked content!

Community Reputation

39 Good

About Pauven

  • Rank
    Advanced Member

Converted

  • Gender
    Male
  • Location
    Atlanta Metro Area
  1. Is that a thing? All I know is I needed the kvm_intel.netsted=1 statement to allow VMWare images to play inside a Linux VM, and it works even on my Ryzen box. Without this option, the passed through CPU reports no VM capabilities. But I'll look into it further, maybe the AMD version would work even better...
  2. Change is made. I'll test and report back. Full disclosure, my append line has some extra baggage: label unRAID OS menu default kernel /bzimage append initrd=/bzroot acpi_enforce_resources=lax kvm_intel.nested=1 rcu_nocbs=0-15
  3. Technically, I think the m/b manufacturers are just programming to AMD spec, so the problem is still getting AMD to say what this feature does on their processor. No, I have never tried the rcu_nocbs kernel option. Since I have a Ryzen R7-1800X, I would need 'rcu_nocbs=0-15', correct? I don't know how to apply this, can you point me in the right direction please? To the best of my ability, yes same kernel. But this was 9 months ago, when Ryzen was first released. I think my testing was on the 4.10 or 4.11 branches, and after rc7a I never tested again with any new kernels, as I thought this was a resolved issue. I'll do ya one better: Point me to a distro that does hang, and I will test it. If it hangs, I'll shut up about it, but if it doesn't... Is this information public somewhere? I would appreciate a pointer if it is. I am a major AMD shareholder, and don't mind pursuing Investor Relations for an answer to this issue, but I need to know what I'm talking about. Thanks, Paul
  4. I'm a little confused by your post. Without C6 disabled, either via bios or zenstates, and no special kernel options, I would expect Ryzen to hang eventually. Are you saying that, with using zenstates to disable C6 it still crashes? And now you are trying "Disable Global c-state Control"? (which I presume is a bios option?) Sorry, I was not clear. 'Global C-state Control' is the BIOS setting that must be disabled to prevent Ryzen hangs. To be even more clear, you will also see in the Ryzen BIOS various options to Disable C6 or other C-states - these do not help. Only the 'Global C-state Control' setting has an impact. This is not intuitive that other c-state related settings don't accomplish the same thing. So what I was writing in my post above was that I have now disabled 'Global C-state Control' in my BIOS for the first time since pre-rc7a. And yes, using only ZenStates to disable C6 it still crashes. Perhaps by researching what 'Global C-state Control' does, you can better understand the nature of the problem and solution. Seems like that was the magic sauce. Is is possible to see what exactly this option accomplished by reviewing the old source code for 4.12.3? That seems like a big leap of logic. I would imagine that AMD would have eventually provided a mechanism for microcode updates from Linux, and Spectre probably forced their hand to get this done sooner then planned. While it would be nice if AMD provided a Ryzen hang fix via microcode, to me there is not enough info to even speculate on this. Besides, from my testing, Windows and other Linux distributions did not suffer from this issue, plus the CONFIG_RCU_NOCB_CPU_ALL option fixed this issue on unRAID without a microcode update. I also want to point out that I tried extensively to get other linux distros to crash, including running the same slackware distro and kernel version that you were utilizing for unRAID. I never had a crash outside unRAID. I know that "one data-point does not a fact make", and I'm not trying to point fingers, just sharing my experiences with my particular Ryzen system, which has attained 'Canary in a Coal-Mine' status - if there is a problem, my system hangs quicker than the rest, and has only ever hung while running unRAID, and never hung with rc7a or with 'Global C-state Control' disabled. Paul
  5. I'm experiencing hard crashes with 6.4.0 on my Ryzen build. For those that don't know, I'm the original discoverer of the Global C-state Control fix for Ryzens, and my unRAID server is extremely susceptible to this issue for whatever reason. I typically get crashes within a few hours if the problem exists. Until today, I was running 6.4-rc7a for several months, with Global C-state control ENABLED, and no other manual fixes. I had uptime exceeding 50 days, and never experienced a crash under rc7a. This morning I updated to 6.4.0 stable, and applied the ZenStates fix to the config/go file (my file shown here): #!/bin/bash # Start the Management Utility zenstates --c6-disable /usr/local/sbin/emhttp Within 3 hours, I have already experienced my first hard crash. Console was not responsive, no output anywhere. I have disabled Global C-state Control again, so hopefully I can be stable on 6.4.0 while this is being addressed. Here's what I don't understand: If this is an AMD bug, and is AMD's responsibility to fix, why did the fixes that were put into rc7a work so well? And why can't those same fixes be brought forward to 6.4.0? Thanks, Paul
  6. Anybody planning a Ryzen build?

    I was the original discoverer of the Ryzen stability issue and C-state solution. My server is extremely susceptible to the C-state issue, typically crashing in 4-8 hours when the issue is present. I'm running 6.4.0-rc7a, with C-states enabled, and my uptime is 52 days. I have avoided all of the recent 'Really Close' releases since 7a, as the changes just seemed too scary for me to be a guinea pig. I think it was the introduction of the block level device encryption. I don't plan to use it, but I have nightmares thinking that a beta version could somehow misbehave and accidentally encrypt my precious data, so that I never get it back. I know the odds of that happening are pretty much zilch, though if it could happen it would likely happen to me. I'm waiting for the next stable public release. Anyway, perhaps something has changed since 7a that lost the fix for the Ryzen C-state issue. Paul
  7. I've had lots of similar issues lately, also on -rc7a. The only solution I've found is to boot in safe mode. I think a plugin is somehow breaking this functionality. I'm currently running in safe mode for this very reason, and a parity rebuild is in progress, so at the moment I can't tell you what plugins I normally run. I'll check later.
  8. Are your freezing issues occurring with 6.4.0-rc7a, or a different version? Does disabling "Global C-state Control" in your BIOS fix the issue?
  9. Anybody planning a Ryzen build?

    Thanks Greygoose! I just reached 50+ hours uptime on 6.4.0-rc7a with C-states enabled. Looks like Lime-Tech may have solved the stability issue, good job guys! With C-states enabled, Idle wattage has dropped 10+ watts. My UPS only reports in 10.5w increments (which is 1% of the 1050w power rating), so actual savings are likely somewhere between 10.5w-21 watts. From earlier testing with a more accurate Kill-A-Watt, the actual delta between C-states enabled & disabled was between 12w-18w. Idle temps have dropped 2-3 degrees C on both CPU (41C) and System (36C). Not as much as I had hoped, but I think my expectations were off. I did a lot of initial testing with the case cover off, and temps have unsurprisingly increased simply from closing the case, as case fans are on lowest speed (three 120mm fans, 1000 RPM @ 35% PWM), and they have to suck air past the HD's, so very little airflow at idle. The CPU fan speed profile is set to 'Standard' in the BIOS. At max case fan speeds (2750 RPM), idle CPU temp easily drops to 35C and System to 30C, but the higher fan speeds consume an extra 10+ watts and make lots of noise. As a compromise, I just changed my minimum case fan speed to 1400 RPM @ 50% PWM, which is much more quiet and energy efficient than full blast, but still improves my idle temps a couple degrees over the slowest fan speeds: 39C CPU, 34C System. I'll probably change the CPU fan profile from Standard to Performance in the BIOS to see if that drops the 5C delta over ambient a bit, but other than that I think I'm done. I'm happy to have idle temps back in the 30's, at reasonable fan speeds/noise, and with idle watts back to a more reasonable level. Paul
  10. That again aligns with my experience. Under -rc7a, assigning the drive and starting the array once weren't enough. I had to stop/start multiple times, and the last reboot really helped sync things up. I also had Mover issues, as the share.cfg file had lost some settings related to the cache drive. Be sure to check Mover on your server, and if you're having problems, the solution was documented in the forum link in my previous post above. I can't tell if it is the new GUI misreporting what the system is doing (possibly caching old info), or the system misbehaving. Either way, this is the first time I've experienced this type of behavior in 8+ years of using unRAID. Paul
  11. Anybody planning a Ryzen build?

    That's not a bad idea. I've been using the same USB stick for 8+ years, since the beginning when I started with 4.5 beta4 (with its brand new 20-disk limit). How's that for a flashback. Though I've certainly wiped it on occasion over the years. Most recently I think for the 6.1 branch. Now that I've finally got things settled, I'm gonna let it chill as-is. If more problem crop up, this will be high on my trouble-shooting list. As far as dealing with the potential corruption, I might just have to start from scratch and rebuild my configuration if I wipe the drive. Otherwise, I'm simply restoring potentially corrupted files. Thanks. Paul
  12. I'm running this now on my Ryzen build. New drivers are greatly appreciated, as is the fix for assigning the cache drive. I did experience at least 1 Kernel Panic, snapped a pic you can see here (plus a write-up on my general experiences with -rc6/-rc7a): Huh, I had kinda noticed that the temps weren't displaying on my NVMe drive, but I was attributing that to my general cache drive assignment problems. Now that those are resolved, I can confirm that temps are not displaying. A very important configuration change has been applied that may eliminate the need for disabling Global C-state Control. My system typically crashes in a matter of hours with C-states enabled. I'm testing this now. Early reports are promising, though absolute confirmation may take some time. Here's the change (quoted from the release notes above): If you are looking for better VM performance, that will probably have to wait on fixes destined for a newer Linux kernel, though there may be some general improvements in 4.12. This is out of Lime-Tech's control. I had similar experiences, though technically mine started on v6.3 with my new Ryzen build when I swapped out my cache drive - I could never assign the new one until -rc7a. Thank you Lime-Tech. Paul
  13. unRAID OS version 6.4.0-rc6 available

    So ultimately -rc7a did resolve the issue, though not without some headaches along the way (including a Kernel Panic). Posted details here: Thanks for all your help! Paul
  14. Anybody planning a Ryzen build?

    Okay, multiple findings. First, when I checked on my server this morning, I found a Kernel Panic on the console screen, and the system was fully hung. Here's a pic: I restarted in Safe Mode again, started the array, and checked the share.cfg file. shareCacheEnabled was still missing. I stopped the array and went to the Settings/Global Shares panel. I couldn't directly apply "Yes" to "Use cache disk:", as it was already on "Yes" and wouldn't let me Apply it. I set it to "No", Applied, then set back to "Yes" and Applied. Now the share.cfg file got updated with the shareCacheEnabled="Yes" line, plus what appears to be several additional lines that must have also been missing. Here's the new file contents: # Generated settings: shareDisk="e" shareUser="e" shareUserInclude="" shareUserExclude="" shareSMBEnabled="yes" shareNFSEnabled="no" shareNFSFsid="100" shareAFPEnabled="no" shareInitialOwner="Administrator" shareInitialGroup="Domain Users" shareCacheEnabled="yes" shareCacheFloor="2000000" shareMoverSchedule="40 3 * * *" shareMoverLogging="yes" fuse_remember="330" fuse_directio="auto" shareAvahiEnabled="yes" shareAvahiSMBName="%h" shareAvahiSMBModel="Xserve" shareAvahiAFPName="%h-AFP" shareAvahiAFPModel="Xserve" Expecting Mover to now work again, I restarted into normal mode, as I wanted my temperature and fan plugins to keep my drives cool while the Mover got busy. On reboot, I confirmed that shareCacheEnabled="yes" was still in the share.cfg file. I then manually started Mover. This time the logged message was "root: mover: started", and I can see disk activity so it appears that Mover really is working. So it appears that my Cache drive and Mover troubles are finally over - thank you Tom and all who helped. That said, assuming Lime-Tech is already here reading this, I'd like to take a moment to recount my experiences with the -rc6/-rc7a releases: Experienced 1 Kernel Panic while in Safe Mode on -rc7a (above), and possibly another in Safe Mode on -rc6 (speculation) The upgrade from 6.3.latest to 6.4.0-rc6 coincided with whacking some cache related configuration file parameters (can't rule out plug-ins as a contributing factor) Could not assign the cache drive under -rc6, though -rc7a fixed this Several -rc7a anomalies (cache drive showing unassigned even though it was assigned, multiple Stop/Starts/Restarts required to get system synced up & behaving correctly) Currently, Mover is working but only at about 36 MB/s peak. Never paid attention before, because Mover is normally running in the middle of the night, but this seems rather slow. Possibly because data is being written to a drive that is 96% full, so this may be nothing. Odd caching in new GUI under -rc7a (didn't notice on -rc6) in which sometimes I have to Shift-F5/Forced Refresh to get current data presented. An easy example is the UPS Summary on the Dashboard, which kept reporting 157 watts for 30 minutes after I spun the drives down. I finally forced a screen refresh, and the status updated to 84 watts. Another example (plugin related) is the Dynamix System Temps ticker at the bottom of the screen doesn't seem to be updating. I've got both Firefox and IE open on the Main screen, and the ticker has been frozen on both for 10+ minutes, and they don't match each other. If I click around the menus, sometimes the ticker updates, and sometimes it just disappears. The behavior seems worse on IE than Firefox. On the plus side, the new 4.12 kernel includes some drivers that were missing, so that's pretty nice. It will take a while to determine if the C-state issue is resolved. Thanks, Paul
  15. Anybody planning a Ryzen build?

    Thanks for chiming in Tom. Yes, 7a. Okay, so I booted up in Safe Mode. With array stopped, first I check the Settings/Global Shares and see that Cache Enabled is showing 'Yes'. I don't touch anything here for the moment. I then go start the array. Whole GUI immediately becomes unresponsive, web pages returning "Server not found" errors. Telnet is unresponsive too. Can't find server by name or IP. This is the second time server has become unresponsive tonight, once on rc6, and now again on rc7a, both times in safe mode. Seems very odd that I have more problems in safe mode than in frivolous mode. There is a chance the server has hung from the C-state being re-enabled on my Ryzen, not sure at this time and don't want to rule anything out yet. One last thought: Mover worked great until I upgraded from 6.3.latest to 6.4.0-rc6, at which point it appears to have stopped working at the same time. Just thinking that might be a clue as to how the shareCacheEnabled var got whacked. That's enough for one night, gonna hit the sack.

Copyright © 2005-2017 Lime Technology, Inc. unRAIDĀ® is a registered trademark of Lime Technology, Inc.