greg_gorrell

Members
  • Posts

    172
  • Joined

  • Last visited

About greg_gorrell

  • Birthday 07/02/1989

Converted

  • Gender
    Male
  • URL
    htpps://gorrell.tech
  • Location
    Bradford, PA

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

greg_gorrell's Achievements

Apprentice

Apprentice (3/14)

1

Reputation

  1. Just a quick look at the logs and your drive config shows some sort of BTRFS corruption. Without digging in further, I cannot say for sure but there is a possibility that the memory problems lead to corrupt filesystem or you have a bad drive/controller. I say this because the high number of errors in the SMART stats and the system log entries like this one: I would start by using a known good drive for cache or testing it on the array to rule the drive out.
  2. Yes, that is the name of the zpool. I did notice that everything works fine with a fresh docker.img file created on the cache or array via the settings and the appdata folders on the zpool, so it is definitely some weird little bug with using ZFS. It works for now, I'll see what happens with the official ZFS implementation when it comes around. Apologies for the delay in responding, I had a drive go on the other server that kinda took priority lately but thank you for taking the time to check out the diagnostics and offer input.
  3. Perhaps I am not asking in the correct way. Could somebody please explain to me how the web interface interacts with the underlying services? When I click "reboot server" on the main page, what has to happen for the "shutdown -r" command to be executed by the system? Is it possible that the web server component of Unraid sends a command to the system and will not move on until that command is completed? Is it possible I have an issue with Docker or ZFS and that issue is why the timeout is occurring, rendering the timeout more of a symptom than a cause for the problem? Thanks again.
  4. Thanks for the reply Squid, but I honestly am not sure what would be causing this issue but the "upstream timeout" seems to be in the log every time this happens. To clarify though, I don't suspect CA has anything to do with it and just happens to be related to the job I was performing at the time this happened most recently. Generally, the timeout occurs when I am accessing the Docker page and not the CA Apps page, like it timesout when querying the service. Since I am getting no other information from the logs, I have no clue where to start but it seems that the issue lies with the query itself from the Web GUI and the Dockers. Maybe not, but when I start the array and run it with either Docker disabled completely or Docker enabled with no containers running, the issue does not manifest itself. After a couple Docker containers are running, at some point this timeout will occur and any subsequent commands to control a service will not execute, whether sent over SSH or the web interface. I am going to attempt to move all of the docker related stuff off the zpool and onto a cache drive managed by Unraid, but any assistance on how to better troubleshoot this would be greatly appreciated.
  5. Hello, I have an H ML350P with plenty of resources, been attempting to upgrade from 6.8.2 to 6.9.2 for quite some time now. Each time I do, I am unable to start more than a few docker containers before the Web GUI starts acting erratically. I am not sure what the cause is, and the only thing I see in the logs in common with each occurrence is the following "upstream time out" error messages: Jan 29 09:49:19 ML350P nginx: 2022/01/29 09:49:19 [error] 9325#9325: *902 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 10.0.0.51, server: , request: "POST /plugins/community.applications/scripts/notices.php HTTP/1.1", upstream: "fastcgi://unix:/var/run/php5-fpm.sock", host: "10.0.0.101", referrer: "http://10.0.0.101/Main" After this error shows in the log, no commands to the server will work. While I can still navigate the web interface, aside from the Docker page which just tries to load endlessly, I am unable to send a command to stop or restart the docker service. The machine will not reboot either with the button on GUI or by command line. The syslog indicates the system is going down for reboot, but the nothing happens after that. I have not been able to pin this down to a particular container and seems to be fine when no containers are running. As soon as I fire up three or so, I can expect the issue to occur again. Also, I should note that I am using ZFS and that is where my docker config and containers are located. I have also tried deleting the docker image file as well, now I cannot even get them to run from the templates. I am thinking this may be a ZFS issue, but is there anywhere else I can look for some clues? Here is what happens when I tried to add my dockers back: Jan 29 10:20:44 ML350P nginx: 2022/01/29 10:20:44 [error] 7556#7556: *2249 upstream timed out (110: Connection timed out) while reading upstream, client: 10.0.0.51, server: , request: "POST /Docker/AddContainer?xmlTemplate=user%3A%2Fboot%2Fconfig%2Fplugins%2FdockerMan%2Ftemplates-user%2Fmy-UniFi-Video.xml&rmTemplate= HTTP/1.1", upstream: "fastcgi://unix:/var/run/php5-fpm.sock:", host: "10.0.0.101", referrer: "http://10.0.0.101/Docker/AddContainer?xmlTemplate=user%3A%2Fboot%2Fconfig%2Fplugins%2FdockerMan%2Ftemplates-user%2Fmy-UniFi-Video.xml&rmTemplate=" Thanks in advance! ml350p-diagnostics-20220129-1040.zip
  6. No, pihole is on the same untagged, native 10.0.0.0/24 VLAN the as Unraid eth0 interface. There are no firewall rules or other networking issues at play either. I just tested on my second Unraid server, we will say with 10.0.0.3 IP. I created the VLAN bridges exactly as I did on the first server and I can successfully ping the Pihole. This leads me to believe that it isn't normal behavior and there is a configuration issue with the first Unraid box.
  7. Hey all, I was hoping someone would be able to explain some behavior that seems a little odd to me. I and currently on the latest production release of Unraid and this particular server runs a pfsense VM and an Manjaro VM. The pfsense VM is say 10.0.0.1 and has a dual port NIC passed through which is connected to my modem on WAN side and LAN side to a managed L3 Cisco switch. The other day, I created some VLANs on my network to segment some traffic like most do. In Unraid, I set up the VLANs as well, each having their own br0.vlan00 interface and I moved the dockers which are exposed to the internet on their own VLAN. I have a pihole running at say 10.0.0.80 which provides DNS for all of the network currently. Before creating the VLANs, my unraid server and all the docker containers would resolve DNS through the pihole. After creating the VLANs though, nothing on the Unraid box can reach the pihole. Keep in mind, although I have created VLANs, I have not moved the pihole yet and both Unraid and the pihole are on the same 10.0.0.0/24 LAN network, I have only added additional br0.vlan00 interfaces. Is there a reason that I am unable to even ping the pihole IP, either from the Unraid host at 10.0.0.2 or from the dockers or VMs utilizing br0?? After moving the Dockers to the "DMZ" VLAN, obviously a different subnet, they are able to resolve requests from the pihole and ping it as well. Perhaps this is more a Linux behavior than Unraid, but I have not encountered it before as this is my first foray into VLANs on a Linux box so could someone confirm this is typical? Thanks in advance!
  8. What a moron, I skimmed the whole thread before posting and I still missed that. Sorry guys! Edit: I wouldn't say it is "ignored," just not reflected in the GUI in my case.
  9. I began testing this build on my HP ML350 Gen8 in hopes the temperature values would be fixed in the GUI with the smartctl changes I noticed in the code. Unfortunately this hasn't done anything to resolve the problem of the default "Automatic" setting not pulling the SMART data (incorrect syntax error), and when set manually I still am not getting the temperature data on the Main or Dashboard tabs. I have also noticed a new problem occurring now in this build that wasn't on 6.8.x. If I go to an individual disk and set the SMART controller manually, after clicking "apply" and reloading the page, the SMART data will update and reflect the change but the GUI still shows "default." Just playing around some these settings, I have noticed it can be somewhat buggy as well. I have hit apply and refreshed the page on two occasions now only to have the settings revert back to the default. Perhaps someone could try to reproduce this error.
  10. Does anyone know why I would be unable to access Heimdall all of a sudden? It worked fine for a year now then over the past week since we had a power outage, I try to login and go to a 419 page informing me that my session expired. I just access this container inside my network via IP so no reverse proxying or DNS issues. I also tried deleting the keys and still had no luck. What might be causing this and how can i force a new session?
  11. Yes, I was getting all smart data when running manually. Looking at files in /var/local/emhttp/smart, it is clear that the underlying command hard-coded into the dynamix webgui is not running properly nor is it affected by what settings are entered for either each disk or globally. See my last post though, I explored the code a little bit last night and believe that this is something currently be resolved and would expect to see in next beta release. Lots of lines were removed in and only one added, and I am not even sure how to interpret it correctly: if (file_exists("$file") && exec("grep -Pom1 '^SMART.*: \K[A-Z]+' ".escapeshellarg($file)." |tr -d '\n' 2>/dev/null", $ssa) && in_array("$ssa",$failed)) { as referenced in this commit on July 12: https://github.com/limetech/webgui/commit/6f8507e5474e9b77fef836ee7379a1bee25a7a5b
  12. I just spent the whole evening trying to figure out what is going on here. After playing around int he Global Disk Settings, trying various SMART controller types and testing, I think I have some answers. I set the SMART controller to HP cciss globally and it wouldn't work. I tried a few others, eventually landing on "SAT," whatever the hell that may mean. To my surprise it returned all of the data I was looking for. Every one of the fields are populated and temperature data works, although like in the case of OP, it doesn't transfer to the Main page. As mentioned before, the Dynamix webGUI is pulling the data from the /var/local/emhttp/disks.ini file. You would think this information might be related to the data in the files contained in the /var/local/emhttp/smart/ directory which seem to query the SMART data from the disk, but it appears that the files in the smart folder are not connected in any way. In my case, the directory contains basic text file for each disk that contains what would be the smartctl output for that device, as well as a file of the same name with a .ssa extension. No matter what SMART controller I select in the disk settings, the information in these files does not seem to change and just reports the following: After doing some more searching, none of these files have anything to do with the Dashboard or Main pages. It seems that others have had issues with the Areca controllers in which it has been stated that the smart reporting on Dashboard and Main pages are hard-coded in emhttp and the parameters are not able to be defined by the user. I checked out the webgui code and it and I want to say they are currently working on a fix for this, as there was a commit last month removing the smartctl command from the monitor script. The extent of my PHP knowledge is reading a couple chapters of PHP in 24 hours 20 years ago when I was in middle school, so I could be full of bs here. I just spent way too much time into this last night when I could have been implementing a script to alert me to issues via another method. Hopefully some devs can chime in on what is going on or if anyone here is familiar with the codebase and wants to check it out. It definitely seems like something too simple to just leave not working properly, especially when thats kind of an important feature for this OS.
  13. I just picked up an ML350P Gen8 and am going through the same issue currently. I put the P420i controller into HBA mode after getting all the firmware up to date and just have been doing some testing before I try to migrate everything over. I have a hodgepodge of SATA drives installed, different brands sizes, etc. and haven't noticed any issues with the fan speeds since running Unraid so I am taking that as a good sign. Unraid lists no temps or SMART data for any of the drives, but I am able to retrieve it via the smartctl command as you mention. One of my ideas was to try to use iLO and set an alarm for this but of course there is no easy way to install the agent that reports this data into Unraid, although I am considering trying to convert the rpm into a txz if there is no easy way to obtain the data in the GUI. If you are still working on this, I would be glad to exchange notes here and see if we can't figure out a way to solve this. It's so strange to me that Unraid can be so polished in many aspects and fall flat in others, especially when it's paid software and this mature. Regardless, I am going to try some things tonight and will share what I come up with if noteworthy.
  14. That is odd, in Chrome it does not work either and I simply get ERR_SSL_PROTOCOL_ERROR. My configuration is pretty much the same, although some of the directives are are defined in the ssl.conf and proxy.conf files. Just to verify, I removed the proxy-conf file for mediawiki I created and added your config you shared above. I get the exact same results in Firefox and Chrome now, without the ability to connect via IP internally now though. Any thoughts there? Could it be an issue with letsencrypt and/or the cert maybe?
  15. I am using this container on Unraid behind the Linuxserver.io letsencrypt container. I see that you recommended that in your documentation, which is very good I might add. I have learned a lot from your notes, so thanks for that. I am still new to Nginx though, and am having some issues getting it to work properly with mediawiki. I have tried using the docuwiki proxy config in the letsencrypt container and changing the proto, ip, port as needed but still having no luck. Currently, I am able to access the mediawiki container via IP internally, but when attempting to use the domain name I end up with an error: SSL received a record that exceeded the maximum permissible length. Error code: SSL_ERROR_RX_RECORD_TOO_LONG Can you share a configuration that works please? I would assume I am just directing it to the container IP:PORT with proxy_pass, but I can't semm to figure out the issue. I will note that I have a password enabled as well, just in case it is relevant. Thanks!