Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mosquitto 1.5.1 becomes unresponsive after max open files is exceeded #948

Closed
wiebeytec opened this issue Sep 4, 2018 · 18 comments
Closed
Labels
Component: mosquitto-broker Status: Completed Nothing further to be done with this issue, it can be closed by the requestor or committer. Type: Bug
Milestone

Comments

@wiebeytec
Copy link

Mosquitto 1.4.15 would give 'access denied by tcpd' or something when you exceed the maximum open files (ulimit -n). Mosquitto 1.5(.1) seems to have the same limit, but no such log message appears. Instead, it jumps to 100% CPU and messages aren't getting through anymore. The threshold is quite sudden, from almost no CPU to a lot. Is it related to the E_POLL change?

I used this test script:

#!/bin/bash

SUBSCRIBERS=1500

[ -z "$wiebepassword" ] && echo "set variable wiebepassword" && exit 1

set -u

echo "PID: $$"

declare -a pids

trap kill_all_children SIGINT SIGTERM

for i in `seq 1 $SUBSCRIBERS`; do
  echo "Starting subscriber $i"
  timeout 300 mosquitto_sub -v -h mqtt.myserver.com -u [email protected] -P "$wiebepassword" --port 18883 --cafile /home/wiebe/mqtt/ca.crt -t "N/$RANDOM/#" > /dev/null & pid=$!
  pids+=($pid)
  #sleep 0.03
done

kill_all_children()
{
  for p in "${pids[@]}"; do
    kill -SIGTERM "$p"
  done
}

echo "$SUBSCRIBERS mosquittos started. Waiting for ctrl-c..."
wait

Without publishing, you can see a jump to 100% CPU. If you make 950 subscribers (with the default file limit of 1024), there is very little CPU.

To still test publishing (just an example):

mosquitto_pub -h mqtt.myserver.com -u '[email protected]' -P "$mqttpassword" --port 18883 --cafile ca.crt -t 'N/905932684246/in' -m "message"

(my user is admin, so I see all publishes)

Open file limit exhaustion may need to be dealt with more gracefully, but also the ulimit may need to be set better. Perhaps the process can increase it (based on a config) before switching to user mosquitto? I currently had to hack the init script (also for 1.4.15), and set it in Supervisor (I run two daemons).

@toast-uz
Copy link
Contributor

toast-uz commented Sep 6, 2018

Try to compile with the option WITH_EPOLL=no in order to clarify the cause.
Or, does the directive max_connections of conf become your remedy?

@wiebeytec
Copy link
Author

I expected to get the same behavior back without EPOLL ('access denied by tcpd'), but the behavior is actually the same (it does to 100% CPU when max open files is exceeded). Probably error handling is missing when opening a socket, and it retries?

My point about a config directive is that Mosquitto needs to be aware of max open files; to to give useful log messages and instructions/documentation to the user.

Take Nginx, which really has been engineered for this. It has worker_rlimit_nofile, and distro packaging has files like /etc/default/nginx to set the ulimit -n before the daemon is started.

@toast-uz
Copy link
Contributor

toast-uz commented Sep 8, 2018

Thanks. Understood that EPOLL was not the cause.
But I've not understood yet that max_connections didn't become your remedy. I think the max_connections makes your system safe under your test scenario. I've misunderstood?

@toast-uz
Copy link
Contributor

toast-uz commented Sep 8, 2018

p.s.
Nginx has both of worker_rlimit_nofile and worker_connections maybe due to the complicated features of nginx and http-portocol. Actually the recommendation of worker_rlimit_nofile is two or three times of worker_connections. On the other hand, because the features of mosquitto and mqtt-protocol is more simple, I guess max_connections only itself becomes enough solution.

@wiebeytec
Copy link
Author

Adding max_connections doesn't solve anything in and of itself. If you set it to 10000, you'll still have the 100% CPU issue if you also don't make ulimit -n higher. It needs some additional logic: failure to start if you set max_connections higher than ulimit -n, useful log messages, etc.

@karlp
Copy link
Contributor

karlp commented Sep 8, 2018

I agree that mosquitto should give you a better error message if it can detect that it was out of files, but I don't think it's any of mosquitto's business to go around validating ulimits vs connection settings of it's own. Mosquitto should limit itself to what it's been configured to do, but what the external container rules are are none of it's concern.

Certainly, simply disabling ulimits wholesale in init scripts goes against the entire purpose of ulimits.

@toast-uz
Copy link
Contributor

toast-uz commented Sep 8, 2018

I am at the side of @karlp . Mosquitto's behavior after max open files is an issue. But it is minor because setting max_connections properly becomes the remedy for @wiebeytec 's concern.

@wiebeytec
Copy link
Author

Certainly, simply disabling ulimits wholesale in init scripts goes against the entire purpose of ulimits.

I disagree. What if you want to service 2000 clients with the default Mosquitto init script? You just can't without hacking then. Setting the ulimit is next to impossible for a daemon that way, because they are set at login. That's why Nginx packaging in Ubuntu/Debian provides /etc/default/nginx with a way to make the init script set ulimit.

@toast-uz toast-uz added the Status: Blocked Another issue needs to be resolved first label Sep 9, 2018
@toast-uz
Copy link
Contributor

toast-uz commented Sep 9, 2018

@wiebeytec , could you clarify what you insist now on this issue?

  1. Bug for the behavior after max open files. It's your original insistence. Keep the subject of this issue.
  2. New feature request for adding the option such as max_openfiles. It's your current insistence. Change the subject of this issue.

Thereby I will add labels on this issue, although I'm afraid I cannot promise fixing.

@karlp
Copy link
Contributor

karlp commented Sep 9, 2018

@wiebeytec methods of customizing init scripts is simply out of scope. By all means distro packaging should make it easy to set ulimits, but mosquitto shouldn't be trying to circumvent distro mechanisms for setting ulimits (and cgroups and all other system wide limits and restrictions)

@wiebeytec
Copy link
Author

@toast-uz the bug is that Mosquitto crashes to 100% CPU when you exceed your open file limit. It should detect this, and present the user with log messages, otherwise it's mysterious behavior. Whether or not to also implement max_connections, or whether that is a separate feature request I'll leave to you, but to me, it's part of the same.

As for the ulimits: the Mosquitto debs do provide an init script, which is where ulimit needs to be set based on something configurable (like /etc/default/nginx). This is how the Nginx init scripts do it too. This is not circumventing distro mechanisms, this is the only way, because the init script is provided by the deb. As I said before, you simply cannot run more than 1024 (the default in most cases) clients if the Mosquitto init script doesn't facilitate this (aside from hacking it, or using Supervisor to run it instead, or something).

@karlp
Copy link
Contributor

karlp commented Sep 12, 2018

@wiebeytec what I mean is, take that up with debian packaging, it's out of scope of mosquitto itself. Of course your distro should let you change these settings, I'm certainly not saying that.

@wiebeytec
Copy link
Author

But the deb, and init scripts, is provide by the Mosquitto authors here, albeit outdated at the moment.

@toast-uz toast-uz added Type: Bug Component: mosquitto-broker and removed Status: Blocked Another issue needs to be resolved first labels Sep 13, 2018
@toast-uz
Copy link
Contributor

@wiebeytec , thanks. I've labeled.

@ralight ralight added this to the 1.5.2 milestone Sep 18, 2018
@ralight
Copy link
Contributor

ralight commented Sep 18, 2018

It's interesting that there is this different behaviour. The difference you're seeing is down to compiling with WITH_WRAP=yes or no, the default straight from the repo/tarball is no. When using WITH_WRAP=no, a new incoming connection proceeds like this:

  • The socket is accepted with accept(), this succeeds, unless there are no more fds available, in which case it fails
  • If it fails, we return an error but the new connection is still on the listening socket queue, so we very quickly loop back round to try again - which fails again and again, hence the high CPU usage

If we have WITH_WRAP=yes, connections look like this:

  • The socket is accepted with accept(), this succeeds (all other errors ignored)
  • We attempt to make a tcp-wrappers check - this fails if the last accept() call took the last available fd. If it fails, we close the new socket and return with an error. The same situation as above, but with the important difference that the incoming connection has been removed from the listener queue.

I've replicated this behaviour by keeping a spare socket around in case of fd exhaustion. If I get EMFILE when trying to accept(), then I close the spare socket, accept the incoming connection, close the incoming connection, then reopen my spare socket ready for the next time.

@ralight
Copy link
Contributor

ralight commented Sep 18, 2018

Could you take a look and close this if you're happy?

@ralight ralight added the Status: Completed Nothing further to be done with this issue, it can be closed by the requestor or committer. label Sep 18, 2018
ralight added a commit that referenced this issue Sep 18, 2018
@wiebeytec
Copy link
Author

So the builds provided by the Mosquitto deb repo do have tcp wrapping? What is the reasoning behind it?

I do know that 1.5.1 (built from source) has significantly higher performance than the pre-built 1.4 release. Connecting thousands of idle clients took 50% CPU with Mosquitto 1.4. With 1.5.1, it's about 5%. E_POLL is faster, obviously, but is TCP wrapping likely also detrimental to performance?

@ralight
Copy link
Contributor

ralight commented Sep 20, 2018

The tcp wrappers library is just a different mechanism for allowing/denying access with /etc/hosts.allow and /etc/hosts.deny. It only comes into play when a new connection is made, so unless you have a high reconnection rate it shouldn't have any impact on performance.

ralight added a commit that referenced this issue Nov 8, 2018
ralight added a commit that referenced this issue Nov 8, 2018
@lock lock bot locked as resolved and limited conversation to collaborators Aug 7, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Component: mosquitto-broker Status: Completed Nothing further to be done with this issue, it can be closed by the requestor or committer. Type: Bug
Projects
None yet
Development

No branches or pull requests

4 participants