Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mosquitto thread does not exit if currently reconnecting #2545

Closed
p-luke opened this issue May 23, 2022 · 2 comments
Closed

mosquitto thread does not exit if currently reconnecting #2545

p-luke opened this issue May 23, 2022 · 2 comments

Comments

@p-luke
Copy link

p-luke commented May 23, 2022

Hi, I found a scenario in which the mosquitto_loop_stop(), called with force=false, never returns, leaving the hosting application stuck forever: mosquitto_loop_stop() remains blocked on pthread_join() call, because the network thread does not exit even if mosquitto_disconnect() has been called prior to mosquitto_loop_stop().

Issue seen in lib 1.6.12 in threaded mode on Linux - but, by inspecting the code, I would say that this issue affects current master's head, too.

I analyzed the runtime behavior by adding logs to the lib and found out why this is happening.
Please consider this sequence:
1.. network thread is blocked (for some seconds) inside net__socket_connect(blocking=true) - called by mosquitto__reconnect() (which is in turn called by mosquitto_loop_forever()).
This happens e.g. if broker is currently not reachable anymore, so the thread is trying to reconnect - thus net__socket_connect() waits for connection establishment.
2.. at the very same time, application calls mosquitto_disconnect(), which sets client state to mosq_cs_disconnected (which is the condition checked by the network thread to break its loop and exit). A good way to call mosquitto_disconnect() at the right moment to trigger the issue is to add logs before and after the net__socket_connect() (see step above).
3.. then, network thread returns from the net__socket_connect() call, with rc != 0 (broker is still unreachable) - and so client state is reset to mosq_cs_connect_pending:

  	{
		rc = net__socket_connect(mosq, mosq->host, mosq->port, mosq->bind_address, blocking);
	}
	if(rc>0){
		mosquitto__set_state(mosq, mosq_cs_connect_pending);
		return rc;
	}

this irreparably overrides the exit request by API of step 2.
4.. as a consequence, network thread does not exit (state is not mosq_cs_disconnected anymore)
5.. application now calls mosquitto_loop_stop(force=false), which remains stuck forever on the pthread_join() call - thread is not breaking its loop.

As a real world example, steps 2+5 are a very normal application shutdown sequence, which then may cause application hang while exiting with broker currently unreachable.

Personal thought: I think that the tecnique of setting the state from API to make the thread exit is very risky, given the continuos re-set of the state performed by the network thread on itself. Indeed, I suspect that the above sequence may not be the only problematic one.
A different method would be required to be sure that the network thread always exits when requested. Since mosquitto has already a socktpair for application thread to network thread communication, an idea would be to use that to signal the exit request (e.g. by writing a "special value" byte - currently the lib writes always a zero byte value). Then, when reading from mosq->sockpairR in mosquitto_loop() and interruptible_sleep(), the network thread may check the read byte value and signal "exit from API request" condition to mosquitto_loop_forever() with a new specific rc value - to break its loop.
Alternatively, a global "thread exit requested" flag may be set by API and read by the thread, in a mutexed context.

@ralight
Copy link
Contributor

ralight commented Aug 15, 2022

Thank you for the description, I believe this is now fixed.

@p-luke
Copy link
Author

p-luke commented Nov 14, 2022

Hi Roger,
I see the fix in 2.0.15, thanks. Just an additional, probably minor, note: if the network thread is sleeping inside interruptible_sleep() (e.g. because of reconnection retry delay), mosquitto_loop_stop() will be blocked on the pthread_join() call until interruptible_sleep() returns on its own; this would cause the application to be blocked for various seconds, depending on the selected reconnection backoff interval.
Sending the usual dummy byte into the socketpair in mosquitto_disconnect_v5() would awake the network thread and let it see the exit request immediately.
Just a thought.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 19, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants