New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mosquitto thread does not exit if currently reconnecting #2545
Comments
Thank you for the description, I believe this is now fixed. |
Hi Roger, |
Hi, I found a scenario in which the mosquitto_loop_stop(), called with force=false, never returns, leaving the hosting application stuck forever: mosquitto_loop_stop() remains blocked on pthread_join() call, because the network thread does not exit even if mosquitto_disconnect() has been called prior to mosquitto_loop_stop().
Issue seen in lib 1.6.12 in threaded mode on Linux - but, by inspecting the code, I would say that this issue affects current master's head, too.
I analyzed the runtime behavior by adding logs to the lib and found out why this is happening.
Please consider this sequence:
1.. network thread is blocked (for some seconds) inside net__socket_connect(blocking=true) - called by mosquitto__reconnect() (which is in turn called by mosquitto_loop_forever()).
This happens e.g. if broker is currently not reachable anymore, so the thread is trying to reconnect - thus net__socket_connect() waits for connection establishment.
2.. at the very same time, application calls mosquitto_disconnect(), which sets client state to mosq_cs_disconnected (which is the condition checked by the network thread to break its loop and exit). A good way to call mosquitto_disconnect() at the right moment to trigger the issue is to add logs before and after the net__socket_connect() (see step above).
3.. then, network thread returns from the net__socket_connect() call, with rc != 0 (broker is still unreachable) - and so client state is reset to mosq_cs_connect_pending:
this irreparably overrides the exit request by API of step 2.
4.. as a consequence, network thread does not exit (state is not mosq_cs_disconnected anymore)
5.. application now calls mosquitto_loop_stop(force=false), which remains stuck forever on the pthread_join() call - thread is not breaking its loop.
As a real world example, steps 2+5 are a very normal application shutdown sequence, which then may cause application hang while exiting with broker currently unreachable.
Personal thought: I think that the tecnique of setting the state from API to make the thread exit is very risky, given the continuos re-set of the state performed by the network thread on itself. Indeed, I suspect that the above sequence may not be the only problematic one.
A different method would be required to be sure that the network thread always exits when requested. Since mosquitto has already a socktpair for application thread to network thread communication, an idea would be to use that to signal the exit request (e.g. by writing a "special value" byte - currently the lib writes always a zero byte value). Then, when reading from mosq->sockpairR in mosquitto_loop() and interruptible_sleep(), the network thread may check the read byte value and signal "exit from API request" condition to mosquitto_loop_forever() with a new specific rc value - to break its loop.
Alternatively, a global "thread exit requested" flag may be set by API and read by the thread, in a mutexed context.
The text was updated successfully, but these errors were encountered: