Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mosquitto loop exits on TCP handshake failure for TLS #2825

Open
p-luke opened this issue May 31, 2023 · 3 comments
Open

mosquitto loop exits on TCP handshake failure for TLS #2825

p-luke opened this issue May 31, 2023 · 3 comments

Comments

@p-luke
Copy link

p-luke commented May 31, 2023

mosquitto v2.0.15
platform: linux
Using client library in threaded async mode with TLS connection to broker.

If, during the initial TCP connection setup, the server responds with a TCP RST (e.g. broker application down),
then quite often the mosquitto thread exits, thus requiring a manual re-start of the client,
which is unfeasible because the lib does not even notify the host application about this premature exit.

The exit point is in mosquitto_loop_forever() when dealing with mosquitto_loop() retcode:
https://github.com/eclipse/mosquitto/blob/master/lib/loop.c#L276
(rc is MOSQ_ERR_ERRNO and errno is EPROTO).

It seems to me that everything is originated from net__handle_ssl() (called by net__read()):
the SSL error SSL_ERROR_SYSCALL (which is returned in this scenario, since the underlying TCP socket got a ECONNREFUSED) is not handled anymore (it was, before the commit mentioned here below),
so the function returns -1 and sets errno to EPROTO, which triggers the above loop exit condition.

This bug could be possibly introduced by this commit: e979a46

@p-luke
Copy link
Author

p-luke commented Jun 5, 2023

I can confirm that this commit e979a46 introduces this issue (because of SSL_ERROR_SYSCALL not handled anymore during connection setup).
I successfully tested the same scenario with v2.0.14, no problems there.

@acode-x
Copy link

acode-x commented Jun 12, 2023

Yes, we also face the same issue. Currently we downgraded to 2.0.14.
Waiting for issue to be fixed.

@rickvargas
Copy link

On SSL_get_error man page, it says:

On an unexpected EOF, versions before OpenSSL 3.0 returned SSL_ERROR_SYSCALL, nothing was added to the error stack, and errno was 0. Since OpenSSL 3.0 the returned error is SSL_ERROR_SSL with a meaningful error on the error stack (SSL_R_UNEXPECTED_EOF_WHILE_READING). This error reason code may be used for control flow decisions (see the man page for ERR_GET_REASON(3) for further details on this).

As stated on other issues, such as #2767 , seems like the library has broken the reconnection for EOF on e979a46 . It is broken from v2.0.15 to v2.0.18.

Taking a look at the library implementation, on v2.0.18 under lib/net_mosq.c at net__handle_ssl, there could exist some checks for non fatal SSL server errors (SSL_ERROR_ZERO_RETURN, SSL_ERROR_NONE, SSL_ERROR_SSL - SSL_ERROR_SYSCALL and errno 0 for OpenSSL < 3.0.0 - and possible others?) that the client could return MOSQ_ERR_CONN_LOST, what would trigger the automatic reconnection. Currently if a SSL error happens on server side, like proxy down, EOF, empty reply, firewall rule added, the SSL goes to errno EPROTO, exiting the mosquitto_loop_forever as fatal error.

Treating these non fatal SSL errors as MOSQ_CONN_LOST and considering the fatal SSL errors as MOSQ_ERR_TLS could potentially allow to remove the check by 'errno == EPROTO' on mosquitto_loop_forever, what I find a trick to handle on async implementations, but unsure if it is used for other purpose that I'm not aware.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants