Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perform ACK completion after the callback to avoid message loss #2906

Open
wants to merge 5 commits into
base: develop
Choose a base branch
from

Conversation

CIPop
Copy link

@CIPop CIPop commented Sep 25, 2023

Then please check the following list of things we ask for in your pull request:

  • Have you signed the Eclipse Contributor Agreement, using the same email address as you used in your commits?
  • Do each of your commits have a "Signed-off-by" line, with the correct email address? Use "git commit -s" to generate this line for you.
  • If you are contributing a new feature, is your work based off the develop branch?
  • If you are contributing a bugfix, is your work based off the fixes branch?
  • Have you added an explanation of what your changes do and why you'd like us to include them?
  • Have you successfully run make test with your changes locally? (Ran ctest -E broker as broker baseline testing for develop hangs.)

Abstract

The current Mosquitto client implementation might lose QoS1 and QoS2 messages if the application crashes during the hand-off callback.

Based on MQTT section 4.4 Message retry delivery, the application (not the MQTT library) takes ownership of the message:

  • For QoS 1: before sending the PUBACK.
  • For QoS 2: before sending the PUBREC

Behavior changes

The current implementation, on receiving a PUB:

  • QoS 1: sends the PUBACK, calls the callback
  • QoS 2: sends PUBREC, receives PUBREL, sends PUBCOMP, calls the callback

With this change, on receiving a PUB:

  • QoS 1: calls the callback, sends the PUBACK
  • QoS 2: calls the callback, sends PUBREC, receives PUBREL, sends PUBCOMP

Notes

  • The reordering of ACKs might be a breaking change for certain applications. If the callback crashes the client, the callback will be called more than once, with the same message (even for QoS2), until both callback execution and ACK sending is successful.
  • The reordering allows applications to crash while processing the callback without losing the message: the broker is supposed to re-send a QoS1 message after the subscriber reconnects, if a PUBACK was not received. Similarly, the broker is supposed to re-send a QoS2 message after the subscriber reconnects if a PUBREC was not received.
  • Because the library does not expose a receive ACK callback (PUBACK, PUBREC, etc), the application would not know when these acknowledgement messages are sent. The message loop must run for a while to ensure all ACKs have been drained from the queue. (See test changes for details.)
     
    Fixes Client PUB completion (PUBACK, PUBREC/PUBCOMP) should be performed after the application callback. #2884

lib/handle_publish.c Outdated Show resolved Hide resolved
lib/handle_publish.c Outdated Show resolved Hide resolved
lib/handle_publish.c Outdated Show resolved Hide resolved
lib/handle_publish.c Outdated Show resolved Hide resolved
lib/handle_publish.c Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants