Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add MOSQ_OPT_DELAYED_ACK option #1932

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from

Conversation

linuxbasic
Copy link

Signed-off-by: Linus Basig [email protected]
Signed-off-by: Fabrizio Lazzaretti [email protected]

Thank you for contributing your time to the Mosquitto project!

Before you go any further, please note that we cannot accept contributions if
you haven't signed the Eclipse Contributor Agreement.
If you aren't able to do that, or just don't want to, please describe your bug
fix/feature change in an issue. For simple bug fixes it is can be just as easy
for us to be told about the problem and then go fix it directly.

Then please check the following list of things we ask for in your pull request:

  • Have you signed the Eclipse Contributor Agreement, using the same email address as you used in your commits?
  • Do each of your commits have a "Signed-off-by" line, with the correct email address? Use "git commit -s" to generate this line for you.
  • If you are contributing a new feature, is your work based off the develop branch?
  • If you are contributing a bugfix, is your work based off the fixes branch?
  • Have you added an explanation of what your changes do and why you'd like us to include them?
  • Have you successfully run make test with your changes locally?
    • 01-keepalive-pingreq.py fails, but it also fails on master

This PR adds a method mosquitto_delay_puback. After calling mosquitto_delay_puback the library will delay the PUBACK message required for QoS 1 message transfers until after the on_message callback returns.

This change ensures the message does not get lost if the client crashes during the processing of the message.

If the mosquitto_delay_puback is not called, the previous behavior (PUBACK before calling on_message) applies.

related issues:

Our use-case is a messaging middleware that forwards the messages to other systems and we need to maintain the "At Least Once" delivery guarantee of QoS 1.

Signed-off-by: Linus Basig <[email protected]>
Co-authored-by: Fabrizio Lazzaretti <[email protected]>
Signed-off-by: Fabrizio Lazzaretti <[email protected]>
@karlp
Copy link
Contributor

karlp commented Dec 6, 2020

When did this change? I thought the library had always been pubacking after teh callback?

@ralight
Copy link
Contributor

ralight commented Dec 6, 2020

@karlp I've checked, and it's been like this forever.

I'm mostly inclined to say that we should just change the behaviour (and for QoS 2) without an option. For people who have quick callbacks it will make no difference, and those with long callbacks there If would only potentially be a problem with non-compliant brokers.

If I did add it as an option, it would be with mosquitto_int_option() though, there is no need for a new function I don't think.

@linuxbasic linuxbasic changed the base branch from master to develop December 7, 2020 08:40
@linuxbasic
Copy link
Author

When did this change? I thought the library had always been pubacking after teh callback?

It did not surprise me that the library acknowledges before the message is handed over to the application; most libraries do that. I think the reason for that is that the specification is not very explicit about this situation.

The MQTT specifications says:

In the QoS 1 delivery protocol, the receiver MUST respond with a PUBACK packet containing the Packet Identifier from the incoming PUBLISH packet, having accepted ownership of the Application Message

Figure 4.2 – QoS 1 protocol flow diagram, non-normative example goes into more detail:

The receiver does not need to complete delivery of the Application Message before sending the PUBACK. When its original sender receives the PUBACK packet, ownership of the Application Message is transferred to the receiver.

Personally, I'm also in favor of changing the default behavior. As @karlp's reaction shows the "acknowledging before processing" behavior is very counter-intuitive and the specification does not provide any good reason to use it.

Should I change the PR to send PUBACK and PUBCOMP after on_message was called?

@karlp
Copy link
Contributor

karlp commented Dec 7, 2020

Hrm, I thought the paho python libs didn't ack until the callback returned, or something I was working with at least, I was explicitly demoing delays in the handlers. I'd be perfectly fine with switching the default, but this will likely lead to beahviour changes as broken clients will now cause unacked packets to build up in varous places, whereas right now, a client that crashes doesn't matter.

@linuxbasic
Copy link
Author

linuxbasic commented Dec 7, 2020

We were using the paho.mqtt.c library before and it has the same "issue": https://github.com/eclipse/paho.mqtt.c/blob/master/src/MQTTProtocolClient.c#L326-L332

The same for the patho.mqtt.python: eclipse/paho.mqtt.python#348

There are libraries that implement a delayable acknowledgment option like paho.mqtt.java (https://github.com/eclipse/paho.mqtt.java/blob/master/org.eclipse.paho.mqttv5.client/src/main/java/org/eclipse/paho/mqttv5/client/IMqttAsyncClient.java#L822-L833) and some that delay it per default like mqtt.js (mqttjs/MQTT.js#697).

I think pending acknowledgments shouldn't be a big problem. They can already occur if there is an issue with the network layer or the client crashes before he reads the PUBLISH message from the network buffer. In any way, as soon as the connection's Keep Alive period runs out, these pending acknowledgments should be cleaned up.

@ralight
Copy link
Contributor

ralight commented Dec 8, 2020

I think on balance the best thing to do is add it as an option otherwise if it does cause problems there is nothing users can do other than downgrade, if it does cause a problem. So please update this to use mosquitto_int_option() with say MOSQ_OPT_DELAYED_ACK as the option. Set it to off by default, and extend it to cover QoS 2 as well.

Then we go:

2.1: Option available, but off by default
2.2: Option enabled by default
3.0: Option removed?

@linuxbasic
Copy link
Author

linuxbasic commented Dec 9, 2020

I implemented the MOSQ_OPT_DELAYED_ACK option for QoS 1.

Regarding QoS 2:

(for the following thoughts I assumed message__queue() stores the message in memory and that the message will be lost in the case of the process crashing. Please correct me if I'm wrong.)

With the current implementation of QoS 2, delaying the sending of the PUBCOMP until after the on_message exits would not have any effect because there is a mechanism in place that prevents on_message to be called more than once (see https://github.com/eclipse/mosquitto/blob/develop/lib/handle_pubrel.c#L101-L104).

As far as I see we have two options:

  1. leave it like it is ("At Most Once" processing guarantee if the process crashes after the initial PUBLISH is received and on_message is called)
  2. move the on_message callback call into the handler of the PUBLISH message (like the MQTT specs suggest) and implement the delayed ack logic there.
  • with delayed ack: this would result in an "At Least Once" processing guarantee if we delay the acknowledgment (if the process crashes and loses the msg-id before PUBREC is sent).
  • without delayed ack: this would result in a "0+" processing guarantee if we send PUBREC before calling on_message (0 in case the process crashes after PUBREC is sent and before on_message called and "1+" if the process crashes after on_message is called and at the same time there is a network issue that prevents the arrival of the PUBREC).

I personally am in favor of option 1 because QoS 2 sucks. the MQTT specification assumes that the sender and the receiver never crash or lose their state. Even if we go with option 2 we can't fulfill the promise of an "Exactly Once" processing guarantee in real-world conditions and option 1 is already as good as it gets if we ignore the possibility of a process crashing at the wrong moment.

@linuxbasic
Copy link
Author

@karlp @ralight What do you think?

@linuxbasic linuxbasic changed the title add mosquitto_delay_puback option add MOSQ_OPT_DELAYED_ACK option Dec 21, 2020
@linuxbasic linuxbasic force-pushed the feat/delayed-puback branch 2 times, most recently from 9f834df to feeca2b Compare December 27, 2020 12:34
Signed-off-by: Linus Basig <[email protected]>
Co-authored-by: Fabrizio Lazzaretti <[email protected]>
Signed-off-by: Fabrizio Lazzaretti <[email protected]>
@Type1J
Copy link

Type1J commented Jan 11, 2023

Any news on when this might get reviewed/merged?

@Lazzaretti
Copy link

Any news on when this might get reviewed/merged?

@ralight any estimations?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants