Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bridge connection enters a connect-disconnect loop when incomplete QoS 2 publish, and local broker fails to persist for any reason. #57

Closed
ralight opened this issue Mar 15, 2016 · 2 comments

Comments

@ralight
Copy link
Contributor

ralight commented Mar 15, 2016

migrated from Bugzilla #467304
status UNCONFIRMED severity normal in component Mosquitto for 1.4
Reported in version 1.4 on platform PC
Assigned to: Roger Light

On 2015-05-14 02:46:43 -0400, jsaak jsaak wrote:

Bridge connection enters a connect-disconnect loop when incomplete QoS 2 publish, and local broker fails to persist for any reason.

Scenario:

  1. local mosq publishes QoS2 to remote mosq
  2. local mosq dies (fails to persist)
  3. local mosq restarts with "clean_session false"
  4. local mosq reestabilishes bridge connection to remote mosq
  5. remote mosq replies with PUBREC
  6. local mosq does not find the corresponding message in the DB, gives error
  7. local mosq disconnect bridge connection
  8. goto 4.

My proposed solution is that change 6.
If it does not find the mid in the DB reply anyway, with a WARNING.

--- a/mosquitto/lib/read_handle_shared.c
+++ b/mosquitto/lib/read_handle_shared.c
@@ -103,6 +103,10 @@ int _mosquitto_handle_pubrec(struct mosquitto *mosq)
_mosquitto_log_printf(NULL, MOSQ_LOG_DEBUG, "Received PUBREC from %s (Mid: %d)", mosq->id, mid);

    rc = mqtt3_db_message_update(mosq, mid, mosq_md_out, mosq_ms_wait_for_pubcomp);
  • if (rc) {
  • rc = 0;
  • _mosquitto_log_printf(NULL, MOSQ_LOG_WARNING, "Received PUBREC is not in the DB, replying anyway");
  • }
    #else
    _mosquitto_log_printf(mosq, MOSQ_LOG_DEBUG, "Client %s received PUBREC (Mid: %d)", mosq->id, mid);
@hmvp
Copy link

hmvp commented May 19, 2016

We see the same in our production environment with the latest version.

Unfortunately this seems endemic in MQTT implementations: see also eclipse/paho.mqtt.java#27 for the same bug in the java client

Under the right circumstances this happens for most/all acknowledgements so PUBCOMP and PUBACK can probably trigger the same behaviour (Possibly SUBACKs as well)

ralight added a commit that referenced this issue May 19, 2016
Allows message flow to complete where e.g. the broker didn't persist a
partially complete flow.

Thanks to jsaak jsaak and Hiram van Paassen.

Bug: #57
@ralight
Copy link
Contributor Author

ralight commented May 19, 2016

Thanks for the nudge, I believe this is now fixed.

@ralight ralight closed this as completed May 19, 2016
@ralight ralight added this to the 1.4.9 milestone May 19, 2016
ralight added a commit that referenced this issue May 19, 2016
@lock lock bot locked as resolved and limited conversation to collaborators Aug 8, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants