New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No messages delivered after websocket client takeover #1489
Comments
Thanks for the detailed description. I haven't checked this yet, but I'm sure you're right. Websockets is a right pain. I don't think what you've proposed is correct. Adding the client to the disused list (using I'll take a proper look at it soon. |
Good point, hadn't considered that. That makes sense and I could definitely see it causing issues. Not sure if it's of any help, but I've done some more testing myself and it seems that I can only reproduce this problem when the web socket listener is using SSL: If I test with regular web sockets then everything works as expected. I can only speculate as to why this is, but it may be related to the fact that it takes a while for a SSL connection to be properly closed? I've created a small test script (Python 3) that can always reproduce the issue for me, when I have SSL enabled: https://gist.github.com/bverhoeven/00b638487df9f0a1b24a58d6954ffaaa#file-issue-1489-py, I've also included the test configuration I've used. Example output:
Quick update: I can also easily reproduce it using the public test broker at test.mosquitto.org, so the issue doesn't seem to be limited to just my setup: server_host = 'test.mosquitto.org'
server_port = 8081
server_use_tls = True |
I've also reproduced it thanks to your test file, that's very much appreciated. |
Experimental fix (I'm clocking off for this evening, it works but I haven't tested it further):
|
At first glance that does seem to fix the issue, but by the looks of it only when connecting with 'clean session' set? A quick test on my side confirms that if I set Edit: Would it be bad if this logic was applied to all clients regardless of the "clean_start" flag? I can confirm that the issue is that the first web socket connection doesn't get closed (and thereby cleaned up/freed) until after the second client has managed to subscribe:
I've thought of ignoring clients marked for disconnect in the subscription logic (where I figured it probably checks if the subscription already exists), but that all seems a bit too far-fetched and feels hacky. If we'd always call The fact that the old connection still lingers on for a few more seconds doesn't seem to be harmful. It also seems that, as long as we don't call Again, not familiar with mosquitto internals, or too familiar with the specs, so pardon my ignorance. |
Interesting. For me, with your patch applied, I'm still seeing the issue with clean_start == false: clean_session=True
clean_session=False
It does work fine if I apply the |
What version of libwebsockets are you using? |
I'm using the |
Hmm, very odd. I have tried on a few different systems and can't reproduce it. I am really surprised that you're seeing the problem, it would mean that the whole concept of taking over an existing clean session=false session was broken. |
This can occurs after one client takes over a previous connection. Closes #1489. Thanks to Bas Verhoeven.
I know you're not seeing it completely fixed, but that is the commit with the above patch in for now. |
Okay, never mind any of that... I thought my source tree was clean, but for some reason my helpful editor decided to use an older version of mosquitto/src/handle_connect.c Lines 128 to 161 in 05171b2
Also, my apologies for the earlier confusion: I for some reason thought that subscriptions would not be part of a 'session', and thus would not be affected by 'clean session'. After reading the v3.1.1 standard, it turns out that subscriptions are in fact very much part of the session and everything should indeed be working exactly as you described. With a clean source tree and your patch applied all issues appear to be resolved. Thanks! |
I've been running a version with this patch applied in production for the past week and all seems to be working as expected. Thanks again for your quick response and again my apologies for the confusion I caused by accidentally stripping out pieces of code. |
Using Mosquitto 1.6.7 (checked both 'master' and 'fixes' branches), I'm seeing an issue with the web socket 'take over' behavior where, once a client has taken over an old session, it doesn't get any messages if it subscribes to the same topics as the old session.
Situation
A publisher is writing data to
example/example1
every second.A websocket client is connected with client_id
example
, subscribed toexample/example1
(qos 1), receiving all those messages just fine. After a while that connection is somehow lost without a clean shutdown (network issues, client crash, bad link, etc.) and the client reconnects.Mosquitto then correctly prints "Client example already connected, closing old connection".
The new client happily subscribes to the
example/example1
topic again, which is acknowledged withSUBACK
, all seemingly fine. As of this point, however, the client never gets the messages that are published toexample/example
until it reconnects again.There are no errors in mosquitto's logs.
Reproducing
I think I've found an easy way to reproduce this by having:
example
,(while true; do mosquitto_pub -h localhost -t example/example1 -m pong; sleep 1; done) &
And then connecting my own client using
mosquitto_sub
with theexample
client id:(No data is received).
The old client is disconnected. No data is ever received for the new session. If I then, however, close this connection, and try to connect again, it'll immediately work:
It'll also immediately work if:
example/+
This makes it seem as if the old session isn't cleaned up properly and mosquitto still "sees" the old session, and all it's subscriptions. It's almost as if it feels that "the client with id 'example'" is already subscribed to "example/example1" and thus doesn't feel a need to actually subscribe to it "again".
Possible fix
After some debugging I've tracked this down to the way that websocket clients are cleaned up in a takeover situation, as last changed by 5088202.
This issue seems similar to #1227, but the fix there doesn't seem to have helped and I'm wondering if it's doing what it should be doing:
mosquitto/src/loop.c
Lines 678 to 686 in b622aae
If I replace this call to
context__add_to_disused(db, context);
this seems to be resolved: the old client is still disconnected, but, on new connection, re-subscribing works and the client gets the messages as expected.The rationale behind this change is the fact that
context__disconnect
also calls this function, which then calls thecontext__remove_from_by_id
referenced in the code above, but also seems to free up some more structures.I've just started with mosquitto so I'm not knowledgeable enough to see if this change would cause any issues. I'd appreciate it if someone with more in-depth knowledge of mosquitto could take a peek.
The text was updated successfully, but these errors were encountered: