Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Message delivery over the websockets is too slow #411

Closed
arminbaljic opened this issue Mar 30, 2017 · 15 comments
Closed

Message delivery over the websockets is too slow #411

arminbaljic opened this issue Mar 30, 2017 · 15 comments
Labels
Status: Blocked Another issue needs to be resolved first

Comments

@arminbaljic
Copy link

I am having an issue with MQTT message delivery over websockets. On the server side, I have been using mosquitto broker version 1.4.8. installed on an AWS t2.medium server instance

with the following configuration:

# basic
pid_file /var/run/mosquitto.pid
listener 1883
listener 9883
protocol websockets
max_inflight_messages 0
max_queued_messages 0
allow_anonymous false

# persistence
persistence true
persistence_file mosquitto.db
persistence_location /var/lib/mosquitto/
persistent_client_expiration 14d
autosave_interval 300

# logging
log_type all
log_timestamp true
log_dest file /var/log/mosquitto/mosquitto.log

# auth
auth_plugin /usr/lib/mosquitto-auth-plugin/auth-plugin.so
auth_opt_backends mysql
auth_opt_host 127.0.0.1
auth_opt_port 3306
auth_opt_dbname mydbname
auth_opt_user myusername
auth_opt_pass mypassword
auth_opt_userquery SELECT tbl_mqtt_users.password FROM tbl_mqtt_users WHERE tbl_mqtt_users.username = '%s'
auth_opt_superquery SELECT COUNT(*) FROM tbl_mqtt_users WHERE tbl_mqtt_users.username = '%s' AND tbl_mqtt_users.super = 1
auth_opt_aclquery SELECT tbl_mqtt_acls.topic FROM tbl_mqtt_acls, tbl_mqtt_users WHERE (tbl_mqtt_users.id = tbl_mqtt_acls.mqtt_user_id) AND (tbl_mqtt_users.username = '%s') AND (tbl_mqtt_acls.read_write >= %d)

After the data processing server publishes about 900 messages to a single subscriber (with QOS1). This is where I have noticed large latency (about 30 seconds) in message delivery.
This only happens with clients that are using websockets. If I subscribe with the MQTT.fx client (1883 port) then the message delivery is almost instant. Please see timings below:

[9:59:01] - Server published the last message to broker
[9:59:01] - MQTT.fx client received last message
[9:59:32] - Browser (Paho client library) received the last message

Additionally, please notice that I have tried the following: different client library (MQTT.js from https://github.com/mqttjs), MQHive web client, replaced all processing in onMessageReceived client method with the console output, upgraded mosquitto broker to 1.4.11 version. All with the same results. Any thoughts about this?

@ibrahimss
Copy link

+1

@gdraynz
Copy link

gdraynz commented Apr 26, 2017

I encountered this using Mosquitto on alpinelinux. It was fixed when I upgraded libwebsockets to a more recent version, you should try the same.

@jacmet
Copy link
Contributor

jacmet commented Jun 1, 2017

@gdraynz can you add a bit more details about what libwebsockets version fixed the issue for you? I still see it with v2.1.1 here

@jacmet
Copy link
Contributor

jacmet commented Jun 1, 2017

I reported a similar issue to the mailing list last month, so far without any feedback:
https://dev.eclipse.org/mhonarc/lists/mosquitto-dev/msg01505.html

@gdraynz
Copy link

gdraynz commented Jun 1, 2017

I just tried to reproduce the environment on which I encountered this issue but could not reproduce it again...
Maybe it was completely unrelated, sorry if my comment was misleading.

@ralight
Copy link
Contributor

ralight commented Jun 19, 2017

Could you please give the latest fixes branch a try with regards to websockets performance?

@rrichardson
Copy link

rrichardson commented Jun 24, 2017

OK. I did some benchmarking. I started with the fixed branch and I'm glad I did, otherwise I'd be pretty sure there was an error in my benchmarking code.

The tl;dr :

  • Current master is unusable : about 5 publishes per second per client
  • The fixes branch is really freaking fast : about 3,400 publish round-trips per second and 23,800 receives per second for 7 concurrent processes

Overview of the test. I stood up 7 processes that each subscribed to '+', and published to a string comprised of their pid. Every time they received a message on the topic of their pid, they would publish again, they count all received messages, as well as their number of publishes. )

Test machine

A 12 core model name : Intel(R) Core(TM) i7-6850K CPU @ 3.60GHz
with 32 GB of RAM.

mosquitto build

A docker container running debian stretch. Dependencies provided by apt.
I ran the docker container with --net=host to avoid that annoying docker-proxy.

Versions:

        linux-vdso.so.1 
        libdl.so.2 
        libm.so.6 
        librt.so.1 
        libssl.so.1.1
        libcrypto.so.1.1 
        libuuid.so.1 
        libwebsockets.so.8 
        libc.so.6 
        /lib64/ld-linux-x86-64.so.2
        libpthread.so.0 

Node.js test code

var mqtt = require('mqtt')
var now = require('performance-now')
var pubcount = 0
var recvcount = 0
var recvstart
var recvend
var pubstart
var pubend
var topic = '' + process.pid

console.log("starting on topic ", topic)

var client  = mqtt.connect('ws:https://localhost:1883')

client.on('connect', function () {
  client.subscribe(topic)
  client.subscribe('+')
  pubstart = now();
  recvstart = pubstart;
  client.publish(topic, 'Hello mqtt')
})

client.on('message', function (intopic, message) {
  // message is Buffer
  // console.log(message.toString())
  if (intopic === topic) {
      if (++pubcount >= 10000) {
        pubcount = 0;
        //client.end()
        pubend = now()
        console.log('pubs per sec :', 1000 * 10000 / (pubend - pubstart));
        pubstart = pubend
      }
    client.publish(topic, 'Hello mqtt')
  }
  if (++recvcount >= 50000) {
    recvcount = 0
    recvend = now()
    console.log('receives per sec :', 1000 * 50000 / (recvend - recvstart) );
    recvstart = recvend
  }
})

sample output (single process)


 pubs per sec : 3106.4102248359923
receives per sec : 22454.7461081153
pubs per sec : 4459.682871612058
receives per sec : 22417.661698455944
receives per sec : 21027.4445251877
pubs per sec : 3661.641929219182
receives per sec : 19916.397257892866
pubs per sec : 3024.9927115144997
receives per sec : 21557.757819827882
pubs per sec : 3071.0019611949774
receives per sec : 23176.401898833457
receives per sec : 23386.12239555303
pubs per sec : 4142.496938604661
receives per sec : 21331.130904875154
pubs per sec : 3326.2016775357574
receives per sec : 21621.166637478076
pubs per sec : 3450.3092089998786
receives per sec : 23888.589706333347
pubs per sec : 3802.1589219557436
receives per sec : 20501.7997041529
receives per sec : 22080.52001398151
pubs per sec : 3300.583826113492
receives per sec : 24548.979773147486
pubs per sec : 3933.692096804666
receives per sec : 23061.25912112394

top showing mem and cpu

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
15328 rick      20   0   52264  21920   4904 R 100.0  0.1  13:27.09 mosquitto
 4124 rick      20   0 1022796  83316  16768 R  95.3  0.3   1:19.27 node
 3991 rick      20   0 1023788  81468  16672 R  92.4  0.2   1:26.98 node
 4054 rick      20   0 1023948  83732  16968 R  92.4  0.3   1:22.85 node
 4233 rick      20   0 1022716  81464  16868 R  91.9  0.2   1:12.69 node
 4194 rick      20   0 1022820  81336  16788 R  90.1  0.2   1:15.04 node
 4155 rick      20   0 1022996  81632  16768 R  89.5  0.2   1:18.42 node
 4093 rick      20   0 1023360  80908  16768 R  89.0  0.2   1:20.72 node

@ralight
Copy link
Contributor

ralight commented Jun 27, 2017

@rrichardson awesome, thanks for confirming.

@ralight
Copy link
Contributor

ralight commented Jul 18, 2017

This is fixed in 1.4.13.

@rrichardson as you've hopefully still got your test set up, I don't suppose you'd mind trying the test again with the version at #495 ?

@PierreF PierreF added the Status: Blocked Another issue needs to be resolved first label Jan 5, 2018
@MRazvan
Copy link

MRazvan commented Jul 15, 2018

I can provide more information on this, had the same problems, large packets > 4k had a huge delay. 1.5Mb took 40 seconds to arrive on the client over websockets.

I finally found the problems, and there are multiple problems.

  1. Send the packet to libwebsocket. Mosquitto just sends the full packet using libwebsocket_write, this is know to be a performance issue for libwebsockets, due to how it will do the send. Internally it will create a new buffer and do the split into multiple chunks. So it will take multiple calls to libwebsocket_service to actually send the packet.
  2. It takes on average ~10-90ms for libwebsocket to be called.

Meaning a package of size 1.5Mb will be split into multiple packets by libwebsocket and those will be sent on average every 90ms .... each time the mosquitto loop runs.

I solved these in two ways.

  • First change the websockets.c to split the packet beforehand and send small buffers to libwebsockets
    https://gist.github.com/overlordtm/28566cfe818cc2b2e8c9
    this will at least fix a new buffer allocation for the message, it will not fix the need to do multiple calls to service
  • Second do the call to libwebsocket_service multiple times on each mosquitto loop. I don't know how to actually make sure the full packet was send so this is just a configurable value. Ideally it will loop until the packet was sent

The performance improvement was huge.

`-------------------------------------------------------------------------------
Multiple calls to libwebsocket_service per mosquitto loop with packet split
1531637911: LWSSTATS_C_CONNECTIONS: 1
1531637911: LWSSTATS_C_API_CLOSE: 2
1531637911: LWSSTATS_C_API_READ: 3
1531637911: LWSSTATS_C_API_LWS_WRITE: 267
1531637911: LWSSTATS_C_API_WRITE: 269
1531637911: LWSSTATS_C_WRITE_PARTIALS: 2
1531637911: LWSSTATS_C_WRITEABLE_CB_REQ: 270
1531637911: LWSSTATS_C_WRITEABLE_CB_EFF_REQ: 267
1531637911: LWSSTATS_C_WRITEABLE_CB: 267
1531637911: LWSSTATS_B_READ: 492
1531637911: LWSSTATS_B_WRITE: 1048872
1531637911: LWSSTATS_B_PARTIALS_ACCEPTED_PARTS: 0
1531637911: LWSSTATS_MS_SSL_CONNECTIONS_ACCEPTED_DELAY: 0ms
1531637911: LWSSTATS_MS_SSL_RX_DELAY: 0ms
1531637911: LWSSTATS_MS_WRITABLE_DELAY: 16980us
1531637911: LWSSTATS_MS_WORST_WRITABLE_DELAY: 2998us
1531637911: Avg writable delay: 63us


-> Single call to libwebsocket_service per mosquitto loop
1531638011: LWS internal statistics dump ----->
1531638011: LWSSTATS_C_CONNECTIONS: 2
1531638011: LWSSTATS_C_API_CLOSE: 3
1531638011: LWSSTATS_C_API_READ: 9
1531638011: LWSSTATS_C_API_LWS_WRITE: 167
1531638011: LWSSTATS_C_API_WRITE: 167
1531638011: LWSSTATS_C_WRITE_PARTIALS: 0
1531638011: LWSSTATS_C_WRITEABLE_CB_REQ: 175
1531638011: LWSSTATS_C_WRITEABLE_CB_EFF_REQ: 161
1531638011: LWSSTATS_C_WRITEABLE_CB: 160
1531638011: LWSSTATS_B_READ: 1001
1531638011: LWSSTATS_B_WRITE: 635283
1531638011: LWSSTATS_B_PARTIALS_ACCEPTED_PARTS: 0
1531638011: LWSSTATS_MS_SSL_CONNECTIONS_ACCEPTED_DELAY: 0ms
1531638011: LWSSTATS_MS_SSL_RX_DELAY: 0ms
1531638011: LWSSTATS_MS_WRITABLE_DELAY: 14993056us
1531638011: LWSSTATS_MS_WORST_WRITABLE_DELAY: 156580us
1531638011: Avg writable delay: 93706us
`
To aggregate the above data

  • Fix -> 1Mb -> 16ms
  • Original -> 600k -> 14 seconds

I'm on windows and the results are from
mosquitto 1.4.7
libwebsockets 3.0.0

I can do a pull request for this, however I am not sure what impact this has on anything else since I don't have tests setup. Also the point 2 for the fix should be loop until packet is sent, and I don't know how to do that for now.

The new configuration options are

`
#the number of loops to run libwebsockets_service during each mosquitto loop
websocket_loop_count 100

#the size of the chunk to send to libwebsockets
websocket_packet_size 4096
`

PS. For anyone not wanting to do the full fix and wants to do a quick test, in loop.c find the call to libwebsocket_service and wrap it in a for loop.

@MRazvan
Copy link

MRazvan commented Jul 15, 2018

@arminbaljic I don't know if you found the issue or not, or if you are still using mosquitto. If you can, please do a quick test with the finding above?
Meaning the quick fix at the end of the post.

@arminbaljic
Copy link
Author

Hi @MRazvan , unfortunately, due to performance issues and some other bugs we had to change MQTT broker. The client was becoming increasingly agitated by these issues and we decided to switch to RabbitMQ with web MQTT plugin.

@aalibasic
Copy link

Has anyone found a fix to this issue?

@ralight
Copy link
Contributor

ralight commented Aug 28, 2019

@aalibasic this is a separate issue to the one you are seeing. This one was due to the websockets listeners not being serviced properly and should be closed. I believe the issue with large messages is entirely different.

@ralight
Copy link
Contributor

ralight commented Sep 24, 2019

Closing because the original issue is addressed.

@ralight ralight closed this as completed Sep 24, 2019
@lock lock bot locked as resolved and limited conversation to collaborators Dec 23, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Status: Blocked Another issue needs to be resolved first
Projects
None yet
Development

No branches or pull requests

9 participants