Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Throughput drops disproportionally as number of consumers increases #216

Closed
uvzubovs opened this issue Jul 17, 2016 · 21 comments
Closed

Throughput drops disproportionally as number of consumers increases #216

uvzubovs opened this issue Jul 17, 2016 · 21 comments
Labels
Type: Enhancement A new feature for a minor or major release.

Comments

@uvzubovs
Copy link

I ran a test against Mosquitto 1.4.9 on a modern RedHat VM with 2 cores (I know it uses only 1) and 2GB RAM using Java Paho client with 1 producer and N consumers. Each consumer had unique number and subscribed to /c followed by that number, i.e. /c1, /c2, etc. Producer published 1000 byte message round robin to these topics with qos=0. Under this setup any one message was delivered to only one consumer and all consumers got the same number of messages.

I collected the following throughput numbers:

50,000 messages per second when N=1
25,000 messages per second when N=100
15,000 messages per second when N=200
7,000 messages per second when N=500
3,500 messages per second when N=1000

The messages per second above is the total throughput from the producer perspective, not per consumer. The broker was 100% CPU bound under the test.

This seems like an awfully undesirable degradation. I then published to only 1 topic with 100 subscriptions present (1 consumer got all messages and 99 other consumers did not get anything), and the performance was equally degraded -- 25,000 messages per second. So, it's not the distribution that seems to be the problem but topic to subscription matching?

By comparison, developer edition of IBM MessageSight running on identical VM under exactly the same test setup produced the following numbers:

60,000 messages per second when N=1
42,000 messages per second when N=100
38,000 messages per second when N=200
35,000 messages per second when N=500
35,000 messages per second when N=1000

As the number of subscriptions grew, throughput degraded, which is expected as every subscription must be checked for matching the publication topic string, but only by a bit.

@ralight
Copy link
Contributor

ralight commented Jul 19, 2016

Yes, it's definitely undesirable, although I would point out that MessageSight maybe has got just a touch more resource behind it :)

Could you try the first test using the develop branch of https://github.com/ralight/mosquitto please?

@uvzubovs
Copy link
Author

uvzubovs commented Jul 26, 2016

MessageSight had only 1 more CPU, not 10 to explain 10x better throughput, and there was only 1 producer and 1 consumer in both tests. We will try the 'develop' branch. Did something change recently?

@uvzubovs
Copy link
Author

Tried 'develop' branch (reporting 'mosquitto version 1.4.90') with no difference. Anything else I should check?

@ralight
Copy link
Contributor

ralight commented Jul 27, 2016

Please check that you specifically used the develop branch from the ralight repo, not from the eclipse repo.

By 'resource' I mean that IBM employs probably lots and lots of people to work on MessageSight full time, compared to mosquitto which has one person working on it in limited free time.

@icraggs
Copy link

icraggs commented Jul 27, 2016

In addition to having many more people working on it, MessageSight's pretty much only goal is to be fast, whereas Mosquitto is intended to be small, too. As well as open source, and free :-)

@uvzubovs
Copy link
Author

Ah, I get it now (about 'resources').

How do I check the repo? We went to https://github.com/ralight/mosquitto, selected branch 'develop' from the dropdown, and clicked the download button. The downloaded file was named 'mosquitto-develop.zip'. Is there anything else to check?

@icraggs
Copy link

icraggs commented Jul 27, 2016

It could help to identify where most processing time is being used by using a profiler like gprof.

@uvzubovs
Copy link
Author

Indeed. Last time I used this stuff was early 90's (college). Is that something you could do? I can give you the producer/consumer code, although it's rather trivial -- just have 1 producer publish to 1 topic first with 1 subscription present and then with 100 subscriptions present (99 would get nothing). If not, could you please advise how to get started with the profiler? Do we need to make a debug build?

@icraggs
Copy link

icraggs commented Jul 27, 2016

I was hoping that you would be able to run the profiler and post the information back. It's not hard to do, at least for basic programs: http:https://www.thegeekstuff.com/2012/08/gprof-tutorial

@uvzubovs
Copy link
Author

We'll get going with that. How would we post back the information?

@icraggs
Copy link

icraggs commented Jul 27, 2016

Zip the reports up and attach them as a file, I suggest. You'll need to specify exactly which code level you are using.

@ralight
Copy link
Contributor

ralight commented Jul 27, 2016

@uvzubovs I'm pretty certain where the problem is - the source zip you downloaded has at least a start of a fix for it. If you can compile that and try it with your existing test then hopefully you should already see a big difference without having to worry about the profiler. If you don't see the difference, then that's when it would be worth profiling. I'm afraid I'm not going to be able to look at anything until the end of next week though.

@uvzubovs
Copy link
Author

@icraggs Attached (gprof.zip) please find the profiler output for 2 runs. First run is 1 consumer/subscription/publisher (throughput was 60,000 msgs/sec). Second run was 100 consumers, each with own subscription, so 100 subscriptions, but producer was publishing to only 1 topic -- 99 consumers got nothing (throughput was 25,000 msgs/sec). In both cases 1,000,000 messages were published. Mosquitto was restarted before every run.

Zip contains:

  • mosquitto.out -- whatever Mosquitto output to the console
  • gmon.out -- raw profiler output
  • gmon.txt -- readable profiler output

Please let me know if this is what you were looking for and whether there is anything else I could do.

@ralight We did not see any improvement with the 'develop' version. I am attaching what we installed in case you could verify that we installed what you thought we should have installed.

gprof.zip

mosquitto-develop-20160726.zip

@ralight
Copy link
Contributor

ralight commented Jul 27, 2016

Ah, yes ok. Like I said, the changes were only a start :)

I think the HASH_ITER() call in sub__search() needs to be replaced with multiple HASH_FIND() calls - one for the topic element being sought, and one each for +, #.

@uvzubovs
Copy link
Author

HASH_ITER() in sub__search() or mosquitto_main_loop()? mosquitto_main_loop() seems to be taking much bigger slice of the time. If you want to break up mosquitto_main_loop() into functions, we could profile it again to confirm which part is taking the most time.

@ralight
Copy link
Contributor

ralight commented Aug 8, 2016

Could you please try the new code in the develop branch of https://github.com/ralight/mosquitto ?

If you could post your client code it would be useful.

@uvzubovs
Copy link
Author

Thanks. Initial runs did not show improvement. We will capture profiler output tomorrow.

A colleague of mine was studying your code, and pointed out that it may be not the number of subscriptions that was degrading performance, but the number of connections (sockets). And, in fact, I restructured the test to have a single consumer create 100 subscriptions, and performance hardly degraded compared to having a single subscription (having 100 consumers each with a single subscription degraded performance by 50%). Unfortunately, in our case we would have thousands of clients/connections/sockets each with one or a few subscriptions, so something would need to be optimized in that area if possible.

The client code is a bit involved. I cut out a leaner version. It will not compile in a couple of places, but you should get an idea for what is going on. Please ask if anything's confusing.

MqttConsumer.java.txt
MqttProducer.java.txt
PahoMqttBroker.java.txt

@uvzubovs uvzubovs changed the title Throughput drops disproportionally as number of subscriptions increases Throughput drops disproportionally as number of consumers increases Aug 10, 2016
@uvzubovs
Copy link
Author

uvzubovs commented Aug 10, 2016

Attached is the profiler output for the latest develop branch. There are 2 sets of files as before -- 1 consumer and 100 consumers. The throughput of having 100 consumers was 50% of the throughput of having 1 consumer.
gprof-20160810.zip

@uvzubovs
Copy link
Author

@ralight Was this information what you were looking for? Was it useful? Would you like us to conduct more tests? Thanks!

@PierreF
Copy link
Contributor

PierreF commented Jan 2, 2018

There had improvement in develop branch. I no longer see fast drop in throughput. But there is probably still place for more improvement.


To reproduce issue, I do:

  • Run broker: mosquitto -p 1885
  • Run ONE subscriber: mosquitto_sub -p 1885 -t /c0 > /dev/null
  • Run one producer (target to ~30k msgs/s): (while true; do head -n 1000 src/conf.c; sleep 0.03;done)| ./client/mosquitto_pub -p 1885 -t /c0 -l  
  • Watch for $SYS/ to see throughput: mosquitto_sub -t '$SYS/broker/load/messages/received/1min' -p 1885

The nominal throughput on my machine is ~ 1.6M msgs/minutes.

Then, I create connection new connection that subscribe on other topic, using:

cat > sub.py << EOF
import time
import paho.mqtt.client as mqtt
# Can't go beyond 340 connections with Python paho, run multiple process
# to simulate more than 340 connections.
# https://github.com/eclipse/paho.mqtt.python/issues/238
for i in range(340):
    client = mqtt.Client()
    client.connect('localhost', 1885)
    client.subscribe('topic/nothing')
    client.loop_start()

while True:
    time.sleep(1000)
EOF
python subs.py

The program is run multiple time in multiple terminal to create more than 340 connection.
On fixes branch, the drops is quiet fast: ~1M msgs/minutes with 340 connections and ~250 msgs/minutes with 680 connections.
On develop branch, I don't seen any drop before 1020 (3 * 340) connections. But I see an increase in CPU usage of mosquitto (~20% nominal usage, +25% per 340 connection).
On develop branch, at 1700 connections the throughput dropped to 1.3M msgs/minutes

@PierreF PierreF added the Type: Enhancement A new feature for a minor or major release. label Jan 2, 2018
@ralight
Copy link
Contributor

ralight commented Dec 2, 2020

I believe this is now addressed in version 2.0.

@ralight ralight closed this as completed Dec 2, 2020
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 11, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Type: Enhancement A new feature for a minor or major release.
Projects
None yet
Development

No branches or pull requests

4 participants