Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clients not receiving subscribed messages at volume #16

Closed
ralight opened this issue Mar 15, 2016 · 1 comment
Closed

Clients not receiving subscribed messages at volume #16

ralight opened this issue Mar 15, 2016 · 1 comment

Comments

@ralight
Copy link
Contributor

ralight commented Mar 15, 2016

migrated from Bugzilla #485952
status UNCONFIRMED severity normal in component Mosquitto for 1.4
Reported in version 1.4 on platform PC
Assigned to: Roger Light

On 2016-01-15 12:05:29 -0500, Gary Gershon wrote:

Hi Roger,

We are experiencing behavior problems on 1.4.7 with message subscription volumes on the order of 20,000 messages per minute of 60 byte messages.

Since we're seeing no issues in publishing messages to the server at these rates, and the server utilization and network volumes are low, it appears that the issue is related to the subscribing message code path.

The first connected client seems to work without errors, but subsequent clients on the same topic keep failing. But this may be just the effects of additional load.

The problem appears to occur with both Qos 0 and 1 with mosquitto_sub and Paho Java clients.

From the trace (below), the server is logging PUBLISH (and PINGRESP) messages to these stalled clients, but the clients aren't receiving the messages.

From a browse of the code, I see there are a number of steps and a packet queue between logging (using debug) the PUBLISH and when the real socket write occurs.

The _mosquitto_packet_queue and and _mosquitto_packet_write routines are daunting to follow given their mutex lock, linked list, callback, and state tests.

Roger, we'd appreciate your assistance to resolve this. We've spent a few days tracing the behavior and browsing the code.

Perhaps we should we try a version that predates the Web Socket support modifications?

Thanks,
Gary

There are no observed problems at lower volumes on the order of 1,000 messages/second.

If I start multiple clients on the same workstation to the server, each will ultimately fail at different times subscribing to the same topic. Watching the terminal windows side-by-side, I will see one window stall while others continue to scroll by.

The server is running at a low utilization (<10%) with mosquitto running in a Docker container at 1-3% average utilization. The server is hosted in an Amazon Linux image and is assigned a single core of a Intel CPU. All the workload on this Linux AMI is in containers running Java apps for publishing and subscribing. One would expect that my AMI is sharing CPU resources with many others.

After the client stalls, it sends a PINGREQ after some number of seconds, but this does not restart the stream. A minute later, the client disconnects (and then reconnects).

There are currently only two topics, with Paho clients publishing to each. One device and topic is about 300 messages/second. The other topic is fed from a message aggregator at 20,000 messages/second. There are normally only three or four subscribers doing logging and analytics. The problem is when a subscriber includes the high-volume topic (it uses # to combine the two topics, but also fails using just the higher-volume topic name).

Server is 1.4.7 using Dockerfile slightly modified from https://github.com/toke/docker-mosquitto (debian:jessie on top of ubuntu 14.04 AMI)

Java clients are Paho 1.0.2 with Java 8 on AWS Linux and Apple MB El Capitan.

mosquitto_sub version 1.4.5 running on libmosquitto 1.4.5 on Apple MB.

Client trace with q0 - Client stopped receiving PUBLISH messages, client sent PINGREQ, and a PINGRESP was logged by server.

But no PUBLISH stream was received. After 60 seconds there was a disconnect and reconnect

Client mosqsub/58985-GMG-MB-Pr received PUBLISH (d0, q0, r0, m0, 'Source/AIS/AISHuB', ... (62 bytes))
Source/AIS/AISHuB !AIVDM,1,1,,A,8h3OwjQKP@4GCJPPP121IoCol54cd0AwwwwwwWraTP0,2_66
Client mosqsub/58985-GMG-MB-Pr received PUBLISH (d0, q0, r0, m0, 'Source/AIS/AISHuB', ... (46 bytes))
Source/AIS/AISHuB !AIVDM,1,1,,B,Dh3OwjPflnfpMaF9HNAF9HNqF9H,2_63
Client mosqsub/58985-GMG-MB-Pr received PUBLISH (d0, q0, r0, m0, 'Source/AIS/AISHuB', ... (47 bytes))
Source/AIS/AISHuB !AIVDM,1,1,,A,133nbePP00PD>i8MDF7h0?w02><,0*0D Client mosqsub/58985-GMG-MB-Pr received PUBLISH (d0, q0, r0, m0, 'Source/AIS/AISHuB', ... (47 bytes)) Source/AIS/AISHuB !AIVDM,1,1,,B,13aOBk00000BbIbMfR0Q31W20><,0_36

Client mosqsub/58985-GMG-MB-Pr sending PINGREQ

Client mosqsub/58985-GMG-MB-Pr sending CONNECT
Client mosqsub/58985-GMG-MB-Pr received CONNACK
Client mosqsub/58985-GMG-MB-Pr sending SUBSCRIBE (Mid: 2, Topic: Source/AIS/#, QoS: 0)
Client mosqsub/58985-GMG-MB-Pr received SUBACK Subscribed (mid: 2): 0

Client mosqsub/58985-GMG-MB-Pr received PUBLISH (d0, q0, r0, m0, 'Source/AIS/AISHuB', ... (47 bytes))
Source/AIS/AISHuB !AIVDM,1,1,,B,36:Bd7Q000T97NFkTDd<GH>2><,0_1A
Client mosqsub/58985-GMG-MB-Pr received PUBLISH (d0, q0, r0, m0, 'Source/AIS/AISHuB', ... (47 bytes))
Source/AIS/AISHuB !AIVDM,1,1,,B,16:=HgPP00d<:0AtiFRCOv<0><,0_64
Client mosqsub/58985-GMG-MB-Pr received PUBLISH (d0, q0, r0, m0, 'Source/AIS/AISHuB', ... (71 bytes))
Source/AIS/AISHuB !AIVDM,1,1,,B,85NoHR1Kf1HgQVqnP?uKWerkmrKi3J9ptKFg@1urjDiKqn0N@a58,0_57
Client mosqsub/58985-GMG-MB-Pr received PUBLISH (d0, q0, r0, m0, 'Source/AIS/AISHuB', ... (46 bytes))

Server Log for this client:

cat server-58985.log | grep "mosqsub/58985" | grep -v "PUBLISH"

1452806532: New client connected from 100.35.218.11 as mosqsub/58985-GMG-MB-Pr (c1, k60).
1452806532: Sending CONNACK to mosqsub/58985-GMG-MB-Pr (0, 0)
1452806532: Received SUBSCRIBE from mosqsub/58985-GMG-MB-Pr
1452806532: Sending SUBACK to mosqsub/58985-GMG-MB-Pr
1452806592: Received PINGREQ from mosqsub/58985-GMG-MB-Pr
1452806592: Sending PINGRESP to mosqsub/58985-GMG-MB-Pr
1452806652: Socket error on client mosqsub/58985-GMG-MB-Pr, disconnecting.
1452806653: New client connected from 100.35.218.11 as mosqsub/58985-GMG-MB-Pr (c1, k60).
1452806653: Sending CONNACK to mosqsub/58985-GMG-MB-Pr (0, 0)
1452806653: Received SUBSCRIBE from mosqsub/58985-GMG-MB-Pr
1452806653: Sending SUBACK to mosqsub/58985-GMG-MB-Pr

Note on above: The PINGREQ was exactly 60 seconds after SUBACK, although the PUBLISH messages stopped a number of seconds earlier.

Note on above: The disconnect was similarly exactly 60 seconds after the PINGREQ/PINGRESP sequence. No PUBLISH messages displayed durng this interval.

Note on below: The server was logging PUBLISH messages in the same second -- before and after -- PINGREQ/PINGRESP, but nothing was displayed on mosquitto_sub terminal:

1452806592: Sending PUBLISH to mosqsub/58985-GMG-MB-Pr (d0, q0, r0, m0, 'Source/AIS/AISHuB', ... (47 bytes))
1452806592: Sending PUBLISH to mosqsub/58985-GMG-MB-Pr (d0, q0, r0, m0, 'Source/AIS/AISHuB', ... (113 bytes))
1452806592: Received PINGREQ from mosqsub/58985-GMG-MB-Pr
1452806592: Sending PINGRESP to mosqsub/58985-GMG-MB-Pr
1452806592: Sending PUBLISH to mosqsub/58985-GMG-MB-Pr (d0, q0, r0, m0, 'Source/AIS/JC', ... (47 bytes))
1452806592: Sending PUBLISH to mosqsub/58985-GMG-MB-Pr (d0, q0, r0, m0, 'Source/AIS/AISHuB', ... (34 bytes))

Thus we conclude there is an issue after logging outbound messages that is causing them not to reach the session socket correctly.

Similar situation with q=1
Display of messages stops, subsequent PINGREQ, nothing displayed for 60 seconds, then disconnect

Trace of mosquitto_sub client that stalled and subsequently sent a PINGREQ

mosquitto_sub -h 54.172.21.249 -t "Source/AIS/AISHuB" -v -d -q 1

Client mosqsub/57156-GMG-MB-Pr received PUBLISH (d0, q1, r0, m16680, 'Source/AIS/AISHuB', ... (47 bytes)) Client mosqsub/57156-GMG-MB-Pr sending PUBACK (Mid: 16680) Source/AIS/AISHuB !AIVDM,1,1,,A,15NDTLgP1Gre39hG@hhWo?wR2>`<,0_71 Client mosqsub/57156-GMG-MB-Pr received PUBLISH (d0, q1, r0, m16681, 'Source/AIS/AISHuB', ... (65 bytes)) Client mosqsub/57156-GMG-MB-Pr sending PUBACK (Mid: 16681) Source/AIS/AISHuB !AIVDM,1,1,,A,E02E340W6@1WPab3bPa200000000:usB?9TV@00003v010,4_1F Client mosqsub/57156-GMG-MB-Pr received PUBLISH (d0, q1, r0, m16682, 'Source/AIS/AISHuB', ... (46 bytes)) Client mosqsub/57156-GMG-MB-Pr sending PUBACK (Mid: 16682) Source/AIS/AISHuB !AIVDM,1,1,,B,Dh3OwjPflnfpMaF9HNAF9HNqF9H,2*63 Client mosqsub/57156-GMG-MB-Pr sending PINGREQ ^C ---

Server Config:

Place your local configuration in /etc/mosquitto/conf.d/

pid_file /mqtt/data/mosquitto.pid

persistence true
persistence_location /mqtt/data/

max_inflight_messages 1000

max_queued_messages 40000

queue_qos0_messages true

message_size_limit 8192

persistent_client_expiration 2d
autosave_interval 1800

user ubuntu

log_dest file /mqtt/log/mosquitto.log
log_type debug
log_type error
log_type warning
log_type notice
connection_messages true
log_timestamp true

include_dir /mqtt/config/conf.d


Stats:

$SYS/broker/version mosquitto version 1.4.7
$SYS/broker/timestamp Tue, 22 Dec 2015 12:47:28 +0000
$SYS/broker/uptime 2959 seconds
$SYS/broker/clients/total 6
$SYS/broker/clients/inactive 0
$SYS/broker/clients/disconnected 0
$SYS/broker/clients/active 6
$SYS/broker/clients/connected 6
$SYS/broker/clients/expired 0
$SYS/broker/messages/stored 137
$SYS/broker/messages/received 2116842
$SYS/broker/messages/sent 2151495
$SYS/broker/subscriptions/count 4
$SYS/broker/retained messages/count 47
$SYS/broker/heap/current 42224
$SYS/broker/heap/maximum 3738960
$SYS/broker/publish/messages/dropped 0
$SYS/broker/publish/messages/received 1021941
$SYS/broker/publish/messages/sent 34633
$SYS/broker/publish/bytes/received 58142385
$SYS/broker/publish/bytes/sent 62518744
$SYS/broker/bytes/received 86347945
$SYS/broker/bytes/sent 90662285
$SYS/broker/load/messages/received/1min 42973.26
$SYS/broker/load/messages/received/5min 42390.26
$SYS/broker/load/messages/received/15min 40997.44
$SYS/broker/load/messages/sent/1min 43135.45
$SYS/broker/load/messages/sent/5min 42651.92
$SYS/broker/load/messages/sent/15min 41628.45
$SYS/broker/load/publish/received/1min 20687.61
$SYS/broker/load/publish/received/5min 20424.22
$SYS/broker/load/publish/received/15min 19793.77
$SYS/broker/load/publish/sent/1min 157.93
$SYS/broker/load/publish/sent/5min 261.30
$SYS/broker/load/publish/sent/15min 630.87
$SYS/broker/load/bytes/received/1min 1747734.36
$SYS/broker/load/bytes/received/5min 1728984.96
$SYS/broker/load/bytes/received/15min 1674080.57
$SYS/broker/load/bytes/sent/1min 1795965.65
$SYS/broker/load/bytes/sent/5min 1783664.37
$SYS/broker/load/bytes/sent/15min 1754350.49
$SYS/broker/load/sockets/1min 0.06
$SYS/broker/load/sockets/5min 0.20
$SYS/broker/load/sockets/15min 0.14
$SYS/broker/load/connections/1min 0.06
$SYS/broker/load/connections/5min 0.20
$SYS/broker/load/connections/15min 0.14

-- end --

On 2016-01-18 16:45:06 -0500, Roger Light wrote:

Hi Gary,

Thanks for the extensive information, it's much appreciated. I can imagine that there might be a problem, but can't point you somewhere to look quite yet. One thing I'll say is that all of the pthread* calls do nothing in the broker, so that reduces the complexity a little bit.

I've not yet been able to reproduce this, although I've also not had much time to try it - I've an important work deadline approaching. Is there a simple way that I can see the problem, ideally using mosquitto_sub/pub or the Paho Python client or a client using libmosquitto or the Paho C library? Basically I'm happiest working with C/C++ or Python.

Thanks,

Roger

On 2016-01-19 12:39:22 -0500, Gary Gershon wrote:

Hi Roger,

The server has a public IP. I will email you the address and topic string.

I can reproduce the problem by starting two mosquitto_sub clients using OS X Terminal.

After a minute they stall and subsequently timeout and reconnect.

It's possible you might simulate my publisher using a Python or C test harness and thus drive your own broker instance. My publisher is a Paho Java client sending messages from about 30 to 120 bytes average 60, and averaging 20,000/minute. My publisher is in a Docker container adjacent to the broker, so there would be close to zero network latency.

Another option would be for me to redirect my publisher to your 1.4.7 test broker instance, but this would add network delays. I have no idea if that would matter, although it might be interesting to find out.

Best,
Gary

On 2016-01-19 18:23:52 -0500, Roger Light wrote:

Thanks for the server details, but I think I really need to fiddle with the server to see what is up.

Is your publisher just publishing to that single topic? I'll see about knocking up a test client.

@PierreF
Copy link
Contributor

PierreF commented Dec 28, 2017

I don't reproduce the behavior described (i.e. publish that works at 20k msg/s) but I do reproduce issue at high throughput.
However this issue does not happen with develop branch, therefor I believe it may be fixed in develop branch.

I'm closing this issue since I can't reproduce it on develop branch.


Steps to reproduce the issue on fixes branch (a4e9120):

  • Generate an msgs file with 100 lines of ~60 bytes each line
  • Start a mosquitto broker
  • Generate high throughput using (here it target 40k msgs/s : 4 times 100 lines every 10ms):
(while true; do cat msgs msgs msgs msgs ; sleep 0.01;done) | ./client/mosquitto_pub -h 127.0.0.1 -p 1883 -t 'Source/AIS/AISHuB' -l

At this point, mosquitto will use 100% CPU (no subscribed yet, just publishing is enough).


On develop, to try to reproduce original issue, I've also done:

  • Generate low throughput using:
(while true; do cat msgs msgs msgs ; sleep 1;done) | ./client/mosquitto_pub -h 127.0.0.1 -p 1883 -t 'Source/AIS/AISHuB-slow' -l
  • Subscribe (and estimate messages/s) using:
time ./client/mosquitto_sub -h 127.0.0.1 -p 1883 -t '#' -C 100000 | wc -l

My subscriber get 100k messages in 4.5s which means ~22k messages/s. Unable to reproduce an issue when generate 20k message/s on one topic.

@PierreF PierreF closed this as completed Dec 28, 2017
@lock lock bot locked as resolved and limited conversation to collaborators Aug 8, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants