-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TX delays several seconds #1915
Comments
Well that's certainly interesting. I don't see any behaviour like that myself, but I'll see if I can reproduce your setup (without the PICs though). How did you install Mosquitto, out of interest? |
RPi4 repository has some old version. So I downloaded ZIP from github -> make, make install. You do not need PICs, find some other source generating similar traffic with Mosquitto.
This helped a lot to debug occassionally occurring BUGs, too. |
Are you seeing these problems between your clients? or are you seeing these blockages in openhab2 not getting connections to mosquitto? My first thought would be the network stacks on the pics, not mosquitto. I've ran into problems before with naiive networking assumptions in ultra embedded. |
If I'm reading it correctly, the 4 second delay wireshark log shows some ICMP traffic from 222 (broker) to 232 (PIC) at the time of the delay, and that ICMP request shows destination unreachable. So it's not clear that there is a network connection at that point in time. |
I have now been able to see this behaviour when connecting to a pi, but it is so uncommon to make it very difficult to debug. From what I've seen so far, there are very occasional "TCP spurious retransmission" of PUBACK packets, and these are always present when a delay happens. I've modified the Mosquitto source to print the parameters and result of the Something that may be related, when I've noticed the spurious retransmission messages in wireshark, and seen a large delay - that has coincided with my ssh connection to the pi also having problems. I know that the location my pi is situated doesn't have great wifi signal, so for me at least that may be a contributing factor. I should also note that the spurious retransmission occurs at both the pi and my laptop. |
everything is connected via Ethernet cable: PIC IoT communicates at 10 Mb/s HDX in average with load about
smart switch has
errors for PIC IoT LAN port, so no packets lost on physical layer should I download lates sources to make more detail logs?
|
@ralight any updates? Tried mosquitto version 2.0.4, but behavior seems even worse - more TCP retransmissions, more often occuring timeout. 1.1) collision of PUBLISHing on both sides
1.2) TCP trace: PUBACK of 04/eth/PktsRxOK after IoT DISCONNECT:
3.3) mosquitto.log: ===
2.2) TCP trace: PUBACK of 04/BME680/P_pa after IoT DISCONNECT:
2.3) mosquitto.log:
===
3.2) TCP trace: PUBACK of 04/eth/PktsTxOK after IoT DISCONNECT:
3.3) mosquitto.log:
What I can see in the IoT logs, that sys/date_DD_MM_ is successfully delivered to IoT, but connection is timeouting on PUBACK of messages published at the same time to mosquitto BROKER.
while mosquitto sees lines 1 & 2 in the opposite order (and here seems to occur problem with TCP/IP sync). I am considering in such a conflict cases to re-publish last message (in this case |
Sorry, I haven't had the chance to look at this further. Could you try the following patch, it adds nanoseconds to the log messages when you already have a timestamp format defined (which I see you have) and prints log messages when packets have been passed to the TCP stack. I'm convinced this is happening at a level below Mosquitto, this will show whether I'm right or not.
|
Hi, seems patch file is missing:
Did you commit that file in a correct branch? Anyhow, expecting same like you wrote (Socket library level) based on this:
btw, in the meantime identified the problem of |
No, I wouldn't have committed the temporary patch. It's attached to my previous comment. |
Here's result:
/var/log/mosquitto/mosquitto.log
=> Seems socket layer issue like discussed 😢. In logs I see another issues:
|
Well I was expecting that, but I'm still a bit surprised by it. Your point 1 - the PUBLISH/PUBACK ordering - is fine. This is allowed by the spec. Point 2 - if the client |
Hi, I am using:
There happens several times a day communication in mosquitto is stuck for several seconds (also cases with 10 seconds ! no communication from mosquitto).
In the system log I cannot find any issue that could cause this within time window where issue occurs. Also I am not facing HW problems on RPi4.
Actually I am running 3 (PIC based) IoT nodes communicating in QoS1. Data are published to OpenHab2. On RPi4 eth0 adapter there is continuous flow of packets from IoT devices. After about 4 seconds PIC nodes are terminating connections on PUBACK timeout and trying to disconnect and reconnect (this can be seen also in tshark.log). But mosquitto does not TX answer packets on time, Also in mosquitto.log it's visible there are significant delays form messages processing.
Attaching
logs for:
20201124 mosquitto stuck 4 sec.txt
20201125 mosquitto stuck 10 sec.txt
and mosquitto broker configuration /etc/mosquitto/mosquitto.conf
mosquitto.conf.txt
There you can find 3 sections:
(here you can see all nodes are detaching due to mosquitto stuck communication IP 192,168.1.222)
(in 10 seconds all 3 devices are trying to connect 3 times)
This generates huge amount of reconnects 180 in 4 days (mostly CONNACK errors 150, remaining are mainly PUBACK/SUBACK errors):
RPi4 load is small:
Seems to be a BUG.
The text was updated successfully, but these errors were encountered: