-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
1.6 bridge won't reconnect on bridge connection loss #1334
Comments
Hi there, I am having the same issue with mosquitto running on 1.6.3 and bridge on 1.6.3 or 1.5.8 after a client lost connection Logs from the broker are very interesting as they show that messages are actually received but no Expected logs (From a working state)
Observed logs (No
|
Thanks for the second report @minjatJ, I haven't yet reproduced this. Are you also running on Docker? |
Thanks for the different reports both of you, I'm struggling to reproduce this. Do you have any more hints on what you're seeing? |
Hi @ralight thanks for looking into this. |
I can reproduce it:
Having an existing and working bridge, when restarting the broker on "bim", connection from "denise" dies and is not able to recover without restarting the Mosquitto instance on "denise". Logs from denise around the time of restart of "bim":
Logs from bim after having it restarted:
|
@m0wlheld I've just tried that with a fresh remote ubuntu 20.04 VM and can't reproduce it. Is there anything special in your configuration files I should know about? |
Nothing special (I think), I'll share. /etc/mosquitto/mosquitto.conf on Ubuntu host "bim.fritz.box":
/etc/mosquitto/conf.d/auth.conf on Ubuntu host "bim.fritz.box":
/mosquitto/config/mosquitto.conf for container instance on Raspbian host "denise.fritz.box":
|
Thanks - I agree it doesn't look much special, but it's always nice to have exactly the same conditions to work with. |
I observe the same problem here:
Procedure:
logs B:
logs B:
A:
B:
|
I see exactly same problem as Toschoch with Mosquitto 2.07 installed on Windows 10. Anybody know how to make a quick workaround? |
Ubuntu 20.04 I've found a workaround using docker-compose healthcheck. a couple of steps version: "3.8"
services:
autoheal:
image: "willfarrell/autoheal:latest"
tty: true
container_name: "autoheal"
restart: always
volumes:
- "/var/run/docker.sock:/var/run/docker.sock"
mosquitto:
image: "eclipse-mosquitto"
ports:
- "1883:1883/tcp"
- "9001:9001/tcp"
volumes:
- "/host/mosquitto/config:/mosquitto/config"
- "/host/mosquitto/data:/mosquitto/data"
- "/host/mosquitto/log:/mosquitto/log"
labels:
- "autoheal=true"
healthcheck:
test: ["CMD", "/mosquitto/config/hc.sh"]
interval: 5s
timeout: 10s
retries: 3
restart: unless-stopped
#!/bin/sh
function_to_fork() {
HC=$( mosquitto_sub -h 127.0.0.1 -p 1883 -t "test/topic_hc" -u hass -P "MeGaPa$$wD" -C 1 )
if test "$HC" != 'alive'; then
echo "> $HC result"
exit 1
fi
}
function_to_fork &
mosquitto_pub -h 10.0.0.1 -p 1883 -t 'test/topic_hc' -m 'alive' -u hass -P 'MeGaPa$$wD'
wait -n |
I had this issue in 2.0.10, but in 2.0.11 it seems fine. (using docker) |
I'm still having the issue on 2.0.14, where I've got three brokers. When restarting everything (docker compose), two brokers start up at nearly the same time, which means that one isn't ready when the other is trying to connect. The other then never tries again. A workaround shouldn't be needed, right? Here's how I've set it up: If broker A restarts, it'll reconnect to broker B. If broker B restarts, it'll reconnect to broker C, but broker A will state In all mosquitto.conf I have the lines:
Why does it not reconnect? |
Just noticed something odd. I actually have a fourth broker, which I don't really use. This broker is also connected as a bridge, and this one DOES reconnect no matter what. Brokers A, B, and C are all running as docker containers on RPi's (one on a RPi Zero W), and none of them attempt to reconnect. Broker D is running as a docker container on Debian. Can this explain the issue? |
Hi, I have a similar problem. I created a Broker on a Raspberry PI and connected it to my main MQTT. The device connects to the Raspberry Broker and works without problems. However, the Broker disconnects from the main MQTT after some time and no longer reconnects. When I restart it, it works again until it disconnects again. Does anyone know why this is happening? This is a very strange behavior, communication between MQTT servers works smoothly in both directions until disconnected.
Is it necessary to set an interval for the check or turn it off completely or something similar? Thank you in advance for any answer, |
@michaelfanta; the timeout should be caught by |
I just tried that and it still disconnects from the main MQTT server. Here I attach my config file. No need to set protocol type or something? It's weird that it always disconnects after a while.
I'm testing it on a Raspberry. I have there connected to MQTT one SonOff MINI (tasmota). If I change the states of the switch, the connection seems to be stable. However, when the status stops changing, the MQTT Bridge disconnects from the main MQTT. Shouldn't the MQTT Bridge somehow send regular data to the main MQTT to avoid disconnection? I also tried to set "start_type" to "lazy" but without effect. EDIT I also watched that the date and time did not display correctly in the log file. So I set the value "privileged" to "true" in "docker-compose.yml" and the time and date are displayed correctly. |
Privileged containers is almost never a good idea, especially not when used to circumvent another bug. For system time in the container, you can best do this: However, the fact that it doesn't reconnect after (wrongly) losing the connection is the same bug I am experiencing. |
Thank you so much, this is how it works without problems :) As for MQTT, he probably didn't mind that there was no communication. Now that sensor states are being sent, the MQTT does not disconnect. But as you say, when a connection is lost, it doesn't reconnect, which is weird. I'm trying to solve something similar with Tasmot FW. It connects to the MQTT and when the MQTT is Offline for some time, the Tasmota device no longer logs on. Each time a connection attempt fails, the time is shifted until it finally stops trying to connect, which is incorrect. I'm still trying to find some information about why "restart_timeout" isn't responding .. I was hoping that it would be enough to set "start_type" to "lazy", but even that doesn't affect me. I even tried to set "threshold = 1" to respond to each message, but it will never reconnect. For now, I'm dealing with a Python-based script, when the mqtt connection to the broker is lost, the MQTT on RPI restarts to reconnect, but it's not really a permanent solution. |
I have the same issue with 2.0.11 on both bridge and broker (Docker). I notice that sometimes when the bridge starts up and tries to connect the broker, it doesn't receive the CONNACK (even though I can see the broker sending the CONNACK in its logs, another problem I'm looking into). After that the keep alive fires every 30 seconds and the ping requests work fine on both ends, but the bridge never publishes a message to the broker after that and never tries to reconnect. When it does connect ok everything works perfectly. It's almost like it's waiting on the CONNACK forever. How can I get the bridge to retry the connection when it doesn't receive a CONNACK? |
Reading the mosquitto conf manual page here I see with the restart_timeout setting the default is to initially retry after 5 seconds and then increase up to maximum retry period of 30 seconds. So in theory I should see a reconnect attempt every 30 seconds (without configuring the restart_timeout) but I don't. I do see a keep alive (which is successful) and also happens to be set to 30 seconds. Is it possible the successful keep alive overrides the reconnect attempt and leaves the bridge in a permanent "happily disconnected" state? |
"Happily disconnected" does seem appropriate in this case, as the broker never appears to retry connecting. I'm starting to suspect there are multiple problems here:
Let's say it does reconnect after five seconds, and it is indeed due to the I experienced the problem when restarting brokers, where the order in which they are restarted is dependent on which broker initiates the connection, and then only on a RPi Zero W. |
I've recently found this to be an issue with a friends' setup as well. He's also using a rPi Zero W which doesn't reconnect. |
I believe we may have a similar issue happening but it's intermittent (meaning if we restart mosquitto it may or may not happen) but it does seem roughly 50% of the time it happens. I've noticed that for us it appears as though this log statement is always present when it happens:
|
Confirming that 1.5.8 does not exhibit the issue (just tested 5 times and would have expected to see it once). |
1.6.15 did not exhibit the issue either. Actually I'm not sure this is the same issue for us. I think this broke for us between 2.0.14 and 2.0.15 when this (guessing):
was replaced by this:
Sorry for the noise on this issue. |
I'd say it's likely that the problem originates from something like this. Could we somehow get this issue 'bumped'? |
I'm not sure if this is related or not, but on 2.0.15, we're experiencing the same issue that @everactivetim described. We have a number of bridged Mosquitto brokers, and in many cases (maybe close to 50%) the message
appears when Mosquitto is starting up. There is no actual connection timeout at play here - which happens within a second of service start. The bridge is then never actually created. We're reverting to 2.0.14 to see if the issue is present there also; it may be a regression due to c99502a as @everactivetim suggested. The most relevant parts of the configuration are (probably)
|
This may be a different problem, just like the problem mentioned by @everactivetim , but they all appear related. What device/architecture are you using? I'm able to reproduce the problem on a(ny) RPi Zero W, but not on any other device. |
Yes after further investigation I do think it's a separate issue, caused by a regression in 2.0.15. Reverting to 2.0.14 appears to fix the |
5 years age! problem still here! |
Should be fixed with 2.0.16, actually. This issue simply wasn't closed automatically. |
This issue is NOT fixed. We're seeing lost bridge connections on 2.0.18.
mosquitto.conf:
|
That's true! We use 2.0.18 & the issue is still present:
|
After upgrading to 1.6.3 from 1.5.x I found that when the bridge connection is lost the bridge won't recover when the connection is re-established.
Only solution has been restarting mosquitto.
Mosquitto running on docker.
mosquitto.conf
Bridge name:
bridge-01
Connection lost at
1562929130
Connection re-established at
1562929523
The text was updated successfully, but these errors were encountered: