Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

healthcheck #1270

Open
frank3427 opened this issue May 9, 2019 · 25 comments
Open

healthcheck #1270

frank3427 opened this issue May 9, 2019 · 25 comments

Comments

@frank3427
Copy link

does anyone have a healthcheck to check the container, or is it in the container? how to use in docker-compose

@Helloworld-zyt
Copy link

i want to know

@mozzhead164
Copy link

bump

@kaizensparc
Copy link

Hello,
You can use mosquitto_sub to try to connect to the broker and use the -E option to exit immediatly if it works. To avoid automatic reconnection and make the probe fail if it does not works after some time, you can execute it inside a timeout command

@genieai-vikas
Copy link

@didjcodt could you please paste the sample. I am getting error; tried multiple ways

@kaizensparc
Copy link

Example:

timeout 1 mosquitto_sub -h localhost -p 1883 -t 'topic' -E -i probe

What kind of error do you have? Can you paste the logs?

@genieai-vikas
Copy link

@didjcodt This is the error
"Output": "Connection error: Connection Refused: not authorised.\n"

@kaizensparc
Copy link

Did you setup any authentication method (like username/password) or are you filtering based of a clientid maybe?
You can look for that in the configuration file of mosquitto (if you have an option like password_file, acl_file, allow_anonymous, plugin for instance)

@genieai-vikas
Copy link

Yes. I have created a password file. Below is my conf file:

allow_anonymous false
password_file /mosquitto/config/pwfile
port 1883
listener 9001
persistence true
persistence_location /mosquitto/data/
log_dest file /mosquitto/log/mosquitto.log

@kaizensparc
Copy link

kaizensparc commented Nov 15, 2021

So that means your probe also needs a username/password :)
If you have created a user named probe_user with password probe_password you can add the following flags in the command: -u probe_user -P "probe_password"

@genieai-vikas
Copy link

So this is the problem I have created the pwfile which is having a username and password. If I pass that in healthcheck it will expose it. There should have been a healtcheck for which authentication was not required

@Daedaluz
Copy link
Contributor

I think it should be possible to configure another listener that only listen to localhost and have allow_anonymous true.
This way you don't need a username/password for the probes but retain required authentication from remote connections.

Something like this (very untested config)

persistence true
persistence_location /mosquitto/data
log_dest file /mosuqitto/log/mosquitto.log

per_listener_settings true

listener 1883 0.0.0.0
allow_anonymous false
password_file /mosquitto/config/pwfile

# why have this listener?
# listener 9001

listener 1880 127.0.0.1
allow_anonymous true

then you could use mosquitto sub as a probe check without password:
mosquitto_sub -p 1880 -t 'topic' -C 1 -E -i probe -W 3
(Also untested)

@LostOnTheLine
Copy link

Still nothing for this?

I don't need great security (though I'm not really sure why that's an issue as a healthcheck runs on the container's CLI ) but I have found a few things online... none of which work

version: "3"
services:
  mosquitto:
    image: eclipse-mosquitto
    container_name: mosquitto
    user: 1000:1000
    environment:
      - PUID=1000 #optional
      - PGID=1000 #optional
      - TZ=America/Phoenix
    ports:
      - 1883:1883
      #- 9001:9001
    volumes:
      - /docker/homeassistant/mqtt/mosquitto/config:/mosquitto/config
      - /docker/homeassistant/mqtt/mosquitto/data:/mosquitto/data
      - /docker/log/var/log/mosquitto:/mosquitto/log
      - /docker/log/var/log:/var/log:rw
      - /etc/localtime:/etc/localtime:ro
    restart: always
    healthcheck:
      #test: ["mosquitto_sub", "-h", "localhost", "-p", "1883", "-t", "test", "-C", "1"] #Stuck [Running]
    #  test: ["CMD-SHELL", "timeout -t 5 mosquitto_sub -t '$$SYS/#' -C 1 | grep -v Error || exit 1"] #Stuck [Starting] but runs. Becomes [Unhealthy] after 7-8 minutes
      test: ["CMD-SHELL", "mosquitto_sub -h localhost -t test -C 1"] #stuck [starting] but runs & logs active. Becomes [Unhealthy] after 5-12 minutes (7-8 typical)
      interval: 30s
      timeout: 10s
      retries: 5
      start_period: 20s
    #security_opt:
    #  - no-new-privileges:true
    labels:
      - "com.centurylinklabs.watchtower.scope=dockerhub"

Looking through things I think the best solution is going to be to add a healthcheck topic that has the -r (retain last message) flag. Ideally I was trying to set it to publish the time every 5 minutes, the idea being that the healthcheck could print that topic then grep it to determine if the time was in the last 10 minutes, if not it'd be unhealthy. I was trying with this

sh -c date | mosquitto_pub -h localhost -t healthcheck -l -r --quiet --repeat 999999 --repeat-delay 60 

hoping that it'd update the timecheck every minute, but that sadly didn't work.
But I think this is the direction that will serve best for a healthcheck

    healthcheck:
      #test: ["mosquitto_sub", "-h", "localhost", "-p", "1883", "-t", "healthcheck", "-C", "1"] #Stuck [Running] with no healthcheck status
    #  test: ["CMD-SHELL", "timeout -t 5 mosquitto_sub -t '$$SYS/#' -C 1 | grep -v Error || exit 1"] #Stuck [Starting] but runs. Becomes [Unhealthy] after 7-8 minutes
  #    test: ["CMD-SHELL", "mosquitto_sub -h localhost -t healthcheck -C 1"] #stuck [starting] but runs & logs active. Becomes [Unhealthy] after 5-12 minutes (7-8 typical)
      #test: ["sh", "-c", "date | mosquitto_pub -h localhost -t healthcheck -l"] # Publishes date & time to "healthcheck" topic
      #test: ["mosquitto_sub", "-h", "localhost", "-t", "healthcheck", "-C", "1"]
      #test: ["mosquitto_sub", "-h", "localhost", "-p", "1883", "-t", "healthcheck", "-C", "1", "-W", "5"]
      test: ["sh", "-c", "mosquitto_sub -h localhost -C 1 -t healthcheck | grep ."] #Stuck [Running] with no healthcheck status
      interval: 30s
      timeout: 10s
      retries: 5
      start_period: 20s

@Daedaluz
Copy link
Contributor

This seems to work for me...

    healthcheck:
      test: ["CMD", "mosquitto_sub", "-t", "$$SYS/#", "-C", "1", "-i", "healthcheck", "-W", "3"]
      interval: 30s
      timeout: 10s
      retries: 5
      start_period: 20s

@LostOnTheLine
Copy link

LostOnTheLine commented Mar 20, 2023

healthcheck:
test: ["CMD", "mosquitto_sub", "-t", "$$SYS/#", "-C", "1", "-i", "healthcheck", "-W", "3"]
interval: 30s
timeout: 10s
retries: 5
start_period: 20s

Thanks. This seems to work for me too. I don't know why I didn't find this when I searched, found a bunch of things similar, but not this

EDIT: Although when I try it in the CLI I get

/ $ mosquitto_sub -t $$SYS/# -C 1 -i healthcheck -W 3
Timed out
/ $ 

@Daedaluz
Copy link
Contributor

note that $$ in bash results in the current shells pid...
echo mosquitto_sub -t $$SYS/# -C 1 -i healthcheck1 -W 3 => mosquitto_sub -t 107926SYS/# -C 1 -i healthcheck1 -W 3
which is not what you want to listen for...

try mosquitto_sub -t '$SYS/#' -C 1 -i healthcheck1 -W 3 when running locally instead.

@LostOnTheLine
Copy link

Ah, so it's escaped with the 2nd $ or something like that. That makes sense

/ $ mosquitto_sub -t '$SYS/#' -C 1 -i healthcheck1 -W 3
mosquitto version 2.0.15
/ $ 

So it's just a test if it shows a non-null value when version is requested? I mean, it works & shows as Healthy... but I'm not confident that it won't show as Healthy even when it isn't working, which defeats the whole point of the healthcheck. Is there not a way to have it check to see that a topic can be subscribed to? Ideally I'd want a topic that outputs the time every 10 minutes & then a check that sees if the last message in that topic is less than 30 minutes old

@Daedaluz
Copy link
Contributor

Daedaluz commented Mar 21, 2023

  • -t '$SYS/#' topic to subscribe to
  • -C 1 receive one message and exit
  • -i healthcheck set client-id to "healthcheck"
  • -W 3 timeout after 3 seconds if not recieved any message

if you turn on verbose logging in the broker, you should see some logs about healthcheck making subscriptions.

if you add the -v flag to the mosquitto_sub command, you'll also see that the version is actually sent over a topic, and that's what you see there.
$SYS/broker/version

if you still isn't convinced, you could try edit the topic and not push anything to it.. does it still show as healthy?

If really want to go the extra step, you could write a simple program to connect, subscribe to some topic and push on the same, then wait for it to arrive and measure the time difference. this way you test the whole chain and get an idea of how much work the broker is doing. this obviously involves creating your own container with the supplied test program.

@LostOnTheLine
Copy link

  • -t '$SYS/#' topic to subscribe to
  • -C 1 receive one message and exit
  • -i healthcheck set client-id to "healthcheck"
  • -W 3 timeout after 3 seconds if not recieved any message

if you turn on verbose logging in the broker, you should see some logs about healthcheck making subscriptions.

if you add the -v flag to the mosquitto_sub command, you'll also see that the version is actually sent over a topic, and that's what you see there. $SYS/broker/version

if you still isn't convinced, you could try edit the topic and not push anything to it.. does it still show as healthy?

If really want to go the extra step, you could write a simple program to connect, subscribe to some topic and push on the same, then wait for it to arrive and measure the time difference. this way you test the whole chain and get an idea of how much work the broker is doing. this obviously involves creating your own container with the supplied test program.

Ah. So it's subscribing to a topic, alright, that should be good then. I've just seen too many homemade "healthchecks" that don't actually check the health of the container that I'm always skeptical until I know what it's doing

@ingoratsdorf
Copy link

If you have a separate listener on a non-standard port, you also have to specify the port in your healthcheck, ie:

healthcheck:
      test: ["CMD", "mosquitto_sub", "-p", "1880", "-t", "$$SYS/#", "-C", "1", "-i", "healthcheck", "-W", "3"]
      interval: 30s
      timeout: 10s
      retries: 5
      start_period: 20s

@s0170071
Copy link

s0170071 commented Oct 6, 2023

I am wondering: mosquitto_sub is probably part of the mosquitto package. If I want to test the mosquitto container, I have to install mosquitto on the host system as well ?

@ingoratsdorf
Copy link

No, because the test is executing within the container, not on the host. Would not make sense otherwise.

@LostOnTheLine
Copy link

I am wondering: mosquitto_sub is probably part of the mosquitto package. If I want to test the mosquitto container, I have to install mosquitto on the host system as well ?

The way Docker HealtChecks work is by running a test essentially in a CLI inside the container. You can have a healtchcheck check to see if a page is reachable, if a link goes to an actual page or even more complex if a link goes to a page that contains certain words, but you can only do those things if the tools to do so are installed inside the container. Oftentimes checks do things like check to see if this page is reachable & larger than some number of KB.

What this HelatchCheck is doing is, in the CLI of the container, subscribing to a message thread. If that thread doesn't have anything in it it determines that the container is not healthy. But everything is happening inside the OS of the container.

There are certain containers that are designed to test if a thing on your local machine is present, but that is usually done via BIND MOUNTS or by pinging the machine over the network, in either case your machine only needs to be running docker, which is required for it to be running the docker container, & have the variables set for the container in the Docker-Compose or the command used to start the container

@Guiorgy
Copy link

Guiorgy commented Oct 23, 2023

The only annoyance with this is that the logs get filled by:

[timestamp]: New connection from 127.0.0.1:[port] on port 1880.
[timestamp]: New client connected from 127.0.0.1:[port] as healthcheck (p2, c1, k60).
[timestamp]: Client healthcheck disconnected.

Unfortunately, Mosquitto doesn't support per-listener logging configuration, otherwise I would've disabled logging for the localhost listener.

I tried using grep to filter out those logs:

command: ['/bin/sh', '-c', '/usr/sbin/mosquitto -c /mosquitto/config/mosquitto.conf 2>&1 | grep -v -E "^.*:[ ]New connection from 127\\.0\\.0\\.1:[0-9]+ on port 1880\\.$"']

However, grep would buffer the output and when stopped SIGTERM would not be propagated to mosquitto (since it was called though a shell) resulting in container being killed instead with lost logs.

@LostOnTheLine
Copy link

The only annoyance with this is that the logs get filled by:

[timestamp]: New connection from 127.0.0.1:[port] on port 1880.
[timestamp]: New client connected from 127.0.0.1:[port] as healthcheck (p2, c1, k60).
[timestamp]: Client healthcheck disconnected.

Unfortunately, Mosquitto doesn't support per-listener logging configuration, otherwise I would've disabled logging for the localhost listener.

I tried using grep to filter out those logs:

command: ['/bin/sh', '-c', '/usr/sbin/mosquitto -c /mosquitto/config/mosquitto.conf 2>&1 | grep -v -E "^.*:[ ]New connection from 127\\.0\\.0\\.1:[0-9]+ on port 1880\\.$"']

However, grep would buffer the output and when stopped SIGTERM would not be propagated to mosquitto (since it was called though a shell) resulting in container being killed instead with lost logs.

In that case having it's default HealtchCheck time be an hour could be an option. It's less quick to notice problems, but having an hourly log entry doesn't seem bad to me

@Guiorgy
Copy link

Guiorgy commented Feb 15, 2024

Just an FYI, made a docker image that adds a check-health.sh script, and filters out the healthcheck client messages from the logs. You can grab the source and build it yourself too.

PS. Not yet well tested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants