Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker compose health check #167

Open
alfreddatakillen opened this issue Feb 9, 2017 · 13 comments
Open

Docker compose health check #167

alfreddatakillen opened this issue Feb 9, 2017 · 13 comments
Labels

Comments

@alfreddatakillen
Copy link

alfreddatakillen commented Feb 9, 2017

docker-compose now supports health checks (since version 1.10.0), and delaying start-up of containers until their dependencies are up and healthy. See https://docs.docker.com/compose/startup-order/ for docs on this.

It would be great with an example of how to configure such health checks on a kafka container!

@ddewaele
Copy link

ddewaele commented Feb 25, 2017

Did you find a solution for this ?

doing something like this could work in the container :

healthcheck:
   test: ["CMD", "bash", "-c", "unset" , "JMX_PORT" ,";" ,"kafka-topics.sh","--zookeeper","zookeeper:2181","--list"]

Don't know if there is a better way ... the unset is needed when you've enabled JMX.

@knordstrom
Copy link

We're facing this issue with Kubernetes. Our solution was to include a health-check.sh in our Dockerfile with the following, which goes to zookeeper and checks to see if the active brokers list returned contains the broker id of the node. It's not perfect - I'd rather directly query the node if it's up - but I don't see a way to do this and (at least in our case) Kubernetes is watching the Zookeeper cluster as well in a StatefulSet so it seems relatively safe to me.

If there's general interest and @wurstmeister is into it I'd be willing to make a PR for this.

#! /bin/bash

r=`$KAFKA_HOME/bin/zookeeper-shell.sh zk-headless:2181 <<< "ls /brokers/ids" | tail -1 | jq '.[]'`   
ids=( $r )                                                                                         
function contains() {
     local n=$#
     local value=${!n}
     for ((i=1;i < $#;i++)) {
         if [ "${!i}" == "${value}" ]; then
             echo "y"
             return 0
         fi
     }
     echo "n"
     return 1
}

x=`cat $KAFKA_HOME/config/server.properties | awk 'BEGIN{FS="="}/^broker.id=/{print $2}'`
if [ $(contains "${ids[@]}" "$x") == "y" ]; then echo "ok"; exit 0; else echo "doh"; exit 1; fi`

@sscaling
Copy link
Collaborator

@alfreddatakillen - The document you reference just explains how to wrap your startup process in an external script / shell call (https://docs.docker.com/v1.10/compose/startup-order/)

I assume you meant to reference https://docs.docker.com/compose/compose-file/#healthcheck - this was added in 1.12? However, this only reports the status of the container and does not block downstream dependencies. I believe this was mainly added for swarm support / restart policies

e.g.

version: '2.1'
services:
  zookeeper:
    image: wurstmeister/zookeeper
    ports:
      - "2181:2181"
    healthcheck:
      test: ["CMD-SHELL", "echo ruok | nc -w 2 zookeeper 4444"]
      interval: 5s
      timeout: 10s
      retries: 3
  kafka:
    build: .
    depends_on:
      - zookeeper
    ports:
      - "9092"
    environment:
      KAFKA_ADVERTISED_HOST_NAME: 192.168.99.100
      KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock

This is only useful for reporting status of container. Kafka will be started immediately after Zookeeper regardless of 'healthy state'.

$ docker ps
CONTAINER ID        IMAGE                      STATUS
6c6dee3e5aae        kafkadocker_kafka          Up About a minute
c2d8a40b785e        wurstmeister/zookeeper     Up About a minute (unhealthy)

@knordstrom - the healthcheck case may work OK for k8s if it was added to the docker image.

@Lukkie
Copy link

Lukkie commented Nov 19, 2018

For future visitors that cannot get @knordstrom healthcheck to work, this is how I solved it:

Make sure that the shell script is added to a directory by your dockerfile.

We're using dynamic IDs generated by zookeeper, so I had to use following healthcheck.sh:

#! /bin/bash

unset JMX_PORT # https://github.com/wurstmeister/kafka-docker/issues/171#issuecomment-327097497

r=`$KAFKA_HOME/bin/zookeeper-shell.sh zookeeper:2181 <<< "ls /brokers/ids" | tail -1 | jq '.[]'`   
ids=( $r )                                                                                         
function contains() {
     local n=$#
     local value=${!n}
     for ((i=1;i < $#;i++)) {
         if [ "${!i}" == "${value}" ]; then
             echo "y"
             return 0
         fi
     }
     echo "n"
     return 1
}

LOG_DIR=$(awk -F= -v x="log.dirs" '$1==x{print $2}' /opt/kafka/config/server.properties)
x=`cat ${LOG_DIR}/meta.properties | awk 'BEGIN{FS="="}/^broker.id=/{print $2}'`
if [ $(contains "${ids[@]}" "$x") == "y" ]; then echo "ok"; exit 0; else echo "doh"; exit 1; fi

Finally, add following code to your docker-compose:

    healthcheck:
      test: ["CMD-SHELL", "/bin/healthcheck.sh"]
      interval: 5s
      timeout: 10s
      retries: 5

@dobesv
Copy link

dobesv commented Jan 9, 2020

There's a very nice solution here: confluentinc/cp-docker-images#358 (comment)

@matthew-d-jones
Copy link

There's a very nice solution here: confluentinc/cp-docker-images#358 (comment)

That is only for zookeeper, it doesn't help for checking when Kafka is up and ready.

@dobesv
Copy link

dobesv commented Jan 9, 2020

Ah you are right, sorry. I must have mixed this issue up with another one.

@KabDeveloper
Copy link

@Lukkie @dobesv @matthew-d-jones @sscaling @knordstrom @ddewaele @alfreddatakillen

Hi,

Your solution seems good but I am unable to get it working unfortunately.

My kafka setup do not wait for the status to be OK in order to launch the commands !

  kafka-server:
    image: 'wurstmeister/kafka:2.12_2.5.0'
    container_name: kafka-server
    hostname: kafka-server
    ports:
      - '9092:9092'
      - '29092:29092'
      - '1099:1099'
    volumes:
      - '/c/Users/PHP/Desktop/imm/ping/healthcheck.sh:/bin/healthcheck.sh'
    environment:
    .....
    depends_on:
      - zookeeper-server
    healthcheck:
      test: ["CMD-SHELL", "/bin/healthcheck.sh"]
      interval: 5s
      timeout: 10s
      retries: 5

  kafka-setup:
    image: 'wurstmeister/kafka:2.12_2.5.0'
    hostname: kafka-setup
    container_name: kafka-setup
    command: "bash -c 'echo Waiting for Kafka to be ready... && \
                       ./opt/kafka/bin/kafka-topics.sh --create --if-not-exists --zookeeper zookeeper-server:2181 --partitions 1 --replication-factor 1 --topic test1 && \
                       ./opt/kafka/bin/kafka-topics.sh --create --if-not-exists --zookeeper zookeeper-server:2181 --partitions 1 --replication-factor 1 --topic test2'"
    environment:
      KAFKA_BROKER_ID: ignored
      KAFKA_ZOOKEEPER_CONNECT: ignored
    depends_on:
      - kafka-server

When I check for Health Logs:

First time:

{"Status":"starting","FailingStreak":1,"Log":[{"Start":"2020-06-02T14:29:38.838704407Z","End":"2020-06-02T14:29:43.530036206Z","ExitCode":1,"Output":"cat:
can't open '/kafka/kafka-logs-kafka-server/meta.properties': No such file or directory\ndoh\n"}]}

And then:

{"Status":"healthy","FailingStreak":0,"Log":[{"Start":"2020-06-02T14:35:24.372947922Z","End":"2020-06-02T14:35:26.26390467Z","ExitCode":0,"Output":"ok\n"},
{"Start":"2020-06-02T14:35:31.271064187Z","End":"2020-06-02T14:35:33.22431355Z","ExitCode":0,"Output":"ok\n"},{"Start":"2020-06-02T14:35:38.230154493Z","En
d":"2020-06-02T14:35:40.243810443Z","ExitCode":0,"Output":"ok\n"},{"Start":"2020-06-02T14:35:45.252980835Z","End":"2020-06-02T14:35:47.20696075Z","ExitCode
":0,"Output":"ok\n"},{"Start":"2020-06-02T14:35:52.211854112Z","End":"2020-06-02T14:35:54.159074005Z","ExitCode":0,"Output":"ok\n"}]}

And the errors returned from my kafka-setup service:

Waiting for Kafka to be ready...
Error while executing topic command : Replication factor: 1 larger than available brokers: 0.
[2020-06-02 14:29:40,482] ERROR org.apache.kafka.common.errors.InvalidReplicationFactorException: Replication factor: 1 larger than available brokers: 0.
 (kafka.admin.TopicCommand$)

Means that the commands launched before getting the OK status from Healthcheck.

How did you able to get it working please ?

Thank you all !

@TaylorSMarks
Copy link

This seems to work ok for me for checking whether Kafka is up or not:

nc -z localhost 9091 || exit 1

(Note that I personally have Kafka running on port 9091, not 9092.)

It's basic but it seems to be good enough to keep my other containers from starting up before Kafka is actually ready to start receiving traffic.

@pprishchepa
Copy link

This healthcheck works perfect for me:

    image: bitnami/kafka:3.4.0
    ...
    healthcheck:
      test: kafka-cluster.sh cluster-id --bootstrap-server localhost:9092 || exit 1
      interval: 1s
      timeout: 60s
      retries: 60

@jagatsingh
Copy link

    healthcheck:
      test: ["CMD-SHELL", "pgrep -f 'kafka.*9101' || exit 1"]
      interval: 2m
      timeout: 10s
      retries: 3

@anvaari
Copy link

anvaari commented Feb 12, 2024

Also better option is to use

test: ["CMD-SHELL", "(echo > /dev/tcp/kafka1/9092) &>/dev/null && exit 0 || exit 1"]

@alberttwong
Copy link

alberttwong commented Apr 17, 2024

Here is the solution for quay.io/debezium/kafka:2.5. Debezium won't install nc.

Tracking https://debezium.zulipchat.com/#narrow/stream/302529-community-general/topic/Kafka.20health.20check/near/433819156

  kafka:
    image: quay.io/debezium/kafka:2.5
    ports:
      - 9092:9092
    depends_on:
      zookeeper:
        condition: service_healthy
    environment:
     - ZOOKEEPER_CONNECT=zookeeper:2181
    healthcheck:
      test: /kafka/bin/kafka-cluster.sh cluster-id --bootstrap-server kafka:9092 || exit 1
      interval: 1s
      timeout: 60s
      retries: 60

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests