Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract entrypoint command from bin scripts, try native-image zookeeper #311

Merged
merged 41 commits into from
Jul 12, 2020

Conversation

solsson
Copy link
Contributor

@solsson solsson commented Mar 8, 2020

Use the command generated by Kafka's ./bin/*.sh as entrypoints. Given the same env these scripts simply produce the same command every time, and append args. See #309 and https://github.com/solsson/dockerfiles/blob/master/hooks/build.

Now things like #306 should be done with --override.

Experimental, based on the assumption that things like broker.rack
and listener addresses can be set using --overide flags with ev expansion.

If we had the statefulset por-ordinal label we could set those things too,
but now we need to rely on the pod name (+ for zookeeper ID_OFFSET).
the pod name or statefulset.kubernetes.io/pod-name can be used via
the downward api in args to do things like
--override listeners=PLAINTEXT:https://$(POD_NAME).kafka:9092

Once again it's unfortunate that the statefulset label is pod name,
not pod index.

Also makes sure that DNS entries are published prior to readiness
so clusters don't get into loops of not being able to find each other.
…rs-support

Rename the statefulset's headless service from broker to kafka
client sessionid 0x0, likely client has closed socket"
this value should be increased or set to 0(infinite retry) to overcome issues related to DNS name resolving.

But I'm not sure if "Java system property only" means that this conf entry has no effect.
@solsson
Copy link
Contributor Author

solsson commented Apr 27, 2020

I had a very strange init error that isn't reproducible: the symlink operation failed, complaining that the target path exists.

+ '[' '!' -d /var/lib/zookeeper/data ']'
+ '[' -z 1 ']'
+ export ZOOKEEPER_SERVER_ID=2
+ ZOOKEEPER_SERVER_ID=2
+ echo 2
+ tee /var/lib/zookeeper/data/myid
2
+ cp -Lur /etc/kafka-configmap/init.sh /etc/kafka-configmap/log4j.properties /etc/kafka-configmap/zookeeper.properties /etc/kafka/
+ '[' '!' -z 0 ']'
+ '[' '!' -z 3 ']'
+ sed -i 's/^server\./#server./' /etc/kafka/zookeeper.properties
++ seq 0
++ seq 3
+ for N in $(seq $(( $REPLICAS - $PZOO_REPLICAS )))
+ echo server.1=zoo-0.zoo:2888:3888:participant
+ for N in $(seq $(( $REPLICAS - $PZOO_REPLICAS )))
+ echo server.2=zoo-1.zoo:2888:3888:participant
+ for N in $(seq $(( $REPLICAS - $PZOO_REPLICAS )))
+ echo server.3=zoo-2.zoo:2888:3888:participant
+ sed -i 's/server\.2\=[a-z0-9.-]*/server.2=0.0.0.0/' /etc/kafka/zookeeper.properties
+ ln -s /etc/kafka/zookeeper.properties /etc/kafka/zookeeper.properties.scale-3.zoo-1
ln: failed to create symbolic link '/etc/kafka/zookeeper.properties.scale-3.zoo-1': File exists

After delete pod the next pod came up as usual.

because in a resource strapped dev environment kafka will often crashloop
several times while waiting for zookeeper, and JVM starts are heavy.
while GKE had no such issues, probably because of fsGroup
@solsson solsson changed the base branch from zookeeper-cpurequest to master May 18, 2020 18:44
@solsson solsson changed the title Explore java as the container pocess, not wrapper scripts Extract entrypoint command from bin scripts, try native-image zookeeper May 18, 2020
solsson added a commit to solsson/dockerfiles that referenced this pull request May 18, 2020
@solsson
Copy link
Contributor Author

solsson commented Jul 12, 2020

To summarize changes from v6.0.4:

The ambition was to keep the base folders kafka and zookeeper unchanged, but we had to do some changes to init scripts and container args in order to support an updated nonroot base and the new native base.

  • Kafka 2.5.0
  • Zookeeper 3.5.8
  • Experimental GraalVM native-image binary build of Zookeeper 3.5.8
  • Variants scale-1 and dev-small use native zookeeper because faster zk startup time means fewer broker crashloops and thus reduced resource waste when provisioning dev/CI clusters.
  • The REPLICAS environment variable is new. It replaces ZOO_REPLICAS while PZOO_REPLICAS remains.
  • The .properties file paths written by init-config scripts now contain pod names so config can, if desired, vary per pod. There's symlinks to the original path to avoid breaking existing container command/args.
  • Zookeeper .properties files also contain the replicas number because they must enumerate all peers and thus need to be changed at zookeeper scaling up or down.

to simplify troubleshooting in cases like #310 with ephemeral pods

Because of the relation between replica count and container args
we're only doing this on fixed scale variants, not on bases.
@solsson solsson merged commit e201ea3 into master Jul 12, 2020
solsson added a commit that referenced this pull request Jul 13, 2020
so that it matches the zookeeper.connect property that is actual pods now
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant