-
Notifications
You must be signed in to change notification settings - Fork 7.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zookeeper stops working after injecting istio-proxy #19280
Comments
zookeeper pod log
Notice the pods are rebooted a few dozen times. |
/assign |
Found a workaround for this... zookeeper has 3 ports: 2181/TCP,3888/TCP,2888/TCP 2181 is for client connnections, and 3888/2888 are both used internally for leader election and followers. I went ahead and excluded 3888/2888 for inbound ports, e.g.
and redeployed the statefulset. After that, all my zookeeper pods are coming up fine and the quorum are established. |
If you exclude these ports then when you have mTLS enabled the pods will not communicate with each other using mTLS....correct? |
I can not startup with https://github.com/helm/charts/tree/master/incubator/zookeeper, the error is not related to istio.
But i tried with another deploy https://blog.csdn.net/wslyk606/article/details/90720424, it workes well in my env. My env is installed from a recent @linsun So maybe tls setting is not right, can you show your configdump of the zookeeper? |
install following this guide https://istio.io/docs/tasks/security/authentication/auto-mtls/#before-you-begin It also works for me. |
@dcberg I don't think mTLS will be an issue on other ports that are excluded so basically i'm asking istio to do nothing with the inter member communication within the zookeeper cluster by excluding the 2 ports. |
@hzxuzhonghu great you were able to recreate it, maybe? I think the issue you had was caused by the configuration of the storage provider... you had to configure it when install the helm chart which I configured for IKS. I used a fresh istio 1.4.0 installation with default profile (NO mTLS enabled). Let me know if you need anything. |
@linsun I wasn't able to get my environment working with just the annotation that you used to excludeInboundPorts. I had to excludedOutboundPorts as well.
Once I did this then establishing quorum did work for me. |
The question I have is should we have to use the annotations to exclude both inbound and outbound ports that are used for inter pod communications for the pods within the StatefulSet when Istio is enabled? |
The root cause is that zookeeper listens on pod ip only
Ref: https://istio.io/faq/applications/#cassandra This is really a bad UX, @rshriram @howardjohn @lambdai any idea how can we solve this? |
@hzxuzhonghu thank you so much for looking at this! Could you elaborate why "listens on pod ip only" caused the prob? @mbanikazemi have you ever tried zookeeper + istio ? I noticed your https://istio.io/faq/applications/#cassandra is only for cassandra. @dcberg I think users should not need to exclude these ports unless they specifically don't want istio to intercept traffic on these ports. In my case, I did it simply because istio can't handle it right now. |
The inbound cluster address is set to 127.0.0.1 for ipv4 |
Thank you @hzxuzhonghu, I see the difference now.
|
This looks like the issues we have with apps that do not listen on local host. This can be changed with updating one or more configuration parameters for a given app. Looking at zookeeper docs (https://zookeeper.apache.org/doc/r3.3.5/zookeeperAdmin.html#sc_configuration) I see:
looking into it. |
as noted earlier, the issue is that the servers listen on their IP address for ports used for communication between the servers. There is a config option |
There is a PR to the Zookeeper helm chart that fixes this. See helm/charts#17183 and helm/charts#17258 . The PR is stuck because of a screwup with a contributor license bot. |
works pretty well after i remove the annotation! @banix @Snible thank you much for the suggestion of using quorumListenOnAllIPs. Here is what I did.
echo "quorumListenOnAllIPs=true" >> $ZK_CONFIG_FILE
|
Added an entry to the FAQ: istio/istio.io#5951 |
Thank you @mbanikazemi ! With the FAQ, I will close the issue |
as said here (istio/istio#19280 (comment)), zookeeper doesn't listen on 0.0.0.0 per default and then is not "service mesh friendly". Adding this options makes it listen to all ports and not one IP address.
as said in istio/istio#19280 (comment), zookeeper doesn't listen on 0.0.0.0 per default and then is not "service mesh friendly". Adding this options makes it listen to all ports and not one IP address. Signed-off-by: Sylvain Desbureaux <[email protected]>
@mbanikazemi @linsun tried deploying the zookeeper with the suggested fixes to use the 0.0.0.0 on the host server and also have enabled quorumListenOnAllIPs=true on my zookeeper servers. I am still getting the UnknownHostException. Any idea on how to debug this. istio-proxy@tst-zk-istio-zookeeper-0:/$ nc -v tst-zk-istio-zookeeper-headless-mirror-main.tst-zk-istio.svc.cluster.local 2181 |
we see, with quorumListenOnAllIPs=true things work fine. But if we scale the zookeeper cluster say from 3 pods to 5 pods. we see issues, and zk qourum gets disrupted and never comes up till the pods are restarted. Anybody seen such issues and probable root cause? this is very consistent for us. |
Thanks @linsun Adding echo "quorumListenOnAllIPs=true" >> conf/zookeeper.conf; worked for me |
Chiming in here. I am running the Strimzi operator to bring up Kafka and Zookeeper. I also found that I had the same "zookeeper is listening to Pod IP not 0.0.0.0 for 2888/3888" issue. While adding While I am aware that double-TLS is probably not a great idea.. I am unclear on why it broke. If there is mTLS happening automatically on the outbound traffic for Zookeeper1 -> Zookeeper 2, and automatic mTLS happening on the inbound traffic to Zookeeper2 .. then I don't understand why the applications running in each of those pods would see any difference with regards to the traffic.
To make it work, I had to go ahead and add the |
@diranged this sounds like your zookeeper is calling The double TLS should work |
I have a 3 node zookeeper (bitnami zookeeper helm chart, zk version 3.6.1) with istio proxy. all pods are full of
But if I delete all pods at the same time instead of a rollout restart then it works properly. Any idea, anyone? Should I change |
Not a zk expert, but my understanding: this is required on 1.9, optional on 1.10. See https://istio.io/latest/blog/2021/upcoming-networking-changes/ for details. |
yes, you should not set |
@diranged Is it possible to share your strimzi kafka configuration? I am hitting same mtls issues for zookeeper when istio is injected. |
We gave up trying to run Strimzi-on-Istio, and instead explicitly run it off the mesh. |
Bug description
All worked fine and validated each of the 3 pods within the stateful set is good and the quorum is established.
Chatted with @hzxuzhonghu briefly via #networking channel on slack - would like to open an issue to track this.
Expected behavior
zookeeper continues to work, at least in permissive mode.
Steps to reproduce the bug
see above
Version (include the output of
istioctl version --remote
andkubectl version
andhelm version
if you used Helm)$ istioctl version
client version: 1.4.0
control plane version: 1.4.0
data plane version: 1.3.2 (3 proxies), 1.4.0 (4 proxies)
How was Istio installed?
istioctl manifest apply
Environment where bug was observed (cloud vendor, OS, etc)
IBM Cloud K8s 1.14 cluster
The text was updated successfully, but these errors were encountered: