-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Running Crossplane on hostNetwork #5520
Comments
Making sure I follow correctly - the control plane (i.e. API server) needs to reach Crossplane components to hit its webhooks? How do other projects address this? |
yes, this is an issue in EKS setups where the pod network is configured to use custom CNI in overlay mode (could be Calico overlay, Cilium overlay, etc..) that gives pods non-VPC-routable IPs. Pod-to-Pod communications themselves work over the overlay network enabled by the respective CNI, so Crossplane pod -> Function pods grpc endpoints works seamlessly(with appropriate networkpolicy, if it is being enforced) . But, the the EKS APIServer don't have direct connectivity(route) to talk to the pods on the overlay network. And AWS don't For any K8s APIServer -> webhook pod communications, to workaround above, the easiest option is to run any such webhook pods which requires ingress from EKS K8s APIServer, in hostNetwork mode, which gives such pods with VPC routable IPs. Most of the CNCF projects out there, therefore allow configuring the ports for their webhook pods: for example With most of the upjet based official family providers enabling conversion webhook in the provider MR CRDs for API Versioning support, this problem is getting exacerbated in such EKS setups. Since the provider pkg reconciler in crossplane core seems to hardcode the webhook and metrics port to 9443 and 8080 respectively, and running such provider family pods in And setting/overriding the ports in runtimeDeploymentConfig per provider don't seem to work as expected. |
Also, just to add for any future onlookers, In #5521, the error manifests as a timeout error This issue where the pods are running on an non-VPC routable overlay network, the error manifests as |
Adding a flag to providers (and core) to make the webhook port configurable sounds reasonable to me. |
I think I could start working on this next week, if you want me to. Although we would also need to introduce a flag for the other ports as well (grpc and metrics are the ones I'm currently aware of) |
Please do! Thank you. |
We noticed that even if we disable metrics via helm values the manager container will still try to bind to it hostNetwork: true
metrics:
# -- Enable Prometheus path, port and scrape annotations and expose port 8080 for both the Crossplane and RBAC Manager pods.
enabled: false manager init: https://github.com/crossplane/crossplane/blob/master/internal/controller/pkg/revision/runtime.go#L221 error: I think this also causes some problems with Cluster Autoscaler since it doesn't know the pod wants the hostPort and therefore will not try to bring a new node to alleviate the conflict. Haven't had time to go deep on this, so still need to confirm. But in general we saw that when metrics was disabled, the pod just crashloops and ClusterAutoscaler doesn't react to anything, where typically it would call out that host ports are not available and scale up. |
please note that its not only setting another thing that should be configurable is the dsnPolicy of the pod it should be The port should be able to be configured per provider. As i am using fluxcd for gitops, I am patching the deployments using the kustomization functionality. but would be good if the proper solution is provided here. Otherwise people in eks with other cni network plugins cannot have the webhook functionality at all. thanks. |
@marianobilli if you haven't gotten a chance to look at #5540, it may be useful to have your opinion there too 🙇♂️ |
@marianobilli I was curious if you could share more about how you ended up addressing this issue in your setup? We are finding that the crossplane controller is reconciling back any changes we attempt to patch in instantly (for the deployment or service). |
What problem are you facing?
Running Crossplane on an AWS EKS cluster with the Calico CNI leads to multiple problems:
hostNetwork: true
. This includes:hostNetwork: true
hostNetwork: true
is not easily possible, since the ports are hard coded and will conflict with each other and other components running on the same nodeWhy is this necessary? When running Calico on an EKS cluster, the kubernetes control plane will still run on the AWS CNI (the control plane is completely managed by AWS and cannot be modified). Thus everything that needs to be reached from the control plane needs to run on the host network itself.
How could Crossplane help solve your problem?
A potential solution could be to make the ports configurable, but this would also need to be implemented on the function/provider side. To my knowledge the
hostPort
andcontainerPort
values have to be the same if running on the host network (since it's the same network namespace as the host).Maybe services of type
NodePort
could be explored as well.Any ideas and feedback are greatly appreciated!
The text was updated successfully, but these errors were encountered: