-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ClusterIP addresses for Ingress services no longer work when bpf masquerading is enabled in native routing mode #32525
Comments
Okay to reproduce the problem using the First I edit the cilium helm values under Then I provision the kind cluster with variations of the death star tutorial service to include ingress and gateway API services fronting the deathstar service.
The tiefighter pod, running on a different node than the deathstar backend pod works when using the ingress clusterIP
The xwing pod, running on the same node as the deathstar backend pod doesn't work.
Interesting note both the xwing and tiefighter pods are able to access the Deathstar service directly using the ClusterIP, it appears that ingress (and I'm assuming also Gateway..i need to test... services fronting the actual Deathstar service are impacted) |
quick check and gateway has same issue, can't access via clusterIP from the same node. I'm able to see a difference between my tie and xwing pods only because i have a replica of 1 setup for my deathstar backends in my environment. if I had backends on both my worker nodes by scaling the deathstar deployment up, I get sort of stochastic behavior on connection attempts from both xwing and tiefighter pods.. depending on which backend is chosen to service the HTTP request. So for diagnostic purposes its easier to keep the target service deployment to 1 backend. |
thanks for your issue, can you give it a try with bpf.legacyHostRouting enabled ? |
This could be similar to #31653 |
enabling bpf.legacyHostRouting did not help the situation. trying to curl the clusterIP of the deathstar ingress service from a pod on the same node as the deathstar backend pod still results in the timeout error. |
@sayboras definitely looks similar to the error in the "new" connectivity test. running baseline native routing without bpf.masq enabled the pod-to-ingress-service test pass
Note: For those following along, disregard the known spurious warning as enableDefaultDeny is added in the 1.16.0 cilium prereleases as an extension to the network policy spec and the cli tool is just being overly verbose about it as I'm running cilium 1.15.4. enabling bpf.masquerade results in errors
See attached sysdump corresponding to first of six failed actions: enabling bpf.masquerade and bpf.legacyHostRouting results in errors:
See attached sysdump corresponding to first of six failed actions: |
just tested with 1.15.5 release, out today.. and still not working for me...even with the legacyHostRouting option enabled...
kind config
Passing Cilium config
Failing config
See attached sysdump for first action failure using 1.15.5 |
As discussed in Cilium slack, the mentioned changes were merged after 1.15.5, so you might need to check out the main branch. |
@sayboras I used the images you suggested and didn't see a fix. Can you tell me what images I should be using if these weren't the correct ones? https://cilium.slack.com/archives/C1MATJ5U5/p1715887169176509?thread_ts=1715382762.010149&cid=C1MATJ5U5 |
Is there an existing issue for this?
What happened?
I cannot seem to access ingress clusterIP addresses from a pod on the same node as the service backend pod if bpf masquerade is enabled in native router mode. I've got a documented Kind cluster environment in a github repo that can be used to reproduce.
In fact I can no longer access either the externalIP or the clusterIP, but the loss of the externalIP from inside the cluster isn't necessarily something I would expect to always be true. But I do expect to be able to use the clusterIP from inside the cluster.
There seems to be several bpf masquerade issues floating around, I didn't read any of them as being specific to this situation. They are probably all related, however.
I was able to isolate the symptoms to just the bpf.masquarade boolean in native routing mode.
I can't seem to trigger this at all in default tunneling routing mode.
Helm values used:
Cluster operates as expected when baseline helm values are used.
Cilium Version
cilium v1.5.4
Kernel Version
Linux carbon 6.7.4-200.fc39.x86_64 #1 SMP PREEMPT_DYNAMIC Mon Feb 5 22:21:14 UTC 2024 x86_64 GNU/Linux
Kubernetes Version
using kind cluster
kubectl version
Client Version: v1.28.2
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.29.2
Regression
I did not check for regression
Sysdump
cilium-sysdump-20240513-165256.zip
Relevant log output
No response
Anything else?
I've documented the baseline native routing Kind cluster environment I'm using here:
https://github.com/jspaleta/scale21x-demos/tree/main/environments/cilium-l2lb/imperial-gateway-native-routing
I'll update the issue with additional info on how to use this environment to reproduce the symptoms.
Cilium Users Document
Code of Conduct
The text was updated successfully, but these errors were encountered: