-
Notifications
You must be signed in to change notification settings - Fork 7.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support multiple addresses for IstioEndpoint #51279
Conversation
Skipping CI for Draft Pull Request. |
/test all |
/test all |
/test integ-security-multicluster |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A common pattern (in K8S and other places) is to leave the Address as is, and add a second AdditionalAddresses - this would avoid changing all the code so much. Or even better use a structure to represent the additional addresses - with the network name included for example since typically a pod will have multiple interfaces (there is a WG and CNIs supporting this). And of course the multiple IPv6, which may also belong to separate networks.
It's just a suggestion - I can see some benefit in not keeping Address and identifying all the places we assume it is only one and is used as a key, which this PR does.
I believe we have all the new ambient and MDS code that has a map actually keyed on IP(s) - that's where the IP (and network) are real keys, not sure how this interacts.
pilot/pkg/model/context.go
Outdated
@@ -492,6 +492,19 @@ func compareVersion(ov, nv int) int { | |||
|
|||
var NodeTypes = [...]NodeType{SidecarProxy, Router, Waypoint, Ztunnel} | |||
|
|||
// Key returns the key of Proxy based on its addresses, have the same logic as Key() of IstioEndpoint | |||
func (node *Proxy) Key() string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about "FirstIPOrNil" or something less generic ?
The IP is one of the worst things to use as a key...
Yes, I understand the point about having both Address and AdditonalAddresses/Addresses. Just a note, if the Istio DS env is not enabled then Addresses of endpoints will still have only the first IP inside. |
On Wed, Jun 5, 2024 at 7:09 AM Leonardo Sarra ***@***.***> wrote:
A common pattern (in K8S and other places) is to leave the Address as is,
and add a second AdditionalAddresses - this would avoid changing all the
code so much. Or even better use a structure to represent the additional
addresses - with the network name included for example since typically a
pod will have multiple interfaces (there is a WG and CNIs supporting this).
And of course the multiple IPv6, which may also belong to separate networks.
It's just a suggestion - I can see some benefit in not keeping Address and
identifying all the places we assume it is only one and is used as a key,
which this PR does.
I believe we have all the new ambient and MDS code that has a map actually
keyed on IP(s) - that's where the IP (and network) are real keys, not sure
how this interacts.
Yes, I understand the point about having both Address and
AdditonalAddresses/Addresses.
It would for sure make the code changes here a bit easier and allow us to
not change an existing field of the core data structures.
Although my concern is that it would also open up the need for future
contributions to always keep in mind the existance of both fields and
having proper code path that makes use of both.
I'm sure at least one reviewer or dev will be familiar with K8S Service,
Pod and all others who follow the same pattern :-)
Some comments will also help...
We also have one address in the node ID - and additional IPs sent
separately, and other places.
But the second part of my comment was that if we add an "Addresses" field,
we should be aware that it's not only
for dual stack, but also for VMs and Pods with multiple networks attached -
and simple IP is not sufficient, it needs to
include the network and in future we may need more info ( I am not very up
to date with the multi-network CNI efforts). If
we do use the Addresses field to store more info about the address - it
would need to include
info about the 'primary' address too, and I think in time we'll eventually
end up using that, leaving the "Address" as
a kind of key or identifier - to match the node ID.
The advantage of only having Addresses is that we are sort-of-forcing
people to deal with a scenario that otherwise could be missed.
I don't think forcing people is the right approach :-), and we have quite a
few other places where IPs are used - like the new
ambient structures. Comments and docs are likely a better way to help
contributors understand. Which reminds me, it would
help to have more details in the PR description and comments on the code -
I am not sure I understand all use cases you
have in mind and how the additional IPs are to be used...
… Just a note, if the Istio DS env is not enabled then Addresses of
endpoints will still have only the first IP inside.
—
Reply to this email directly, view it on GitHub
<#51279 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAUR2V64PZC2RCHAHCDT6TZF4L2XAVCNFSM6AAAAABIO63YICVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNJQGA4TQNJQGQ>
.
You are receiving this because your review was requested.Message ID:
***@***.***>
|
t.Errorf("got unexpected instances : %v", gotInstances) | ||
} | ||
|
||
// 4. test update instances |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does this comment mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The purpose of the test is to verify that existing instances can be updated and more addresses can be added if needed.
I updated the comment
Thank you @MorrisLaw and @costinm for the comments. @costinm I would like to hear @howardjohn opinion on going from addresses to address+additionalAddresses , I recall you discusses about additionalAddresses in the past, although it was probably more envoy-related.
Sure will add those information in the PR description. I will carry some information from the original design document https://docs.google.com/document/d/1pw-WaZOnpSVuZwNuv4VNXEgrFvDI2zbxvR9NonDwkI4/edit#heading=h.se6wjxq9mtpk |
addr := util.BuildAddress(e.Address, e.EndpointPort) | ||
addr := util.BuildAddress(e.Addresses[0], e.EndpointPort) | ||
|
||
// This change can support multiple addresses for an endpoint, then there are some use cases for it, such as |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am seeing very different behavior here with Istio vs non-Istio. I have 4 services, v4 only, v6 only, v4>v6, and v6>v4.
With istio:
/ $for i in {0..100}; do curl -s echo | grep IP=; done | sort | uniq -c
101 IP=127.0.0.6
/ $for i in {0..100}; do curl -s echo-v4v6 | grep IP=; done | sort | uniq -c
101 IP=127.0.0.6
/ $for i in {0..100}; do curl -s echo-v6v4 | grep IP=; done | sort | uniq -c
101 IP=127.0.0.6
/ $for i in {0..100}; do curl -s echo-v6 | grep IP=; done | sort | uniq -c
101 IP=127.0.0.6
ALL use v4
Without istio:
/ $for i in {0..100}; do curl -s echo-v6 | grep IP=; done | uniq | sort | uniq -c
1 IP=fd00:10:244::b
/ $for i in {0..100}; do curl -s echo-v6 | grep IP=; done | sort | uniq -c
101 IP=fd00:10:244::b
/ $for i in {0..100}; do curl -s echo-v6v4 | grep IP=; done | sort | uniq -c
101 IP=fd00:10:244::b
/ $for i in {0..100}; do curl -s echo-v4v6 | grep IP=; done | sort | uniq -c
101 IP=fd00:10:244::b
/ $for i in {0..100}; do curl -s echo | grep IP=; done | sort | uniq -c
101 IP=10.244.0.11
All use v6, except the ipv4 only service.
I think its incorrect to use IPv6 for an IPv4 only and vis-versa. Kubernetes doesn't, they are not in the EndpointSlices, and DNS will not even resolve for the wrong IP family in kube-dns
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we sure WE should be doing happy eyeballs?
Consider this case:
apiVersion: v1
kind: Service
metadata:
name: echo-only-v4-pod
spec:
ipFamilies:
- IPv6
- IPv4
ipFamilyPolicy: RequireDualStack
ports:
- name: http
port: 80
protocol: TCP
targetPort: 80
selector:
app: none
---
addressType: IPv4
apiVersion: discovery.k8s.io/v1
endpoints:
- addresses:
- 10.244.0.16
conditions:
ready: true
serving: true
terminating: false
nodeName: dual-control-plane
kind: EndpointSlice
metadata:
labels:
kubernetes.io/service-name: echo-only-v4-pod
name: echo-only-v4-pod
ports:
- name: http
port: 80
protocol: TCP
This is a dualstack service with only a v4 IP.
How it works in other kubernetes networking product I have found is that the client should be doing happy eyeballs itself, and the network will just forward v4->v4 and v6->v6
Edit: I was testing linkerd wrong. Looks like they are similar to us but prefer v6 instead of v4. I don't think they actually do happyeyeballs either, I think they only pick a single IP preferring v6. not 100% sure though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does get trickier with SE though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did some testing. Not complete but:
Source | Service | Destination Pod | Kubernetes Behavior | PR Behavior |
---|---|---|---|---|
v4 | Dual | Dual | Success, v4 | Success, v4 |
v4 | Dual | V4 only | Success, v4 | Success, v4 |
v4 | Dual | V6 only | FAIL | FAIL |
v6 | Dual | Dual | Success, v6 | Success, v4 |
v6 | Dual | V4 only | FAIL | 50% success v4, 50% fail |
v6 | Dual | V6 only | Success, v6 | FAIL |
v6 | v4 | NXDOMAIN | Didn't test | |
v4 | v6 | NXDOMAIN | Didn't test |
think happy eyeballs is fundamentally broken with how we handle proxies. Maybe in a way we can not really prevent, either.
There are kind of two issues I see.
Happy eyeballs occurs at (up to) 2 places: the client app and the client proxy.
However, both of these are flawed. On client app -> client proxy, the proxy is ALWAYS going to accept the connection. Literally any IP, v4 or v6, it will accept it. So this will trigger the app to assume that is the right family to use.
Next, we have proxy -> proxy. Same thing -- the destination will accept anything. Since we always setup v4>v6, this means that when we are connecting to something in the mesh, it will ALWAYS use v4. Then server proxy --> app always uses the original IP family, so it will always use v4.
That is why we see the cases where we are not listening on v4 fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're correct, I pointed out the same in the previous iteration of the PR, prioritizing one address family over the other isn't ideal, especially considering K8s behavior. With this PR we go from the load-balancing across all addresses via LEAST_REQUEST/RR/etc on master (which isn't ideal either) to prioritizing a single family based on the cluster's default. Ideally we should try to stick to the family the request was made with.
Keep in mind that in case we deploy on a DS on a IPv6-default cluster we would end up with IPv6 being prioritized in happy-eyeballs as the cluster would indirectly affect the ordering of addresses in IstioEndpoint. Your result table would end up flipped.
See Istio endpoint in KinD Dual IPv6-first cluster:
leosarra@est-workstation:~/Desktop/istio$ go run istioctl/cmd/istioctl/main.go x internal-debug endpointShardz | jq '."httpbin.istio-system.svc.cluster.local"'
{
"istio-system": {
"Shards": {
"Kubernetes/Kubernetes": [
{
"Labels": {
"app": "httpbin",
"kubernetes.io/hostname": "kind-control-plane",
...
"version": "v1"
},
"Addresses": [
"fd00:100:96::8",
"100.96.0.8"
],
...
At the time of the previous PR I thought this was something I could address in the Envoy project by adding some knob to change the happy-eyeballs LB behavior.
But I did not consider the double-happy-eyeballs you mentioned, I was only thinking about the proxy and ignoring the client (thank you for making me aware of that).
Due to client happy-eyeballs + client proxy accepting all connections, I feel envoy will never be able to smartly choose the initial IP family to check as the IP version of the original request could be misleading.
To be honest I don´t know if we can do better given these constraints :/
On a side note, I'm surprised of the 50% failure/success behavior you got in one scenario, I'm looking into that and trying to reproduce.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep in mind that in case we deploy on a DS on a IPv6-default cluster we would end up with IPv6 being prioritized in
How do you mark it IPv6-default in k8s?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This got complex so started thinking it through in a doc: https://docs.google.com/document/d/1iUlgvmybCAa6QNwKA0-ju3EWGnrpqhBZHL4A0S6c9fE/edit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep in mind that in case we deploy on a DS on a IPv6-default cluster we would end up with IPv6 being prioritized in
How do you mark it IPv6-default in k8s?
You can use this config in KinD to deploy it.
The order of the families in the subnet will affect the order of addresses in the k8s resources.
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
networking:
ipFamily: dual
podSubnet: "fd00:100:96::/48,100.96.0.0/11"
serviceSubnet: "fd00:100:64::/108,100.64.0.0/13"
// 1. An endpoint can have both ipv4 or ipv6 | ||
// 2. An endpoint can be represented a serviceentry/workload instance with multiple IP addresses | ||
// When the additional_addresses field is populated for EDS in Envoy configuration, there would be a Happy Eyeballs | ||
// algorithm to instantiate for the Endpoint, first attempt connecting to the IP address in the address field. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we first attempt to connect to the IP family they used? It seems very odd that a user sends an IPv6 requests but then they always end up picking IPv4 (the first address).
If we are going to prefer one family vs the other, maybe it would be more appropriate to order them based on the ipFamilies order.... though I doubt that is a good idea, since K8s doesn't
Co-authored-by: zhlsunshine <[email protected]>
@@ -276,6 +273,14 @@ func (esc *endpointSliceController) updateEndpointCacheForSlice(hostName host.Na | |||
} | |||
|
|||
istioEndpoint := builder.buildIstioEndpoint(a, portNum, portName, discoverabilityPolicy, healthStatus) | |||
if features.EnableDualStack && (pod != nil) && len(pod.Status.PodIPs) > 1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems wrong. This will make all services dual stack, even if they are explicitly single stack
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to disable that logic for DS Svcs but I have one doubt on how to perform the check given the current service.go APIs.
I can't use GetExtraAddressesForProxy (as the update is not proxy-specific), shall I add a new hasExtraAddresses() function in our service model to verify if any of the clusters have an extra address for the service?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is single cluster, so I am not sure I understand why we need this info. Is it because we do not have access to the corev1.Service
in this part of the code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently that part of the code obtains the model.Service, but I think I can infer the k8s svc from the slice, let me try and test it 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh we do have the model.Service, I missed that.
If that doesn't work well we can put model.Service.Attributes.K8sAttributes.IpFamily or similar
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the last commit should do the trick, unit tests for istio endpoints with multiple addresses are also passing. Tomorrow I will deploy it and verify that it works in a live env.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is working fine after the last commit.
I have added two test cases for single-stack svc on DS deployments to have some coverage of this.
0.244.0.28:18080,fd00:10:244::1c:18080 HEALTHY outbound|80||echo-dual.echo-ds-3-3070.svc.cluster.local
10.244.0.22:18080,fd00:10:244::16:18080 HEALTHY outbound|80||echo-v4-naked.echo-ds-3-3070.svc.cluster.local
10.244.0.25:18080,fd00:10:244::19:18080 HEALTHY outbound|80||echo-v4.echo-ds-3-3070.svc.cluster.local
10.244.0.27:18080 HEALTHY outbound|80||echo-v4onlysvc.echo-ds-3-3070.svc.cluster.local
10.244.0.21:18080,fd00:10:244::15:18080 HEALTHY outbound|80||echo-v6-naked.echo-ds-3-3070.svc.cluster.local
10.244.0.23:18080,fd00:10:244::17:18080 HEALTHY outbound|80||echo-v6.echo-ds-3-3070.svc.cluster.local
fd00:10:244::18:18080 HEALTHY outbound|80||echo-v6onlysvc.echo-ds-3-3070.svc.cluster.local
Thank you so much @howardjohn for all the help on getting this across the finish line! |
Please provide a description of this PR:
This is based upon the #47273 PR. I have taken over since the original author is moving to a different assignment inside his company. I will take care of further development.
Fixes #44043
Related to #40394
I'm keeping this in draft mode as I need to verify some things first