Support multiple addresses for IstioEndpoint #51279

leosarra · 2024-05-29T13:34:05Z

Please provide a description of this PR:

This is based upon the #47273 PR. I have taken over since the original author is moving to a different assignment inside his company. I will take care of further development.
Fixes #44043
Related to #40394

I'm keeping this in draft mode as I need to verify some things first

istio-testing · 2024-05-29T13:34:07Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

leosarra · 2024-06-04T10:05:37Z

/test all

leosarra · 2024-06-04T13:33:29Z

/test all

leosarra · 2024-06-04T14:19:03Z

/test integ-security-multicluster

leosarra · 2024-06-05T08:41:12Z

@kfaseela @howardjohn

costinm

A common pattern (in K8S and other places) is to leave the Address as is, and add a second AdditionalAddresses - this would avoid changing all the code so much. Or even better use a structure to represent the additional addresses - with the network name included for example since typically a pod will have multiple interfaces (there is a WG and CNIs supporting this). And of course the multiple IPv6, which may also belong to separate networks.

It's just a suggestion - I can see some benefit in not keeping Address and identifying all the places we assume it is only one and is used as a key, which this PR does.

I believe we have all the new ambient and MDS code that has a map actually keyed on IP(s) - that's where the IP (and network) are real keys, not sure how this interacts.

costinm · 2024-06-05T13:04:31Z

pilot/pkg/model/context.go

@@ -492,6 +492,19 @@ func compareVersion(ov, nv int) int {

 var NodeTypes = [...]NodeType{SidecarProxy, Router, Waypoint, Ztunnel}

+// Key returns the key of Proxy based on its addresses, have the same logic as Key() of IstioEndpoint
+func (node *Proxy) Key() string {


How about "FirstIPOrNil" or something less generic ?

The IP is one of the worst things to use as a key...

leosarra · 2024-06-05T14:09:24Z

A common pattern (in K8S and other places) is to leave the Address as is, and add a second AdditionalAddresses - this would avoid changing all the code so much. Or even better use a structure to represent the additional addresses - with the network name included for example since typically a pod will have multiple interfaces (there is a WG and CNIs supporting this). And of course the multiple IPv6, which may also belong to separate networks.

It's just a suggestion - I can see some benefit in not keeping Address and identifying all the places we assume it is only one and is used as a key, which this PR does.

I believe we have all the new ambient and MDS code that has a map actually keyed on IP(s) - that's where the IP (and network) are real keys, not sure how this interacts.

Yes, I understand the point about having both Address and AdditonalAddresses/Addresses.
It would for sure make the code changes here a bit easier and allow us to not change an existing field of the core data structures.
Although my concern is that it would also open up the need for future contributions to always keep in mind the existance of both fields and having proper code path that makes use of both.
The advantage of only having Addresses is that we are sort-of-forcing people to deal with a scenario that otherwise could be missed.

Just a note, if the Istio DS env is not enabled then Addresses of endpoints will still have only the first IP inside.

costinm · 2024-06-06T15:29:33Z

On Wed, Jun 5, 2024 at 7:09 AM Leonardo Sarra ***@***.***> wrote: A common pattern (in K8S and other places) is to leave the Address as is, and add a second AdditionalAddresses - this would avoid changing all the code so much. Or even better use a structure to represent the additional addresses - with the network name included for example since typically a pod will have multiple interfaces (there is a WG and CNIs supporting this). And of course the multiple IPv6, which may also belong to separate networks. It's just a suggestion - I can see some benefit in not keeping Address and identifying all the places we assume it is only one and is used as a key, which this PR does. I believe we have all the new ambient and MDS code that has a map actually keyed on IP(s) - that's where the IP (and network) are real keys, not sure how this interacts. Yes, I understand the point about having both Address and AdditonalAddresses/Addresses. It would for sure make the code changes here a bit easier and allow us to not change an existing field of the core data structures. Although my concern is that it would also open up the need for future contributions to always keep in mind the existance of both fields and having proper code path that makes use of both.

I'm sure at least one reviewer or dev will be familiar with K8S Service, Pod and all others who follow the same pattern :-) Some comments will also help... We also have one address in the node ID - and additional IPs sent separately, and other places. But the second part of my comment was that if we add an "Addresses" field, we should be aware that it's not only for dual stack, but also for VMs and Pods with multiple networks attached - and simple IP is not sufficient, it needs to include the network and in future we may need more info ( I am not very up to date with the multi-network CNI efforts). If we do use the Addresses field to store more info about the address - it would need to include info about the 'primary' address too, and I think in time we'll eventually end up using that, leaving the "Address" as a kind of key or identifier - to match the node ID.

The advantage of only having Addresses is that we are sort-of-forcing people to deal with a scenario that otherwise could be missed.

I don't think forcing people is the right approach :-), and we have quite a few other places where IPs are used - like the new ambient structures. Comments and docs are likely a better way to help contributors understand. Which reminds me, it would help to have more details in the PR description and comments on the code - I am not sure I understand all use cases you have in mind and how the additional IPs are to be used...

…

Just a note, if the Istio DS env is not enabled then Addresses of endpoints will still have only the first IP inside. — Reply to this email directly, view it on GitHub <#51279 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAUR2V64PZC2RCHAHCDT6TZF4L2XAVCNFSM6AAAAABIO63YICVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNJQGA4TQNJQGQ> . You are receiving this because your review was requested.Message ID: ***@***.***>

pilot/pkg/model/context.go

pilot/pkg/model/service.go

MorrisLaw · 2024-06-09T04:20:18Z

pilot/pkg/serviceregistry/serviceentry/store_test.go

+ t.Errorf("got unexpected instances : %v", gotInstances)
+ }
+
+ // 4. test update instances


what does this comment mean?

The purpose of the test is to verify that existing instances can be updated and more addresses can be added if needed.
I updated the comment

pilot/pkg/xds/endpoints/endpoint_builder.go

leosarra · 2024-06-12T13:33:18Z

Thank you @MorrisLaw and @costinm for the comments.
I have updated the comments and did some refactoring.

@costinm
I was checking how the additionalAddresses structure is used in K8s to take inspiration before I start doing the changes but I'm having difficulties in finding examples in the K8s APIs. Nodes/Pod/Svc K8s object defines both address/IP and Addresses/IPs where Addresses/IPs are simply [primary_address, extra_addresses...].
On the other end, I know that Envoy uses the additionalAddresses for listeners and endpoints, where each entry of additional address is usually a pair (address, socket_options).
Can you provide an example of additionalAddresses in K8s so that I try to do something similar?

I would like to hear @howardjohn opinion on going from addresses to address+additionalAddresses , I recall you discusses about additionalAddresses in the past, although it was probably more envoy-related.

Which reminds me, it would help to have more details in the PR description and comments on the code - I am not sure I understand all use cases you have in mind and how the additional IPs are to be used...

Sure will add those information in the PR description. I will carry some information from the original design document https://docs.google.com/document/d/1pw-WaZOnpSVuZwNuv4VNXEgrFvDI2zbxvR9NonDwkI4/edit#heading=h.se6wjxq9mtpk

pilot/pkg/model/context.go

howardjohn · 2024-06-13T22:34:34Z

pilot/pkg/xds/endpoints/endpoint_builder.go

- addr := util.BuildAddress(e.Address, e.EndpointPort)
+ addr := util.BuildAddress(e.Addresses[0], e.EndpointPort)
+
+ // This change can support multiple addresses for an endpoint, then there are some use cases for it, such as


I am seeing very different behavior here with Istio vs non-Istio. I have 4 services, v4 only, v6 only, v4>v6, and v6>v4.

With istio:

/ $for i in {0..100}; do curl -s echo | grep IP=; done | sort | uniq -c 101 IP=127.0.0.6 / $for i in {0..100}; do curl -s echo-v4v6 | grep IP=; done | sort | uniq -c 101 IP=127.0.0.6 / $for i in {0..100}; do curl -s echo-v6v4 | grep IP=; done | sort | uniq -c 101 IP=127.0.0.6 / $for i in {0..100}; do curl -s echo-v6 | grep IP=; done | sort | uniq -c 101 IP=127.0.0.6

ALL use v4

Without istio:

/ $for i in {0..100}; do curl -s echo-v6 | grep IP=; done | uniq | sort | uniq -c 1 IP=fd00:10:244::b / $for i in {0..100}; do curl -s echo-v6 | grep IP=; done | sort | uniq -c 101 IP=fd00:10:244::b / $for i in {0..100}; do curl -s echo-v6v4 | grep IP=; done | sort | uniq -c 101 IP=fd00:10:244::b / $for i in {0..100}; do curl -s echo-v4v6 | grep IP=; done | sort | uniq -c 101 IP=fd00:10:244::b / $for i in {0..100}; do curl -s echo | grep IP=; done | sort | uniq -c 101 IP=10.244.0.11

All use v6, except the ipv4 only service.

I think its incorrect to use IPv6 for an IPv4 only and vis-versa. Kubernetes doesn't, they are not in the EndpointSlices, and DNS will not even resolve for the wrong IP family in kube-dns

Are we sure WE should be doing happy eyeballs?

Consider this case:

apiVersion: v1 kind: Service metadata: name: echo-only-v4-pod spec: ipFamilies: - IPv6 - IPv4 ipFamilyPolicy: RequireDualStack ports: - name: http port: 80 protocol: TCP targetPort: 80 selector: app: none --- addressType: IPv4 apiVersion: discovery.k8s.io/v1 endpoints: - addresses: - 10.244.0.16 conditions: ready: true serving: true terminating: false nodeName: dual-control-plane kind: EndpointSlice metadata: labels: kubernetes.io/service-name: echo-only-v4-pod name: echo-only-v4-pod ports: - name: http port: 80 protocol: TCP

This is a dualstack service with only a v4 IP.

How it works in other kubernetes networking product I have found is that the client should be doing happy eyeballs itself, and the network will just forward v4->v4 and v6->v6

Edit: I was testing linkerd wrong. Looks like they are similar to us but prefer v6 instead of v4. I don't think they actually do happyeyeballs either, I think they only pick a single IP preferring v6. not 100% sure though

This does get trickier with SE though

Did some testing. Not complete but:

Source Service Destination Pod Kubernetes Behavior PR Behavior

v4 Dual Dual Success, v4 Success, v4

v4 Dual V4 only Success, v4 Success, v4

v4 Dual V6 only FAIL FAIL

v6 Dual Dual Success, v6 Success, v4

v6 Dual V4 only FAIL 50% success v4, 50% fail

v6 Dual V6 only Success, v6 FAIL

v6 v4 NXDOMAIN Didn't test

v4 v6 NXDOMAIN Didn't test

think happy eyeballs is fundamentally broken with how we handle proxies. Maybe in a way we can not really prevent, either.

There are kind of two issues I see.

Happy eyeballs occurs at (up to) 2 places: the client app and the client proxy.

However, both of these are flawed. On client app -> client proxy, the proxy is ALWAYS going to accept the connection. Literally any IP, v4 or v6, it will accept it. So this will trigger the app to assume that is the right family to use.

Next, we have proxy -> proxy. Same thing -- the destination will accept anything. Since we always setup v4>v6, this means that when we are connecting to something in the mesh, it will ALWAYS use v4. Then server proxy --> app always uses the original IP family, so it will always use v4.

That is why we see the cases where we are not listening on v4 fail.

You're correct, I pointed out the same in the previous iteration of the PR, prioritizing one address family over the other isn't ideal, especially considering K8s behavior. With this PR we go from the load-balancing across all addresses via LEAST_REQUEST/RR/etc on master (which isn't ideal either) to prioritizing a single family based on the cluster's default. Ideally we should try to stick to the family the request was made with.

Keep in mind that in case we deploy on a DS on a IPv6-default cluster we would end up with IPv6 being prioritized in happy-eyeballs as the cluster would indirectly affect the ordering of addresses in IstioEndpoint. Your result table would end up flipped.

See Istio endpoint in KinD Dual IPv6-first cluster:

leosarra@est-workstation:~/Desktop/istio$ go run istioctl/cmd/istioctl/main.go x internal-debug endpointShardz | jq '."httpbin.istio-system.svc.cluster.local"' { "istio-system": { "Shards": { "Kubernetes/Kubernetes": [ { "Labels": { "app": "httpbin", "kubernetes.io/hostname": "kind-control-plane", ... "version": "v1" }, "Addresses": [ "fd00:100:96::8", "100.96.0.8" ], ...

At the time of the previous PR I thought this was something I could address in the Envoy project by adding some knob to change the happy-eyeballs LB behavior.
But I did not consider the double-happy-eyeballs you mentioned, I was only thinking about the proxy and ignoring the client (thank you for making me aware of that).
Due to client happy-eyeballs + client proxy accepting all connections, I feel envoy will never be able to smartly choose the initial IP family to check as the IP version of the original request could be misleading.

To be honest I don´t know if we can do better given these constraints :/

On a side note, I'm surprised of the 50% failure/success behavior you got in one scenario, I'm looking into that and trying to reproduce.

Keep in mind that in case we deploy on a DS on a IPv6-default cluster we would end up with IPv6 being prioritized in

How do you mark it IPv6-default in k8s?

This got complex so started thinking it through in a doc: https://docs.google.com/document/d/1iUlgvmybCAa6QNwKA0-ju3EWGnrpqhBZHL4A0S6c9fE/edit.

Keep in mind that in case we deploy on a DS on a IPv6-default cluster we would end up with IPv6 being prioritized in

How do you mark it IPv6-default in k8s?

You can use this config in KinD to deploy it.
The order of the families in the subnet will affect the order of addresses in the k8s resources.

kind: Cluster apiVersion: kind.x-k8s.io/v1alpha4 networking: ipFamily: dual podSubnet: "fd00:100:96::/48,100.96.0.0/11" serviceSubnet: "fd00:100:64::/108,100.64.0.0/13"

howardjohn · 2024-06-13T22:36:16Z

pilot/pkg/xds/endpoints/endpoint_builder.go

+ // 1. An endpoint can have both ipv4 or ipv6
+ // 2. An endpoint can be represented a serviceentry/workload instance with multiple IP addresses
+ // When the additional_addresses field is populated for EDS in Envoy configuration, there would be a Happy Eyeballs
+ // algorithm to instantiate for the Endpoint, first attempt connecting to the IP address in the address field.


Shouldn't we first attempt to connect to the IP family they used? It seems very odd that a user sends an IPv6 requests but then they always end up picking IPv4 (the first address).

If we are going to prefer one family vs the other, maybe it would be more appropriate to order them based on the ipFamilies order.... though I doubt that is a good idea, since K8s doesn't

pkg/dns/server/name_table.go

Co-authored-by: zhlsunshine <[email protected]>

pilot/pkg/networking/core/listener.go

pilot/pkg/serviceregistry/kube/controller/endpointslice.go

howardjohn · 2024-07-08T20:07:27Z

pilot/pkg/serviceregistry/kube/controller/endpointslice.go

@@ -276,6 +273,14 @@ func (esc *endpointSliceController) updateEndpointCacheForSlice(hostName host.Na
 }

 istioEndpoint := builder.buildIstioEndpoint(a, portNum, portName, discoverabilityPolicy, healthStatus)
+ if features.EnableDualStack && (pod != nil) && len(pod.Status.PodIPs) > 1 {


This seems wrong. This will make all services dual stack, even if they are explicitly single stack

Makes sense to disable that logic for DS Svcs but I have one doubt on how to perform the check given the current service.go APIs.
I can't use GetExtraAddressesForProxy (as the update is not proxy-specific), shall I add a new hasExtraAddresses() function in our service model to verify if any of the clusters have an extra address for the service?

This code is single cluster, so I am not sure I understand why we need this info. Is it because we do not have access to the corev1.Service in this part of the code?

Currently that part of the code obtains the model.Service, but I think I can infer the k8s svc from the slice, let me try and test it 👍

Oh we do have the model.Service, I missed that.

If that doesn't work well we can put model.Service.Attributes.K8sAttributes.IpFamily or similar

I think the last commit should do the trick, unit tests for istio endpoints with multiple addresses are also passing. Tomorrow I will deploy it and verify that it works in a live env.

It is working fine after the last commit.
I have added two test cases for single-stack svc on DS deployments to have some coverage of this.

0.244.0.28:18080,fd00:10:244::1c:18080 HEALTHY outbound|80||echo-dual.echo-ds-3-3070.svc.cluster.local 10.244.0.22:18080,fd00:10:244::16:18080 HEALTHY outbound|80||echo-v4-naked.echo-ds-3-3070.svc.cluster.local 10.244.0.25:18080,fd00:10:244::19:18080 HEALTHY outbound|80||echo-v4.echo-ds-3-3070.svc.cluster.local 10.244.0.27:18080 HEALTHY outbound|80||echo-v4onlysvc.echo-ds-3-3070.svc.cluster.local 10.244.0.21:18080,fd00:10:244::15:18080 HEALTHY outbound|80||echo-v6-naked.echo-ds-3-3070.svc.cluster.local 10.244.0.23:18080,fd00:10:244::17:18080 HEALTHY outbound|80||echo-v6.echo-ds-3-3070.svc.cluster.local fd00:10:244::18:18080 HEALTHY outbound|80||echo-v6onlysvc.echo-ds-3-3070.svc.cluster.local

leosarra · 2024-07-16T19:29:07Z

Thank you so much @howardjohn for all the help on getting this across the finish line!
And ofc also a big thank you to @zhlsunshine for the initial PR and @kfaseela for helping with the test coverage :)

istio-testing added the do-not-merge/work-in-progress Block merging of a PR because it isn't ready yet. label May 29, 2024

istio-policy-bot added the area/networking label May 29, 2024

istio-testing added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label May 29, 2024

leosarra mentioned this pull request May 29, 2024

Dual-stack : Promote to Alpha - Status Tracker #47998

Open

24 tasks

istio-testing added the needs-rebase Indicates a PR needs to be rebased before being merged label May 31, 2024

leosarra force-pushed the multi-addrs branch from b8bdd5b to 00bf0ec Compare June 4, 2024 12:15

istio-testing removed the needs-rebase Indicates a PR needs to be rebased before being merged label Jun 4, 2024

leosarra force-pushed the multi-addrs branch from 459377b to 87bff72 Compare June 5, 2024 08:03

leosarra marked this pull request as ready for review June 5, 2024 08:40

leosarra requested review from a team as code owners June 5, 2024 08:40

istio-testing removed the do-not-merge/work-in-progress Block merging of a PR because it isn't ready yet. label Jun 5, 2024

costinm reviewed Jun 5, 2024

View reviewed changes

howardjohn mentioned this pull request Jun 7, 2024

Support multiple addresses for IstioEndpoint based on proposal #47273

Closed

13 tasks

MorrisLaw reviewed Jun 9, 2024

View reviewed changes

howardjohn reviewed Jun 13, 2024

View reviewed changes

istio-testing added the needs-rebase Indicates a PR needs to be rebased before being merged label Jun 14, 2024

leosarra force-pushed the multi-addrs branch from a3f4e66 to 9330308 Compare June 17, 2024 12:53

istio-testing removed the needs-rebase Indicates a PR needs to be rebased before being merged label Jun 17, 2024

howardjohn mentioned this pull request Jun 17, 2024

Sidecar service only supports single IP family #51608

Closed

howardjohn reviewed Jun 17, 2024

View reviewed changes

pkg/dns/server/name_table.go Show resolved Hide resolved

Support mutiple addresses for IstioEndpoint

9454910

Co-authored-by: zhlsunshine <[email protected]>

leosarra added 10 commits July 5, 2024 11:15

Short circuit nametable builder if one address is not valid

004824d

Update comment

b7e9374

fix tests

6938356

minor test improvements

025ade4

fix depguard warning

8982ebd

rename Key() to FirstAddressOrNil

8bf0304

format code and fix comments

4238e59

remove FirstAddressOrNil from proxy model

2033505

adjust listener builder tests

e6bc84c

add lb dualstack tests

6d1d9e9

leosarra force-pushed the multi-addrs branch from 9330308 to 6d1d9e9 Compare July 5, 2024 09:15

leosarra requested a review from a team as a code owner July 5, 2024 09:15

leosarra changed the title ~~Support mutiple addresses for IstioEndpoint~~ Support multiple addresses for IstioEndpoint Jul 5, 2024

rename test

989ea57

howardjohn reviewed Jul 8, 2024

View reviewed changes

leosarra added 3 commits July 15, 2024 13:00

add comments

f1ebd78

don´t fill multiple addresses for non-ds svcs

06e000f

add singlestack lb test

fccc338

howardjohn approved these changes Jul 16, 2024

View reviewed changes

howardjohn added do-not-merge/hold Block automatic merging of a PR. and removed do-not-merge/hold Block automatic merging of a PR. labels Jul 16, 2024

istio-testing merged commit b512295 into istio:master Jul 16, 2024
28 checks passed

hzxuzhonghu mentioned this pull request Jul 17, 2024

Fix: prevent duplicate istioEndpoints generated #52104

Merged

howardjohn mentioned this pull request Jul 19, 2024

Deprecate EnableDualStack #52186

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support multiple addresses for IstioEndpoint #51279

Support multiple addresses for IstioEndpoint #51279

leosarra commented May 29, 2024 •

edited

Loading

istio-testing commented May 29, 2024

leosarra commented Jun 4, 2024

leosarra commented Jun 4, 2024

leosarra commented Jun 4, 2024

leosarra commented Jun 5, 2024

costinm left a comment

costinm Jun 5, 2024

leosarra commented Jun 5, 2024

costinm commented Jun 6, 2024 via email

MorrisLaw Jun 9, 2024

leosarra Jun 12, 2024

leosarra commented Jun 12, 2024 •

edited

Loading

howardjohn Jun 13, 2024

howardjohn Jun 13, 2024 •

edited

Loading

howardjohn Jun 13, 2024

howardjohn Jun 14, 2024

leosarra Jun 17, 2024 •

edited

Loading

howardjohn Jun 18, 2024

howardjohn Jun 18, 2024

leosarra Jun 18, 2024

howardjohn Jun 13, 2024

howardjohn Jul 8, 2024

leosarra Jul 15, 2024

howardjohn Jul 15, 2024

leosarra Jul 15, 2024

howardjohn Jul 15, 2024

leosarra Jul 15, 2024

leosarra Jul 16, 2024

leosarra commented Jul 16, 2024

Source	Service	Destination Pod	Kubernetes Behavior	PR Behavior
v4	Dual	Dual	Success, v4	Success, v4
v4	Dual	V4 only	Success, v4	Success, v4
v4	Dual	V6 only	FAIL	FAIL
v6	Dual	Dual	Success, v6	Success, v4
v6	Dual	V4 only	FAIL	50% success v4, 50% fail
v6	Dual	V6 only	Success, v6	FAIL
v6	v4		NXDOMAIN	Didn't test
v4	v6		NXDOMAIN	Didn't test

Support multiple addresses for IstioEndpoint #51279

Support multiple addresses for IstioEndpoint #51279

Conversation

leosarra commented May 29, 2024 • edited Loading

istio-testing commented May 29, 2024

leosarra commented Jun 4, 2024

leosarra commented Jun 4, 2024

leosarra commented Jun 4, 2024

leosarra commented Jun 5, 2024

costinm left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

leosarra commented Jun 5, 2024

costinm commented Jun 6, 2024 via email

Choose a reason for hiding this comment

Choose a reason for hiding this comment

leosarra commented Jun 12, 2024 • edited Loading

Choose a reason for hiding this comment

howardjohn Jun 13, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

leosarra Jun 17, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

leosarra commented Jul 16, 2024

leosarra commented May 29, 2024 •

edited

Loading

leosarra commented Jun 12, 2024 •

edited

Loading

howardjohn Jun 13, 2024 •

edited

Loading

leosarra Jun 17, 2024 •

edited

Loading