Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support multiple addresses for IstioEndpoint #51279

Merged
merged 15 commits into from
Jul 16, 2024

Conversation

leosarra
Copy link
Contributor

@leosarra leosarra commented May 29, 2024

Please provide a description of this PR:

This is based upon the #47273 PR. I have taken over since the original author is moving to a different assignment inside his company. I will take care of further development.
Fixes #44043
Related to #40394

I'm keeping this in draft mode as I need to verify some things first

@istio-testing
Copy link
Collaborator

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@istio-testing istio-testing added the do-not-merge/work-in-progress Block merging of a PR because it isn't ready yet. label May 29, 2024
@istio-testing istio-testing added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label May 29, 2024
@istio-testing istio-testing added the needs-rebase Indicates a PR needs to be rebased before being merged label May 31, 2024
@leosarra
Copy link
Contributor Author

leosarra commented Jun 4, 2024

/test all

@istio-testing istio-testing removed the needs-rebase Indicates a PR needs to be rebased before being merged label Jun 4, 2024
@leosarra
Copy link
Contributor Author

leosarra commented Jun 4, 2024

/test all

@leosarra
Copy link
Contributor Author

leosarra commented Jun 4, 2024

/test integ-security-multicluster

@leosarra leosarra marked this pull request as ready for review June 5, 2024 08:40
@leosarra leosarra requested review from a team as code owners June 5, 2024 08:40
@istio-testing istio-testing removed the do-not-merge/work-in-progress Block merging of a PR because it isn't ready yet. label Jun 5, 2024
@leosarra
Copy link
Contributor Author

leosarra commented Jun 5, 2024

@kfaseela @howardjohn

Copy link
Contributor

@costinm costinm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A common pattern (in K8S and other places) is to leave the Address as is, and add a second AdditionalAddresses - this would avoid changing all the code so much. Or even better use a structure to represent the additional addresses - with the network name included for example since typically a pod will have multiple interfaces (there is a WG and CNIs supporting this). And of course the multiple IPv6, which may also belong to separate networks.

It's just a suggestion - I can see some benefit in not keeping Address and identifying all the places we assume it is only one and is used as a key, which this PR does.

I believe we have all the new ambient and MDS code that has a map actually keyed on IP(s) - that's where the IP (and network) are real keys, not sure how this interacts.

@@ -492,6 +492,19 @@ func compareVersion(ov, nv int) int {

var NodeTypes = [...]NodeType{SidecarProxy, Router, Waypoint, Ztunnel}

// Key returns the key of Proxy based on its addresses, have the same logic as Key() of IstioEndpoint
func (node *Proxy) Key() string {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about "FirstIPOrNil" or something less generic ?

The IP is one of the worst things to use as a key...

@leosarra
Copy link
Contributor Author

leosarra commented Jun 5, 2024

A common pattern (in K8S and other places) is to leave the Address as is, and add a second AdditionalAddresses - this would avoid changing all the code so much. Or even better use a structure to represent the additional addresses - with the network name included for example since typically a pod will have multiple interfaces (there is a WG and CNIs supporting this). And of course the multiple IPv6, which may also belong to separate networks.

It's just a suggestion - I can see some benefit in not keeping Address and identifying all the places we assume it is only one and is used as a key, which this PR does.

I believe we have all the new ambient and MDS code that has a map actually keyed on IP(s) - that's where the IP (and network) are real keys, not sure how this interacts.

Yes, I understand the point about having both Address and AdditonalAddresses/Addresses.
It would for sure make the code changes here a bit easier and allow us to not change an existing field of the core data structures.
Although my concern is that it would also open up the need for future contributions to always keep in mind the existance of both fields and having proper code path that makes use of both.
The advantage of only having Addresses is that we are sort-of-forcing people to deal with a scenario that otherwise could be missed.

Just a note, if the Istio DS env is not enabled then Addresses of endpoints will still have only the first IP inside.

@costinm
Copy link
Contributor

costinm commented Jun 6, 2024 via email

pilot/pkg/model/context.go Outdated Show resolved Hide resolved
pilot/pkg/model/service.go Outdated Show resolved Hide resolved
t.Errorf("got unexpected instances : %v", gotInstances)
}

// 4. test update instances
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this comment mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose of the test is to verify that existing instances can be updated and more addresses can be added if needed.
I updated the comment

pilot/pkg/xds/endpoints/endpoint_builder.go Outdated Show resolved Hide resolved
@leosarra
Copy link
Contributor Author

leosarra commented Jun 12, 2024

Thank you @MorrisLaw and @costinm for the comments.
I have updated the comments and did some refactoring.

@costinm
I was checking how the additionalAddresses structure is used in K8s to take inspiration before I start doing the changes but I'm having difficulties in finding examples in the K8s APIs. Nodes/Pod/Svc K8s object defines both address/IP and Addresses/IPs where Addresses/IPs are simply [primary_address, extra_addresses...].
On the other end, I know that Envoy uses the additionalAddresses for listeners and endpoints, where each entry of additional address is usually a pair (address, socket_options).
Can you provide an example of additionalAddresses in K8s so that I try to do something similar?

I would like to hear @howardjohn opinion on going from addresses to address+additionalAddresses , I recall you discusses about additionalAddresses in the past, although it was probably more envoy-related.

Which reminds me, it would help to have more details in the PR description and comments on the code - I am not sure I understand all use cases you have in mind and how the additional IPs are to be used...

Sure will add those information in the PR description. I will carry some information from the original design document https://docs.google.com/document/d/1pw-WaZOnpSVuZwNuv4VNXEgrFvDI2zbxvR9NonDwkI4/edit#heading=h.se6wjxq9mtpk

pilot/pkg/model/context.go Outdated Show resolved Hide resolved
addr := util.BuildAddress(e.Address, e.EndpointPort)
addr := util.BuildAddress(e.Addresses[0], e.EndpointPort)

// This change can support multiple addresses for an endpoint, then there are some use cases for it, such as
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am seeing very different behavior here with Istio vs non-Istio. I have 4 services, v4 only, v6 only, v4>v6, and v6>v4.

With istio:

/ $for i in {0..100}; do curl -s echo | grep IP=; done  | sort | uniq -c
    101 IP=127.0.0.6
/ $for i in {0..100}; do curl -s echo-v4v6 | grep IP=; done  | sort | uniq -c
    101 IP=127.0.0.6
/ $for i in {0..100}; do curl -s echo-v6v4 | grep IP=; done  | sort | uniq -c
    101 IP=127.0.0.6
/ $for i in {0..100}; do curl -s echo-v6 | grep IP=; done  | sort | uniq -c
    101 IP=127.0.0.6

ALL use v4

Without istio:

/ $for i in {0..100}; do curl -s echo-v6 | grep IP=; done | uniq | sort | uniq -c
      1 IP=fd00:10:244::b
/ $for i in {0..100}; do curl -s echo-v6 | grep IP=; done  | sort | uniq -c
    101 IP=fd00:10:244::b
/ $for i in {0..100}; do curl -s echo-v6v4 | grep IP=; done  | sort | uniq -c
    101 IP=fd00:10:244::b
/ $for i in {0..100}; do curl -s echo-v4v6 | grep IP=; done  | sort | uniq -c
    101 IP=fd00:10:244::b
/ $for i in {0..100}; do curl -s echo | grep IP=; done  | sort | uniq -c
    101 IP=10.244.0.11

All use v6, except the ipv4 only service.

I think its incorrect to use IPv6 for an IPv4 only and vis-versa. Kubernetes doesn't, they are not in the EndpointSlices, and DNS will not even resolve for the wrong IP family in kube-dns

Copy link
Member

@howardjohn howardjohn Jun 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure WE should be doing happy eyeballs?

Consider this case:

apiVersion: v1
kind: Service
metadata:
  name: echo-only-v4-pod
spec:
  ipFamilies:
  - IPv6
  - IPv4
  ipFamilyPolicy: RequireDualStack
  ports:
  - name: http
    port: 80
    protocol: TCP
    targetPort: 80
  selector:
    app: none
---
addressType: IPv4
apiVersion: discovery.k8s.io/v1
endpoints:
- addresses:
  - 10.244.0.16
  conditions:
    ready: true
    serving: true
    terminating: false
  nodeName: dual-control-plane
kind: EndpointSlice
metadata:
  labels:
    kubernetes.io/service-name: echo-only-v4-pod
  name: echo-only-v4-pod
ports:
- name: http
  port: 80
  protocol: TCP

This is a dualstack service with only a v4 IP.

How it works in other kubernetes networking product I have found is that the client should be doing happy eyeballs itself, and the network will just forward v4->v4 and v6->v6

Edit: I was testing linkerd wrong. Looks like they are similar to us but prefer v6 instead of v4. I don't think they actually do happyeyeballs either, I think they only pick a single IP preferring v6. not 100% sure though

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does get trickier with SE though

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did some testing. Not complete but:

Source Service Destination Pod Kubernetes Behavior PR Behavior
v4 Dual Dual Success, v4 Success, v4
v4 Dual V4 only Success, v4 Success, v4
v4 Dual V6 only FAIL FAIL
v6 Dual Dual Success, v6 Success, v4
v6 Dual V4 only FAIL 50% success v4, 50% fail
v6 Dual V6 only Success, v6 FAIL
v6 v4 NXDOMAIN Didn't test
v4 v6 NXDOMAIN Didn't test

think happy eyeballs is fundamentally broken with how we handle proxies. Maybe in a way we can not really prevent, either.

There are kind of two issues I see.

Happy eyeballs occurs at (up to) 2 places: the client app and the client proxy.

However, both of these are flawed. On client app -> client proxy, the proxy is ALWAYS going to accept the connection. Literally any IP, v4 or v6, it will accept it. So this will trigger the app to assume that is the right family to use.

Next, we have proxy -> proxy. Same thing -- the destination will accept anything. Since we always setup v4>v6, this means that when we are connecting to something in the mesh, it will ALWAYS use v4. Then server proxy --> app always uses the original IP family, so it will always use v4.

That is why we see the cases where we are not listening on v4 fail.

Copy link
Contributor Author

@leosarra leosarra Jun 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're correct, I pointed out the same in the previous iteration of the PR, prioritizing one address family over the other isn't ideal, especially considering K8s behavior. With this PR we go from the load-balancing across all addresses via LEAST_REQUEST/RR/etc on master (which isn't ideal either) to prioritizing a single family based on the cluster's default. Ideally we should try to stick to the family the request was made with.

Keep in mind that in case we deploy on a DS on a IPv6-default cluster we would end up with IPv6 being prioritized in happy-eyeballs as the cluster would indirectly affect the ordering of addresses in IstioEndpoint. Your result table would end up flipped.

See Istio endpoint in KinD Dual IPv6-first cluster:

leosarra@est-workstation:~/Desktop/istio$ go run istioctl/cmd/istioctl/main.go  x internal-debug endpointShardz | jq '."httpbin.istio-system.svc.cluster.local"'
{
  "istio-system": {
    "Shards": {
      "Kubernetes/Kubernetes": [
        {
          "Labels": {
            "app": "httpbin",
            "kubernetes.io/hostname": "kind-control-plane",
	    ...
            "version": "v1"
          },
          "Addresses": [
            "fd00:100:96::8",
            "100.96.0.8"
          ],
          ...

At the time of the previous PR I thought this was something I could address in the Envoy project by adding some knob to change the happy-eyeballs LB behavior.
But I did not consider the double-happy-eyeballs you mentioned, I was only thinking about the proxy and ignoring the client (thank you for making me aware of that).
Due to client happy-eyeballs + client proxy accepting all connections, I feel envoy will never be able to smartly choose the initial IP family to check as the IP version of the original request could be misleading.

To be honest I don´t know if we can do better given these constraints :/

On a side note, I'm surprised of the 50% failure/success behavior you got in one scenario, I'm looking into that and trying to reproduce.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep in mind that in case we deploy on a DS on a IPv6-default cluster we would end up with IPv6 being prioritized in

How do you mark it IPv6-default in k8s?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This got complex so started thinking it through in a doc: https://docs.google.com/document/d/1iUlgvmybCAa6QNwKA0-ju3EWGnrpqhBZHL4A0S6c9fE/edit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep in mind that in case we deploy on a DS on a IPv6-default cluster we would end up with IPv6 being prioritized in

How do you mark it IPv6-default in k8s?

You can use this config in KinD to deploy it.
The order of the families in the subnet will affect the order of addresses in the k8s resources.

kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
networking:
  ipFamily: dual
  podSubnet: "fd00:100:96::/48,100.96.0.0/11"
  serviceSubnet: "fd00:100:64::/108,100.64.0.0/13"

// 1. An endpoint can have both ipv4 or ipv6
// 2. An endpoint can be represented a serviceentry/workload instance with multiple IP addresses
// When the additional_addresses field is populated for EDS in Envoy configuration, there would be a Happy Eyeballs
// algorithm to instantiate for the Endpoint, first attempt connecting to the IP address in the address field.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we first attempt to connect to the IP family they used? It seems very odd that a user sends an IPv6 requests but then they always end up picking IPv4 (the first address).

If we are going to prefer one family vs the other, maybe it would be more appropriate to order them based on the ipFamilies order.... though I doubt that is a good idea, since K8s doesn't

@istio-testing istio-testing added the needs-rebase Indicates a PR needs to be rebased before being merged label Jun 14, 2024
@istio-testing istio-testing removed the needs-rebase Indicates a PR needs to be rebased before being merged label Jun 17, 2024
@leosarra leosarra requested a review from a team as a code owner July 5, 2024 09:15
@leosarra leosarra changed the title Support mutiple addresses for IstioEndpoint Support multiple addresses for IstioEndpoint Jul 5, 2024
pilot/pkg/networking/core/listener.go Show resolved Hide resolved
@@ -276,6 +273,14 @@ func (esc *endpointSliceController) updateEndpointCacheForSlice(hostName host.Na
}

istioEndpoint := builder.buildIstioEndpoint(a, portNum, portName, discoverabilityPolicy, healthStatus)
if features.EnableDualStack && (pod != nil) && len(pod.Status.PodIPs) > 1 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems wrong. This will make all services dual stack, even if they are explicitly single stack

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to disable that logic for DS Svcs but I have one doubt on how to perform the check given the current service.go APIs.
I can't use GetExtraAddressesForProxy (as the update is not proxy-specific), shall I add a new hasExtraAddresses() function in our service model to verify if any of the clusters have an extra address for the service?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is single cluster, so I am not sure I understand why we need this info. Is it because we do not have access to the corev1.Service in this part of the code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently that part of the code obtains the model.Service, but I think I can infer the k8s svc from the slice, let me try and test it 👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh we do have the model.Service, I missed that.

If that doesn't work well we can put model.Service.Attributes.K8sAttributes.IpFamily or similar

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the last commit should do the trick, unit tests for istio endpoints with multiple addresses are also passing. Tomorrow I will deploy it and verify that it works in a live env.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is working fine after the last commit.
I have added two test cases for single-stack svc on DS deployments to have some coverage of this.

0.244.0.28:18080,fd00:10:244::1c:18080          HEALTHY                             outbound|80||echo-dual.echo-ds-3-3070.svc.cluster.local
10.244.0.22:18080,fd00:10:244::16:18080          HEALTHY                             outbound|80||echo-v4-naked.echo-ds-3-3070.svc.cluster.local
10.244.0.25:18080,fd00:10:244::19:18080          HEALTHY                             outbound|80||echo-v4.echo-ds-3-3070.svc.cluster.local
10.244.0.27:18080                                HEALTHY                             outbound|80||echo-v4onlysvc.echo-ds-3-3070.svc.cluster.local
10.244.0.21:18080,fd00:10:244::15:18080          HEALTHY                             outbound|80||echo-v6-naked.echo-ds-3-3070.svc.cluster.local
10.244.0.23:18080,fd00:10:244::17:18080          HEALTHY                             outbound|80||echo-v6.echo-ds-3-3070.svc.cluster.local
fd00:10:244::18:18080                            HEALTHY                             outbound|80||echo-v6onlysvc.echo-ds-3-3070.svc.cluster.local

@howardjohn howardjohn added do-not-merge/hold Block automatic merging of a PR. and removed do-not-merge/hold Block automatic merging of a PR. labels Jul 16, 2024
@leosarra
Copy link
Contributor Author

Thank you so much @howardjohn for all the help on getting this across the finish line!
And ofc also a big thank you to @zhlsunshine for the initial PR and @kfaseela for helping with the test coverage :)

@istio-testing istio-testing merged commit b512295 into istio:master Jul 16, 2024
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/networking size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Missing endpoints when large number of pods is created at once
6 participants