Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(DestinationRules): Adding aggression and min_weight_percent to DestinationRules API #3216

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

frgaudet
Copy link

@frgaudet frgaudet commented May 23, 2024

Adding envoy slowStartMode aggression and min_weight_percent parameters to DestinationRules API

Fixes #3215

Next PR to come on the cluster_traffic_policy side

First time I contribute here, hope this is good :)

@istio-testing istio-testing added the do-not-merge/work-in-progress Block merging of a PR because it isn't ready yet. label May 23, 2024
@istio-policy-bot
Copy link

😊 Welcome @frgaudet! This is either your first contribution to the Istio api repo, or it's been
a while since you've been here.

You can learn more about the Istio working groups, Code of Conduct, and contribution guidelines
by referring to Contributing to Istio.

Thanks for contributing!

Courtesy of your friendly welcome wagon.

Copy link

linux-foundation-easycla bot commented May 23, 2024

CLA Signed

The committers listed above are authorized under a signed CLA.

@istio-testing istio-testing added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. needs-ok-to-test labels May 23, 2024
@istio-testing
Copy link
Collaborator

Hi @frgaudet. Thanks for your PR.

I'm waiting for a istio member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@istio-policy-bot
Copy link

🤔 🐛 You appear to be fixing a bug in Go code, yet your PR doesn't include updates to any test files. Did you forget to add a test?

Courtesy of your friendly test nag.

@frgaudet frgaudet marked this pull request as ready for review May 23, 2024 16:15
@frgaudet frgaudet requested a review from a team as a code owner May 23, 2024 16:15
Copy link
Member

@hzxuzhonghu hzxuzhonghu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we wrap all slow star config together

// By tuning aggression parameter, one could achieve polynomial or exponential speed for traffic increase.
message aggression {
uint32 default_value = 5;
string runtime_key = 6;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do they mean and can you provide an demo

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think we should expose a runtime key here. Also for how many services do you have to configure this? And does it differ from service to service?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think we should expose a runtime key here. Also for how many services do you have to configure this? And does it differ from service to service?

As far as I remember runtime_key parameter is mandatory if we want to use an aggression parameter (which is the one we really need).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do they mean and can you provide an demo

We're using Java microservices and pods need a warmup phase in order to have full performance.

In practice the goal is to avoid giving 100% of the traffic to a new READY pod. Leveraging a slow start allow us to give first a certain % of the traffic then ramp-up progressively to 100%.

For this first attempt we tried to use the LoadBalancerSettings feature from Istio. This allow us to specify the duration of the warmup. However in this config we can’t configure 2 important options because they are not exposed by Istio API :

min_weight_percent : specifies the initial percent of origin load, if not present, it is default to 10%.

aggression : will defined the evolution of the % of traffic sent to the pods from min_weight_percent to 100%, by default the the ramp-up curve is linear, but by customising it we can achieve exponential type of curve.

The result (sorry I don't have a picture to illustrate that) is that 10% of traffic still too much : our latency increase a lot and impact our users.

To check if the 2 parameters mentioned above impact our traffic, we used an EnvoyFilter that we applied to 3 clients of our app.

image

Deploying this config from only a portion of our traffic (roughly 75%) with a slow_start_window of 3 minutes and a min_weight_percent of 1% we have been able to observe an impact were we can see the progressive ramp-up of the traffic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is actually mandatory if we want to use the agression parameter. If I try this EnvoyFilter :

apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
  name: h2-control
spec:
  configPatches:
    - applyTo: CLUSTER
      match:
        cluster:
          name: "outbound|8080||http-echo.infra.svc.cluster.local"
      patch:
        operation: MERGE
        value:
          name: "outbound|8080||http-echo.infra.svc.cluster.local"
          lbPolicy: LEAST_REQUEST
          leastRequestLbConfig:
            slowStartConfig:
              min_weight_percent: { value: 99 }
              slow_start_window: "12s"
              aggression: { default_value: 2  }
  workloadSelector:
    labels:
      app: landing-f.gaudet

Then I have this warning in the logs :

landing-f.gaudet istio-proxy {"level":"warning","time":"2024-06-04T04:41:49.406026Z","scope":"envoy config","msg":"gRPC config for type.googleapis.com/envoy.config.cluster.v3.Cluster rejected: Proto constraint validation failed (ClusterValidationError.LeastRequestLbConfig: embedded message failed validation | caused by LeastRequestLbConfigValidationError.SlowStartConfig: embedded message failed validation | caused by SlowStartConfigValidationError.Aggression: embedded message failed validation | caused by RuntimeDoubleValidationError.RuntimeKey: value length must be at least 1 characters):

and the config is not applied. However, if I setup the runtime key

          leastRequestLbConfig:
            slowStartConfig:
              min_weight_percent: { value: 99 }
              slow_start_window: "12s"
              aggression: { default_value: 2, runtime_key: "(" }

Then the config is successfully applied :

istioctl pc cluster landing-f.gaudet.infra --fqdn http-echo.infra.svc.cluster.local -ojson | jq ".[].leastRequestLbConfig"


{
  "slowStartConfig": {
    "slowStartWindow": "12s",
    "aggression": {
      "defaultValue": 2,
      "runtimeKey": "("
    },
    "minWeightPercent": {
      "value": 99
    }
  }
}

@frgaudet
Copy link
Author

Can we wrap all slow star config together

Just to be sure to understand your request : you mean wrap all slowStart fields into a new message struct ? What is the best practice you would recommend dealing with such proto change ?

  message slowStart {
    google.protobuf.Duration warmup_duration_secs = 1;
    message aggression {
      uint32 default_value = 2;
      string runtime_key = 3;
    }
    uint32 min_weight_percent = 4;
  };

@istio-testing istio-testing added the needs-rebase Indicates a PR needs to be rebased before being merged label May 31, 2024
@frgaudet frgaudet force-pushed the fred/istio/adding-slow-start-parameters branch from 1ac690d to 78a2ea5 Compare June 5, 2024 10:02
@istio-testing istio-testing removed the needs-rebase Indicates a PR needs to be rebased before being merged label Jun 5, 2024
@hzxuzhonghu
Copy link
Member

Yes @frgaudet i mean something like this

@ramaraochavali
Copy link
Contributor

Do you need separate value of aggression for each service?

@frgaudet
Copy link
Author

frgaudet commented Jun 6, 2024

Do you need separate value of aggression for each service?

Potentially yes, depending on the Java code, the warmup could be tweaked differently

@frgaudet
Copy link
Author

@hzxuzhonghu @ramaraochavali do you need something else ?

@frgaudet frgaudet changed the title [WIP] feat(DestinationRules): Adding aggression and min_weight_percent to DestinationRules API feat(DestinationRules): Adding aggression and min_weight_percent to DestinationRules API Jun 18, 2024
@istio-testing istio-testing removed the do-not-merge/work-in-progress Block merging of a PR because it isn't ready yet. label Jun 18, 2024
@frgaudet
Copy link
Author

@howardjohn do you think this could be added in the next release to come ?

@istio-testing istio-testing added the needs-rebase Indicates a PR needs to be rebased before being merged label Jun 28, 2024
@frgaudet frgaudet force-pushed the fred/istio/adding-slow-start-parameters branch from 21db8c8 to 9442f42 Compare July 5, 2024 08:41
@istio-testing istio-testing removed the needs-rebase Indicates a PR needs to be rebased before being merged label Jul 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-ok-to-test size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support agression and min_weight_percent in DestinationRule
5 participants