Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose retry budget configuration #27419

Open
christian-posta opened this issue Sep 18, 2020 · 14 comments
Open

Expose retry budget configuration #27419

christian-posta opened this issue Sep 18, 2020 · 14 comments

Comments

@christian-posta
Copy link
Contributor

christian-posta commented Sep 18, 2020

Describe the feature request
I'd like to expose Envoy's retry budget settings for clusters:

https://www.envoyproxy.io/docs/envoy/v1.15.0/api-v3/config/cluster/v3/circuit_breaker.proto#config-cluster-v3-circuitbreakers-thresholds-retrybudget

Describe alternatives you've considered
Trying to configure by EnvoyFilter but since the retry budget is part of the Envoy threshold/circuit breaking config, and it's a list, I cannot update it with EnvoyFilter (list merge semantics are Additive)

[ ] Docs
[ ] Installation
[X] Networking
[ ] Performance and Scalability
[ ] Extensions and Telemetry
[ ] Security
[ ] Test and Release
[ ] User Experience
[ ] Developer Infrastructure

Additional context

@howardjohn howardjohn added this to > P2 in Prioritization Oct 22, 2020
@istio-policy-bot istio-policy-bot added the lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while label Mar 18, 2021
Prioritization automation moved this from > P2 to Done Apr 2, 2021
@istio-policy-bot istio-policy-bot added the lifecycle/automatically-closed Indicates a PR or issue that has been closed automatically. label Apr 2, 2021
@ramaraochavali ramaraochavali removed the lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while label Nov 18, 2022
@quraynai
Copy link

@ramaraochavali before we have this feature available. Do you have any suggest way to configure retry_budget?
Currently I have to remove cluster generated by istio and manually generate new one with retry_budget. That's kinda troublesome.

@ramaraochavali
Copy link
Contributor

Is replacing complete circuit breaker thresholds via envoy filter not working?

@quraynai
Copy link

it's working. just 100 lines of configs to maintain is kinda too much.

@ramaraochavali
Copy link
Contributor

ramaraochavali commented Nov 25, 2022 via email

@ktalg
Copy link

ktalg commented Dec 1, 2022

for _, cp := range efw.Patches[networking.EnvoyFilter_CLUSTER] {
applied := false
if cp.Operation != networking.EnvoyFilter_Patch_MERGE {
IncrementEnvoyFilterMetric(cp.Key(), Cluster, applied)
continue
}

It seems impossible to use operations other than MERGE on Cluster? What is the intent of this design?
@ramaraochavali

@ramaraochavali
Copy link
Contributor

Sorry - I did not understand, what operation are you trying to do?

@istio-policy-bot istio-policy-bot added the lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while label May 30, 2023
@ejc3
Copy link

ejc3 commented Aug 21, 2023

I see a pending pull request that hasn't been merged. Will this be looked at again? Thank you!

@zirain zirain removed lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while lifecycle/automatically-closed Indicates a PR or issue that has been closed automatically. labels Apr 19, 2024
@zirain zirain reopened this Apr 19, 2024
@ajvengo
Copy link

ajvengo commented Apr 22, 2024

Hello @zirain,

I confirm that CircuitBreaker thresholds cannot be modified for priority: DEFAULT using merge patch, because it will append a new item to the thresholds array instead of modifying thresholds[0] settings for default priority.

Envoy will use threshold[0] default settings generated by istiod because it looks for the first match.

I agree with @quraynai that we can try to workaround that by completely replacing thresholds via EnvoyFilter, but it's much more logical and convenient to expose this settings directly via DestinationRule.

I will be glad to help to backport this feature to branches 1.19 and 1.20

@zirain
Copy link
Member

zirain commented Apr 22, 2024

@ajvengo FYI: istio/api#2734 (comment)

@ramaraochavali
Copy link
Contributor

@howardjohn Any progress on Gateway API side on this?

@zirain
Copy link
Member

zirain commented Apr 22, 2024

kubernetes-sigs/gateway-api#1731 (comment)

@tejokumar
Copy link

We did some tests with and without retry-budgets using EnvoyFilter with Istio. Using retry-budgets performs better than having a static value to "max_retries". We desperately need this feature with Istio. Maintaining EnvoyFilter is very complicated. With EnvoyFilter approach, we have to maintain few 100s of lines of configuration. Those configurations include some parameters we set in DestinationRule as well.
With 100s of microservices in an Enterprise app, maintaining such complex EnvoyFilter is difficult and complex to automate. Thinking of creating a PR to address this feature.

@rajatsharma94
Copy link
Contributor

@tejokumar can you share an example EnvoyFilter that configures retry_budgets ?

From our experiences (and as @ktalg mentioned), it seems its not possible to use EnvoyFilter for this because of the "MERGE" behaviour on cluster.

@tejokumar
Copy link

Yes. The 'Merge' operation doesn't work on Arrays in EnvoyFilter. "Retry Budget" is part of thresholds array under envoy circuit-breaker configurations. We created a EnvoyFilter with two operations - REMOVE and ADD. It is ugly. Thats why we need this feature.

---
apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
  name: my-app-name-retry-budget
  namespace: my-app-name
spec:
  workloadSelector:
    labels:
      app: my-app-name
  configPatches:
  - applyTo: CLUSTER
    match:
      context: SIDECAR_OUTBOUND
      cluster:
        portNumber: 443
        name: outbound|443|mesh|app.upstream-namespace.svc.cluster.local
        subset: mesh
    patch:
      operation: REMOVE
  - applyTo: CLUSTER
    match:
      context: SIDECAR_OUTBOUND
    patch:
      operation: ADD
      value:
        name: outbound|443|mesh|app.upstream-namespace.svc.cluster.local
        type: EDS
        loadAssignment:
          clusterName: outbound|443|mesh|app.upstream-namespace.svc.cluster.local
          policy:
            overprovisioningFactor: 100
        edsClusterConfig:
          edsConfig:
            ads: {}
            initialFetchTimeout: 0s
            resourceApiVersion: V3
          serviceName: outbound|443|mesh|app.upstream-namespace.svc.cluster.local
        connectTimeout: 1s
        circuitBreakers:
          thresholds:
          - maxConnections: 10240
            maxPendingRequests: 1024
            maxRequests: 2048
            retryBudget:
              budgetPercent:
                value: 10
              minRetryConcurrency: 3
            trackRemaining: true
        typedExtensionProtocolOptions:
          envoy.extensions.upstreams.http.v3.HttpProtocolOptions:
            "@type": type.googleapis.com/envoy.extensions.upstreams.http.v3.HttpProtocolOptions
            explicitHttpConfig:
              http2ProtocolOptions: {}
        outlierDetection:
          consecutive5xx: 5
          interval: 10s
          baseEjectionTime: 60s
          maxEjectionPercent: 20
          enforcingConsecutive5xx: 100
          enforcingSuccessRate: 0
        commonLbConfig:
          healthyPanicThreshold: {}
          localityWeightedLbConfig: {}
        transportSocket:
          name: envoy.transport_sockets.tls
          typedConfig:
            "@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext
            commonTlsContext:
              tlsParams:
                tlsMinimumProtocolVersion: TLSv1_2
                tlsMaximumProtocolVersion: TLSv1_3
              tlsCertificateSdsSecretConfigs:
              - name: file-cert:/etc/certs/cert-chain.pem~/etc/certs/key.pem
                sdsConfig:
                  apiConfigSource:
                    apiType: GRPC
                    transportApiVersion: V3
                    grpcServices:
                    - envoyGrpc:
                        clusterName: sds-grpc
                    setNodeOnFirstMessageOnly: true
                  resourceApiVersion: V3
              combinedValidationContext:
                defaultValidationContext:
                  matchSubjectAltNames:
                  - exact: upstream-san
                validationContextSdsSecretConfig:
                  name: file-root:/etc/certs/root-cert.pem
                  sdsConfig:
                    apiConfigSource:
                      apiType: GRPC
                      transportApiVersion: V3
                      grpcServices:
                      - envoyGrpc:
                          clusterName: sds-grpc
                      setNodeOnFirstMessageOnly: true
                    resourceApiVersion: V3
              alpnProtocols:
              - h2
        metadata:
          filterMetadata:
            istio:
              config: "/apis/networking.istio.io/v1alpha3/namespaces/upstream-namespace/destination-rule/upstream-dr"
              services:
              - host: app.upstream-namespace.svc.cluster.local
                name: app
                namespace: upstream-namespace
              subset: mesh
        filters:
        - name: istio.metadata_exchange
          typedConfig:
            "@type": type.googleapis.com/envoy.tcp.metadataexchange.config.MetadataExchange
            protocol: istio-peer-exchange

It is very hard to generate this config using any automation script or tools. Also, this EnvoyFIlter should be created in the client namespace, unlike DestinationRule and VirtualService - which are created in server namespace.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

No branches or pull requests

10 participants