Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change default target burst capacity #12774

Merged
merged 3 commits into from
Mar 24, 2022

Conversation

psschwei
Copy link
Contributor

@psschwei psschwei commented Mar 23, 2022

Signed-off-by: Paul S. Schweigert [email protected]

With the current TBC defaults (200), setting min-scale=2 causes activator to flip in/out of path on each request. This PR sets the default TBC to 210 so as to not be a direct multiple of the container concurrency target default, and thus make it harder to trigger activator flipageddon.

More discussion in (where we considered also setting TBC at -1 to always keep the activator in path unless overridden before a lazy consensus on just bumping the default slightly):
#11926
#12241

Changes the default target-burst-capacity to 210 in order to fix a configuration issue that caused rapid swapping of activator in/out of path. 

/assign @dprotaso @rhuss @evankanderson @nader-ziada

More discussion in:
knative#11926
knative#12241

With the current defaults, setting min-scale=2 causes activator to flip in/out of path on each request. This PR sets the default TBC to 210 so as to not be a direct multiple of the container concurrency target default, and thus make it harder to trigger the activator flipageddon

Signed-off-by: Paul S. Schweigert <[email protected]>
@knative-prow-robot knative-prow-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. area/API API objects and controllers area/autoscale labels Mar 23, 2022
@codecov
Copy link

codecov bot commented Mar 23, 2022

Codecov Report

Merging #12774 (d748815) into main (14eae34) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##             main   #12774   +/-   ##
=======================================
  Coverage   87.46%   87.46%           
=======================================
  Files         196      196           
  Lines        9754     9760    +6     
=======================================
+ Hits         8531     8537    +6     
  Misses        935      935           
  Partials      288      288           
Impacted Files Coverage Δ
pkg/autoscaler/config/config.go 97.91% <100.00%> (ø)
pkg/reconciler/gc/gc.go 94.59% <0.00%> (-2.47%) ⬇️
pkg/reconciler/revision/background.go 90.00% <0.00%> (+1.81%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 14eae34...d748815. Read the comment docs.

Signed-off-by: Paul S. Schweigert <[email protected]>
@psschwei
Copy link
Contributor Author

/retest

@dprotaso
Copy link
Member

Just out of curiosity - since I haven't been following in detail is there still a scenario (but now less likely) where a single request will flip the activator in/out of the path?

@dprotaso
Copy link
Member

/lgtm
/approve

/hold for a few more eyes

feel free to unhold if there's no input tomorrow

@knative-prow-robot knative-prow-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm Indicates that a PR is ready to be merged. labels Mar 23, 2022
@knative-prow-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dprotaso, psschwei

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@knative-prow-robot knative-prow-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 23, 2022
@psschwei
Copy link
Contributor Author

Just out of curiosity - since I haven't been following in detail is there still a scenario (but now less likely) where a single request will flip the activator in/out of the path?

Hmmm... I guess any combination of min-scale * cc = 210 would do it... if we wanted to be REALLY safe, we could guarantee that not to happen by using 211 (or some other prime number) instead of 210 as the default

@knative-prow-robot knative-prow-robot removed the lgtm Indicates that a PR is ready to be merged. label Mar 24, 2022
@psschwei
Copy link
Contributor Author

/retest

MaxScaleUpRate: 1000,
MaxScaleDownRate: 2,
TargetBurstCapacity: 200,
TargetUtilization: defaultTargetUtilization,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good in general, but not sure about the alignment change. I think go lint would complain about this reformatting

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was done by gofmt... think the alignment blocks are delineated by comments (which is why L48-51 got realigned). If I move the comment to the end of the line (instead of on a new line before) the indentation remains the same.

@@ -112,6 +112,7 @@ func defaultConfigMapData() map[string]string {
"scale-to-zero-grace-period": gracePeriod.String(),
"tick-interval": "2s",
"min-scale": "0",
"target-burst-capacity": "200",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be 211 then now, too ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I borrowed this from #12241 which didn't change the TBC value for the tests.

Before I saw that, I tried using TBC=211 and it caused a bunch of the tests to fail. I didn't dig too deep into it, so it could be that the tests are tuned to the 200 value (though that's just speculation, I really didn't look into for more than a few minutes).

If we'd like to get the tests passing with the updated TBC value, I can work on that (one thing I did find was increasing the minActivators by one (2 -> 3) fixed two-thirds of the errors, though not sure why exactly).

@psschwei
Copy link
Contributor Author

Just out of curiosity - since I haven't been following in detail is there still a scenario (but now less likely) where a single request will flip the activator in/out of the path?

Hmmm... I guess any combination of min-scale * cc = 210 would do it... if we wanted to be REALLY safe, we could guarantee that not to happen by using 211 (or some other prime number) instead of 210 as the default

Well, unless cc=1 and min-scale=211, but for all intents and purposes...

@dprotaso
Copy link
Member

Well, unless cc=1 and min-scale=211, but for all intents and purposes...

Lol - at least it's documented in this PR discussion
/hold cancel
/lgtm

@knative-prow-robot knative-prow-robot added lgtm Indicates that a PR is ready to be merged. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Mar 24, 2022
@nader-ziada
Copy link
Member

/lgtm

@dprotaso
Copy link
Member

/retest

@chizhg
Copy link
Member

chizhg commented Mar 24, 2022

/test istio-latest-no-mesh-tls_serving_main

@chizhg
Copy link
Member

chizhg commented Mar 24, 2022

/test istio-latest-no-mesh_serving_main

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/API API objects and controllers area/autoscale lgtm Indicates that a PR is ready to be merged. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants