fix: adds affinity and other scheduling to the operator #29977

shawkins · 2024-05-29T12:47:52Z

Draft of a scheduling spec with the options that are most likely to be requested - affinity, topologyspread, priorityclass, and tolerations.

The handling is similar to other constructs - if the user has populated the field in the unsupported spec, that takes presedence.

A new label, app.kubernetes.io/component, is added to distinguish the server pods.

The default node anti-affinity is added if no other affinity has been specified. This mechanism may not have great usability. It could be better to have a withDefaults field on the scheduling spec.

Similarly I'm proposing to go a step further and handle the default affinity or spread - depending upon whether an external cache is in use. Issues with doing this:

cache configuration is currently done though additional properties. However we do have a cache spec where we could have first-class properties.
the cache configuration could be baked into the image, which we currently don't know how to account for. At worst the user would need to supply their own spread constraint(s).

@ahus1 @vmuzikar @abelmatos WDYT?

keycloak-github-bot

Unreported flaky test detected, please review

keycloak-github-bot · 2024-05-29T13:37:40Z

Unreported flaky test detected

If the flaky tests below are affected by the changes, please review and update the changes accordingly. Otherwise, a maintainer should report the flaky tests prior to merging the PR.

org.keycloak.testsuite.broker.KcOidcBrokerTest#testPostBrokerLoginFlowWithOTP_bruteForceEnabled

Keycloak CI - Java Distribution IT (windows-latest - temurin - 19)

java.lang.AssertionError: expected:<Invalid authenticator code.> but was:<null>
	at org.junit.Assert.fail(Assert.java:89)
	at org.junit.Assert.failNotEquals(Assert.java:835)
	at org.junit.Assert.assertEquals(Assert.java:120)
	at org.junit.Assert.assertEquals(Assert.java:146)
...

Report flaky test

ahus1 · 2024-06-03T09:53:40Z

Thank you for this PR. What I'm reading here is that you're about to expose the Kubernetes constructs as is. When doing this, we'll offer a non-opinionated way to schedule Pods, which is IMHO ok as we don't seem to have an opinionated and agreed-upon solution.

Eventually, more coarse grained options might emerge as we learn about how this can be used in different environments. I'd say even then exposing this as a supported feature makes sense as we won't be able to foresee any possible scenario.

Even if the deployment uses an external Infinispan, there are still the embedded Infinispans for now, and therefore I'd advise the Operator not to take any caching configuration into account - at least for now. @pruivo is working an a PR which is planned for KC26 which would allow running Keycloak without its embedded distributed caches, but this is still some time in the future.

So for now, I'd say

have a default node anti-affinity,
ignore the caching options, and
expose the scheduling as is.

cc: @ryanemerson

shawkins · 2024-06-03T13:08:34Z

@ahus1 I'm fine with removing the defaults around spread constraints and moving that to a topic for the scaling guide.

ahus1 · 2024-06-03T16:44:01Z

I'm fine with removing the defaults around spread constraints and moving that to a topic for the scaling guide.

No, please add a default best-effort spread. I'm just against making the default settings dependent on the cache configuration.

shawkins · 2024-06-03T22:47:18Z

No, please add a default best-effort spread. I'm just against making the default settings dependent on the cache configuration.

The default spread is just an affinity for the same zone. The pr has been updated with that.

Should just need some docs.

@vmuzikar WDYT?

keycloak-github-bot

Unreported flaky test detected, please review

keycloak-github-bot · 2024-06-03T23:17:54Z

Unreported flaky test detected

If the flaky tests below are affected by the changes, please review and update the changes accordingly. Otherwise, a maintainer should report the flaky tests prior to merging the PR.

org.keycloak.testsuite.oauth.TokenIntrospectionTest#testUnsupportedToken

Keycloak CI - Base IT (6)

java.lang.NullPointerException: Cannot invoke "com.fasterxml.jackson.databind.JsonNode.asBoolean()" because the return value of "com.fasterxml.jackson.databind.JsonNode.get(String)" is null
	at org.keycloak.testsuite.oauth.TokenIntrospectionTest.testUnsupportedToken(TokenIntrospectionTest.java:313)
	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
	at java.base/java.lang.reflect.Method.invoke(Method.java:580)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
...

Report flaky test

vmuzikar

@shawkins Thank you for the changes!

...tor/src/main/java/org/keycloak/operator/controllers/KeycloakDeploymentDependentResource.java

shawkins · 2024-06-25T10:55:22Z

@vmuzikar added the docs, should be fully ready for review now.

mabartos

@shawkins Great work! I've just put a little brain dump below. And there's also some changes conflict.

nit: I also like when we have test-covered the CR serialization as they're running a few ms and may bring additional test coverage - but in this case the PodTemplate test should be sufficient enough.

docs/guides/operator/advanced-configuration.adoc

mabartos · 2024-06-26T12:43:35Z

docs/guides/operator/advanced-configuration.adoc

@@ -204,6 +204,25 @@ It is achieved by providing certain JVM options.

 For more details, see <@links.server id="containers" />.

+=== Scheduling
+
+You may control several aspects of the server Pod scheduling via the Keycloak CR. The scheduling stanza exposes the standard Kubernetes affinity, tolerations, topology spread constraints, and the priority class name to fine tune the scheduling and placement of your server Pods. By default if you do not specify a custom affinity your Pods will have an affinity for the same zone to prevent stretch clusters and an anti-affinity for the same node to improve availability.


Wouldn't be good for users to also briefly explain why we want to prevent the stretched clusters? When we support the affinity configuration, we should mention the cons of the different AZ and the embedded Infinispan, correct? We will avoid some future confusion when users use different affinity settings.

@shawkins WDYT?

Added a little more on this. I'd defer to @ryanemerson if we want a more detailed explanation.

I think what we have for now is fine. I think we need a dedicated guide explaining different architectures, stretch vs non-stretch etc, as it's a complex topic.

docs/documentation/upgrading/topics/changes/changes-26_0_0.adoc

docs/guides/operator/advanced-configuration.adoc

...tor/src/main/java/org/keycloak/operator/controllers/KeycloakDeploymentDependentResource.java

shawkins · 2024-06-27T16:31:13Z

@ryanemerson @ahus1 @vmuzikar in the docs should we call out specifically recommending an affinity for the same zone as the database?

ahus1 · 2024-06-28T07:46:15Z

@shawkins

should we call out specifically recommending an affinity for the same zone as the database?

The good thing about pod affinity is that it is now (1) supported by the Operator and no longer part of the unsupported pod template and (2) we have a reasonable default.

Deploying Keycloak on the same region as the database seems to be a good idea, still I didn't tested this how much it actually impacts Keycloak. And if the database is running outside of Kubernetes, you would start coding the zone manually in the affinity rules, and not use the "try to run all pods in the same zone but I don't care which".

So I'd say it is probably good enough for now.

Once we work with this a bit more, I could see that we provide some predefined scheduling templates to hide the complexity and to promote some well-tested and well-understood scheduling methods. This would then be a follow-up PR.

ryanemerson · 2024-06-28T08:19:50Z

+1 to Alexander's comment. I think that more advanced/opinionated setting examples should exist as part of the Keycloak HA guide or similar, where we explain in detail the trade-offs of different architectures and how they apply to specific use-cases.

mabartos

@shawkins Just last minor thing about the example formatting, otherwise LGTM

docs/guides/operator/advanced-configuration.adoc

shawkins · 2024-07-02T16:58:55Z

@shawkins Just last minor thing about the example formatting, otherwise LGTM

Should be good now.

vmuzikar

Added one minor comment. LGTM otherwise.

docs/guides/operator/advanced-configuration.adoc

closes: keycloak#29258 Signed-off-by: Steve Hawkins <[email protected]>

vmuzikar

@shawkins LGTM, thank you.

vmuzikar · 2024-07-03T18:06:29Z

@mabartos I believe your comments has been resolved. I'm overriding your review request so we can merge. Once you're back, let me know if you have any further concerns, we can address them as a follow-up.

Review comments has been resolved

keycloak-github-bot bot added the flaky-test label May 29, 2024

keycloak-github-bot bot reviewed May 29, 2024

View reviewed changes

shawkins force-pushed the iss29258 branch from 5406c71 to ad74435 Compare June 3, 2024 22:46

keycloak-github-bot bot reviewed Jun 3, 2024

View reviewed changes

shawkins force-pushed the iss29258 branch from ad74435 to b6d651b Compare June 4, 2024 11:00

keycloak-github-bot bot mentioned this pull request Jun 4, 2024

Flaky test: org.keycloak.testsuite.oauth.TokenIntrospectionTest#testUnsupportedToken #30111

Closed

shawkins force-pushed the iss29258 branch from b6d651b to 1875f7c Compare June 6, 2024 13:18

vmuzikar reviewed Jun 12, 2024

View reviewed changes

...tor/src/main/java/org/keycloak/operator/controllers/KeycloakDeploymentDependentResource.java Show resolved Hide resolved

...tor/src/main/java/org/keycloak/operator/controllers/KeycloakDeploymentDependentResource.java Show resolved Hide resolved

ryanemerson reviewed Jun 12, 2024

View reviewed changes

...tor/src/main/java/org/keycloak/operator/controllers/KeycloakDeploymentDependentResource.java Outdated Show resolved Hide resolved

shawkins force-pushed the iss29258 branch from 1875f7c to eff1909 Compare June 25, 2024 10:52

shawkins marked this pull request as ready for review June 25, 2024 10:52

shawkins requested review from a team as code owners June 25, 2024 10:52

keycloak-github-bot bot added team/cloud-native labels Jun 25, 2024

This was referenced Jun 25, 2024

Flaky test: org.keycloak.testsuite.organization.exportimport.OrganizationExportTest#testExport #30598

Closed

Flaky test: org.keycloak.testsuite.broker.KcOidcBrokerTest#testPostBrokerLoginFlowWithOTP_bruteForceEnabled #30188

Closed

ryanemerson approved these changes Jun 26, 2024

View reviewed changes

mabartos previously requested changes Jun 26, 2024

View reviewed changes

shawkins force-pushed the iss29258 branch from eff1909 to 14ed747 Compare June 27, 2024 16:31

keycloak-github-bot bot mentioned this pull request Jun 27, 2024

Flaky test: org.keycloak.testsuite.oauth.TokenIntrospectionTest#testUnsupportedToken #30110

Open

mabartos reviewed Jun 28, 2024

View reviewed changes

docs/guides/operator/advanced-configuration.adoc Outdated Show resolved Hide resolved

shawkins requested a review from mabartos July 1, 2024 18:26

shawkins force-pushed the iss29258 branch from 14ed747 to 24eec17 Compare July 1, 2024 18:26

vmuzikar reviewed Jul 3, 2024

View reviewed changes

docs/guides/operator/advanced-configuration.adoc Outdated Show resolved Hide resolved

fix: adds affinity and other scheduling to the operator

31062d1

closes: keycloak#29258 Signed-off-by: Steve Hawkins <[email protected]>

shawkins force-pushed the iss29258 branch from 24eec17 to 31062d1 Compare July 3, 2024 14:36

vmuzikar approved these changes Jul 3, 2024

View reviewed changes

vmuzikar merged commit a7ae90c into keycloak:main Jul 3, 2024
64 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: adds affinity and other scheduling to the operator #29977

fix: adds affinity and other scheduling to the operator #29977

shawkins commented May 29, 2024

keycloak-github-bot bot left a comment

keycloak-github-bot bot commented May 29, 2024

ahus1 commented Jun 3, 2024

shawkins commented Jun 3, 2024

ahus1 commented Jun 3, 2024

shawkins commented Jun 3, 2024

keycloak-github-bot bot left a comment

keycloak-github-bot bot commented Jun 3, 2024

vmuzikar left a comment

shawkins commented Jun 25, 2024

mabartos left a comment •

edited

Loading

mabartos Jun 26, 2024

shawkins Jun 27, 2024

ryanemerson Jun 28, 2024

shawkins commented Jun 27, 2024

ahus1 commented Jun 28, 2024

ryanemerson commented Jun 28, 2024

mabartos left a comment

shawkins commented Jul 2, 2024

vmuzikar left a comment

vmuzikar left a comment

vmuzikar commented Jul 3, 2024 •

edited

Loading

fix: adds affinity and other scheduling to the operator #29977

fix: adds affinity and other scheduling to the operator #29977

Conversation

shawkins commented May 29, 2024

keycloak-github-bot bot left a comment

Choose a reason for hiding this comment

keycloak-github-bot bot commented May 29, 2024

Unreported flaky test detected

org.keycloak.testsuite.broker.KcOidcBrokerTest#testPostBrokerLoginFlowWithOTP_bruteForceEnabled

ahus1 commented Jun 3, 2024

shawkins commented Jun 3, 2024

ahus1 commented Jun 3, 2024

shawkins commented Jun 3, 2024

keycloak-github-bot bot left a comment

Choose a reason for hiding this comment

keycloak-github-bot bot commented Jun 3, 2024

Unreported flaky test detected

org.keycloak.testsuite.oauth.TokenIntrospectionTest#testUnsupportedToken

vmuzikar left a comment

Choose a reason for hiding this comment

shawkins commented Jun 25, 2024

mabartos left a comment • edited Loading

Choose a reason for hiding this comment

mabartos Jun 26, 2024

Choose a reason for hiding this comment

shawkins Jun 27, 2024

Choose a reason for hiding this comment

ryanemerson Jun 28, 2024

Choose a reason for hiding this comment

shawkins commented Jun 27, 2024

ahus1 commented Jun 28, 2024

ryanemerson commented Jun 28, 2024

mabartos left a comment

Choose a reason for hiding this comment

shawkins commented Jul 2, 2024

vmuzikar left a comment

Choose a reason for hiding this comment

vmuzikar left a comment

Choose a reason for hiding this comment

vmuzikar commented Jul 3, 2024 • edited Loading

mabartos left a comment •

edited

Loading

vmuzikar commented Jul 3, 2024 •

edited

Loading