Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand UnschedulablePlugins/PendingPlugins to include PreBind plugins #125330

Closed
sanposhiho opened this issue Jun 5, 2024 · 6 comments · Fixed by #125360
Closed

Expand UnschedulablePlugins/PendingPlugins to include PreBind plugins #125330

sanposhiho opened this issue Jun 5, 2024 · 6 comments · Fixed by #125360
Labels
kind/feature Categorizes issue or PR as related to a new feature. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling.

Comments

@sanposhiho
Copy link
Member

sanposhiho commented Jun 5, 2024

Currently, the scheduler honors Unschedulable(AndUnresolvable) or Pending only from PreFilter, Filter, Reserve, and Permit (WaitOnPermit).
https://github.com/kubernetes/kubernetes/blob/master/pkg/scheduler/framework/types.go#L210-L214

It doesn't expect other extension points to return those statuses and just regard all non-success status as errors, that is, immediately retrying scheduling of Pods.

Actually DRA returns Pending at PreBind, which is currently being ignored as described.
So, we should expand UnschedulablePlugins/PendingPlugins to include PreBind plugins. (or maybe all extension points too?)

/cc @pohly
/sig scheduling
/kind feature

@k8s-ci-robot k8s-ci-robot added sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. kind/feature Categorizes issue or PR as related to a new feature. labels Jun 5, 2024
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Jun 5, 2024
@pohly
Copy link
Contributor

pohly commented Jun 5, 2024

So, we should expand UnschedulablePlugins/PendingPlugins to include PreBind plugins. (or maybe all extension points too?)

Both seems reasonable to me.

@pohly
Copy link
Contributor

pohly commented Jun 5, 2024

Because Pending is not handled for PreBind, the following error is logged:

E0604 15:45:50.980929  306340 schedule_one.go:1048] "Error scheduling pod; retrying" err="waiting for resource driver" pod="test/test-draqld28"

@pohly
Copy link
Contributor

pohly commented Jun 5, 2024

This patch gets rid of the error. I am uncertain whether it has other implications? Performance was the same with and without it.

diff --git a/pkg/scheduler/schedule_one.go b/pkg/scheduler/schedule_one.go
index 6b6992095d8..8c7868fedec 100644
--- a/pkg/scheduler/schedule_one.go
+++ b/pkg/scheduler/schedule_one.go
@@ -292,6 +292,17 @@ func (sched *Scheduler) bindingCycle(
 
        // Run "prebind" plugins.
        if status := fwk.RunPreBindPlugins(ctx, state, assumedPod, scheduleResult.SuggestedHost); !status.IsSuccess() {
+               if status.IsRejected() {
+                       fitErr := &framework.FitError{
+                               NumAllNodes: 1,
+                               Pod:         assumedPodInfo.Pod,
+                               Diagnosis: framework.Diagnosis{
+                                       NodeToStatusMap:      framework.NodeToStatusMap{scheduleResult.SuggestedHost: status},
+                                       UnschedulablePlugins: sets.New(status.Plugin()),
+                               },
+                       }
+                       return framework.NewStatus(status.Code()).WithError(fitErr)
+               }
                return status
        }
 

@sanposhiho
Copy link
Member Author

sanposhiho commented Jun 5, 2024

What I imagined is like that. It shouldn't have any other impact because volumebinding, which is the only PreBind plugin apart from DRA in in-tree, only returns success or error status.
https://github.com/kubernetes/kubernetes/blob/master/pkg/scheduler/framework/plugins/volumebinding/volume_binding.go#L355-L378

We have to mention the change in the release note for custom plugin developers though.

@pohly
Copy link
Contributor

pohly commented Jun 6, 2024

I've turned this into a PR: #125360

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants