The cluster kube-scheduler scheduling is unbalanced, causing the pod to hang and fail to run, even though there are currently idle nodes #125503
Labels
kind/bug
Categorizes issue or PR as related to a bug.
kind/support
Categorizes issue or PR as a support question.
needs-sig
Indicates an issue or PR lacks a `sig/foo` label and requires one.
needs-triage
Indicates an issue or PR lacks a `triage/foo` label and requires one.
What happened?
I have 10 nodes in my cluster, and I keep the default cluster scheduling mechanism without configuring the manual scheduling mechanism for tasks (including node taints, label scheduling affinity, etc., and the configuration of pod limit request), but I found that one of my pods will use up 50% of the node's memory after running for a period of time, and the total memory occupied by other nodes of this node is more than 90%. But when I restarted this pod, I found the first problem, it was still scheduled to this node, although there were other nodes with only 30% memory usage, and the second problem was that I found that the pod seemed to like this node very much, and other newly started pods would also be assigned to this 90% load machine, resulting in these pod tasks can only be suspended, can not run, resulting in business stagnation. Why is this?
Off topic: I understand that you can use artificial scheduling mechanism to solve this problem, but the default scheduling I found from the official website's documentation explanation is to score nodes, and there is no additional mechanism to affect other I am very curious about this problem, and I found that many people in the k8s community have encountered this problem.
What did you expect to happen?
I want to know two questions about the above:
How can we reproduce it (as minimally and precisely as possible)?
This is a long-term practical problem
Anything else we need to know?
Kubernetes version
Cloud provider
OS version
Install tools
Container runtime (CRI) and version (if applicable)
Related plugins (CNI, CSI, ...) and versions (if applicable)
The text was updated successfully, but these errors were encountered: