Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cluster health check will failed in most case bacause the socket connection to the target cluster is not stable #2310

Closed
pikehuang opened this issue Oct 24, 2023 · 0 comments · Fixed by #2311
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@pikehuang
Copy link
Contributor

What happened:
platform control will call checkHealth to update the cluster status. In production use case, we have total 100 clusters however 70 of them have failed cluster status, however the failed clusters are running well when we ssh to check their status. The detailed info is shown in following:
failed-clusters
total-failed-clusters

What you expected to happen:
wo hope that the cluster status keeps the same with its real status, whose most life lives in running.

How to reproduce it (as minimally and precisely as possible):
make the cluster in heavy network pressure or move the cluster from cloud to idc environment.

Anything else we need to know?:
the health check is not correct in most case, if there is monitor system the issue is easy to find.

Environment:

  • TKE version: any
  • Global or business cluster: business cluster
  • Kubernetes version (use kubectl version): any
  • Install addons: no
  • Others:
@pikehuang pikehuang added the kind/bug Categorizes issue or PR as related to a bug. label Oct 24, 2023
pikehuang added a commit to pikehuang/tke that referenced this issue Oct 24, 2023
bug address: tkestack#2310

simple description:
cluster health check will failed in most case bacause the socket
connection to the target cluster is not stable tkestack#2310
pikehuang added a commit to pikehuang/tke that referenced this issue Oct 25, 2023
pikehuang added a commit to pikehuang/tke that referenced this issue Oct 25, 2023
pikehuang added a commit to pikehuang/tke that referenced this issue Oct 25, 2023
pikehuang added a commit to pikehuang/tke that referenced this issue Oct 25, 2023
pikehuang added a commit to pikehuang/tke that referenced this issue Oct 25, 2023
in health check go get rid of socket connection failure tkestack#2310
pikehuang added a commit to pikehuang/tke that referenced this issue Oct 25, 2023
in health check to go get rid of socket connection failure tkestack#2310
leoryu pushed a commit that referenced this issue Oct 25, 2023
in health check to go get rid of socket connection failure #2310
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
1 participant