cluster health check will failed in most case bacause the socket connection to the target cluster is not stable #2310

pikehuang · 2023-10-24T12:57:37Z

What happened:
platform control will call checkHealth to update the cluster status. In production use case, we have total 100 clusters however 70 of them have failed cluster status, however the failed clusters are running well when we ssh to check their status. The detailed info is shown in following:

What you expected to happen:
wo hope that the cluster status keeps the same with its real status, whose most life lives in running.

How to reproduce it (as minimally and precisely as possible):
make the cluster in heavy network pressure or move the cluster from cloud to idc environment.

Anything else we need to know?:
the health check is not correct in most case, if there is monitor system the issue is easy to find.

Environment:

TKE version: any
Global or business cluster: business cluster
Kubernetes version (use kubectl version): any
Install addons: no
Others:

The text was updated successfully, but these errors were encountered:

bug address: tkestack#2310 simple description: cluster health check will failed in most case bacause the socket connection to the target cluster is not stable tkestack#2310

…check go get rid of socket connection failure tkestack#2310

check go get rid of socket connection failure tkestack#2310

…check go get rid of socket connection failure tkestack#2310

in health check go get rid of socket connection failure tkestack#2310

in health check to go get rid of socket connection failure tkestack#2310

in health check to go get rid of socket connection failure #2310

pikehuang added the kind/bug Categorizes issue or PR as related to a bug. label Oct 24, 2023

pikehuang added a commit to pikehuang/tke that referenced this issue Oct 24, 2023

fix bug 2310

e509795

bug address: tkestack#2310 simple description: cluster health check will failed in most case bacause the socket connection to the target cluster is not stable tkestack#2310

pikehuang mentioned this issue Oct 24, 2023

fix(health check): check cluster status for multiple times in health check go get rid of socket connection failure #2310 #2311

Merged

pikehuang added a commit to pikehuang/tke that referenced this issue Oct 25, 2023

fix(health check): check cluster status for multiple times in health …

7daf005

…check go get rid of socket connection failure tkestack#2310

pikehuang added a commit to pikehuang/tke that referenced this issue Oct 25, 2023

fix(health check): check cluster status for multiple times in health

55e4937

check go get rid of socket connection failure tkestack#2310

pikehuang added a commit to pikehuang/tke that referenced this issue Oct 25, 2023

fix(health check): check cluster status for multiple times in health

d9f04f5

check go get rid of socket connection failure tkestack#2310

pikehuang added a commit to pikehuang/tke that referenced this issue Oct 25, 2023

fix(health check): check cluster status for multiple times in health …

5d90a2c

…check go get rid of socket connection failure tkestack#2310

pikehuang added a commit to pikehuang/tke that referenced this issue Oct 25, 2023

fix(health check): check cluster status for multiple times

622f85f

in health check go get rid of socket connection failure tkestack#2310

pikehuang added a commit to pikehuang/tke that referenced this issue Oct 25, 2023

fix(health check): check cluster status for multiple times

9731331

in health check to go get rid of socket connection failure tkestack#2310

leoryu closed this as completed in #2311 Oct 25, 2023

leoryu pushed a commit that referenced this issue Oct 25, 2023

fix(health check): check cluster status for multiple times (#2311)

b7900fc

in health check to go get rid of socket connection failure #2310

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cluster health check will failed in most case bacause the socket connection to the target cluster is not stable #2310

cluster health check will failed in most case bacause the socket connection to the target cluster is not stable #2310

pikehuang commented Oct 24, 2023

cluster health check will failed in most case bacause the socket connection to the target cluster is not stable #2310

cluster health check will failed in most case bacause the socket connection to the target cluster is not stable #2310

Comments

pikehuang commented Oct 24, 2023