Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

k0s upgrade of single node cluster borked it #5287

Open
till opened this issue Nov 23, 2024 · 6 comments
Open

k0s upgrade of single node cluster borked it #5287

till opened this issue Nov 23, 2024 · 6 comments
Labels
question Further information is requested

Comments

@till
Copy link
Contributor

till commented Nov 23, 2024

We tried to upgrade a single node cluster yesterday and it ended up in a somewhat borked state. The initial version of k0s was 1.27.5+k0s.0 and we upgraded one by one to e.g. 1.28.x (always latest patch release) all the way to 1.31.2.

This is the config (extract from k0sctl's config):

apiVersion: k0s.k0sproject.io/v1beta1
kind: ClusterConfig
metadata:
  creationTimestamp: null
  name: runway-support
spec:
  controllerManager:
    extraArgs:
      flex-volume-plugin-dir: /opt/libexec/k0s/kubelet-plugins/volume/exec
  workerProfiles:
  - name: flatcar
    values:
      volumePluginDir: /opt/libexec/k0s/kubelet-plugins/volume/exec
  api:
    address: PRIVATE.IP.OF.THE.NODE
    externalAddress: PUBLIC.IP.OF.THE.NODE
  extensions:
    helm:
      repositories:
      - name: cpo
        url: https://kubernetes.github.io/cloud-provider-openstack
      charts:
      - name: cinder-csi
        order: 2
        chartname: cpo/openstack-cinder-csi
        version: 2.28.1
        namespace: kube-system
        values: |
          clusterID: support
          secret:
            enabled: true
            hostMount: false
            create: false
            name: cloud-config
          storageClass:
            enabled: true
          csi:
            plugin:
              nodePlugin:
                kubeletDir: /var/lib/k0s/kubelet
      - name: openstack-cloud-controller-manager
        order: 1
        chartname: cpo/openstack-cloud-controller-manager
        version: 2.28.2
        namespace: kube-system
        values: |
          cluster:
            name: support
          secret:
            enabled: true
            name: cloud-config
            create: false
          tolerations:
          - key: node.cloudprovider.kubernetes.io/uninitialized
            value: "true"
            effect: NoSchedule
          extraVolumes:
          - name: flexvolume-dir
            hostPath:
              path: /opt/libexec/k0s/kubelet-plugins/volume/exec
          - name: k8s-certs
            hostPath:
              path: /var/lib/k0s/pki
          # Where the additional volumes should be mounted into the pods:
          extraVolumeMounts:
            - name: flexvolume-dir
              mountPath: /opt/libexec/k0s/kubelet-plugins/volume/exec
              readOnly: true
            - name: k8s-certs
              mountPath: /var/lib/k0s/pki
              readOnly: true
    storage:
      create_default_storage_class: false
      type: external_storage
  storage:
    type: kine
  telemetry:
    enabled: false

The upgrades were done using k0sctl and all the upgrades seemed to have worked until we got to 1.31.2. The upgrade hung and eventually errored that two pods in kube-system were not ready.

So, one of the pods at the time was CoreDNS — but before someone jumps on "it's a DNS problem", it did not seem to be.

We had other pods in namespaces failing, one common thing was that they are all using the Kubernetes API (via the kubernetes (service) that creates the virtual https://10.96.0.0.1). And all other failures were actually DNS because CoreDNS was not running.

The fun thing is, this IP is working from the node itself, but would not work from within a pod. Other service IPs did work (anything that was not this service basically), but of course DNS was also broken because CoreDNS was still in a weird crash loop because it couldn't access the the Kubernetes API using the 10.96.0.0.1 host.

I think I read almost anything one can find on Google, the fact that the service is called kubernetes makes it of course extra hard to Google.

Basically the k8s api would work (from pod and host):

From the host: https://10.96.0.1:443

From a pod: https://10.96.0.1:443 👎

I also verified that other service IPs worked — anything but the kubernetes service was working on an IP level.

I also looked through Contrack, and see a ton of connections in waiting state. So these then/maybe/probably reflect the timeouts about API calls in the pod logs.


A couple things that we tried (that did not help):

  • tried to tcpdump (e.g. tcpdump -i eth0 dst 10.96.0.0.1) on the host — nothing, it seems like traffic to the API does not leave the pod, I've also tried to fetch anything from specific pods, and I can generally see traffic, but nothing for the API service)
  • tried to create a route for the service (some blog posts etc. suggested that, but to no avail)
  • cleared iptables and prayed that a restart of kube-router/-proxy would recreate everything (it does recreate rules, but not "enough" to make the kubernetes service work)
  • manually disabled kube-proxy and let kube-router also advertise services
  • upgraded flatcar to the latest stable, etc.

The only thing that sort of helped — not really, but kinda — was patching all deployments and overriding the environment variables that do service discovery. Really ugly hack, but that made CoreDNS run at least. ;)

Anyway, I was hoping anyone else had any insights into this. I still have the node/cluster around to test and prod.

@till
Copy link
Contributor Author

till commented Nov 23, 2024

Also, I feel I need to mention this: it does not seem to be a general problem — we upgraded another cluster this week (similar versions) and it worked nicely.

The difference in configuration is that the other cluster is a multi-node setup that uses etcd for storage/state and Calico for networking. Otherwise, they are identical as in they run on OpenStack, Flatcar Linux, etc..

@twz123 twz123 added the question Further information is requested label Nov 25, 2024
@twz123
Copy link
Member

twz123 commented Nov 25, 2024

Here's what I understand is the gist of the issue:

  • Cluster upgrade from 1.27 to 1.31, minor release by minor release
  • Something™ is wrong with the CoreDNS pod after that
  • Other pods are failing, too, but CoreDNS seems to be the root cause

Is that correct?

To rule out the obvious things: I assume k0s sysinfo is fine on the node? Does the kubelet report the node to be ready?

To figure out what's up with the CoreDNS pods, could you maybe try to provide the output of the following:

  • kubectl get node -owide
  • kubectl --namespace kube-system get po -owide
  • kubectl --namespace kube-system describe deployments coredns
  • kubectl --namespace kube-system get events
  • The kubectl describe output of the CoreDNS pods
  • The log output of the failing CoreDNS pods.

Also, can you check if you're referencing custom images in your k0s configuration? Please have a look at /etc/k0s/k0s.yaml directly on the node.

@till
Copy link
Contributor Author

till commented Nov 25, 2024

Not exactly — I'll try to explain: the reason why CoreDNS does not work, is that it cannot access the k8s API.

But it's not just CoreDNS, basically any pod that needs the k8s API (e.g. CoreDNS, grafana-agent, haproxy-ingress, metrics-server etc.) has the same problem and crashes on start or eventually (depending on how they handle it).

The error is always similar to this:

Error while initializing connection to Kubernetes apiserver.
This most likely means that the cluster is misconfigured
(e.g., it has invalid apiserver certificates or service
accounts configuration).

Reason: Get "https://10.96.0.1:443/version": dial tcp 10.96.0.1:443: i/o timeout

The error message is different from service to service, but ultimately they always fail with an I/o timeout.

k0s sysinfo

(ipv6 is turned off)

Total memory: 7.8 GiB (pass)
File system of /var/lib/k0s: ext4 (pass)
Disk space available for /var/lib/k0s: 130.5 GiB (pass)
Relative disk space available for /var/lib/k0s: 92% (pass)
Name resolution: localhost: [::1 127.0.0.1] (pass)
Operating system: Linux (pass)
  Linux kernel release: 6.6.60-flatcar (pass)
  Max. file descriptors per process: current: 524288 / max: 524288 (pass)
  AppArmor: unavailable (pass)
  Executable in PATH: modprobe: /usr/sbin/modprobe (pass)
  Executable in PATH: mount: /usr/bin/mount (pass)
  Executable in PATH: umount: /usr/bin/umount (pass)
  /proc file system: mounted (0x9fa0) (pass)
  Control Groups: version 1 (pass)
    cgroup controller "cpu": available (pass)
    cgroup controller "cpuacct": available (pass)
    cgroup controller "cpuset": available (pass)
    cgroup controller "memory": available (pass)
    cgroup controller "devices": available (pass)
    cgroup controller "freezer": available (pass)
    cgroup controller "pids": available (pass)
    cgroup controller "hugetlb": available (pass)
    cgroup controller "blkio": available (pass)
  CONFIG_CGROUPS: Control Group support: built-in (pass)
    CONFIG_CGROUP_FREEZER: Freezer cgroup subsystem: built-in (pass)
    CONFIG_CGROUP_PIDS: PIDs cgroup subsystem: built-in (pass)
    CONFIG_CGROUP_DEVICE: Device controller for cgroups: built-in (pass)
    CONFIG_CPUSETS: Cpuset support: built-in (pass)
    CONFIG_CGROUP_CPUACCT: Simple CPU accounting cgroup subsystem: built-in (pass)
    CONFIG_MEMCG: Memory Resource Controller for Control Groups: built-in (pass)
    CONFIG_CGROUP_HUGETLB: HugeTLB Resource Controller for Control Groups: built-in (pass)
    CONFIG_CGROUP_SCHED: Group CPU scheduler: built-in (pass)
      CONFIG_FAIR_GROUP_SCHED: Group scheduling for SCHED_OTHER: built-in (pass)
        CONFIG_CFS_BANDWIDTH: CPU bandwidth provisioning for FAIR_GROUP_SCHED: built-in (pass)
    CONFIG_BLK_CGROUP: Block IO controller: built-in (pass)
  CONFIG_NAMESPACES: Namespaces support: built-in (pass)
    CONFIG_UTS_NS: UTS namespace: built-in (pass)
    CONFIG_IPC_NS: IPC namespace: built-in (pass)
    CONFIG_PID_NS: PID namespace: built-in (pass)
    CONFIG_NET_NS: Network namespace: built-in (pass)
  CONFIG_NET: Networking support: built-in (pass)
    CONFIG_INET: TCP/IP networking: built-in (pass)
      CONFIG_IPV6: The IPv6 protocol: built-in (pass)
    CONFIG_NETFILTER: Network packet filtering framework (Netfilter): built-in (pass)
      CONFIG_NETFILTER_ADVANCED: Advanced netfilter configuration: built-in (pass)
      CONFIG_NF_CONNTRACK: Netfilter connection tracking support: module (pass)
      CONFIG_NETFILTER_XTABLES: Netfilter Xtables support: built-in (pass)
        CONFIG_NETFILTER_XT_TARGET_REDIRECT: REDIRECT target support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_COMMENT: "comment" match support: module (pass)
        CONFIG_NETFILTER_XT_MARK: nfmark target and match support: module (pass)
        CONFIG_NETFILTER_XT_SET: set target and match support: module (pass)
        CONFIG_NETFILTER_XT_TARGET_MASQUERADE: MASQUERADE target support: module (pass)
        CONFIG_NETFILTER_XT_NAT: "SNAT and DNAT" targets support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_ADDRTYPE: "addrtype" address type match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_CONNTRACK: "conntrack" connection tracking match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_MULTIPORT: "multiport" Multiple port match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_RECENT: "recent" match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_STATISTIC: "statistic" match support: module (pass)
      CONFIG_NETFILTER_NETLINK: module (pass)
      CONFIG_NF_NAT: module (pass)
      CONFIG_IP_SET: IP set support: module (pass)
        CONFIG_IP_SET_HASH_IP: hash:ip set support: module (pass)
        CONFIG_IP_SET_HASH_NET: hash:net set support: module (pass)
      CONFIG_IP_VS: IP virtual server support: module (pass)
        CONFIG_IP_VS_NFCT: Netfilter connection tracking: built-in (pass)
        CONFIG_IP_VS_SH: Source hashing scheduling: module (pass)
        CONFIG_IP_VS_RR: Round-robin scheduling: module (pass)
        CONFIG_IP_VS_WRR: Weighted round-robin scheduling: module (pass)
      CONFIG_NF_CONNTRACK_IPV4: IPv4 connetion tracking support (required for NAT): unknown (warning)
      CONFIG_NF_REJECT_IPV4: IPv4 packet rejection: module (pass)
      CONFIG_NF_NAT_IPV4: IPv4 NAT: unknown (warning)
      CONFIG_IP_NF_IPTABLES: IP tables support: built-in (pass)
        CONFIG_IP_NF_FILTER: Packet filtering: module (pass)
          CONFIG_IP_NF_TARGET_REJECT: REJECT target support: module (pass)
        CONFIG_IP_NF_NAT: iptables NAT support: module (pass)
        CONFIG_IP_NF_MANGLE: Packet mangling: module (pass)
      CONFIG_NF_DEFRAG_IPV4: module (pass)
      CONFIG_NF_CONNTRACK_IPV6: IPv6 connetion tracking support (required for NAT): unknown (warning)
      CONFIG_NF_NAT_IPV6: IPv6 NAT: unknown (warning)
      CONFIG_IP6_NF_IPTABLES: IP6 tables support: module (pass)
        CONFIG_IP6_NF_FILTER: Packet filtering: module (pass)
        CONFIG_IP6_NF_MANGLE: Packet mangling: module (pass)
        CONFIG_IP6_NF_NAT: ip6tables NAT support: module (pass)
      CONFIG_NF_DEFRAG_IPV6: module (pass)
    CONFIG_BRIDGE: 802.1d Ethernet Bridging: module (pass)
      CONFIG_LLC: module (pass)
      CONFIG_STP: module (pass)
  CONFIG_EXT4_FS: The Extended 4 (ext4) filesystem: module (pass)
  CONFIG_PROC_FS: /proc file system support: built-in (pass)

kubectl get node

NAME               STATUS   ROLES           AGE    VERSION       INTERNAL-IP      EXTERNAL-IP   OS-IMAGE                                             KERNEL-VERSION   CONTAINER-RUNTIME
node-001.support   Ready    control-plane   391d   v1.31.2+k0s   10.108.110.145   IP.HERE   Flatcar Container Linux by Kinvolk 4081.2.0 (Oklo)   6.6.60-flatcar   containerd:https://1.7.22

others

... for the others, we tried to hack the service discovery by supplying KUBERNETES_ environment variables. Which kinda of works in some cases.

See this as an example:

Name:                   coredns
Namespace:              kube-system
CreationTimestamp:      Tue, 31 Oct 2023 14:38:14 +0000
Labels:                 k0s.k0sproject.io/stack=coredns
                        k8s-app=kube-dns
                        k8slens-edit-resource-version=v1
                        kubernetes.io/name=CoreDNS
Annotations:            deployment.kubernetes.io/revision: 14
                        k0s.k0sproject.io/last-applied-configuration:
                          {"apiVersion":"apps/v1","kind":"Deployment","metadata":{"labels":{"k8s-app":"kube-dns","kubernetes.io/name":"CoreDNS"},"name":"coredns","n...
                        k0s.k0sproject.io/stack-checksum: 09279c44a5e9f6a2e04b3aa27a72a402
Selector:               k8s-app=kube-dns
Replicas:               1 desired | 1 updated | 1 total | 1 available | 0 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  25% max unavailable, 25% max surge
Pod Template:
  Labels:           k8s-app=kube-dns
  Annotations:      kubectl.kubernetes.io/restartedAt: 2024-11-22T19:08:16Z
                    prometheus.io/port: 9153
                    prometheus.io/scrape: true
  Service Account:  coredns
  Containers:
   coredns:
    Image:       quay.io/k0sproject/coredns:1.11.4
    Ports:       53/UDP, 53/TCP, 9153/TCP
    Host Ports:  0/UDP, 0/TCP, 0/TCP
    Args:
      -conf
      /etc/coredns/Corefile
    Limits:
      memory:  170Mi
    Requests:
      cpu:      100m
      memory:   70Mi
    Liveness:   http-get https://:8080/health delay=60s timeout=1s period=10s #success=1 #failure=3
    Readiness:  http-get https://:8181/ready delay=30s timeout=1s period=2s #success=1 #failure=3
    Environment:
      KUBERNETES_SERVICE_HOST:  10.108.110.145
      KUBERNETES_SERVICE_PORT:  6443
      KUBERNETES_PORT:          tcp:https://10.108.110.145:6443
    Mounts:
      /etc/coredns from config-volume (ro)
  Volumes:
   config-volume:
    Type:          ConfigMap (a volume populated by a ConfigMap)
    Name:          coredns
    Optional:      false
  Node-Selectors:  kubernetes.io/os=linux
  Tolerations:     CriticalAddonsOnly op=Exists
                   node-role.kubernetes.io/master:NoSchedule op=Exists
Conditions:
  Type           Status  Reason
  ----           ------  ------
  Available      True    MinimumReplicasAvailable
  Progressing    True    NewReplicaSetAvailable
OldReplicaSets:  coredns-844d4cb9fc (0/0 replicas created), coredns-89dcbdccb (0/0 replicas created), coredns-87b4975cd (0/0 replicas created), coredns-747cf669bf (0/0 replicas created), coredns-85ff44746b (0/0 replicas created), coredns-6d84fd48c9 (0/0 replicas created), coredns-599c9c8bb8 (0/0 replicas created), coredns-6f4cdf5bc4 (0/0 replicas created), coredns-5ff89486 (0/0 replicas created), coredns-84886868fb (0/0 replicas created)
NewReplicaSet:   coredns-857c6f89fb (1/1 replicas created)
Events:          <none>

But I can revert it.

events (kube-system)

We did not try to patch the metrics server, so it keeps restarting currently:

LAST SEEN   TYPE      REASON      OBJECT                               MESSAGE
8m34s       Warning   Unhealthy   pod/metrics-server-7f9ccb4c8-d4tqd   Liveness probe failed: Get "https://10.244.0.187:10250/livez": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
38m         Warning   Unhealthy   pod/metrics-server-7f9ccb4c8-d4tqd   Readiness probe failed: Get "https://10.244.0.187:10250/readyz": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
3m25s       Warning   BackOff     pod/metrics-server-7f9ccb4c8-d4tqd   Back-off restarting failed container metrics-server in pod metrics-server-7f9ccb4c8-d4tqd_kube-system(c98ca056-47b7-40a2-8609-63f44b86bede)

get pods (kube-system)

NAME                                                     READY   STATUS             RESTARTS           AGE     IP               NODE               NOMINATED NODE   READINESS GATES
coredns-857c6f89fb-g7lkl                                 1/1     Running            0                  2d21h   10.244.0.224     node-001.support   <none>           <none>
kube-router-8fqnq                                        1/1     Running            0                  2d21h   10.108.110.145   node-001.support   <none>           <none>
metrics-server-7f9ccb4c8-d4tqd                           0/1     CrashLoopBackOff   1501 (3m42s ago)   3d3h    10.244.0.187     node-001.support   <none>           <none>
openstack-cinder-csi-controllerplugin-6f8d6dcf99-xsnhk   6/6     Running            142 (2d21h ago)    391d    10.244.0.198     node-001.support   <none>           <none>
openstack-cinder-csi-nodeplugin-l2mm9                    3/3     Running            12 (3d1h ago)      391d    10.108.110.145   node-001.support   <none>           <none>
openstack-cloud-controller-manager-9l9zk                 1/1     Running            2 (2d22h ago)      3d1h    10.108.110.145   node-001.support   <none>           <none>

Again, we tried to patch pods that needed the k8s api and that we needed. But we didn't patch metrics-server and it's still failing (since Friday).

logs metrics-server (kube-system)

As an addition, here are logs from the (unpatched) metrics-server:

panic: unable to load configmap based request-header-client-ca-file: Get "https://10.96.0.1:443/api/v1/namespaces/kube-system/configmaps/extension-apiserver-authentication": dial tcp 10.96.0.1:443: i/o timeout

goroutine 1 [running]:
main.main()
	sigs.k8s.io/metrics-server/cmd/metrics-server/metrics-server.go:37 +0x8b

Just keeps repeating and restarting.

@till
Copy link
Contributor Author

till commented Nov 25, 2024

Also, can you check if you're referencing custom images in your k0s configuration? Please have a look at /etc/k0s/k0s.yaml directly on the node.

I have checked — no custom images.

@till
Copy link
Contributor Author

till commented Nov 26, 2024

From what I gathered the kubernetes service is a virtual IP.

I can see it defined, it has one endpoint (the externalAddress). The ports look correct 6433 to 443).

That service IP is used in the SD environment variables in the pod, but the connection does not work - i/o timeout.

The same service IP works from the host OS though.

I see no requests to it with a tcpdump when the request comes from within a pod (e.g. metrics-server). It's like traffic never leaves the pod?

@till
Copy link
Contributor Author

till commented Nov 27, 2024

@twz123 anything else I can look into? Or any idea what could be the culprit?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants