Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Busted Kubernetes image cache prevents pods starting? #219

Closed
negz opened this issue Mar 13, 2022 · 15 comments
Closed

Busted Kubernetes image cache prevents pods starting? #219

negz opened this issue Mar 13, 2022 · 15 comments

Comments

@negz
Copy link

negz commented Mar 13, 2022

I notice when starting Colima with Kubernetes (on my M1 Mac) I usually see something like the following:

$ ./_output/binaries/colima-Darwin-arm64 start --runtime containerd --with-kubernetes --kubernetes-version v1.23.4+k3s1 -c 4 -m 4
INFO[0000] starting colima                              
INFO[0000] creating and starting ...                     context=vm
INFO[0021] starting ...                                  context=containerd
INFO[0026] waiting for startup to complete ...           context=containerd
INFO[0026] downloading and installing ...                context=kubernetes
INFO[0037] loading oci images ...                        context=kubernetes
> unpacking docker.io/rancher/local-path-provisioner:v0.0.21 (sha256:7d89362db230fd1178ba9465bc3582b09872ee6be72037e2ad80cf812751299e)...done
> unpacking docker.io/rancher/mirrored-coredns-coredns:1.8.6 (sha256:a6d8488c231616918f517bd33321ade37f6b55e9355450cbf512d053e4df505e)...ctr: failed to resolve rootfs: content digest sha256:edaa71f2aee883484133da046954ad70fd6bf1fa42e5ae
WARN[0039] error loading oci images: exit status 1       context=kubernetes
WARN[0039] startup may delay a bit as images will be pulled from oci registry  context=kubernetes
INFO[0040] starting ...                                  context=kubernetes
INFO[0045] updating config ...                           context=kubernetes
INFO[0046] Switched to context "colima".                 context=kubernetes
INFO[0046] done 

Note the issue importing mirrored-coredns-coredns.

This results in the CoreDNS pod being unable to start. From kubectl -n dube-system describe po coredns-5789895cd-7qtql:

Events:                                                                                                                                                                                                                          [1517/1883]  Type     Reason     Age                    From               Message                                                                                                                                                                     
  ----     ------     ----                   ----               -------                                                                                                                                                                       Normal   Scheduled  3m54s                  default-scheduler  Successfully assigned kube-system/coredns-5789895cd-7qtql to colima                                                                                                         
  Warning  Failed     103s (x12 over 3m49s)  kubelet            Error: failed to create containerd container: error unpacking image: failed to resolve rootfs: content digest sha256:edaa71f2aee883484133da046954ad70fd6bf1fa42e5aec3f7dae199c626299c: not found                                                                                                                                                                                                                        
  Normal   Pulled     88s (x13 over 3m49s)   kubelet            Container image "rancher/mirrored-coredns-coredns:1.8.6" already present on machine  

ctr i ls also complains, but I'm able to fix the issue by running ctr i rm against the offending image. The Kubernetes pod is then able to pull it successfully. 🤔

colima:/Users/negz/control/abiosoft/colima$ sudo ctr -n k8s.io i ls
ERRO[0000] failed resolving platform for image docker.io/rancher/mirrored-coredns-coredns:1.8.6  error="content digest sha256:edaa71f2aee883484133da046954ad70fd6bf1fa42e5aec3f7dae199c626299c: not found"
ERRO[0000] failed resolving platform for image sha256:edaa71f2aee883484133da046954ad70fd6bf1fa42e5aec3f7dae199c626299c  error="content digest sha256:edaa71f2aee883484133da046954ad70fd6bf1fa42e5aec3f7dae199c626299c: not found"

colima:/Users/negz/control/abiosoft/colima$ sudo ctr -n k8s.io i rm docker.io/rancher/mirrored-coredns-coredns:1.8.6
docker.io/rancher/mirrored-coredns-coredns:1.8.6
@negz
Copy link
Author

negz commented Mar 13, 2022

I'm not really sure what's causing this - maybe aarch64is missing for this OCI image in the upstream image cache tarball? Potential fixes for this at the colima end would be to either:

  • Add a --kubernetes-skip-cache flag
  • Just delete the whole cache if populating it results in an error?

@abiosoft
Copy link
Owner

This is the reason why I temporarily disabled configurable Kubernetes version. I noticed different versions lead to different behaviours and I do not have the time luxury of ensuring smooth execution for all k3s versions.

@abiosoft
Copy link
Owner

Just to clarify, does this only happen with v1.23.4+k3s1 or you've had the issue with other versions as well?

@negz
Copy link
Author

negz commented Mar 13, 2022 via email

@abiosoft
Copy link
Owner

@negz if you would not mind, can you kindly try with the docker runtime as well and see if you see a similar issue.

You do not have to affect your current setup as you can use a separate profile for that.

colima start docker --with-kubernetes # start with a profile named 'docker', defaults to docker runtime
colima delete docker # teardown when done

@negz
Copy link
Author

negz commented Mar 24, 2022

@abiosoft Sorry, I thought I had replied here! It was a while ago now (and I've since gone full yak-shave and moved to a hand-crafted qemu VM) but I'm 99% sure Docker worked fine.

@mhemken-vts
Copy link

I can confirm for v1.22.4+k3s1. docker works fine but containerd gets the rootfs error.

@Crayeth
Copy link

Crayeth commented May 17, 2022

Monterey 12.3.1 m1 mac pro:
colima start --with-kubernetes -r containerd

colima version 0.4.1
git commit: 5d39343b2bdc827e554d78ae306ebc836bf1d02c

runtime: containerd
arch: aarch64
client: v0.19.0
server: v1.5.11

kubernetes
Client Version: v1.24.0
Kustomize Version: v4.5.4
Server Version: v1.23.6+k3s1

Same issue:

unpacking docker.io/rancher/mirrored-coredns-coredns:1.9.1 (sha256:dff8c12625df3489aae62009b23cf369b6b2d8348d55d906be91c52fe3f0854c)...ctr: failed to resolve rootfs: content digest sha256:f40c41555cd4d25d273942094af1267824b9d3e901fdb7c60565dcb6175955df: not found

@abiosoft
Copy link
Owner

@Crayeth does it prevent the cluster from starting for you?

@Crayeth
Copy link

Crayeth commented May 17, 2022

does it prevent the cluster from starting for you?

No the cluster starts however coredns is stuck in createcontainererror.
I tried both upgrading and downgrading through --kubernetes-version but that doesn't seem to fix it (coredns 1.8.7, 1.8.4 all give the same issue). I removed the colima and lima caches and config and ran colima delete every time.

I also noticed my colima version 0.4.1 (via brew) downloads colima-v0.4.0-5/alpine-lima-clm-3.15.4-aarch64.iso not sure if that is correct.

@abiosoft
Copy link
Owner

abiosoft commented May 17, 2022

@Crayeth can you manually remove the offending image and see if it starts?

colima nerdctl -- -n k8s.io rmi -f docker.io/rancher/mirrored-coredns-coredns

If that works, I can find a workaround.

I also noticed my colima version 0.4.1 (via brew) downloads colima-v0.4.0-5/alpine-lima-clm-3.15.4-aarch64.iso not sure if that is correct.

Yeah, this is accurate.

@Crayeth
Copy link

Crayeth commented May 17, 2022

can you manually remove the offending image and see if it starts?

well, it doesn't find any image

➜ colima nerdctl -- -n k8s.io rmi -f rancher/mirrored-coredns-coredns
ERRO[0000] no such image rancher/mirrored-coredns-coredns 

On a sidenote it seems like the coredns images have the wrong arch on them?
coredns/coredns#5363
Maybe the wrong images get mirrored by rancher?

@abiosoft
Copy link
Owner

well, it doesn't find any image

@Crayeth does it still fail when specifying the tag 1.9.1 ?

colima nerdctl -- -n k8s.io rmi -f docker.io/rancher/mirrored-coredns-coredns:1.9.1

On a sidenote it seems like the coredns images have the wrong arch on them?
coredns/coredns#5363
Maybe the wrong images get mirrored by rancher?

This is likely related.

@Crayeth
Copy link

Crayeth commented May 17, 2022

does it still fail when specifying the tag 1.9.1 ?

colima nerdctl -- -n k8s.io rmi -f docker.io/rancher/mirrored-coredns-coredns:1.9.1
WARN[0000] failed to enumerate rootfs                    error="content digest sha256:f40c41555cd4d25d273942094af1267824b9d3e901fdb7c60565dcb6175955df: not found"
Untagged: docker.io/rancher/mirrored-coredns-coredns:1.9.1@sha256:dff8c12625df3489aae62009b23cf369b6b2d8348d55d906be91c52fe3f0854c

As a next step I stopped and started (not deleted) colima and now it seems to run!

coredns-d76bd69b-plmzg                    1/1     Running   1 (59s ago)   82s

@abiosoft
Copy link
Owner

The issue is indeed related to the image architecture, a fix is underway.

Going forward, the image will be loaded successfully irrespective of the arch, and the missing images afterwards (if any) will be manually pulled by k3s on startup.

Kubernetes startup should no longer be affected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants