-
Notifications
You must be signed in to change notification settings - Fork 845
Error Messages And What To Do About Them
This page lists error messages and possible remedies. It is assumed that standard issues have been checked and resolved, such as:
- lack of disk space
- running out of memory or swap
The workers connect to the master using an ssh-like protocol. This is authenticated using private/public keys. The worker knows in advance what the master's public key is, so as to prevent a MITM attack. The master has a list of workers' public keys.
This is seen on the concourse worker when the tsa_host_key.pub file on the worker, which contains the SSH public key, is a mismatch for the tsa_host_key file on the concourse master which contains the SSH private key.
Check this by doing "ssh-keygen -elf tsa_host_key" on the master, and compare with "ssh-keygen -elf tsa_host_key.pub" on the agent. Also, try running "ssh -p 2222 -v cicd-master" on the agent and check what you see in the output for "debug1: Server host key:"
Put the correct public key on the worker, or the correct private key on the master. If necessary, generate a new key pair: https://concourse-ci.org/concourse-generate-key.html
The actual line might be this:
May 28 13:53:02 cicd-worker-5 concourse[31512]: {"timestamp":"1559051582.143739462","source":"guardian","message":"guardian.create.create-failed-cleaningup.start","log_level":1,"data":{"cause":"runc run: exit status 1: container_linux.go:344: starting container process caused "process_linux.go:424: container init caused \"rootfs_linux.go:46: preparing rootfs caused \\\"permission denied\\\"\""\n","handle":"aaaaaaaaa-aaaa-aaaa-aaaa-94420d6db781","session":"45.3"}}
This is seen when the container count on a worker reaches max values(255).
To resolve do
You have to manually kill the worker and it's volume, remember this will fail the other builds that are utilizing the same worker. Optionally you can land-worker, and wait for the containers to finish the task assigned and get cleaned-up. Which will make space for the new containers.
This is seen when...
To resolve do...
This is seen when...
To resolve do...
This is seen when...
To resolve do...
Backend error: Exit status: 404, message: {"Type":"ProcessNotFoundError","Message":"unknown process: task","Handle":"","ProcessID":"task","Binary":""}
This is seen when a worker restarts when pipeline is running.
To resolve do
Check why your worker restarted, If deployed into k8s, kubectl describe pod pod-name
will show you why the pod restarted. In my experience it is due to Probe failures which is due to OOM, so adding additional memory will help you solve this.
I would recommend checking the resources allocated to your concourse Instance and the limits of underlying nodes on which concourse is running. There is no proper solution for OOM error, except to tune resource values to match your requirement.