Confusing documentation around cords or a bug #565

olivermt · 2023-11-04T10:15:56Z

Hello,

I have been struggling a lot with getting kamal >1.0.0 working with cords.
I am not 100% sure if there is a bug or if I am doing something wrong.
Either way I think my questions / observations should either lead to a bugfix or to improved documentation text :)

Are cords automatic?

The docs leave me very unsure if we should be handling the cord and responding something non-200 OK in the healthcheck or if the healthchecker itself knows how to look for the cord. The release upgrade text reads like as if the healthcheck now simply uses the cord, but when we ran our second deploy on kamal 1.1.0 after upgrading from Mrsk it started failing with the very strange "failed healthy unhealthy" or something along those lines.

First of all, this error message could probably be a lot more informative like "cord is configured but container is still responding healthy". And again, there is nothing explicitly telling you to respond to this file to send different statuses to signal to Traefik whats up.

If cord is not automatic, initial deploy will always fail

Since the cord is only there on a rolling deploy, an initial deploy does not get a cord set up. I discovered this after doing a kamal app remove and doing a fresh deploy trying to handle the cord myself. If you are supposed to handle the cord yourself then this is currently broken for any new users and/or new server setups.

I have no idea how to reliably detect if a deploy is an initial one so I can't really do any heuristics to say if a missing cord should result in a 404 or not.

Edit: There is a kamal-cord directory on the fresh install, as I guess its a default container mount now, but the build logs have zero trace of cords being created and the regular touch ..cordstuff../cord

Edit2: cc @djmb as the cord PR author :)

The text was updated successfully, but these errors were encountered:

olivermt · 2023-11-04T10:23:04Z

Also if a cord check fails with healthy (unhealthy) thingy, things seems to fully break as it just exits, so deploy lock is also not released.. which I am not sure makes any sense.

djmb · 2023-11-06T08:46:36Z

Hi @olivermt!

The cord should work automatically, so you shouldn't need to do anything yourself.

It does two things to your docker container:

Mounts a volume into its /tmp directory and copies the "cord file" into that container
Rewrites its healthcheck command to also check for the existence of the cord file

This allows us to delete the file to force the healthcheck to fail before we stop the container. It sounds maybe from your description that the healthcheck is not failing?

But to help debug could you share the logs from where it fails (redacting anything private from them!)?

Also do you have any custom configuration under the healthcheck key?

And could you extract the actual healthcheck from one of the containers? (via docker inspect)

olivermt · 2023-11-14T18:01:46Z

Hello!

It sounds maybe from your description that the healthcheck is not failing?

Correct, it just keeps responding healthy.

I don't quite understand what you mean with the docker inspect, but what I can tell you is that it only fails on a host that has two separate files and deploys pointing at it (separated by labels on the services).

So there is probably something when you run two services on one host (which is supported, but not recommended?).

I am just gonna move this to another small host instead of forcing them onto the same one, but I can hold off if you want me to debug some more for you, if you want to make sure the two(or more)-on-same-host should work.

djmb · 2023-11-15T15:22:02Z

@olivermt - the cord files are namespaced by the app and destination so two services shouldn't be an issue (but I've not tested that out).

Re: the docker healthcheck - sorry I meant could you run:

$ docker inspect <container_id> -f '{{ .Config.Healthcheck.Test }}'

It should output something like:

[CMD-SHELL (curl -f https://localhost:3000/up -m 5 || exit 1) && (stat /tmp/kamal-cord/cord > /dev/null || exit 1)]

Maybe there's something in the healthcheck that means that it always returns a healthy result?

tsangiotis · 2024-02-01T14:21:30Z

I am trying to deploy a django app with kamal. I cannot healthcheck with cord. The container seems unhealthy.

The command requested above gives the following:

$ docker inspect <container_id> -f '{{ .Config.Healthcheck.Test }}'
[CMD-SHELL (curl -f https://localhost:8000/health || exit 1) && (stat /tmp/kamal-cord/cord > /dev/null || exit 1)]

I also added an ls /tmp/kamal-cord on my entrypoint and cannot find the directory. It is as if it is not there.

$ ls /tmp/kamal-cord
ls: cannot access '/tmp/kamal-cord': No such file or directory

ShubhamPalriwala · 2024-03-03T21:29:54Z

Facing the exact same issue with a NextJs App with Kamal + Traefik!

wenderjean · 2024-03-04T18:40:27Z

Do you guys have any updates in this issue? I'm getting the same here, my "replaced" containers never become unhealthy 🤔

djmb · 2024-03-05T08:53:03Z

The cord file is a workaround to allow us to force the container to be unhealthy while it still can handle requests. Sending docker stop will work the wrong way round - the container will stop handling requests, then it will take Traefik a few seconds to notice - in the meantime it will be serving errors.

How it should work is:

Before starting a new container Kamal creates a new directory on the host and adds an empty file cord in it. This will look something like this in the logs:

Running /usr/bin/env mkdir -p .kamal/cords/my-app-production-<newsha> ; touch .kamal/cords/my-app-production-<newsha>/cord

The when starting the new container, it will map the directory into it (by default to /tmp/kamal-cord) and modify the healthcheck to add a check for the existence of that file.

This will look like:

Running docker run --detach --restart unless-stopped --name <container_name> <SNIP> --health-cmd "(<HEALTHCHECK>) && (stat /tmp/kamal-cord/cord > /dev/null || exit 1)" --health-interval "1s" --volume $(pwd)/.kamal/cords/my-app-production-<newsha>:/tmp/kamal-cord <SNIP>

Then when that new container has started up and is healthy, Kamal deletes the cord file from the old container, which should cause the healthcheck to fail and the container to be marked as unhealthy.

This will appear as:

Running /usr/bin/env rm -r /local/app/.kamal/cords/my-app-production-<oldsha>

Since the problem is that the container is not getting marked as unhealthy, that suggests the healthcheck is continuing to succeed, I guess either because the cord file is not deleted, or because we have a healthcheck that doesn't fail when it is deleted.

The way to debug this is to find out exactly what the health check is and then see what happens when you run it manually in the container.

So from the host run:

$ docker inspect <container_name> -f '{{ .Config.Healthcheck.Test }}'
[CMD-SHELL (curl -f https://localhost:3000/up -m 10 || exit 1) && (stat /tmp/kamal-cord/cord > /dev/null || exit 1)]

Then exec into the container and see why that command doesn't return an error code:

$ docker exec -it <container_name> bash
# (curl -f https://localhost:3000/up -m 10 || exit 1) && (stat /tmp/kamal-cord/cord > /dev/null || exit 1)
$ echo $?
0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Confusing documentation around cords or a bug #565

Confusing documentation around cords or a bug #565

olivermt commented Nov 4, 2023 •

edited

Loading

olivermt commented Nov 4, 2023

djmb commented Nov 6, 2023

olivermt commented Nov 14, 2023 •

edited

Loading

djmb commented Nov 15, 2023

tsangiotis commented Feb 1, 2024

ShubhamPalriwala commented Mar 3, 2024

wenderjean commented Mar 4, 2024

djmb commented Mar 5, 2024

Confusing documentation around cords or a bug #565

Confusing documentation around cords or a bug #565

Comments

olivermt commented Nov 4, 2023 • edited Loading

Are cords automatic?

If cord is not automatic, initial deploy will always fail

olivermt commented Nov 4, 2023

djmb commented Nov 6, 2023

olivermt commented Nov 14, 2023 • edited Loading

djmb commented Nov 15, 2023

tsangiotis commented Feb 1, 2024

ShubhamPalriwala commented Mar 3, 2024

wenderjean commented Mar 4, 2024

djmb commented Mar 5, 2024

olivermt commented Nov 4, 2023 •

edited

Loading

olivermt commented Nov 14, 2023 •

edited

Loading