Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pod: add exit policies #13859

Merged
merged 2 commits into from
May 2, 2022
Merged

Conversation

vrothberg
Copy link
Member

@vrothberg vrothberg commented Apr 13, 2022

Add the notion of an "exit policy" to a pod. This policy controls the
behaviour when the last container of pod exits. Initially, there are
two policies:

  • "continue" : the pod continues running. This is the default policy
    when creating a pod.

  • "stop" : stop the pod when the last container exits. This is the
    default behaviour for play kube.

In order to implement the deferred stop of a pod, add a worker queue to
the libpod runtime. The queue will pick up work items and in this case
helps resolve dead locks that would otherwise occur if we attempted to
stop a pod during container cleanup.

Note that the default restart policy of play kube is "Always". Hence,
in order to really solve #13464, the YAML files must set a custom
restart policy; the tests use "OnFailure".

Fixes: #13464
Signed-off-by: Valentin Rothberg [email protected]

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Apr 13, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: vrothberg

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 13, 2022
@vrothberg
Copy link
Member Author

@cdoern PTAL

libpod/pod_api.go Outdated Show resolved Hide resolved
libpod/container_internal.go Outdated Show resolved Hide resolved
libpod/pod_api.go Outdated Show resolved Hide resolved
libpod/container_internal.go Outdated Show resolved Hide resolved
@mheon
Copy link
Member

mheon commented Apr 13, 2022

I don't think this works with --rm containers, which will directly invoke podman rm last I checked.

@mheon
Copy link
Member

mheon commented Apr 13, 2022

Actually, I take that back, RemoveContainer() should invoke cleanup() if the container was running.

@mheon
Copy link
Member

mheon commented Apr 13, 2022

We can potentially avoid the workqueue bits by doing this in a defer() in Cleanup() and RemoveContainer() - as long as it's the first defer in the function, even before the lock/defer unlock() - it will run after all locks are released, but still block the function from returning and Podman from exiting until it finishes.

@mheon
Copy link
Member

mheon commented Apr 13, 2022

Though, that requires that we add it to every function that can stop a single container, which could be complicated

@vrothberg
Copy link
Member Author

We can potentially avoid the workqueue bits by doing this in a defer() in Cleanup() and RemoveContainer() - as long as it's the first defer in the function, even before the lock/defer unlock() - it will run after all locks are released, but still block the function from returning and Podman from exiting until it finishes.

See #13859 (comment). There are deadlock all over the place. We have the choice to either update all call sites of cleanup() and change the locking, or we use a work-queue like approach. I really prefer the queue as it can't really bite us in the future.

@mheon
Copy link
Member

mheon commented Apr 13, 2022

Note my use of the capitalized Cleanup - if we hit the public API entrypoints, we can avoid all locks because no locks have been taken before Cleanup() starts

@vrothberg
Copy link
Member Author

Note my use of the capitalized Cleanup - if we hit the public API entrypoints, we can avoid all locks because no locks have been taken before Cleanup() starts

How would you call stopInfraIfNeeded after the API finished? I guess with a defer and that's where the container is locked.

@vrothberg
Copy link
Member Author

We could make the auto-stop behavior opt-in and let play kube default to it.

Given e2e tests are also barking, I think the behavior should really be opt-in for ordinary pods but be turned on for play kube ones.

@Luap99
Copy link
Member

Luap99 commented Apr 13, 2022

Given e2e tests are also barking, I think the behavior should really be opt-in for ordinary pods but be turned on for play kube ones.

I can only speak for the podman run check dnsname plugin with CNI test, in this case it is just a implementation detail and should be fixed. It tries to resolve the pod name because it assumes it stays running, adding another container to keep it running will fix this. I don't see this as a real world use case.

We could make the auto-stop behavior opt-in and let play kube default to it.

I think making this configurable is a good idea in case users have problems with the new bahavior but IMO the default should be to stop the infra. I see no reasons why they have to keep running since they do nothing.

@rhatdan
Copy link
Member

rhatdan commented Apr 14, 2022

I thought the use case we added or were adding was pods go away when the last container exits. They can start without a container, but if a container is added and removed the infra container goes away.

@vrothberg
Copy link
Member Author

vrothberg commented Apr 14, 2022

I thought the use case we added or were adding was pods go away when the last container exits. They can start without a container, but if a container is added and removed the infra container goes away.

The infra container will now be stopped but the pod will continue to exist. We can add a --rm flag to pod-create similar to the one for containers to remove on exit.

The question now is whether we want to auto-stop the infra container for ordinary pods. There are certain tests breaking that created a file in the pod's mount NS:

  • First container does a touch /foo
  • Second container does a ls /foo

The second one is now breaking as the mount NS is destroyed when we stop the infra container.

@rhatdan
Copy link
Member

rhatdan commented Apr 27, 2022

Can we change the test to do:

podman create mypod -v test:/test --name mypod
podman pod run --name mypod $IMAGE touch /test/foobar
podman pod run --name mypod $IMAGE ls /test/foobar

Second container should restart the pod, correct?

@vrothberg
Copy link
Member Author

@rhatdan I feel rather strongly to make the behavior configurable. If the change already breaks the tests there is no get-free-out-of-jail card if it broke users.

I started working on it yesterday.

@vrothberg vrothberg changed the title pod: stop when all containers are done pod: add exit policies Apr 28, 2022
@vrothberg
Copy link
Member Author

@rhatdan PTAL

@@ -72,6 +72,9 @@ func init() {
flags.StringVarP(&createOptions.Name, nameFlagName, "n", "", "Assign a name to the pod")
_ = createCommand.RegisterFlagCompletionFunc(nameFlagName, completion.AutocompleteNone)

flags.StringVarP(&createOptions.ExitPolicy, "exit-policy", "", "continue", "Behaviour when the last container exits")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine for now, but this should be set in containers.conf.

@rhatdan
Copy link
Member

rhatdan commented Apr 28, 2022

LGTM

@mheon
Copy link
Member

mheon commented Apr 28, 2022

I'm still unsure how we guarantee all pending jobs have finished before the runtime shuts down and Podman exits

@TomSweeneyRedHat
Copy link
Member

All kinds of red unhappy test and you need a rebase @vrothberg

@vrothberg
Copy link
Member Author

Moving to WIP for now to do the plumbing for containers.conf.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 29, 2022
vrothberg added a commit to vrothberg/common that referenced this pull request Apr 29, 2022
Add a new `pod_exit_policy` field to the containers.conf's engine table.
A pod's exit policy determines the behaviour when the last container of
a pod exits.

Required-in: containers/podman/pull/13859
Signed-off-by: Valentin Rothberg <[email protected]>
vrothberg added a commit to vrothberg/common that referenced this pull request Apr 29, 2022
Add a new `pod_exit_policy` field to the containers.conf's engine table.
A pod's exit policy determines the behaviour when the last container of
a pod exits.

Required-in: containers/podman/pull/13859
Signed-off-by: Valentin Rothberg <[email protected]>
@vrothberg
Copy link
Member Author

-> containers/common#1017 to move the definitions and parsing directly to c/common.

@flouthoc
Copy link
Collaborator

flouthoc commented Apr 29, 2022

@vrothberg Just a small doubt why do we need extra exit_policy shouldn't pod just follow how restartPolicy is configured default is Always so it is expected to restart but if its Never or OnFailure pod should just end with Succeeded status.

Ref: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-phase

@rhatdan
Copy link
Member

rhatdan commented Apr 29, 2022

Containers follow restart policy not the Pod. Theoretically if I add a restart=always container to a Pod, the container should restart even if the Pod is set to stop,

@flouthoc
Copy link
Collaborator

flouthoc commented Apr 29, 2022

@rhatdan ah i see but shouldn't containers inherit the restartPolicy from pod spec as stated here for all the containers in a pod https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#restart-policy

Afaik In kubernetes construct i don't think its possible to add a restart policy for a specific container , containers always inherit restartPolicy from pod , some reference here https://pkg.go.dev/k8s.io/api/core/v1#Container and https://pkg.go.dev/k8s.io/api/core/v1#PodSpec

Just want to make sure that a standard spec behaves in a similar manner between podman and k8s other than that I think this should be fine. So yeah i guess its fine.

Edit: Nvm not a blocker.

@vrothberg
Copy link
Member Author

Containers follow restart policy not the Pod. Theoretically if I add a restart=always container to a Pod, the container should restart even if the Pod is set to stop,

Yes, that is correct. That is why the tests set the restart policy to "onFailure" to actually trigger the "stop" exit policy when being run by play kube.

It is also a topic that I wanted to reflect on next week: should the default restart policy really be "always"? I see why K8s does that but should Podman do that as well?

@vrothberg
Copy link
Member Author

@vrothberg Just a small doubt why do we need extra exit_policy shouldn't pod just follow how restartPolicy is configured default is Always so it is expected to restart but if its Never or OnFailure pod should just end with Succeeded status.

It won't without this patch as the infra container would continue to run. Hence, we need to stop the pod once all containers (other than the infra) are done. However, we cannot do that unconditionally as it would break backwards compat to how podman pod create works. That's why we needed to make it configurable via an "exit policy".

Required for using the newly added pod exit policies.

Signed-off-by: Valentin Rothberg <[email protected]>
Add the notion of an "exit policy" to a pod.  This policy controls the
behaviour when the last container of pod exits.  Initially, there are
two policies:

 - "continue" : the pod continues running. This is the default policy
                when creating a pod.

 - "stop" : stop the pod when the last container exits. This is the
            default behaviour for `play kube`.

In order to implement the deferred stop of a pod, add a worker queue to
the libpod runtime.  The queue will pick up work items and in this case
helps resolve dead locks that would otherwise occur if we attempted to
stop a pod during container cleanup.

Note that the default restart policy of `play kube` is "Always".  Hence,
in order to really solve containers#13464, the YAML files must set a custom
restart policy; the tests use "OnFailure".

Fixes: containers#13464
Signed-off-by: Valentin Rothberg <[email protected]>
@vrothberg vrothberg changed the title WIP - pod: add exit policies pod: add exit policies May 2, 2022
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 2, 2022
@vrothberg
Copy link
Member Author

Moving to WIP for now to do the plumbing for containers.conf.

Done.

@vrothberg
Copy link
Member Author

@mheon PTanotherL

Copy link
Member

@Luap99 Luap99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -72,6 +72,10 @@ func init() {
flags.StringVarP(&createOptions.Name, nameFlagName, "n", "", "Assign a name to the pod")
_ = createCommand.RegisterFlagCompletionFunc(nameFlagName, completion.AutocompleteNone)

policyFlag := "exit-policy"
flags.StringVarP(&createOptions.ExitPolicy, policyFlag, "", string(containerConfig.Engine.PodExitPolicy), "Behaviour when the last container exits")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
flags.StringVarP(&createOptions.ExitPolicy, policyFlag, "", string(containerConfig.Engine.PodExitPolicy), "Behaviour when the last container exits")
flags.StringVar(&createOptions.ExitPolicy, policyFlag, string(containerConfig.Engine.PodExitPolicy), "Behaviour when the last container exits")

small nit in case you have to repush

@mheon
Copy link
Member

mheon commented May 2, 2022

LGTM on my end

@vrothberg
Copy link
Member Author

Nice, @rhatdan, want to merge?

@rhatdan
Copy link
Member

rhatdan commented May 2, 2022

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label May 2, 2022
@openshift-merge-robot openshift-merge-robot merged commit c3d871a into containers:main May 2, 2022
@vrothberg vrothberg deleted the fix-13464 branch May 3, 2022 07:03
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 21, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 21, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Running a pod on OSX leaves infra container running forever
8 participants