Allow sidecars to specify probe port #8949

emaildanwilson · 2020-08-06T20:21:29Z

Currently the API for container livenessProbe and readinessProbe doesn't allow port to be specified since containerPort is already provided. In a multi-container use case, containerPort can only be specified on one container which provides a clear way to identify the container and port which receives traffic. As a side effect, it's currently not possible to configure a sidecar container with liveness/readiness probes using either httpGet or tcpSocket.

A non-ideal workaround for readinessProbe exists to design the primary container's health check code to also check the health of any dependent sidecars. This isn't a typical consideration of the service developer and would not work in many cases for third party containers.

Since k8s restarts the specific container that fails a livenessProbe and NOT the entire pod, this same workaround is not effective for livenessProbe and instead would result in unnecessary restarts of the primary container.

The text was updated successfully, but these errors were encountered:

emaildanwilson · 2020-08-12T15:28:13Z

I can pick this up and work on a PR. before I do though...

Is there agreement that:

This should be fixed
The approach is to update the field mask to allow livenessProbe and readinessProbe to be defined on sidecar containers only.

cc @markusthoemmes

mattmoor · 2020-09-15T15:55:56Z

The workaround of proxying through the single exposed container doesn't work for exec probes, which are the defacto way of health checking GRPC services.

The point around the liveness probe restarting the container vs. the pod is another great point / problem with this suggested workaround.

My personal inclination would be to see us allow {liveness,readiness}Probe on sidecars.

cc @dprotaso @markusthoemmes @savitaashture
cc @dgerd (for historical purposes, and because we miss you 😉 )

github-actions · 2020-12-15T01:40:40Z

This issue is stale because it has been open for 90 days with no
activity. It will automatically close after 30 more days of
inactivity. Reopen the issue with /reopen. Mark the issue as
fresh by adding the comment /remove-lifecycle stale.

markusthoemmes · 2021-01-13T09:40:39Z

Ack, allowing the probes seems okay for sidecar containers.

mattmoor · 2021-01-14T22:37:29Z

/lifecycle frozen

evankanderson · 2021-03-22T03:35:29Z

/area
/kind api-change

/good-first-issue
Since it sounds like there's agreement on the general API shape. If not, remove /remove good-first-issue.

/triage acecpted

knative-prow-robot · 2021-03-22T03:35:30Z

@evankanderson:
This request has been marked as suitable for new contributors.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-good-first-issue command.

In response to this:

/area
/kind api-change

/good-first-issue
Since it sounds like there's agreement on the general API shape. If not, remove /remove good-first-issue.

/triage acecpted

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

knative-prow-robot · 2021-03-22T03:35:30Z

@evankanderson: The label(s) area/api-change, triage/acecpted cannot be applied, because the repository doesn't have them.

In response to this:

/area
/kind api-change

/good-first-issue
Since it sounds like there's agreement on the general API shape. If not, remove /remove good-first-issue.

/triage acecpted

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

evankanderson · 2021-03-22T03:35:51Z

/kind api-change
/triage accepted

salaboy · 2021-07-07T07:11:12Z

/assign

salaboy · 2021-07-07T07:56:22Z

Ok, so I started looking at this issue (as my "good first issue"), please remember that I am new with the internals.
This is what I understood so far:

For the primary container (the one which specifies a containerPort) the readiness probe is cleared up (set to nil), because the queue proxy is going to execute the readynessprobe instead. And this is done here: https://github.com/knative/serving/blob/main/pkg/reconciler/revision/resources/deploy.go#L163
For the primary container (the one which specifies a containerPort) the liveness probe, if HTTP is used, is overriden with the networking.BackendHTTPPort to be redirected to the queue proxy, If TCP is used the port provided by the user is used.
For all the sidecars containers, this is not done

My questions based on this:

Should we redirect the probes for the sidecar containers to the queue container as it is done for the main container? In other words, the sidecars' probes should be executed by the kubelet? I can't think of a reason why they shouldn't
Internally no manipulation is done for sidecars probes, but there is a validation web hook screaming at you when you try to add probes to sidecar containers:

Error from server (BadRequest): error when creating "service.yaml": admission webhook "validation.webhook.serving.knative.dev" denied the request: validation failed: must not set the field(s): spec.template.spec.containers[1].livenessProbe, spec.template.spec.containers[1].livenessProbe.initialDelaySeconds, spec.template.spec.containers[1].livenessProbe.periodSeconds

Update: This validation can be turned of by removing this check: https://github.com/knative/serving/blob/main/pkg/apis/serving/k8s_validation.go#L362

It will be great for me to understand historically why this was not allowed. As my gut feeling tells me that having probes coming from different places might cause some race conditions or random behaviour.

I will keep adding my findings here until I have a PR with some changes to discuss in more detail, if someone can shed some light on the expected behaviour that will be great.

salaboy · 2021-07-08T15:53:09Z

@dprotaso @evankanderson I've created a simple PR with a test showing that the probes are maintained if the validation is removed from the validation web hook. The question still remains, do we want the sidecar containers probes to be performed by the kubelet?

I feel that I am missing context here to make a decision on that. In my head it makes some sense to only manage the "serving" container. But at the same time I do realise that I don't fully understand the entire lifecycle of KServices, which might be requiring the queue proxy container to run the probes for sidecars as well.

vishal-chdhry · 2023-01-26T13:21:19Z

/assign

vishal-chdhry · 2023-01-28T18:12:12Z

Hi @dprotaso! I would love to contribute to knative, but i am new to this project, can you please give me some pointers on what to change?

evankanderson · 2023-02-08T20:34:23Z

Hey, I'm not Dave, but I saw your note today.

I think the notes left by @salaboy are roughly correct -- when we added sidecar containers, we weren't sure how liveness and readiness probes should interact with the overall Pod health and in particular, the ability of the application and queue-proxy to serve traffic.

I think relaxing the validation Mauricio pointed to is possible and "safe" -- the largest risk would be an increase in cold-start time for pods which define readiness probes on their additional containers.

We intercept the probes on the primary container (as determined by specifying one containerPort, rather than zero) to enable the queue-proxy and activator to probe for initial readiness at a faster interval than supported by kubelet, but this mechanism should either continue to work (since it was independent of kubelet-reported readiness) or should introduce additional readiness delay (delaying cold start, but this can be a documented outcome).

KaranJagtiani · 2023-08-19T06:36:52Z

Is this issue still valid? I'm new to Knative and I want to contribute. Based on the history of this issue, the PR (that was raised and closed) and the code, what I understand is relaxing the liveliness probe validation on the sidecar container would be safe to implement but we still need to keep primary container validation based on the containerPort. I'm I getting this right or are there some other data points that I need to understand? @evankanderson @dprotaso

evankanderson · 2023-08-19T14:24:32Z

I don't think any work has happened on this since my comment above, so this should be available for work.

KaranJagtiani · 2023-08-20T18:03:14Z

In that case can I be assigned the issue? @evankanderson

evankanderson · 2023-08-20T19:11:46Z

/assign @KaranJagtiani

dprotaso · 2023-09-11T18:33:32Z

We intercept the probes on the primary container (as determined by specifying one containerPort, rather than zero) to enable the queue-proxy and activator to probe for initial readiness at a faster interval than supported by kubelet, but this mechanism should either continue to work (since it was independent of kubelet-reported readiness) or should introduce additional readiness delay (delaying cold start, but this can be a documented outcome).

The activator also does active probing (to the queue-proxy) after initial readiness in order track healthy endpoints for load balancing decisions.

I think relaxing the validation Mauricio pointed to is possible and "safe"

Thinking about this I'd say it's not - we're deviating from what's considered a 'ready' Pod by only considering a subset of containers.

So it seems like we could:

Ensure all probes go through the queue proxy (excluding exec probes)
When probes exist on the sidecar containers, the activator skips it's probing and rely's on the K8s API to understand Pod readiness (this already happens)

dprotaso · 2023-09-19T18:21:22Z

To elaborate a bit more - even if we complete #1 since our sidecar is unable to perform exec probes we should probably fall back on #2 - and have the activator look at pod readiness

I believe we currently disable probing when we have an exec probe configured for a revision

ReToCode · 2024-01-31T13:45:46Z

I believe we currently disable probing when we have an exec probe configured for a revision

Only partially, we still do some TCP probing from Q-P. A full test set can be found in the FT. I'm working on a PR #14853 to enable multi-container probing that works for liveness and readiness probes.

ReToCode · 2024-02-01T15:31:10Z

/assign

emaildanwilson added the kind/feature Well-understood/specified features, ready for coding. label Aug 6, 2020

github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 15, 2020

knative-prow-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 14, 2021

knative-prow-robot added good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. labels Mar 22, 2021

knative-prow-robot added kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API triage/accepted Issues which should be fixed (post-triage) labels Mar 22, 2021

dprotaso added this to Needs Triaging in API Feature: Multi-Container Support Jun 12, 2021

dprotaso added this to the 0.24.x milestone Jun 12, 2021

dprotaso moved this from Needs Triaging to Beta in API Feature: Multi-Container Support Jun 12, 2021

dprotaso modified the milestones: 0.24.x, 0.25.x Jul 6, 2021

knative-prow-robot assigned salaboy Jul 7, 2021

salaboy mentioned this issue Jul 7, 2021

[WIP] Allowing Probes to sidecar containers #11634

Closed

dprotaso modified the milestones: 0.25.x, 0.26.x Aug 26, 2021

dprotaso removed this from the 0.26.x milestone Oct 6, 2021

dprotaso modified the milestones: v1.1.0, v1.2.0 Jan 4, 2022

dprotaso modified the milestones: v1.2.0, v1.3.0 Feb 2, 2022

skonto mentioned this issue Feb 14, 2022

Not possible to define livenessProbes or readinessProbes on multi-container #12615

Closed

dprotaso modified the milestones: v1.3.0, v1.4.0 Mar 10, 2022

dprotaso modified the milestones: v1.4.0, v1.5.0 May 10, 2022

dprotaso modified the milestones: v1.5.0, Icebox Jul 14, 2022

knative-prow bot assigned vishal-chdhry Jan 26, 2023

vishal-chdhry removed their assignment Feb 8, 2023

knative-prow bot assigned KaranJagtiani Aug 20, 2023

KaranJagtiani mentioned this issue Aug 22, 2023

Allow probes to be defined for sidecar containers #14291

Closed

dtrifiro mentioned this issue Dec 5, 2023

add readiness probe on TGIS container (caikit+tgis) opendatahub-io/caikit-tgis-serving#156

Open

ReToCode mentioned this issue Jan 31, 2024

Add multi-container probing #14853

Merged

knative-prow bot assigned ReToCode Feb 1, 2024

knative-prow bot closed this as completed in #14853 Mar 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow sidecars to specify probe port #8949

Allow sidecars to specify probe port #8949

emaildanwilson commented Aug 6, 2020

emaildanwilson commented Aug 12, 2020

mattmoor commented Sep 15, 2020

github-actions bot commented Dec 15, 2020

markusthoemmes commented Jan 13, 2021

mattmoor commented Jan 14, 2021

evankanderson commented Mar 22, 2021

knative-prow-robot commented Mar 22, 2021

knative-prow-robot commented Mar 22, 2021

evankanderson commented Mar 22, 2021

salaboy commented Jul 7, 2021

salaboy commented Jul 7, 2021 •

edited

Loading

salaboy commented Jul 8, 2021

vishal-chdhry commented Jan 26, 2023

vishal-chdhry commented Jan 28, 2023

evankanderson commented Feb 8, 2023

KaranJagtiani commented Aug 19, 2023

evankanderson commented Aug 19, 2023

KaranJagtiani commented Aug 20, 2023

evankanderson commented Aug 20, 2023

dprotaso commented Sep 11, 2023 •

edited

Loading

dprotaso commented Sep 19, 2023 •

edited

Loading

ReToCode commented Jan 31, 2024

ReToCode commented Feb 1, 2024

Allow sidecars to specify probe port #8949

Allow sidecars to specify probe port #8949

Comments

emaildanwilson commented Aug 6, 2020

emaildanwilson commented Aug 12, 2020

mattmoor commented Sep 15, 2020

github-actions bot commented Dec 15, 2020

markusthoemmes commented Jan 13, 2021

mattmoor commented Jan 14, 2021

evankanderson commented Mar 22, 2021

knative-prow-robot commented Mar 22, 2021

knative-prow-robot commented Mar 22, 2021

evankanderson commented Mar 22, 2021

salaboy commented Jul 7, 2021

salaboy commented Jul 7, 2021 • edited Loading

salaboy commented Jul 8, 2021

vishal-chdhry commented Jan 26, 2023

vishal-chdhry commented Jan 28, 2023

evankanderson commented Feb 8, 2023

KaranJagtiani commented Aug 19, 2023

evankanderson commented Aug 19, 2023

KaranJagtiani commented Aug 20, 2023

evankanderson commented Aug 20, 2023

dprotaso commented Sep 11, 2023 • edited Loading

dprotaso commented Sep 19, 2023 • edited Loading

ReToCode commented Jan 31, 2024

ReToCode commented Feb 1, 2024

salaboy commented Jul 7, 2021 •

edited

Loading

dprotaso commented Sep 11, 2023 •

edited

Loading

dprotaso commented Sep 19, 2023 •

edited

Loading