Trino pods goes down instantly while autoscale factor causes pods to terminate even if `terminationGracePeriodseconds` is set to 300 seconds #22483

hsushmitha · 2024-06-24T09:45:26Z

we have set terminationGracePeriodSeconds to 300s in trino coordinator and worker nodes. during autoscaling when the number of worker pods increase and decrease, pods terminate instantly without waiting for the queries in the pod to terminate.
we have set shutdown.grace-period=300s in trino cooridnator and worker also.
Expectation is the trino worker pods must wait for 300sec untill tasks in the worker complete instead of terminating instantly.

we have set starburstWorkerShutdownGracePeriodSeconds: 300 which corresponds to shutdown.grace-period=300s and deploymentTerminationGracePeriodSeconds: 300 which corresponds to terminationGracePeriodSeconds in starburst and the worker pods terminate after 300sec waiting for query tasks to run to completion as expected.

The text was updated successfully, but these errors were encountered:

nineinchnick · 2024-06-24T09:55:23Z

Is this about the Trino Helm chart? If yes, can you include the values to reproduce this?

hsushmitha · 2024-06-24T12:58:58Z

it is about Trino Helm Chart. Attaching deployment config and values file for reproducing the issue.

values.txt
deployment-coordinator.txt
deployment-worker.txt

nineinchnick · 2024-06-24T13:24:31Z

Which chart version you're using? How do you apply the changes you included in deployment-*.txt files?

In the latest chart version, you have to set coordinator.terminationGracePeriodSeconds and worker.terminationGracePeriodSeconds. See https://trinodb.github.io/charts/charts/trino/

hsushmitha · 2024-06-25T07:08:31Z

we are using helm chart version: trino-0.8.0 we do helm upgrade trino . -f values.yaml -n trino and deploy the changes. the above attached files are yaml files.. since we couldn't attach yaml files we attached txt file version.

nineinchnick · 2024-06-25T08:01:41Z

That's very old. I don't know how the chart was structured back then, and I can't help anymore. Can you try using the latest version?

hsushmitha · 2024-07-19T11:30:10Z

we have upgraded the helm chart to 0.25.0, and the terminationGracePeriodSeconds is set to 300s. but still the trino pods are terminating instantly without being in terminating state for 300s.

nineinchnick · 2024-07-19T11:50:52Z

I checked that the default Trino Docker image entrypoint doesn't handle signals sent to the container in any special way. The Trino server also doesn't do this. To handle graceful shutdown, you have to configure the pod's lifecycle in the worker.lifecycle section. See the Helm chart docs for an example.

hsushmitha · 2024-09-24T11:46:38Z

HI, we have set lifecycle prestop hook and terminationGracePeriodSeconds in values.yaml

  lifecycle:
  # worker.lifecycle -- To enable [graceful
  # shutdown](https://trino.io/docs/current/admin/graceful-shutdown.html),
  # define a lifecycle preStop like bellow, Set the
  # `terminationGracePeriodSeconds` to a value greater than or equal to the
  # configured `shutdown.grace-period`. Configure `shutdown.grace-period` in
  # `additionalConfigProperties` as `shutdown.grace-period=2m` (default is 2
  # minutes). Also configure `accessControl` because the `default` system
  # access control does not allow graceful shutdowns.
  # @raw
  # Example:
  # ```yaml
    preStop:
      exec:
        command: ["/bin/sh", "-c", "curl -v -X PUT -d '\"SHUTTING_DOWN\"' -H \"Content-type: application/json\" -H \"X-Trino-User: trino\" https://localhost:8080/v1/info/state"]
  # ```

  terminationGracePeriodSeconds: 300

also we have set shutdown.grace-period in additionalWorkerConfigProperties

additionalWorkerConfigProperties:
  - shutdown.grace-period=300s

we still see the worker pods getting terminated abruptly without being in terminating state for 300s which is causing queries to fail. is there anything else that needs to be set to make sure pods have graceful shutdown.

hashhar · 2024-10-14T14:51:40Z

if any of the tasks take longer than the termination grace period then queries are going to fail.

See docs at https://trino.io/docs/current/admin/graceful-shutdown.html which explain how graceful shutdown works.

The grace period hence needs to be at-least as long as the longest tasks (for simplicity assume queries) that execute on your cluster.

trinodb locked and limited conversation to collaborators Oct 14, 2024

hashhar converted this issue into discussion #23775 Oct 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

Trino pods goes down instantly while autoscale factor causes pods to terminate even if `terminationGracePeriodseconds` is set to 300 seconds #22483

Trino pods goes down instantly while autoscale factor causes pods to terminate even if `terminationGracePeriodseconds` is set to 300 seconds #22483

hsushmitha commented Jun 24, 2024

nineinchnick commented Jun 24, 2024

hsushmitha commented Jun 24, 2024

nineinchnick commented Jun 24, 2024

hsushmitha commented Jun 25, 2024

nineinchnick commented Jun 25, 2024 •

edited

Loading

hsushmitha commented Jul 19, 2024

nineinchnick commented Jul 19, 2024

hsushmitha commented Sep 24, 2024

hashhar commented Oct 14, 2024

This issue was moved to a discussion.

This issue was moved to a discussion.

Trino pods goes down instantly while autoscale factor causes pods to terminate even if terminationGracePeriodseconds is set to 300 seconds #22483

Trino pods goes down instantly while autoscale factor causes pods to terminate even if terminationGracePeriodseconds is set to 300 seconds #22483

Comments

hsushmitha commented Jun 24, 2024

nineinchnick commented Jun 24, 2024

hsushmitha commented Jun 24, 2024

nineinchnick commented Jun 24, 2024

hsushmitha commented Jun 25, 2024

nineinchnick commented Jun 25, 2024 • edited Loading

hsushmitha commented Jul 19, 2024

nineinchnick commented Jul 19, 2024

hsushmitha commented Sep 24, 2024

hashhar commented Oct 14, 2024

This issue was moved to a discussion.

Trino pods goes down instantly while autoscale factor causes pods to terminate even if `terminationGracePeriodseconds` is set to 300 seconds #22483

Trino pods goes down instantly while autoscale factor causes pods to terminate even if `terminationGracePeriodseconds` is set to 300 seconds #22483

nineinchnick commented Jun 25, 2024 •

edited

Loading