Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fail workflows when hitting k8s resource size limit (status code 413) #913

Closed
jessesuen opened this issue Jul 21, 2018 · 0 comments
Closed
Milestone

Comments

@jessesuen
Copy link
Member

Is this a BUG REPORT or FEATURE REQUEST?:

It's possible for a workflow to get into a situation where it's unable to update a workflow because the object itself is simply too large. The symptom is a message in the controller:

Error updating workflow: the server responded with the status code 413 but did not return more information (put workflows.argoproj.io data-prep-ld52n)

K8s limits resource sizes to 1MB. In this event, we want to fail the workflow with a meaningful message. What is currently happening is that it stays in a running state until the workflow is deleted. It will never complete and simply causes the controller to continually process this for all of eternity.

What happened:

time="2018-07-20T20:00:01Z" level=info msg="All of node data-prep-ld52n.grader-1E547080-DB87-40D2-994E-EA21E401FC57 dependencies [] completed" namespace=default workflow=data-prep-ld52n
time="2018-07-20T20:00:01Z" level=warning msg="Deadline exceeded" namespace=default workflow=data-prep-ld52n
time="2018-07-20T20:00:01Z" level=warning msg="Error updating workflow: the server responded with the status code 413 but did not return more information (put workflows.argoproj.io data-prep-ld52n)" namespace=default workflow=data-prep-ld52n
time="2018-07-20T20:00:01Z" level=info msg="Processing workflow" namespace=default workflow=data-prep-ld52n

What you expected to happen:

If we cannot store the workflow because it's payload is too large, then at least fail the workflow by updating the phase to become Error with a message. NOTE: in order to do this, we would need to drop the current payload and simply update the status.phase.

How to reproduce it (as minimally and precisely as possible):

Run a large workflow (workflow who's size will reach 1+ MB).

Anything else we need to know?:

Environment:

  • Argo version: v2.1.1
@jessesuen jessesuen added this to the v2.2 milestone Aug 1, 2018
icecoffee531 pushed a commit to icecoffee531/argo-workflows that referenced this issue Jan 5, 2022
…rgoproj#915)

* feat: extend resource eventsource field filter. Closes argoproj#913

* re-run codegen with lastest panddoc

* infof
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant