Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

retryStrategy with DAGs fails, even if the step passes after retries #885

Closed
ankushagarwal opened this issue Jun 18, 2018 · 4 comments
Closed
Assignees
Labels

Comments

@ankushagarwal
Copy link

Is this a BUG REPORT or FEATURE REQUEST?:
BUG REPORT

What happened:
When I use retryStrategy in the onExit step, the workflow is marked as Failed despite the steps succeeding on retries.

What you expected to happen:
The workflow should be marked as Passed when a step passes after retries (when using retryStrategy)

How to reproduce it (as minimally and precisely as possible):

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: test-exit-handler-
spec:
  entrypoint: success
  onExit: exit-handler
  templates:
  - name: success
    container:
      image: alpine:latest
      command: [sh, -c]
      args: ["echo Success"]
  - name: exit-handler
    dag:
      tasks:
      - arguments: {}
        name: step1
        template: randomly-fail
      - arguments: {}
        dependencies:
        - step1
        name: step2
        template: randomly-fail
    inputs: {}
    metadata: {}
    outputs: {}
  - name: randomly-fail
    retryStrategy:
      limit: 10
    container:
      image: alpine:latest
      command: [sh, -c]
      args: ["exit $(( ${RANDOM} % 3 ))"]

In the exit handler, I have defined two steps step1 and step2 (step2 depends upon step1). Each step fails with a 33% chance. I have used retryStrategy: limit: 10, so I would expect the workflow to be marked as passed when the step succeeds in one of the retries, but it doesn't.

Anything else we need to know?:

Environment:

  • Argo version:
$ argo version
argo: v2.1.0
  BuildDate: 2018-05-01T20:03:06Z
  GitCommit: 9379638189cc194f1b34ff7295f0832eac1c1651
  GitTreeState: clean
  GitTag: v2.1.0
  GoVersion: go1.9.3
  Compiler: gc
  Platform: darwin/amd64
  • Kubernetes version :
$ kubectl version -o yaml
clientVersion:
  buildDate: 2018-04-13T22:27:55Z
  compiler: gc
  gitCommit: d4ab47518836c750f9949b9e0d387f20fb92260b
  gitTreeState: clean
  gitVersion: v1.10.1
  goVersion: go1.9.5
  major: "1"
  minor: "10"
  platform: darwin/amd64
serverVersion:
  buildDate: 2018-04-07T22:06:59Z
  compiler: gc
  gitCommit: cb151369f60073317da686a6ce7de36abe2bda8d
  gitTreeState: clean
  gitVersion: v1.9.6-gke.1
  goVersion: go1.9.3b4
  major: "1"
  minor: 9+
  platform: linux/amd64

Other debugging information (if applicable):

  • workflow result:
% argo get test-exit-handler-whph6
Name:                test-exit-handler-whph6
Namespace:           kubeflow-test-infra
ServiceAccount:      default
Status:              Failed
Created:             Sun Jun 17 19:28:20 -0700 (44 seconds ago)
Started:             Sun Jun 17 19:28:20 -0700 (44 seconds ago)
Finished:            Sun Jun 17 19:28:39 -0700 (25 seconds ago)
Duration:            19 seconds

STEP                               PODNAME                             DURATION  MESSAGE
 ✔ test-exit-handler-whph6         test-exit-handler-whph6             3s

 ✖ test-exit-handler-whph6.onExit
 ├-✔ step1(0)                      test-exit-handler-whph6-4256942720  3s
 └-✔ step2
   ├-✖ step2(0)                    test-exit-handler-whph6-71614339    4s        failed with exit code 2
   └-✔ step2(1)                    test-exit-handler-whph6-1749523334  4s
@bbc88ks
Copy link
Member

bbc88ks commented Jun 20, 2018

I think it's a problem with DAGs in general and how they are processing child nodes. onExit retries work fine with container steps.

@jessesuen
Copy link
Member

Thanks for reporting. Will look into this.

@jessesuen jessesuen self-assigned this Aug 1, 2018
@jessesuen
Copy link
Member

Thanks for the test case @ankushagarwal. Reproduced this and @bbc88ks is correct in this a problem with retries in DAGs in general. Does not necessary have to do with onExit.

@jessesuen jessesuen changed the title Using retryStrategy in onExit fails the workflow even though the step passes after retries retryStrategy with DAGs fails, even if the step passes after retries Aug 1, 2018
@jessesuen
Copy link
Member

Reopening since fix in f223e5a caused a regression.

@jessesuen jessesuen reopened this Aug 1, 2018
icecoffee531 pushed a commit to icecoffee531/argo-workflows that referenced this issue Jan 5, 2022
* fix(ci): fix release github action

* rename

* delete Dockerfile of building in docker
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants