Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Artifact load on 503 Service Unavailable error is not retried for S3 artifact repository #9248

Closed
1 of 3 tasks
danajp opened this issue Jul 28, 2022 · 0 comments · Fixed by #9249
Closed
1 of 3 tasks
Labels
area/artifacts S3/GCP/OSS/Git/HDFS etc type/bug

Comments

@danajp
Copy link
Contributor

danajp commented Jul 28, 2022

Checklist

  • Double-checked my configuration.
  • Tested using the latest version.
  • Used the Emissary executor.

Summary

What happened/what you expected to happen?

The init container in one of my workflow pods failed to download an artifact from s3.

What version are you running?

v3.3.3

Diagnostics

Paste the smallest workflow that reproduces the bug. We must be able to run the workflow.

If I were able to induce 503s from s3, this workflow would reproduce the problem:

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  name: my-workflow
spec:
  entrypoint: work

  templates:
    - name: work
      steps:
        - - name: create-artifact
            template: create-artifact
        - - name: use-artifact
            template: use-artifact
            arguments:
              artifacts:
                - name: data
                  from: "{{steps.create-artifact.outputs.artifacts.data}}"

    - name: create-artifact
      outputs:
        artifacts:
          - name: data
            path: /data
      script:
        image: alpine
        command: [sh]
        source: |
          mkdir -p /data
          echo stuff > /data/foo.txt

    - name: use-artifact
      inputs:
        artifacts:
          - name: data
            path: /data
      script:
        image: alpine
        command: [sh]
        source: |
          echo "here's the stuff:"
          cat /data/foo.txt

Here are the relevant logs from a similar real world workflow that failed:

$ kubectl logs -c init my-workflow-1658912400-2107992629
...
time="2022-07-27T09:05:03.033Z" level=info msg="Downloading artifact: app-repo"
time="2022-07-27T09:05:03.033Z" level=info msg="S3 Load path: /argo/inputs/artifacts/app-repo.tmp, key: my-workflow-1658912400/my-workflow-1658912400-2107992629/app-repo.tgz"
time="2022-07-27T09:05:03.033Z" level=info msg="Creating minio client using static credentials" endpoint=s3.amazonaws.com
time="2022-07-27T09:05:03.033Z" level=info msg="Getting file from s3" bucket=my-bucket endpoint=s3.amazonaws.com key=my-workflow-1658912400/my-workflow-1658912400-2107992629/app-repo.tgz path=/argo/inputs/artifacts/app-repo.tmp
time="2022-07-27T09:05:03.056Z" level=warning msg="Non-transient error: 503 Service Unavailable"
time="2022-07-27T09:05:03.056Z" level=error msg="executor error: artifact app-repo failed to load: failed to get file: 503 Service Unavailable"
time="2022-07-27T09:05:03.056Z" level=info msg="Alloc=8890 TotalAlloc=14344 Sys=22994 NumGC=4 Goroutines=3"
time="2022-07-27T09:05:03.057Z" level=fatal msg="artifact app-repo failed to load: failed to get file: 503 Service Unavailable"

Because 503 Service Unavailable is not considered a transient error, the artifact download is not retried.


Message from the maintainers:

Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.

danajp added a commit to danajp/argo-workflows that referenced this issue Jul 28, 2022
terrytangyuan pushed a commit that referenced this issue Jul 29, 2022
@alexec alexec added the area/artifacts S3/GCP/OSS/Git/HDFS etc label Sep 6, 2022
juchaosong pushed a commit to juchaosong/argo-workflows that referenced this issue Nov 3, 2022
reddymh pushed a commit to reddymh/argo-workflows that referenced this issue Jan 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/artifacts S3/GCP/OSS/Git/HDFS etc type/bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants