Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

artifactGC does not run when the output artifact uses archive: none #12857

Closed
3 of 4 tasks
vilmosnagy opened this issue Mar 28, 2024 · 1 comment · Fixed by #13091
Closed
3 of 4 tasks

artifactGC does not run when the output artifact uses archive: none #12857

vilmosnagy opened this issue Mar 28, 2024 · 1 comment · Fixed by #13091
Assignees
Labels
area/artifacts S3/GCP/OSS/Git/HDFS etc area/gc Garbage collection, such as TTLs, retentionPolicy, delays, and more P3 Low priority type/bug

Comments

@vilmosnagy
Copy link

vilmosnagy commented Mar 28, 2024

Pre-requisites

  • I have double-checked my configuration
  • I can confirm the issue exists when I tested with :latest
  • I have searched existing issues and could not find a match for this bug
  • I'd like to contribute the fix myself (see contributing guide)

What happened/what did you expect to happen?

If I use an output artifact with archive: none: {} and artifactGC, then the artifacts are not deleted with the error:

Artifact Garbage Collection failed for strategy OnWorkflowCompletion, err:The specified key does not exist.

Version

v3.5.4

Paste a small workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: artifactgc-test
spec:
  entrypoint: entrypoint
  arguments:
    parameters:
      - name: service-account
        value: 'data-manager-workflows' # an existing service account
  serviceAccountName: data-manager-workflows
  templates:
    - name: entrypoint
      dag:
        tasks:
          - name: generate-task-without-archiving
            template: generate-task-without-archiving

    - name: generate-task-without-archiving
      script:
        image: alpine:latest
        command: [ "sh", "-e" ]
        source: |
          echo "test message" > /opt/hello-world.txt
          md5sum /opt/*
      outputs:
        artifacts:
          - name: collected-artifacts
            path: /opt/
            archive: { none: { } }
            artifactGC:
              strategy: OnWorkflowCompletion
              serviceAccountName: "{{workflow.parameters.service-account}}"
            s3:
              key: fanout/collected-other-datasets/{{workflow.name}}/
              bucket: "io-realcity-apps-demo-mars-test-data"
              endpoint: "storage.googleapis.com"
              accessKeySecret:
                name: "s3-credentials"
                key: "accesskey"
              secretKeySecret:
                name: "s3-credentials"
                key: "secretkey"

Logs from the garbage collection pod

❯ kubectl logs -n demo-mars --context rc-test artifactgc-test7m7pm-artgc-wfcomp-3448090692 
time="2024-03-28T15:43:27.994Z" level=info msg="S3 Delete artifact: key: fanout/collected-other-datasets/artifactgc-test7m7pm/"
time="2024-03-28T15:43:28.094Z" level=info msg="Creating minio client using static credentials" endpoint=storage.googleapis.com
time="2024-03-28T15:43:28.095Z" level=info msg="Deleting object from s3" bucket=io-realcity-apps-demo-mars-test-data endpoint=storage.googleapis.com key=fanout/collected-other-datasets/artifactgc-test7m7pm/
time="2024-03-28T15:43:28.324Z" level=warning msg="Non-transient error: The specified key does not exist."
time="2024-03-28T15:43:28.388Z" level=info msg="S3 Delete artifact: key: fanout/collected-other-datasets/artifactgc-test7m7pm/"
time="2024-03-28T15:43:28.394Z" level=info msg="Creating minio client using static credentials" endpoint=storage.googleapis.com
time="2024-03-28T15:43:28.488Z" level=info msg="Deleting object from s3" bucket=io-realcity-apps-demo-mars-test-data endpoint=storage.googleapis.com key=fanout/collected-other-datasets/artifactgc-test7m7pm/
time="2024-03-28T15:43:28.624Z" level=warning msg="Non-transient error: The specified key does not exist."
time="2024-03-28T15:43:28.645Z" level=info msg="S3 Delete artifact: key: fanout/collected-other-datasets/artifactgc-test7m7pm/"
time="2024-03-28T15:43:28.689Z" level=info msg="Creating minio client using static credentials" endpoint=storage.googleapis.com
time="2024-03-28T15:43:28.689Z" level=info msg="Deleting object from s3" bucket=io-realcity-apps-demo-mars-test-data endpoint=storage.googleapis.com key=fanout/collected-other-datasets/artifactgc-test7m7pm/
time="2024-03-28T15:43:28.869Z" level=warning msg="Non-transient error: The specified key does not exist."
time="2024-03-28T15:43:28.909Z" level=info msg="S3 Delete artifact: key: fanout/collected-other-datasets/artifactgc-test7m7pm/"
time="2024-03-28T15:43:28.912Z" level=info msg="Creating minio client using static credentials" endpoint=storage.googleapis.com
time="2024-03-28T15:43:28.912Z" level=info msg="Deleting object from s3" bucket=io-realcity-apps-demo-mars-test-data endpoint=storage.googleapis.com key=fanout/collected-other-datasets/artifactgc-test7m7pm/
time="2024-03-28T15:43:29.023Z" level=warning msg="Non-transient error: The specified key does not exist."
time="2024-03-28T15:43:29.103Z" level=info msg="S3 Delete artifact: key: fanout/collected-other-datasets/artifactgc-test7m7pm/"
time="2024-03-28T15:43:29.291Z" level=info msg="Creating minio client using static credentials" endpoint=storage.googleapis.com
time="2024-03-28T15:43:29.292Z" level=info msg="Deleting object from s3" bucket=io-realcity-apps-demo-mars-test-data endpoint=storage.googleapis.com key=fanout/collected-other-datasets/artifactgc-test7m7pm/
time="2024-03-28T15:43:29.446Z" level=warning msg="Non-transient error: The specified key does not exist."

Logs from in your workflow's wait container

no relevant wait container. the artifact is uploaded successfully to

io-realcity-apps-demo-mars-test-data/fanout/collected-other-datasets/artifactgc-test7m7pm/hello-world.txt
@vilmosnagy
Copy link
Author

My guess is that an drv.IsDirectory(&artifact) check is needed somewhere here, and if the path is a directory, probably delete the whole dir?

https://github.com/argoproj/argo-workflows/blob/main/cmd/argoexec/commands/artifact/delete.go#L88-L95

But I'm not a go developer, definetly not familiar with argo source, and this could probably cause some side effects - eg. this deletes the whole dir, so if other, non workflow related stuff was in there, it would delete that as well.

@tczhao tczhao added the area/artifacts S3/GCP/OSS/Git/HDFS etc label Mar 28, 2024
@tczhao tczhao self-assigned this Mar 28, 2024
@agilgur5 agilgur5 added area/gc Garbage collection, such as TTLs, retentionPolicy, delays, and more P3 Low priority labels Mar 28, 2024
@agilgur5 agilgur5 changed the title artifactGC does not run when the output artifact is not archived artifactGC does not run when the output artifact uses archive: none Mar 29, 2024
@juliev0 juliev0 closed this as completed in a929c8f Jul 3, 2024
agilgur5 pushed a commit that referenced this issue Jul 6, 2024
Signed-off-by: Tianchu Zhao <[email protected]>
Signed-off-by: Anton Gilgur <[email protected]>
Co-authored-by: Anton Gilgur <[email protected]>
(cherry picked from commit a929c8f)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/artifacts S3/GCP/OSS/Git/HDFS etc area/gc Garbage collection, such as TTLs, retentionPolicy, delays, and more P3 Low priority type/bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants