Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

persistentvolumeclaims already exists #1130

Closed
canadiannomad opened this issue Dec 14, 2018 · 5 comments · Fixed by #1363
Closed

persistentvolumeclaims already exists #1130

canadiannomad opened this issue Dec 14, 2018 · 5 comments · Fixed by #1363
Assignees
Labels
Milestone

Comments

@canadiannomad
Copy link

BUG REPORT

What happened:

$ argo submit --watch generic.yaml -n argo
Name:                generic-nf42h
Namespace:           argo
ServiceAccount:      default
Status:              Error
Message:             persistentvolumeclaims "generic-nf42h-workdir" already exists
Created:             Fri Dec 14 12:16:30 -0500 (1 second ago)
Started:             Fri Dec 14 12:16:30 -0500 (1 second ago)
Finished:            Fri Dec 14 12:16:30 -0500 (1 second ago)
Duration:            0 seconds
Parameters:
  revision:          master

STEP              PODNAME                  DURATION  MESSAGE
 ● generic-nf42h
 └---◷ provision  generic-nf42h-319909044  1s

What you expected to happen:
generic-nf42h-workdir didn't exist before running, and exists after running.. There shouldn't have been an error.

How to reproduce it (as minimally and precisely as possible):
I'm thinking this has to do with me running a bare metal kubernetes cluster with rook.. But I can't find any logs that clarify what is going wrong.

Anything else we need to know?:
The first step completes successfully but if I have a second step it never reaches it.

Environment:

  • Argo version:
$ argo version
argo: v2.2.1
  BuildDate: 2018-10-11T16:25:59Z
  GitCommit: 3b52b26190163d1f72f3aef1a39f9f291378dafb
  GitTreeState: clean
  GitTag: v2.2.1
  GoVersion: go1.10.3
  Compiler: gc
  Platform: darwin/amd64
  • Kubernetes version :
$ kubectl version -o yaml
clientVersion:
  buildDate: 2018-10-24T06:54:59Z
  compiler: gc
  gitCommit: 17c77c7898218073f14c8d573582e8d2313dc740
  gitTreeState: clean
  gitVersion: v1.12.2
  goVersion: go1.10.4
  major: "1"
  minor: "12"
  platform: darwin/amd64
serverVersion:
  buildDate: 2018-12-03T20:56:12Z
  compiler: gc
  gitCommit: ddf47ac13c1a9483ea035a79cd7c10005ff21a6d
  gitTreeState: clean
  gitVersion: v1.13.0
  goVersion: go1.11.2
  major: "1"
  minor: "13"
  platform: linux/amd64

Other debugging information (if applicable):

  • workflow result:
$ argo get <workflowname>
NAME            STATUS   AGE   DURATION
generic-9rq8z   Error    10s   0s
  • executor logs:
$ kubectl logs <failedpodname> -c main
total 20
drwxr-xr-x    3 root     root          4096 Dec 14 17:33 .
drwxr-xr-x    1 root     root            28 Dec 14 17:33 ..
-rw-r--r--    1 root     root             0 Dec 14 17:33 file
drwx------    2 root     root         16384 Dec 14 17:33 lost+found
$ kubectl logs <failedpodname> -c wait
time="2018-12-14T17:33:22Z" level=info msg="Creating a docker executor"
time="2018-12-14T17:33:22Z" level=info msg="Executor (version: v2.2.0, build_date: 2018-08-30T08:52:54Z) initialized with template:\narchiveLocation: {}\ncontainer:\n  args:\n  - touch file; ls -al\n  command:\n  - sh\n  - -c\n  image: alpine/git\n  name: \"\"\n  resources: {}\n  volumeMounts:\n  - mountPath: /src\n    name: workdir\n  workingDir: /src\ninputs: {}\nmetadata: {}\nname: git-clone\noutputs: {}\n"
time="2018-12-14T17:33:22Z" level=info msg="Waiting on main container"
time="2018-12-14T17:33:23Z" level=info msg="main container started with container ID: 328ceb2261c0861eb60763c8531a9764a78898b1f715f71398586bd461606a43"
time="2018-12-14T17:33:23Z" level=info msg="Starting annotations monitor"
time="2018-12-14T17:33:23Z" level=info msg="docker wait 328ceb2261c0861eb60763c8531a9764a78898b1f715f71398586bd461606a43"
time="2018-12-14T17:33:23Z" level=info msg="Starting deadline monitor"
time="2018-12-14T17:33:23Z" level=info msg="Main container completed"
time="2018-12-14T17:33:23Z" level=info msg="No sidecars"
time="2018-12-14T17:33:23Z" level=info msg="No output artifacts"
time="2018-12-14T17:33:23Z" level=info msg="No output parameters"
time="2018-12-14T17:33:23Z" level=info msg="Annotations monitor stopped"
time="2018-12-14T17:33:23Z" level=info msg="Alloc=4412 TotalAlloc=11248 Sys=10086 NumGC=4 Goroutines=9"
  • workflow-controller logs:
$ kubectl logs -n kube-system $(kubectl get pods -l app=argo-workflow-controller -n kube-system -o name)
time="2018-12-14T17:33:18Z" level=info msg="Processing workflow" namespace=argo workflow=generic-9rq8z
time="2018-12-14T17:33:18Z" level=info msg="Updated phase  -> Running" namespace=argo workflow=generic-9rq8z
time="2018-12-14T17:33:18Z" level=info msg="Creating pvc generic-9rq8z-workdir" namespace=argo workflow=generic-9rq8z
time="2018-12-14T17:33:18Z" level=error msg="generic-9rq8z pvc create error: persistentvolumeclaims \"generic-9rq8z-workdir\" already exists" namespace=argo workflow=generic-9rq8z
time="2018-12-14T17:33:18Z" level=info msg="Updated phase Running -> Error" namespace=argo workflow=generic-9rq8z
time="2018-12-14T17:33:18Z" level=info msg="Updated message  -> persistentvolumeclaims \"generic-9rq8z-workdir\" already exists" namespace=argo workflow=generic-9rq8z
time="2018-12-14T17:33:18Z" level=info msg="Marking workflow completed" namespace=argo workflow=generic-9rq8z
time="2018-12-14T17:33:18Z" level=warning msg="Error updating workflow: Operation cannot be fulfilled on workflows.argoproj.io \"generic-9rq8z\": the object has been modified; please apply your changes to the latest version and try again" namespace=argo workflow=generic-9rq8z
time="2018-12-14T17:33:18Z" level=info msg="Re-appying updates on latest version and retrying update" namespace=argo workflow=generic-9rq8z
time="2018-12-14T17:33:18Z" level=info msg="Update retry attempt 1 successful" namespace=argo workflow=generic-9rq8z
time="2018-12-14T17:33:18Z" level=info msg="Workflow update successful" namespace=argo workflow=generic-9rq8z
time="2018-12-14T17:33:39Z" level=info msg="Alloc=4414 TotalAlloc=35407 Sys=16650 NumGC=39 Goroutines=62"

Workflow:

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: generic-
spec:
  entrypoint: runtest
  arguments:
    parameters:
    - name: revision
      value: master
  volumeClaimTemplates:
  - metadata:
      name: workdir
    spec:
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: 1Gi
  templates:
  - name: runtest
    steps:
    - - name: provision
        template: git-clone

  - name: git-clone
    container:
      image: alpine/git
      workingDir: /src
      command: [sh, -c]
      args: [ "touch file; ls -al" ]
      volumeMounts:
      - name: workdir
        mountPath: /src
@canadiannomad
Copy link
Author

Any ideas on where I can look to find out why it thinks the persistentvolumeclaim already exists despite it having created it? I'm still having this issue.

@jessesuen jessesuen added this to the v2.3 milestone Jan 22, 2019
@alexmt
Copy link
Contributor

alexmt commented Jan 25, 2019

Hello @canadiannomad . Thank you for providing sample workflow! I've tried to run it but could not reproduce issue.

I would suggest to watches for PVC using kubectl get pvc -w and start workflow in another terminal window. Before starting workflow there should be no PVCs with name generic- and after workflow started PVC should be created and deleted. Can you please attach kubectl get pvc -w output and controller logs?

@pplavetzki
Copy link

pplavetzki commented Apr 12, 2019

I am also receiving this error. Here is the output of the command kubectl get pvc -w just after running argo submit hello-world.yaml:

hello-world-bgmwd-service-data                                  Pending                                                                        standard       <invalid>
hello-world-bgmwd-service-data                                  Pending                                                                        standard       <invalid>
hello-world-bgmwd-service-data                                  Pending                                                                        standard       <invalid>
hello-world-bgmwd-service-data                                  Pending   pvc-c2f7d1e6-5cc4-11e9-abb7-0050569aba93   0                         standard       <invalid>
hello-world-bgmwd-service-data                                  Bound     pvc-c2f7d1e6-5cc4-11e9-abb7-0050569aba93   200Mi      RWO            standard       <invalid>

Controller Log:

time="2019-04-12T01:46:20Z" level=info msg="Processing workflow" namespace=jedi-operator workflow=hello-world-bgmwd
time="2019-04-12T01:46:20Z" level=info msg="Updated phase  -> Running" namespace=jedi-operator workflow=hello-world-bgmwd
time="2019-04-12T01:46:20Z" level=info msg="Creating pvc hello-world-bgmwd-service-data" namespace=jedi-operator workflow=hello-world-bgmwd
time="2019-04-12T01:46:21Z" level=info msg="Created pod: hello-world-bgmwd (hello-world-bgmwd)" namespace=jedi-operator workflow=hello-world-bgmwd
time="2019-04-12T01:46:21Z" level=info msg="Pod node hello-world-bgmwd (hello-world-bgmwd) initialized Pending" namespace=jedi-operator workflow=hello-world-bgmwd
time="2019-04-12T01:46:21Z" level=info msg="Workflow update successful" namespace=jedi-operator workflow=hello-world-bgmwd

What Happened:

Name:                hello-world-bgmwd
Namespace:           jedi-operator
ServiceAccount:      jedi-operator
Status:              Error
Message:             persistentvolumeclaims "hello-world-bgmwd-service-data" already exists
Created:             Thu Apr 11 18:46:16 -0700 (10 minutes from now)
Started:             Thu Apr 11 18:23:07 -0700 (12 minutes ago)
Finished:            Thu Apr 11 18:23:07 -0700 (12 minutes ago)
Duration:            0 seconds

STEP                  PODNAME            DURATION  MESSAGE
 ◷ hello-world-bgmwd  hello-world-bgmwd  10m 

Environment:

argo: v2.2.0
  BuildDate: 2018-08-30T08:51:40Z
  GitCommit: af636ddd8455660f307d835814d3112b90815dfd
  GitTreeState: clean
  GitTag: v2.2.0
  GoVersion: go1.10.3
  Compiler: gc
  Platform: darwin/amd64

Workflow:

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: hello-world-
spec:
  entrypoint: hello
  serviceAccountName: jedi-operator
  volumeClaimTemplates:
  - metadata:
      name: service-data
    spec:
      accessModes: [ "ReadWriteOnce" ]
      storageClassName: standard
      resources:
        requests:
          storage: 200Mi
  templates:
  - name: hello
    container:
      image: alpine
      command: ['/bin/sh', '-c']
      args: ['sleep 3;echo hello world']
      volumeMounts:
      - name: service-data
        mountPath: /var/data

K8S Version:

clientVersion:
  buildDate: "2019-03-25T15:53:57Z"
  compiler: gc
  gitCommit: 641856db18352033a0d96dbc99153fa3b27298e5
  gitTreeState: clean
  gitVersion: v1.14.0
  goVersion: go1.12.1
  major: "1"
  minor: "14"
  platform: darwin/amd64
serverVersion:
  buildDate: "2018-09-09T17:53:03Z"
  compiler: gc
  gitCommit: a4529464e4629c21224b3d52edfe0ea91b072862
  gitTreeState: clean
  gitVersion: v1.11.3
  goVersion: go1.10.3
  major: "1"
  minor: "11"
  platform: linux/amd64

Other Info:
This is using argo in a namespaced deployment without a cluster role:

Name:         argo-role
Labels:       <none>
Annotations:  kubectl.kubernetes.io/last-applied-configuration:
                {"apiVersion":"rbac.authorization.k8s.io/v1","kind":"Role","metadata":{"annotations":{},"name":"argo-role","namespace":"jedi-operator"},"r...
PolicyRule:
  Resources                         Non-Resource URLs  Resource Names  Verbs
  ---------                         -----------------  --------------  -----
  persistentvolumeclaims            []                 []              [create delete]
  pods/exec                         []                 []              [create get list watch update patch delete]
  pods                              []                 []              [create get list watch update patch delete]
  workflows.argoproj.io/finalizers  []                 []              [get list watch update patch delete]
  workflows.argoproj.io             []                 []              [get list watch update patch delete]
  configmaps                        []                 []              [get watch list]

Storage Class Info:

Name:            standard
IsDefaultClass:  No
Annotations:     kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"storage.k8s.io/v1","kind":"StorageClass","metadata":{"annotations":{},"name":"standard"},"parameters":{"diskformat":"thin"},"provisioner":"kubernetes.io/vsphere-volume"}

Provisioner:           kubernetes.io/vsphere-volume
Parameters:            diskformat=thin
AllowVolumeExpansion:  <unset>
MountOptions:          <none>
ReclaimPolicy:         Delete
Events:                <none>

@jessesuen jessesuen modified the milestones: v2.3, v2.4 Apr 19, 2019
@sarabala1979 sarabala1979 self-assigned this May 9, 2019
@jessesuen
Copy link
Member

The fix is we need to be idempotent about pvc creations. The following logic does not tolerate when we have already created the PVC:

		pvc, err := pvcClient.Create(&pvcTmpl)
		if err != nil {
			return err
		}
		vol := apiv1.Volume{
			Name: refName,
			VolumeSource: apiv1.VolumeSource{
				PersistentVolumeClaim: &apiv1.PersistentVolumeClaimVolumeSource{
					ClaimName: pvc.ObjectMeta.Name,
				},
			},
		}
		woc.wf.Status.PersistentVolumeClaims[i] = vol

The fix should be if we attempt to create the PVC and it gets a AlreadyExists error, AND the existing PVC has our ownership reference, then we can ignore the error.

sarabala1979 added a commit to sarabala1979/argo that referenced this issue May 9, 2019
sarabala1979 added a commit that referenced this issue Jun 7, 2019
* Fixed: persistentvolumeclaims already exists  #1130
decarboxy added a commit to CyrusBiotechnology/argo that referenced this issue Jan 16, 2020
* Validate ArchiveLocation artifacts (argoproj#1167)

* Update README and preview notice in CLA.

* Update README. (argoproj#1173) (argoproj#1176)

* Argo users: Equinor (argoproj#1175)

* Do not mount unnecessary docker socket (argoproj#1178)

* Issue argoproj#1113 - Wait for daemon pods completion to handle annotations (argoproj#1177)

* Issue argoproj#1113 - Wait for daemon pods completion to handle annotations

* Add output artifacts to influxdb-ci example

* Increased S3 artifact retry time and added log (argoproj#1138)

* Issue argoproj#1123 - Fix 'kubectl get' failure if resource namespace is different from workflow namespace (argoproj#1171)

* Refactor Makefile/Dockerfile to remove volume binding in favor of build stages (argoproj#1189)

* Add Docker Hub build hooks

* Add documentation how to use parameter-file's (argoproj#1191)

* Issue argoproj#988 - Submit should not print logs to stdout unless output is 'wide' (argoproj#1192)

* Fix missing docker binary in argoexec image. Improve reuse of image layers

* Fischerjulian adds ruby to rest docs (argoproj#1196)

* Adds link to ruby kubernetes library.

* Links to a ruby example on how to start a workflow

* Updated OWNERS (argoproj#1198)

* Update community/README (argoproj#1197)

* Issue argoproj#1128 - Use polling instead of fs notify to get annotation changes (argoproj#1194)

* Minor spelling, formatting, and style updates. (argoproj#1193)

* Dockerfile: argoexec base image correction (fixes argoproj#1209) (argoproj#1213)

* Set executor image pull policy for resource template (argoproj#1174)

* Add schedulerName to workflow and template spec (argoproj#1184)

* Issue argoproj#1190 - Fix incorrect retry node handling (argoproj#1208)

* fix dag retries (argoproj#1221)

* Executor can access the k8s apiserver with a out-of-cluster config file (argoproj#1134)

Executor can access the k8s apiserver with a out-of-cluster config file

* Update README with typo fixes (argoproj#1220)

* Update README.md (argoproj#1236)

* Remove extra quotes around output parameter value (argoproj#1232)

Ensure we do not insert extra single quotes when using
valueFrom: jsonPath to set the value of an output parameter for
resource templates.

Signed-off-by: Ilias Katsakioris <[email protected]>

* Update README.md (argoproj#1224)

* Include stderr when retrieving docker logs (argoproj#1225)

* Add Gardener to "Who uses Argo" (argoproj#1228)

* Add feature to continue workflow on failed/error steps/tasks (argoproj#1205)

* Fix the Prometheus address references (argoproj#1237)

* Fixed Issue#1223 Kubernetes Resource action: patch is not supported (argoproj#1245)

* Fixed Issue#1223 Kubernetes Resource action: patch is not supported

This PR is fixed the Issue#1223 reported by @shanesiebken . Argo kubernetes resource workflow failed on patch action. --patch or -p option is required for kubectl patch action.
This PR is including the manifest yaml as patch argument for kubectl. This Fix will support the Patch action in Argo kubernetes resource workflow.

This Fix will support only JSON merge strategic in patch action

* udpated formating

* typo, executo -> executor (argoproj#1243)

* Issue#1165 fake outputs don't notify and task completes successfully (argoproj#1247)

* Issue#1165 fake outputs don't notify and task completes successfully

This PR is addressing the Issue#1165 reported by @alexfrieden.

Issue/Bug: Argo is finishing the task successfully even artifact /file does exist.

Fix: Validate the created gzip contains artifact or file. if file/artifact doesn't exist, Current step/stage/task will be failed with log message .

Sample Log:
'''
INFO[0029] Updating node artifact-passing-lkvj8[0].generate-artifact (artifact-passing-lkvj8-1949982165) status Running -> Error
INFO[0029] Updating node artifact-passing-lkvj8[0].generate-artifact (artifact-passing-lkvj8-1949982165) message: failed to save outputs: File or Artifact does not exist. /tmp/hello_world.txt
INFO[0029] Step group node artifact-passing-lkvj8[0] (artifact-passing-lkvj8-1067333159) deemed failed: child 'artifact-passing-lkvj8-1949982165' failed  namespace=default workflow=artifact-passing-lkvj8
INFO[0029] node artifact-passing-lkvj8[0] (artifact-passing-lkvj8-1067333159) phase Running -> Failed  namespace=default workflow=artifact-passing-lkvj8
'''

* fixed gometalinter errcheck issue

* Git cloning via SSH was not verifying host public key (argoproj#1261)

* Update versions (argoproj#1218)

* Proxy Priority and PriorityClassName to pods (argoproj#1179)

* Error running 1000s of tasks: "etcdserver: request is too large" argoproj#1186 (argoproj#1264)

* Error running 1000s of tasks: "etcdserver: request is too large" argoproj#1186

This PR is addressing the feature request argoproj#1186.
Issue:
Nodestatus element keeps growing  for big workflow.  Workflow will fail once the workflow total size reachs 1 MB (maz size limit in ETCD) .
Solution:
Compressing the Nodestatus once size reachs the 1 MB which increasing 60% to 80% more steps to execute in compress mode.

Latest: Argo cli and Argo UI will able to decode and print nodestatus from compressednoode.

Limitation:
Kubectl willl not decode the compressedNode element

* added Operator.go

* revert the testing yaml

* Fixed the lint issue

* fixed

* fixed lint

* Fixed Testcase

* incorporated the review comments

* Reverted the change

* incorporated review comments

* fixing gometalinter checks

* incorporated review comments

* Update pod-limits.yaml

* updated few comments

* updated error message format

* reverted unwanted files

* Reduce redundancy pod label action (argoproj#1271)

* Add the `mergeStrategy` option to resource patching (argoproj#1269)

* This adds the ability to pass a mergeStrategy to a patch resource.
  this is valuable because the default merge strategy for kubernetes is
  'strategic', which does not work with Custom Resources.
* This also updates the resource example to demonstrate how it is used

* Fix bug with DockerExecutor's CopyFile (argoproj#1275)

The check to see if the source path was in the tgz archive was wrong
when source path was a folder, the arguments to strings.Contains were
inverted.

* Add workflow labels and annotations global vars (argoproj#1280)

* Argo CI is current inactive (argoproj#1285)

* Issue#896 Workflow steps with non-existant output artifact path will succeed (argoproj#1277)

* Issue#896 Workflow steps with non-existant output artifact path will succeed

Issue: argoproj#897
Solution: Added new element "optional" in Artifact. The default is false.  This flag will make artifact as optional and existence check will be ignored if input/output artifact has optional=true.

Output Artifact ( optional=true ):
Artifact existence check will be ignored during the save artifact in destination and continued workflow

Input Artifact ( optional=true ):
Artifact exist check will be ignored during load artifact from source and continued workflow

* added end of line

* removed unwanted whitespace

* Deleted test code

* go formatted

* added formatting directives

* updated Codegen

* Fixed format on merge conflict

* format fix

* updated comments

* improved error case

* Fix for Resource creation where template has same parameter templating (argoproj#1283)

* Fix for Resource creation where template has same parameter templating

This PR will enable to support the custom template  variable reference.
Soulltion: Workflow  variable reference resolve will check the Workflow variable prefix.

* added test

* fixed gofmt issue

* fixed format

* fixed gofmt on common.go

* fixed testcase

* fixed gofmt

* Added unit testcase and documented

* fixed Gofmt format

* updated comments

* Admiralty: add link to blog post, add user (argoproj#1295)

* Add dns config support (argoproj#1301)

* Speed up podReconciliation using parallel goroutine (argoproj#1286)

* Speed up podReconciliation using parallel goroutine

* Fix make lint issue

* put checkandcompress back

* Add community meeting notes link (argoproj#1304)

* Add Karius to users in README.md (argoproj#1305)

* Added support for artifact path references (argoproj#1300)

* Added support for artifact path references
Adds new `{{inputs.artifacts.<NAME>.path}}` and `{{outputs.artifacts.<NAME>.path}}`  placeholders.

* Add support for init containers (argoproj#1183)

* Secrets should be passed to pods using volumes instead of API calls (argoproj#1302)

* Secrets should be passed to pods using downward API instead of API calls

* Fixed Gogfmt format

* fixed file close Gofmt

* updated review comments

* fixed gofmt

* updated review comments

* CheckandEstimate implementation to optimize podReconciliation (argoproj#1308)

* CheckandEstimate implementation

* fixed variable rename

* fixed gofmt

* fixed feedbacks

* Update operator.go

* Update operator.go

* Add alibaba cloud to officially using argo list (argoproj#1313)

* Refactor checkandEstimate to optimize podReconciliation (argoproj#1311)

* Refactor checkandEstimate to optimize podReconciliation

* Move compress function to persistUpdates

* Fix formatting issues in examples documentation (argoproj#1310)

* Fix nil pointer dereference with secret volumes (argoproj#1314)

* Archive location should conditionally be added to template only when needed

* Fix SIGSEGV in watch/CheckAndDecompress. Consolidate duplicate code (resolves argoproj#1315)

* Implement support for PNS (Process Namespace Sharing) executor (argoproj#1214)

* Implements PNS (Process Namespace Sharing) executor
* Adds limited support for Kubelet/K8s API artifact collection by mirroring volume mounts to wait sidecar
* Adds validation to detect when output artifacts are not supported by the executor
* Adds ability to customize executor from workflow-controller-configmap (e.g. add environment variables, append command line args such as loglevel)
* Fixes an issue where daemon steps were not getting terminated properly

* Reorganize manifests to kustomize 2 and update version to v2.3.0-rc1

* Update v2.3.0 CHANGELOG.md

* Export the methods of `KubernetesClientInterface` (argoproj#1294)

All calls to these methods previously generated a panic at runtime
because the calls resolved to the default, panic-always implementation,
not to the overrides provided by `k8sAPIClient` and `kubeletClient`.

Embedding an exported interface with unexported methods into a struct is
the only way to implement that interface in another package.  When doing
this, the compiler generates default, panic-always implementations for
all methods from the interface.  Implementors can override exported
methods, but it's not possible to override an unexported method from the
interface.  All invocations that go through the interface will come to
the default implementation, even if the struct tries to provide an
override.

* Update README.md (argoproj#1321)

* Issue1316 Pod creation with secret volumemount  (argoproj#1318)

* CheckandEstimate implementation

* fixed variable rename

* fixed gofmt

* fixed feedbacks

* Fixed the duplicate mountpath issue

* Support parameter substitution in the volumes attribute (argoproj#1238)

* `argo list` was not displaying non-zero priorities correctly

* Fix regression where argoexec wait would not return when podname was too long

* wait will conditionally become privileged if main/sidecar privileged (resolves argoproj#1323)

* Update version to v2.3.0-rc2. Update changelog

* Add documentation on releasing

* Fix missing template local volumes, Handle volumes only used in init containers (argoproj#1342)

* Fix argoproj#1340 parameter substitution bug (argoproj#1345)

Also create podParams map in substitutePodParams

Signed-off-by: Ilias Katsakioris <[email protected]>

* add / test (argoproj#1240)

* Fix input artifacts with multiple ssh keys (argoproj#1338)

* Fixed : Validate the secret credentials name and key (argoproj#1358)

* CheckandEstimate implementation

* fixed variable rename

* fixed gofmt

* fixed feedbacks

* Fixed Issue1355

* fixed style

* Delete e2e_temp.tmp

* Fix: # 1328 argo submit --wait and argo wait quits while workflow is running (argoproj#1347)

* CheckandEstimate implementation

* fixed variable rename

* fixed gofmt

* fixed feedbacks

* Fixed argo submit --wait and argo wait quits while workflow is running

* fixed Style

* Update version to v2.3.0-rc3

* Update release instructions

* Add --status filter for get command (argoproj#1325)

* Support an easy way to set owner reference (argoproj#1333)

* Use golangci-lint instead of deprecated gometalinter (argoproj#1335)

* [Fix argoproj#1242] Failed DAG nodes are now kept and set to running on RetryWorkflow. (argoproj#1250)

* Fixed :  CLI Does Not Honor metadata.namespace argoproj#1288 (argoproj#1352)

* Validate action for resource templates (argoproj#1346)

* Fix issue where a DAG with exhausted retries would get stuck Running (argoproj#1364)

* Update README.md (argoproj#1372)

Add Adevinta.
https://www.adevinta.com/

* Update docs for the v2.3.0 release and to use the stable tag

* Add Max Kelsen to USERS in README.md (argoproj#1374)

Max Kelsen us utilising Argo throughout the organisation to manage data processing and machine learning pipelines. 

Incredibly thankful to the great community!

* Fixed: Support hostAliases in WorkflowSpec argoproj#1265 (argoproj#1365)

* Fixed : Support hostAliases in WorkflowSpec argoproj#1265

* Fixed:  failed to save outputs: verify serviceaccount default:default has necessary privileges (argoproj#1362)

Fixed:  failed to save outputs: verify serviceaccount default:default has necessary privileges (argoproj#1362)

* Fixed: make verify-codegen is failing on the master branch (argoproj#1399) (argoproj#1400)

* Fixed: withParam parsing of JSON/YAML lists argoproj#1389 (argoproj#1397)

* Make locating kubeconfig in example os independent (argoproj#1393)

* Added Argo Rollouts to README (argoproj#1388)

* Add Mirantis as an official user (argoproj#1401)

* Update README.md (argoproj#1402)

* Update README.md (argoproj#1404)

Includes SAP Fieldglass in users section.

* Fiixed: persistentvolumeclaims already exists argoproj#1130 (argoproj#1363)

* Fixed: persistentvolumeclaims already exists  argoproj#1130

* chore: add IBM to official users section in README.md (argoproj#1409)

* Orders uses alphabetically (argoproj#1411)

* Update OWNERS (argoproj#1429)

* Typo fix in ARTIFACT_REPO.md (argoproj#1425)

In the non-default artifact repo section, when showing the gcs example the bucket name said 'my-aws-bucket-name'. I've updated this to say 'my-gcs-bucket-name'.

Super minor change but I've been banging my head against artifact repo outputs all day and this was bothering me.

* Add OVH as official user (argoproj#1417)

Add OVH as official user

* Update demo.md (argoproj#1396)

Step 2 instructs the user to create the namespace `argo`, and the coin-flip (at least) uses the service account `argo`, so it makes sense to provide `--serviceaccount=argo:argo` so that the initial experience works, "out of the box".

* Fix typo (argoproj#1431)

* PNS executor intermitently failed to capture entire log of script templates (argoproj#1406)

* Terminate all containers within pod after main container completes (argoproj#1423)

Resolves argoproj#1422

* Ability to configure hostPath mount for `/var/run/docker.sock` (argoproj#1419)

* CheckandEstimate implementation

* fixed variable rename

* fixed gofmt

* fixed feedbacks

* implement the configurable Docker sock path

* Update workflowpod.go

* Style updated

* Fixed:  Implemented Template level service account (argoproj#1354)

Fixed:  Implemented Template level service account (argoproj#1354)

* Add paging function for list command (argoproj#1420)

* Add paging function for list command (argoproj#1420)

* Revert "Update demo.md (argoproj#1396)" (argoproj#1433)

This reverts commit 5635c33.

* Update documentation for workflow.outputs.artifacts (argoproj#1439)

* Improve bash completion (argoproj#1437)

* Add threekit to user list (argoproj#1444)

* Fix demo's doc issue of install minio chart (argoproj#1450)

Signed-off-by: Aisuko <[email protected]>

* mention sidecar in failure message for sidecar containers (argoproj#1430)

* Centralized Longterm workflow persistence storage  (argoproj#1344)


* Centralized Longterm workflow persistence storage  implementaion

* New Feature:  provide failFast flag, allow a DAG to run all branches of the DAG (either success or failure) (argoproj#1443)

* Fix bug:   dag will missing some nodes when another branch node fails

* Add test file

* New Feature:   provide failFast  flag, allow a DAG to run all branches of the DAG (either success or failure)

* Move failFast flag to DAG template spec

* * Move test case file to test/e2e/expectedfailures since it is expected to fail
* Remove unused check code

* issue-1445: changing temp directory for output artifacts from root to tmp (argoproj#1458)

* Support PodSecurityContext (argoproj#1463)

* Add doc about failFast feature (argoproj#1453)

* Added Codec to the Argo community list (argoproj#1477)

* fix typo: symboloic > symbolic (argoproj#1478)

* Add --no-color flag to logs (argoproj#1479)

* Fix failFast bug:   When a node in the middle fails, the entire workflow will hang (argoproj#1468)

* Document the insecureIgnoreHostKey git flag (argoproj#1483)

* Fix: 1008 `argo wait` and `argo submit --wait` should exit 1 if workflow fails  (argoproj#1467)

Fix: 1008 `argo wait` and `argo submit --wait` should exit 1 if workflow fails  (argoproj#1467)

* Update OWNERS (argoproj#1485)

* Add Commodus Tech as official user (argoproj#1484)

* Fix: Argo CLI should show warning if there is no workflow definition in file argoproj#1486

Fix: Argo CLI should show warning if there is no workflow definition in file argoproj#1486

* Exposed workflow priority as a variable (argoproj#1476)

* Fix argoproj#1366 unpredictable global artifact behavior (argoproj#1461)

* Fix: Support the List within List type in withParam argoproj#1471 (argoproj#1473)

Fix: Support the List within List type in withParam argoproj#1471 (argoproj#1473)

* Fix a compiler error (argoproj#1500)

Fix a compiler error (argoproj#1500)

* Readme update to add argo and airflow comparison (argoproj#1502)

* Added argo vs airflow presentation

* Update README.md

* change 'continue-on-fail' example to better reflect its description (argoproj#1494)

* Implemented Conditionally annotate outputs of script template only when consumed argoproj#1359 (argoproj#1462)

* Fixed argoproj#1359 Implemented Conditionally annotate outputs of script template only when consumed

* Allow output parameters with .value, not only .valueFrom (argoproj#1336)

Fixed argoproj#1329 Allow output parameters with .value, not only .valueFrom (argoproj#1336)

* Fix the lint target (argoproj#1505)

Fix the lint target

This fixes an issue with the `make lint` target, where if a developer
has golangci-lint installed and also has linter errors, the linter fails
with an error, causing the next case to fall through (and the old linter
is run).

This also fixes all of the linter errors that had somehow cropped up in
the repo.

* Fix a compiler error in a unit test (argoproj#1514)

* Allow Makefile variables to be set from the command line (argoproj#1501)

This changes the assignment operator of various Makefile variables from
the recursive expansion operator (=) to the conditional assignment
operator (?=), such that a developer can define their own values for
those variables. This is highly valuable for a dev who wants to do local
development with a local docker image

* Fixed argoproj#1287 Executor kubectl version is obsolete (argoproj#1513)

Fixed argoproj#1287 Executor kubectl version is obsolete (argoproj#1513)

* Fix issue [Documentation] kubectl get service argo-artifacts -o wide (argoproj#1516)

* Allow overriding workflow labels in 'argo submit' (argoproj#1475)

* Support git shallow clones and additional ref fetches (argoproj#1521)

Implemented a `depth` field for git artifact configuration that, when
specified, will result in a shallow clone (and fetch) of the given
number of commits from the branch tip.

Implemented a `fetch` field for git artifact configuration that fetches
the given refspecs prior to checkout. This is necessary when one wants
to retrieve git revisions that exist in non-branch/-tag refs.

The motivation for these features is to support retrieval of patchset
refs from Gerrit code review (`refs/changes/[n]/[change]/[patch]`) but
these new fields should provide more flexibility to anyone integrating
with other git-based systems.

* Add --dry-run option to `argo submit` (argoproj#1506)

* Fix validation (argoproj#1508)

* Implemented support for WorkflowSpec.ArtifactRepositoryRef (argoproj#1350)

This change allows the workflow to specify the reference the configMap holding the artifact repository configuration.

* Fix argo logs empty content when workflow run in virtual kubelet env (argoproj#1201)

* Expose all input parameters to template as JSON (argoproj#1488)

* WorkflowTemplate CRD (argoproj#1312)

* Added Architecture doc (argoproj#1515)

Fixed argoproj#894 Added Architecture doc (argoproj#1515)

* Format sources and order imports with the help of goimports (argoproj#1504)

* Update ISSUE_TEMPLATE.md (argoproj#1528)

edit to follow to current README.md installation guides.

* Introduce podGC strategy for deleting completed/successful pods (argoproj#1234)

* Update CHANGELOG for v2.4 (argoproj#1531)

* Update README.md (argoproj#1533)

* Use cache to retrieve WorkflowTemplates (argoproj#1534)

* Update argo dependencies to kubernetes v1.14 (argoproj#1530)

* Update argo dependencies to kubernetes v1.14

* Update version to v2.4.0-rc1

* Update main.go (argoproj#1536)

* Update main.go (argoproj#1536)

* Remove GLog config from argo executor (argoproj#1537)

* Remove GLog config from argo executor (argoproj#1537)

* Initialize the wfClientset before using it (argoproj#1548)

* docs(readme): fix workflow types link (argoproj#1560)

* Optimize argo binary install documentation (argoproj#1563)

* Document workflow controller dockerSockPath config (argoproj#1555)

* Add coverage make target (argoproj#1557)

* Fix issue saving outputs which overlap paths with inputs (argoproj#1567)

* Support AutomountServiceAccountToken and executor specific service account(argoproj#1480)

* added DataStax as an organization that uses Argo (argoproj#1576)

* Fix inputs and arguments during template resolution (argoproj#1545)

* Add entrypoint label to workflow default labels (argoproj#1550)

* remove redundant codes (argoproj#1582)

Signed-off-by: xiechengsheng <[email protected]>

* Fix workflow template in namespaced controller (argoproj#1580)

* Add workflow template permissions to namespaced deployment manifests

* Use filtered shared informer factory for namespaced deployment

* Regard resource templates as leaf nodes (argoproj#1593)

This enables retryStrategy to be respected on resource templates.
This closes argoproj#1370

* Update from github.com/ghodss/yaml to sigs.k8s.io/yaml (argoproj#1572)

* Update Gopkg.toml and Gopkg.lock (argoproj#1596)

* Issue1571  Support ability to assume IAM roles in S3 Artifacts  (argoproj#1587)

* Fixed: Ability to interface with S3 using assumed roles (session tokens)
This PR fixes argoproj#1571

* Added retry around RuntimeExecutor.Wait call when waiting for main container completion (argoproj#1597)

* Do not relocate the mounted docker.sock (argoproj#1607)

The mount path of the docker.sock should not depend on the host path of the docker.sock

* Fix DAG enable failFast will hang in some case (argoproj#1595)

* Fix failFast will hang in some case

* Increased Lint timeout (argoproj#1612)

* Add merge keys to Workflow objects to allow for StrategicMergePatches (argoproj#1611)

* Small code cleanup and add tests (argoproj#1562)

* Added WorkflowStatus and NodeStatus types to the Open API Spec. (argoproj#1614)

* Prevent controller from crashing due to glog writing to /tmp (argoproj#1613)

* Updated the API Rule Violations list (argoproj#1618)

* updated invite link (argoproj#1621)

* Increase timeout of golangci-lint (argoproj#1623)

* Store resolved templates (argoproj#1552)

* Store resolved templates in node status

* Update operator.go (argoproj#1630)

* Update operator.go

* update API

* Fix retry workflow state (argoproj#1632)

* Save stored template ID in nodes (argoproj#1631)

* Grant get secret role to controller to support persistence (argoproj#1615)

* Regenerate installation manifests (argoproj#1638)

* Update CHANGELOG for v2.4.0 (argoproj#1636)

* Update version to v2.4.0

* Add back SetGlogLevel calls

* Fix regression where parallelism could cause workflow to fail (argoproj#1639)

* Fix regression where global outputs were unresolveable in DAGs (argoproj#1640)

* Fix global lint issue (argoproj#1641)

* pin colinmarc/hdfs to the next commit, which no longer has vendored deps (argoproj#1622)

* Delay killing sidecars until artifacts are saved (argoproj#1645)

* fixed example wrong comment (argoproj#1643)

* Fix missing merged changes in validate.go (argoproj#1647)

* Fix DAG output aggregation (argoproj#1648)

* Fix dag output aggregation correctly (argoproj#1649)

* Use stored templates to raggregate step outputs (argoproj#1651)

* Fix child node template handling (argoproj#1654)

* Stop failing if artifact file exists, but empty (argoproj#1653)

* Resolve WorkflowTemplate lazily (argoproj#1655)

* Don't provision VM for empty artifacts (argoproj#1660)

* Update version to v2.4.1

* Fix typo (argoproj#1679)

* Handle sidecar killing properly (argoproj#1675)

* Update README.md  Argo Ansible role: Provisioning Argo Workflows on Kubernetes/OpenShift (argoproj#1673)

* Handle retried node properly (argoproj#1669)

* Store locally referenced template properly (argoproj#1670)

* Update version to v2.4.2

* Fix issue that workflow.priority substitution didn't pass validation (argoproj#1690)

* Added status of previous steps as variables (argoproj#1681)

* Print multiple workflows in one command (argoproj#1650)

* Fix retry node processing (argoproj#1694)

* Apply Strategic merge patch against the pod spec (argoproj#1687)

* fixed broke metrics endpoint per argoproj#1634 (argoproj#1695)

* Fixed incorrect `pod.name` in retry pods (argoproj#1699)

* Added ability to auto-resume from suspended state (argoproj#1715)

* Filter workflows in list  based on name prefix (argoproj#1721)

* Support no-headers flag (argoproj#1760)

* Refactoring Template Resolution Logic (argoproj#1744)

* Fix retry node name issue on error (argoproj#1732)

* Do not resolve remote templates in lint (argoproj#1787)

* Handle operation level errors PVC in Retry (argoproj#1762)

* Added hint when using certain tokens in when expressions (argoproj#1810)

* Added hint when using certain tokens in when expressions

* Minor

* SSL enabled database connection for workflow repository (argoproj#1712) (argoproj#1756)

* Error occurred on pod watch should result in an error on the wait container (argoproj#1776)

* Update version to v2.4.3

* Update version to v2.4.3

* rename

* fixing jenkins, committing extra changes

* jenkins

Co-authored-by: Daisuke Taniwaki <[email protected]>
Co-authored-by: Ed Lee <[email protected]>
Co-authored-by: Erik Parmann <[email protected]>
Co-authored-by: Alexander Matyushentsev <[email protected]>
Co-authored-by: kshamajain99 <[email protected]>
Co-authored-by: Jesse Suen <[email protected]>
Co-authored-by: Marcin Karkocha <[email protected]>
Co-authored-by: Julian Fischer <[email protected]>
Co-authored-by: Anna Winkler <[email protected]>
Co-authored-by: Ilias Katsakioris <[email protected]>
Co-authored-by: jdfalko <[email protected]>
Co-authored-by: Greg Roodt <[email protected]>
Co-authored-by: Naoto Migita <[email protected]>
Co-authored-by: shahin <[email protected]>
Co-authored-by: Tim Schrodi <[email protected]>
Co-authored-by: Matthew Coleman <[email protected]>
Co-authored-by: Saravanan Balasubramanian <[email protected]>
Co-authored-by: Nick Stott <[email protected]>
Co-authored-by: Ismail Alidzhikov <[email protected]>
Co-authored-by: Xianlu Bird <[email protected]>
Co-authored-by: Ian Howell <[email protected]>
Co-authored-by: Fred Dubois <[email protected]>
Co-authored-by: Johannes 'fish' Ziemke <[email protected]>
Co-authored-by: Adrien Trouillaud <[email protected]>
Co-authored-by: xubofei1983 <[email protected]>
Co-authored-by: Alexey Volkov <[email protected]>
Co-authored-by: Clemens Lange <[email protected]>
Co-authored-by: Chris Chambers <[email protected]>
Co-authored-by: Hideto Inamura <[email protected]>
Co-authored-by: almariah <[email protected]>
Co-authored-by: Cristian Pop <[email protected]>
Co-authored-by: Jaime <[email protected]>
Co-authored-by: Jacob O'Farrell <[email protected]>
Co-authored-by: Ben Wells <[email protected]>
Co-authored-by: Paul Brit <[email protected]>
Co-authored-by: Brandon Steinman <[email protected]>
Co-authored-by: alex weidner <[email protected]>
Co-authored-by: Alex Collins <[email protected]>
Co-authored-by: ianCambrio <[email protected]>
Co-authored-by: Jean-Louis Queguiner <[email protected]>
Co-authored-by: Stephen Steiner <[email protected]>
Co-authored-by: Jonathon Belotti <[email protected]>
Co-authored-by: Semjon Kopp <[email protected]>
Co-authored-by: Orion Delwaterman <[email protected]>
Co-authored-by: Edwin Jacques <[email protected]>
Co-authored-by: Ziyang Wang <[email protected]>
Co-authored-by: Aisuko <[email protected]>
Co-authored-by: tralexa <[email protected]>
Co-authored-by: Alex Capras <[email protected]>
Co-authored-by: mark9white <[email protected]>
Co-authored-by: Mostapha Sadeghipour Roudsari <[email protected]>
Co-authored-by: commodus-sebastien <[email protected]>
Co-authored-by: Mukulikak <[email protected]>
Co-authored-by: Daniel Duvall <[email protected]>
Co-authored-by: Anes Benmerzoug <[email protected]>
Co-authored-by: Christian Muehlhaeuser <[email protected]>
Co-authored-by: hidekuro <[email protected]>
Co-authored-by: jacky <[email protected]>
Co-authored-by: Brian Mericle <[email protected]>
Co-authored-by: Takayuki Kasai <[email protected]>
Co-authored-by: Xie.CS <[email protected]>
Co-authored-by: John Wass <[email protected]>
Co-authored-by: Premkumar Masilamani <[email protected]>
Co-authored-by: Pablo Osinaga <[email protected]>
Co-authored-by: David Seapy <[email protected]>
Co-authored-by: Anastasia Satonina <[email protected]>
Co-authored-by: Simon Behar <[email protected]>
Co-authored-by: Tobias Bradtke <[email protected]>
Co-authored-by: Marek Čermák <[email protected]>
Co-authored-by: Rick Avendaño <[email protected]>
Co-authored-by: sang <[email protected]>
Co-authored-by: Antoine Dao <[email protected]>
Co-authored-by: gerdos82 <[email protected]>
@audriusrudalevicius
Copy link

audriusrudalevicius commented Jan 23, 2020

Im still getting persistentvolumeclaims "ml-work-rdbcq-workdir-1" already exists 1/10 time when i run my workflow.

What i want is to pre-download files before processing them, do it in parallel

argo get ml-work-rdbcq
Name:                ml-work-rdbcq
Namespace:           default
ServiceAccount:      default
Status:              Running
Message:             persistentvolumeclaims "ml-work-rdbcq-workdir-1" already exists
Created:             Thu Jan 23 14:52:13 +0200 (1 minute ago)
Started:             Thu Jan 23 14:52:13 +0200 (1 minute ago)
Finished:            Thu Jan 23 14:52:13 +0200 (1 minute ago)
Duration:            0 seconds
Parameters:
  parallel:          6

STEP                                          PODNAME                   DURATION  MESSAGE
 ● ml-work-rdbcq (download-images)
 └-·-◷ download-images-shard(0:1) (download)  ml-work-rdbcq-4015706135  1m
   ├-◷ download-images-shard(1:2) (download)  ml-work-rdbcq-2730545253  1m
   ├-◷ download-images-shard(2:3) (download)  ml-work-rdbcq-4012591687  1m
   ├-◷ download-images-shard(3:4) (download)  ml-work-rdbcq-232276801   1m
   ├-◷ download-images-shard(4:5) (download)  ml-work-rdbcq-1959765255  1m
   └-◷ download-images-shard(5:6) (download)  ml-work-rdbcq-1994131533  1m

Pvc:

 kubectl get pvc
NAME                              STATUS        VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
ml-work-rdbcq-workdir-1           Bound         pvc-b3dc6520-d332-4425-a7d0-b497cee0d8bb   1Gi        RWO            standard       2m44s
ml-work-rdbcq-workdir-2           Bound         pvc-3d07926b-c765-4627-98e8-ef9df4d173db   1Gi        RWO            standard       2m44s
ml-work-rdbcq-workdir-3           Bound         pvc-fc163a62-ea2c-473b-bc5f-0e4085ba7e3c   1Gi        RWO            standard       2m44s
ml-work-rdbcq-workdir-4           Bound         pvc-64a05440-195f-4748-97fe-a8b1effe4089   1Gi        RWO            standard       2m44s
ml-work-rdbcq-workdir-5           Bound         pvc-f6103cf7-60c8-4bd3-9880-49af98527755   1Gi        RWO            standard       2m44s
ml-work-rdbcq-workdir-6           Bound         pvc-e4b82cc8-f666-4f82-a08e-dd787b7accf2   1Gi        RWO            standard       2m44s

Kubernetes version:

Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.0", GitCommit:"91e7b4fd31fcd3d5f436da26c980becec37ceefe", GitTreeState:"clean", BuildDate:"2018-06-27T22:30:22Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"15+", GitVersion:"v1.15.4-gke.22", GitCommit:"a6ba43f5a24ac29e631bb627c9b2a719c4e93638", GitTreeState:"clean", BuildDate:"2019-11-26T00:40:25Z", GoVersion:"go1.12.11b4", Compiler:"gc", Platform:"linux/amd64"}

Argo version:
argoproj/workflow-controller:v2.4.3

Controller logs:

time="2020-01-23T12:52:13Z" level=info msg="Processing workflow" namespace=default workflow=ml-work-rdbcq
time="2020-01-23T12:52:13Z" level=info msg="Updated phase  -> Running" namespace=default workflow=ml-work-rdbcq
time="2020-01-23T12:52:13Z" level=info msg="Creating pvc ml-work-rdbcq-workdir-1" namespace=default workflow=ml-work-rdbcq
time="2020-01-23T12:52:13Z" level=info msg="Creating pvc ml-work-rdbcq-workdir-2" namespace=default workflow=ml-work-rdbcq
time="2020-01-23T12:52:13Z" level=info msg="Creating pvc ml-work-rdbcq-workdir-3" namespace=default workflow=ml-work-rdbcq
time="2020-01-23T12:52:13Z" level=info msg="Creating pvc ml-work-rdbcq-workdir-4" namespace=default workflow=ml-work-rdbcq
time="2020-01-23T12:52:13Z" level=info msg="Creating pvc ml-work-rdbcq-workdir-5" namespace=default workflow=ml-work-rdbcq
time="2020-01-23T12:52:13Z" level=info msg="Creating pvc ml-work-rdbcq-workdir-6" namespace=default workflow=ml-work-rdbcq
time="2020-01-23T12:52:13Z" level=info msg="Steps node ml-work-rdbcq (ml-work-rdbcq) initialized Pending" namespace=default workflow=ml-work-rdbcq
time="2020-01-23T12:52:13Z" level=info msg="node ml-work-rdbcq (ml-work-rdbcq) phase Pending -> Running" namespace=default workflow=ml-work-rdbcq
time="2020-01-23T12:52:13Z" level=info msg="StepGroup node ml-work-rdbcq[0] (ml-work-rdbcq-3903295395) initialized Running" namespace=default workflow=ml-work-rdbcq
time="2020-01-23T12:52:13Z" level=info msg="Pod node ml-work-rdbcq[0].download-images-shard(0:1) (ml-work-rdbcq-4015706135) initialized Pending" namespace=default workflow=ml-work-rdbcq
time="2020-01-23T12:52:15Z" level=info msg="Created pod: ml-work-rdbcq[0].download-images-shard(0:1) (ml-work-rdbcq-4015706135)" namespace=default workflow=ml-work-rdbcq
time="2020-01-23T12:52:15Z" level=info msg="Pod node ml-work-rdbcq[0].download-images-shard(1:2) (ml-work-rdbcq-2730545253) initialized Pending" namespace=default workflow=ml-work-rdbcq
time="2020-01-23T12:52:17Z" level=info msg="Created pod: ml-work-rdbcq[0].download-images-shard(1:2) (ml-work-rdbcq-2730545253)" namespace=default workflow=ml-work-rdbcq
time="2020-01-23T12:52:17Z" level=info msg="Pod node ml-work-rdbcq[0].download-images-shard(2:3) (ml-work-rdbcq-4012591687) initialized Pending" namespace=default workflow=ml-work-rdbcq
time="2020-01-23T12:52:18Z" level=info msg="Created pod: ml-work-rdbcq[0].download-images-shard(2:3) (ml-work-rdbcq-4012591687)" namespace=default workflow=ml-work-rdbcq
time="2020-01-23T12:52:18Z" level=info msg="Pod node ml-work-rdbcq[0].download-images-shard(3:4) (ml-work-rdbcq-232276801) initialized Pending" namespace=default workflow=ml-work-rdbcq
time="2020-01-23T12:52:20Z" level=info msg="Created pod: ml-work-rdbcq[0].download-images-shard(3:4) (ml-work-rdbcq-232276801)" namespace=default workflow=ml-work-rdbcq
time="2020-01-23T12:52:20Z" level=info msg="Pod node ml-work-rdbcq[0].download-images-shard(4:5) (ml-work-rdbcq-1959765255) initialized Pending" namespace=default workflow=ml-work-rdbcq
time="2020-01-23T12:52:22Z" level=info msg="Created pod: ml-work-rdbcq[0].download-images-shard(4:5) (ml-work-rdbcq-1959765255)" namespace=default workflow=ml-work-rdbcq
time="2020-01-23T12:52:22Z" level=info msg="Pod node ml-work-rdbcq[0].download-images-shard(5:6) (ml-work-rdbcq-1994131533) initialized Pending" namespace=default workflow=ml-work-rdbcq
time="2020-01-23T12:52:23Z" level=info msg="Created pod: ml-work-rdbcq[0].download-images-shard(5:6) (ml-work-rdbcq-1994131533)" namespace=default workflow=ml-work-rdbcq
time="2020-01-23T12:52:23Z" level=info msg="Workflow step group node ml-work-rdbcq[0] (ml-work-rdbcq-3903295395) not yet completed" namespace=default workflow=ml-work-rdbcq
time="2020-01-23T12:52:23Z" level=warning msg="Error updating workflow: Operation cannot be fulfilled on workflows.argoproj.io \"ml-work-rdbcq\": the object has been modified; please apply your changes to the latest version and try again Conflict" namespace=default workflow=ml-work-rdbcq
time="2020-01-23T12:52:23Z" level=info msg="Re-appying updates on latest version and retrying update" namespace=default workflow=ml-work-rdbcq
time="2020-01-23T12:52:23Z" level=info msg="Update retry attempt 1 successful" namespace=default workflow=ml-work-rdbcq
time="2020-01-23T12:52:23Z" level=info msg="Workflow update successful" namespace=default workflow=ml-work-rdbcq

Pod description:

Name:               ml-work-rdbcq-4015706135
Namespace:          default
Priority:           0
PriorityClassName:  <none>
Node:               gke-nnn-dask-pool-1-aadaae80-76z2/10.164.0.2
Start Time:         Thu, 23 Jan 2020 14:52:19 +0200
Labels:             workflows.argoproj.io/completed=false
                    workflows.argoproj.io/workflow=ml-work-rdbcq
Annotations:        kubernetes.io/limit-ranger=LimitRanger plugin set: cpu request for container wait
                    workflows.argoproj.io/node-name=ml-work-rdbcq[0].download-images-shard(0:1)
                    workflows.argoproj.io/template={"name":"download","arguments":{},"inputs":{"parameters":[{"name":"partition","value":"1"}]},"outputs":{},"metadata":{},"container":{"name":"","image":"alpine:latest","c...
Status:             Succeeded
IP:                 10.40.126.49
Controlled By:      Workflow/ml-work-rdbcq
Containers:
  wait:
    Container ID:  docker:https://660bd400a5941c7d6b235d83f0b6b556be1aa7c8b48b67f2647cb75ef454ecfa
    Image:         argoproj/argoexec:v2.4.3
    Image ID:      docker-pullable:https://argoproj/argoexec@sha256:d7ab12ccc0c479cb856fa5aa6ab38c8368743f978bcbc4547bd8a67a83eb65f7
    Port:          <none>
    Host Port:     <none>
    Command:
      argoexec
      wait
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Thu, 23 Jan 2020 14:52:53 +0200
      Finished:     Thu, 23 Jan 2020 14:52:55 +0200
    Ready:          False
    Restart Count:  0
    Requests:
      cpu:  100m
    Environment:
      ARGO_POD_NAME:  ml-work-rdbcq-4015706135 (v1:metadata.name)
    Mounts:
      /argo/podmetadata from podmetadata (rw)
      /mainctrfs/mnt/pv from workdir-1 (rw)
      /var/run/docker.sock from docker-sock (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-jc2wb (ro)
  main:
    Container ID:  docker:https://d479bf67ffec8a58c7f67bda778db0166c1f4c037f39e81583bd9bde5c2e5005
    Image:         alpine:latest
    Image ID:      docker-pullable:https://alpine@sha256:ddba4d27a7ffc3f86dd6c2f92041af252a1f23a8e742c90e6e1297bfa1bc0c45
    Port:          <none>
    Host Port:     <none>
    Command:
      mkdir
      -p
      /mnt/pv/downloads_ml-work-rdbcq_1
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Thu, 23 Jan 2020 14:52:54 +0200
      Finished:     Thu, 23 Jan 2020 14:52:54 +0200
    Ready:          False
    Restart Count:  0
    Limits:
      cpu:     100m
      memory:  100Mi
    Requests:
      cpu:        100m
      memory:     100Mi
    Environment:  <none>
    Mounts:
      /mnt/pv from workdir-1 (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-jc2wb (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  podmetadata:
    Type:  DownwardAPI (a volume populated by information about the pod)
    Items:
      metadata.annotations -> annotations
  docker-sock:
    Type:          HostPath (bare host directory volume)
    Path:          /var/run/docker.sock
    HostPathType:  Socket
  workdir-1:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  ml-work-rdbcq-workdir-1
    ReadOnly:   false
  default-token-jc2wb:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-jc2wb
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason                  Age              From                                                Message
  ----     ------                  ----             ----                                                -------
  Warning  FailedScheduling        7m (x3 over 7m)  default-scheduler                                   pod has unbound immediate PersistentVolumeClaims (repeated 2 times)
  Normal   Scheduled               7m               default-scheduler                                   Successfully assigned default/ml-work-rdbcq-4015706135 to gke-nnn-dask-pool-1-aadaae80-76z2
  Normal   SuccessfulAttachVolume  7m               attachdetach-controller                             AttachVolume.Attach succeeded for volume "pvc-b3dc6520-d332-4425-a7d0-b497cee0d8bb"
  Normal   Pulled                  7m               kubelet, gke-nnn-dask-pool-1-aadaae80-76z2  Container image "argoproj/argoexec:v2.4.3" already present on machine
  Normal   Created                 7m               kubelet, gke-nnn-dask-pool-1-aadaae80-76z2  Created container wait
  Normal   Started                 7m               kubelet, gke-nnn-dask-pool-1-aadaae80-76z2  Started container wait
  Normal   Pulling                 7m               kubelet, gke-nnn-dask-pool-1-aadaae80-76z2  Pulling image "alpine:latest"
  Normal   Pulled                  7m               kubelet, gke-nnn-dask-pool-1-aadaae80-76z2  Successfully pulled image "alpine:latest"
  Normal   Created                 7m               kubelet, gke-nnn-dask-pool-1-aadaae80-76z2  Created container main
  Normal   Started                 7m               kubelet, gke-nnn-dask-pool-1-aadaae80-76z2  Started container main

Workflow:

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: ml-work-
  namespace: default
spec:
  arguments:
    parameters:
    - name: parallel
      value: '6'
  entrypoint: download-images
  templates:
  - name: download-images
    steps:
    - - arguments:
          parameters:
          - name: partition
            value: '{{item}}'
        name: download-images-shard
        template: download
        withSequence:
          end: '{{workflow.parameters.parallel}}'
          start: '1'
    - - arguments:
          parameters:
          - name: partition
            value: '{{item}}'
        name: tag-downloaded-images-shard
        template: tag-images
        withSequence:
          end: '{{workflow.parameters.parallel}}'
          start: '1'
    - - arguments:
          parameters:
          - name: partition
            value: '{{item}}'
        name: clean-shard
        template: cleanup
        withSequence:
          end: '{{workflow.parameters.parallel}}'
          start: '1'
  - container:
      command:
      - mkdir
      - -p
      - /mnt/pv/downloads_{{workflow.name}}_{{inputs.parameters.partition}}
      image: alpine:latest
      resources:
        limits:
          cpu: 0.1
          memory: 100Mi
        requests:
          cpu: 0.1
          memory: 100Mi
      volumeMounts:
      - mountPath: /mnt/pv
        name: workdir-{{inputs.parameters.partition}}
    inputs:
      parameters:
      - name: partition
    name: download
  - container:
      command:
      - ls
      - -lh
      - /mnt/pv/downloads_{{workflow.name}}_{{inputs.parameters.partition}}
      image: alpine:latest
      resources:
        limits:
          cpu: 0.1
          memory: 100Mi
        requests:
          cpu: 0.1
          memory: 100Mi
      volumeMounts:
      - mountPath: /mnt/pv
        name: workdir-{{inputs.parameters.partition}}
    inputs:
      parameters:
      - name: partition
    name: tag-images
  - container:
      command:
      - echo
      - Finished tagging, we can call webhook
      image: alpine:latest
      resources:
        limits:
          cpu: 0.1
          memory: 100Mi
        requests:
          cpu: 0.1
          memory: 100Mi
      volumeMounts:
      - mountPath: /mnt/pv
        name: workdir-{{inputs.parameters.partition}}
    inputs:
      parameters:
      - name: partition
    name: collect
  - container:
      command:
      - rm
      - -fr
      - /mnt/pv/downloads_{{workflow.name}}_{{inputs.parameters.partition}}
      image: alpine:latest
      resources:
        limits:
          cpu: 0.1
          memory: 100Mi
        requests:
          cpu: 0.1
          memory: 100Mi
      volumeMounts:
      - mountPath: /mnt/pv
        name: workdir-{{inputs.parameters.partition}}
    inputs:
      parameters:
      - name: partition
    name: cleanup
  volumeClaimTemplates:
  - metadata:
      name: workdir-1
    spec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 1Gi
  - metadata:
      name: workdir-2
    spec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 1Gi
  - metadata:
      name: workdir-3
    spec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 1Gi
  - metadata:
      name: workdir-4
    spec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 1Gi
  - metadata:
      name: workdir-5
    spec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 1Gi
  - metadata:
      name: workdir-6
    spec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 1Gi

Adding retry to download-images does not help. Feels like it doesn't wait for available resources before saying it failed. Also if it failed, status of workflow was still in running state even after >1h where pods had status "Completed". This resulting in stuck workflows.

NAME                       READY     STATUS      RESTARTS   AGE
ml-work-rdbcq-4015706135   0/2       Completed   0          86m

icecoffee531 pushed a commit to icecoffee531/argo-workflows that referenced this issue Jan 5, 2022
* docs: Enhance the filters tutorial for argoproj#1097

Signed-off-by: Tim Collins <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants