Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

failed to save outputs: interface conversion: error is *exec.Error, not *exec.ExitError #1207

Closed
hamedhsn opened this issue Feb 2, 2019 · 7 comments
Labels

Comments

@hamedhsn
Copy link

hamedhsn commented Feb 2, 2019

Is this a BUG REPORT or FEATURE REQUEST?:
BUG REPORT

What happened:
Seeing this error msg every so often and completely random. It does not happen specifically to a task. I get this for a task and the next run my task works fine.

The task runs fine and I can see the output but If the next task depends on this task it won't go to the next task.

What you expected to happen:
I did not used to see this and start to see that at some point of time when I installed kubeflowpipline and ran a task. However I remove/redeploy argo again but still see the error every so often.

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

  • Argo version:
$ argo version
argo: v2.2.1
  BuildDate: 2018-10-11T16:25:59Z
  GitCommit: 3b52b26190163d1f72f3aef1a39f9f291378dafb
  GitTreeState: clean
  GitTag: v2.2.1
  GoVersion: go1.10.3
  Compiler: gc
  Platform: darwin/amd64
  • Kubernetes version :
$ kubectl version -o yaml
clientVersion:
  buildDate: 2018-07-10T10:13:58Z
  compiler: gc
  gitCommit: 91e7b4fd31fcd3d5f436da26c980becec37ceefe
  gitTreeState: clean
  gitVersion: v1.11.0
  goVersion: go1.10.3
  major: "1"
  minor: "11"
  platform: darwin/amd64
serverVersion:
  buildDate: 2018-12-06T23:13:14Z
  compiler: gc
  gitCommit: 6bad6d9c768dc0864dab48a11653aa53b5a47043
  gitTreeState: clean
  gitVersion: v1.11.5-eks-6bad6d
  goVersion: go1.10.3
  major: "1"
  minor: 11+
  platform: linux/amd64

Other debugging information (if applicable):

  • workflow result:
$ argo get <workflowname>

argo get argo-gpu-s3-copy-4qzp8
Name: argo-gpu-s3-copy-4qzp8
Namespace: development
ServiceAccount: argo
Status: Error
Created: Sat Feb 02 19:10:05 +0000 (3 minutes ago)
Started: Sat Feb 02 19:10:05 +0000 (3 minutes ago)
Finished: Sat Feb 02 19:10:21 +0000 (3 minutes ago)
Duration: 16 seconds
Parameters:
s3-path: Shared_data/OULU/small_frames_npy
local-path: test2
bucker-name: onfido-mlplatform-in
node-selector: m4.xlarge

STEP PODNAME DURATION MESSAGE
⚠ argo-gpu-s3-copy-4qzp8
└-⚠ list-chunk argo-gpu-s3-copy-4qzp8-3040831338 16s failed to save outputs: interface conversion: error is *exec.Error, not *exec.ExitError

  • executor logs:
$ kubectl logs <failedpodname> -c init
$ kubectl logs <failedpodname> -c wait
  • workflow-controller logs:
$ kubectl logs -n kube-system $(kubectl get pods -l app=workflow-controller -n kube-system -o name)
@hamedhsn
Copy link
Author

hamedhsn commented Feb 2, 2019

workflow-controller log:

time="2019-02-02T19:25:02Z" level=info msg="Updated phase  -> Running" namespace=development workflow=argo-gpu-s3-copy-58kdd
time="2019-02-02T19:25:02Z" level=info msg="Steps node argo-gpu-s3-copy-58kdd (argo-gpu-s3-copy-58kdd) initialized Running" namespace=development workflow=argo-gpu-s3-copy-58kdd
time="2019-02-02T19:25:02Z" level=info msg="StepGroup node argo-gpu-s3-copy-58kdd[0] (argo-gpu-s3-copy-58kdd-2204826621) initialized Running" namespace=development workflow=argo-gpu-s3-copy-58kdd
time="2019-02-02T19:25:02Z" level=info msg="Created pod: argo-gpu-s3-copy-58kdd[0].check (argo-gpu-s3-copy-58kdd-1253578299)" namespace=development workflow=argo-gpu-s3-copy-58kdd
time="2019-02-02T19:25:02Z" level=info msg="Pod node argo-gpu-s3-copy-58kdd[0].check (argo-gpu-s3-copy-58kdd-1253578299) initialized Pending" namespace=development workflow=argo-gpu-s3-copy-58kdd
time="2019-02-02T19:25:02Z" level=info msg="Workflow step group node argo-gpu-s3-copy-58kdd[0] (argo-gpu-s3-copy-58kdd-2204826621) not yet completed" namespace=development workflow=argo-gpu-s3-copy-58kdd
time="2019-02-02T19:25:02Z" level=info msg="Workflow update successful" namespace=development workflow=argo-gpu-s3-copy-58kdd
time="2019-02-02T19:25:03Z" level=info msg="Processing workflow" namespace=development workflow=argo-gpu-s3-copy-58kdd
time="2019-02-02T19:25:03Z" level=info msg="Checking for deleted pods" namespace=development workflow=argo-gpu-s3-copy-58kdd
time="2019-02-02T19:25:04Z" level=info msg="Updating node argo-gpu-s3-copy-58kdd[0].check (argo-gpu-s3-copy-58kdd-1253578299) message: PodInitializing"
time="2019-02-02T19:25:04Z" level=info msg="Workflow step group node argo-gpu-s3-copy-58kdd[0] (argo-gpu-s3-copy-58kdd-2204826621) not yet completed" namespace=development workflow=argo-gpu-s3-copy-58kdd
time="2019-02-02T19:25:04Z" level=info msg="Workflow update successful" namespace=development workflow=argo-gpu-s3-copy-58kdd
time="2019-02-02T19:25:05Z" level=info msg="Processing workflow" namespace=development workflow=argo-gpu-s3-copy-58kdd
time="2019-02-02T19:25:05Z" level=info msg="Checking for deleted pods" namespace=development workflow=argo-gpu-s3-copy-58kdd
time="2019-02-02T19:25:05Z" level=info msg="Workflow step group node argo-gpu-s3-copy-58kdd[0] (argo-gpu-s3-copy-58kdd-2204826621) not yet completed" namespace=development workflow=argo-gpu-s3-copy-58kdd
time="2019-02-02T19:25:06Z" level=info msg="Processing workflow" namespace=development workflow=argo-gpu-s3-copy-58kdd
time="2019-02-02T19:25:06Z" level=info msg="Checking for deleted pods" namespace=development workflow=argo-gpu-s3-copy-58kdd
time="2019-02-02T19:25:06Z" level=info msg="Workflow step group node argo-gpu-s3-copy-58kdd[0] (argo-gpu-s3-copy-58kdd-2204826621) not yet completed" namespace=development workflow=argo-gpu-s3-copy-58kdd
time="2019-02-02T19:25:09Z" level=info msg="Processing workflow" namespace=development workflow=argo-gpu-s3-copy-58kdd
time="2019-02-02T19:25:09Z" level=info msg="Updating node argo-gpu-s3-copy-58kdd[0].check (argo-gpu-s3-copy-58kdd-1253578299) status Pending -> Running"
time="2019-02-02T19:25:09Z" level=info msg="Workflow step group node argo-gpu-s3-copy-58kdd[0] (argo-gpu-s3-copy-58kdd-2204826621) not yet completed" namespace=development workflow=argo-gpu-s3-copy-58kdd
time="2019-02-02T19:25:09Z" level=warning msg="Error updating workflow: Operation cannot be fulfilled on workflows.argoproj.io \"argo-gpu-s3-copy-58kdd\": the object has been modified; please apply your changes to the latest version and try again" namespace=development workflow=argo-gpu-s3-copy-58kdd
time="2019-02-02T19:25:09Z" level=info msg="Re-appying updates on latest version and retrying update" namespace=development workflow=argo-gpu-s3-copy-58kdd
time="2019-02-02T19:25:09Z" level=info msg="Update retry attempt 1 successful" namespace=development workflow=argo-gpu-s3-copy-58kdd
time="2019-02-02T19:25:09Z" level=info msg="Workflow update successful" namespace=development workflow=argo-gpu-s3-copy-58kdd
time="2019-02-02T19:25:10Z" level=info msg="Processing workflow" namespace=development workflow=argo-gpu-s3-copy-58kdd
time="2019-02-02T19:25:10Z" level=info msg="Updating node argo-gpu-s3-copy-58kdd[0].check (argo-gpu-s3-copy-58kdd-1253578299) status Running -> Error"
time="2019-02-02T19:25:10Z" level=info msg="Updating node argo-gpu-s3-copy-58kdd[0].check (argo-gpu-s3-copy-58kdd-1253578299) message: failed to save outputs: interface conversion: error is *exec.Error, not *exec.ExitError"
time="2019-02-02T19:25:10Z" level=info msg="Step group node argo-gpu-s3-copy-58kdd[0] (argo-gpu-s3-copy-58kdd-2204826621) deemed failed: child 'argo-gpu-s3-copy-58kdd-1253578299' failed" namespace=development workflow=argo-gpu-s3-copy-58kdd
time="2019-02-02T19:25:10Z" level=info msg="node argo-gpu-s3-copy-58kdd[0] (argo-gpu-s3-copy-58kdd-2204826621) phase Running -> Failed" namespace=development workflow=argo-gpu-s3-copy-58kdd
time="2019-02-02T19:25:10Z" level=info msg="node argo-gpu-s3-copy-58kdd[0] (argo-gpu-s3-copy-58kdd-2204826621) message: child 'argo-gpu-s3-copy-58kdd-1253578299' failed" namespace=development workflow=argo-gpu-s3-copy-58kdd
time="2019-02-02T19:25:10Z" level=info msg="node argo-gpu-s3-copy-58kdd[0] (argo-gpu-s3-copy-58kdd-2204826621) finished: 2019-02-02 19:25:10.370256123 +0000 UTC" namespace=development workflow=argo-gpu-s3-copy-58kdd
time="2019-02-02T19:25:10Z" level=info msg="step group argo-gpu-s3-copy-58kdd[0] (argo-gpu-s3-copy-58kdd-2204826621) was unsuccessful: child 'argo-gpu-s3-copy-58kdd-1253578299' failed" namespace=development workflow=argo-gpu-s3-copy-58kdd
time="2019-02-02T19:25:10Z" level=info msg="Outbound nodes of argo-gpu-s3-copy-58kdd-1253578299 is [argo-gpu-s3-copy-58kdd-1253578299]" namespace=development workflow=argo-gpu-s3-copy-58kdd
time="2019-02-02T19:25:10Z" level=info msg="Outbound nodes of argo-gpu-s3-copy-58kdd is [argo-gpu-s3-copy-58kdd-1253578299]" namespace=development workflow=argo-gpu-s3-copy-58kdd
time="2019-02-02T19:25:10Z" level=info msg="node argo-gpu-s3-copy-58kdd (argo-gpu-s3-copy-58kdd) phase Running -> Failed" namespace=development workflow=argo-gpu-s3-copy-58kdd
time="2019-02-02T19:25:10Z" level=info msg="node argo-gpu-s3-copy-58kdd (argo-gpu-s3-copy-58kdd) message: child 'argo-gpu-s3-copy-58kdd-1253578299' failed" namespace=development workflow=argo-gpu-s3-copy-58kdd
time="2019-02-02T19:25:10Z" level=info msg="node argo-gpu-s3-copy-58kdd (argo-gpu-s3-copy-58kdd) finished: 2019-02-02 19:25:10.3703671 +0000 UTC" namespace=development workflow=argo-gpu-s3-copy-58kdd
time="2019-02-02T19:25:10Z" level=info msg="Checking deamoned children of argo-gpu-s3-copy-58kdd" namespace=development workflow=argo-gpu-s3-copy-58kdd
time="2019-02-02T19:25:10Z" level=info msg="Updated phase Running -> Failed" namespace=development workflow=argo-gpu-s3-copy-58kdd
time="2019-02-02T19:25:10Z" level=info msg="Updated message  -> child 'argo-gpu-s3-copy-58kdd-1253578299' failed" namespace=development workflow=argo-gpu-s3-copy-58kdd
time="2019-02-02T19:25:10Z" level=info msg="Marking workflow completed" namespace=development workflow=argo-gpu-s3-copy-58kdd
time="2019-02-02T19:25:10Z" level=warning msg="Error updating workflow: Operation cannot be fulfilled on workflows.argoproj.io \"argo-gpu-s3-copy-58kdd\": the object has been modified; please apply your changes to the latest version and try again" namespace=development workflow=argo-gpu-s3-copy-58kdd
time="2019-02-02T19:25:10Z" level=info msg="Re-appying updates on latest version and retrying update" namespace=development workflow=argo-gpu-s3-copy-58kdd
time="2019-02-02T19:25:10Z" level=info msg="Update retry attempt 1 successful" namespace=development workflow=argo-gpu-s3-copy-58kdd
time="2019-02-02T19:25:10Z" level=info msg="Workflow update successful" namespace=development workflow=argo-gpu-s3-copy-58kdd
time="2019-02-02T19:25:11Z" level=info msg="Labeled pod development/argo-gpu-s3-copy-58kdd-1253578299 completed"```

@wadeholler
Copy link

I just build from master ( plus a unrelated tweak ) and can confirm this behavior - except I get it every run

@hamedhsn
Copy link
Author

hamedhsn commented Feb 9, 2019

@wadeholler thanks. it fails 80% of times for me. I think we upgraded the k8s cluster version and starts to see this.
Any solution for that?

@elikatsis
Copy link
Contributor

elikatsis commented Feb 9, 2019

@wadeholler there had been a bug on master branch, I fixed it with the PR#1213. Try to delete argoexec:latest from your cluster and build it again using the new dockerfile (if argoproj/argoexec:latest is not updated or if you made any modifications).

@wadeholler
Copy link

That helped the stated problem above but now submodule support is broken:

failed to load artifacts: fatal: No url found for submodule path 'obsfuscated' in .gitmodules

@wadeholler
Copy link

my previous reply was for repos that had a submodule reference but no .gitmodules file. the new argoexec updates that force a submodule update caused this issue. unrelated to the above. All is well now. cheers

@jessesuen
Copy link
Member

I'm pretty sure I had fixed a exec.Error, not *exec.ExitError panic conversion as part of the PNS work. Will close this as fixed in v2.3 but please re-open if it is seen again.

icecoffee531 pushed a commit to icecoffee531/argo-workflows that referenced this issue Jan 5, 2022
- only attempt to list/join channels if postMessage call fails with
  not_in_channel
- respect rate limit 429 responses when iterating through paginated
  conversations.list result

fixes argoproj#1206

Signed-off-by: Robert King <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants