-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Workflow stuck at running when failed to load a Git artifact #10045
Comments
This was working for v3.4.0 but not working for v3.4.2+ |
Only init container terminated:
|
…proj#10045 Signed-off-by: Yuan Tang <[email protected]>
…ixes argoproj#10045 Signed-off-by: Yuan Tang <[email protected]>
…ixes #10045 (#10047) Signed-off-by: Yuan Tang <[email protected]>
…ixes #10045 (#10047) Signed-off-by: Yuan Tang <[email protected]> Signed-off-by: Saravanan Balasubramanian <[email protected]>
Hi @terrytangyuan , I have a simple use-case which seems to be linked to this issue, still seen on v3.4.5:
I would expect the Workflow to stop and switch to error state here. On 2 different clusters (k8s v1.25.1 & 1.25.7), my workflow is stuck in Running state although the Pod is in Init:Error state. But if I build the controller myself using the Development Container as explained in the doc, on the v3.4.5 or latest tag, my workflow switches to Error as expected. With the Helm chart, I never get the Thank you. |
Could you paste your |
Here it is:
|
I think the Helm Chart might not be using the image that includes this fix. |
In my case this occurs randomly. If workflows transition to failed, in logs i see "marking node as failed since init container has non-zero exit code", but if logs contains "Pod failed before main container starts" then it stays at running. I also tested with latest images yesterday but it stuck at running again. |
And I can add also this: I am sure that the fix is present in this image :
|
Currently i am using this to test and it stuck at pending apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
name: init-fail
spec:
entrypoint: init-container-example
templates:
- name: init-container-example
container:
image: alpine:latest
command: ["echo", "bye"]
volumeMounts:
- name: foo
mountPath: /foo
initContainers:
- name: hello
image: alpine:latest
command: ["abcd"]
mirrorVolumeMounts: true
volumes:
- name: foo
emptyDir: {} |
see #13858 |
Pre-requisites
:latest
What happened/what you expected to happen?
Pod errored:
But the workflow is still running.
Version
latest
Paste a small workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.
Logs from the workflow controller
kubectl logs -n argo deploy/workflow-controller | grep ${workflow}
Logs from in your workflow's wait container
kubectl logs -n argo -c wait -l workflows.argoproj.io/workflow=${workflow},workflow.argoproj.io/phase!=Succeeded
The text was updated successfully, but these errors were encountered: