-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updating the workflow archive entry for a retried workflow fails with "current transaction is aborted" #2427
Comments
I'll investigate.. |
@danxmoran you are correct, the archive should insert a new record, if it gets a primary key, we assume that somehow we're trying to archive the workflow twice, so we need to do an update. You haven't included the error message from the workflow. Could I please ask for it as it would really help narrow this down? |
I think the error in my workflow was a red herring. The WF spec was invalid, and the message was Looking at the logs again, I see what you mean about first trying to INSERT, then performing an UPDATE. I think the UPDATE step is failing, though, because it's trying to use the same transaction as the failed INSERT. I see this (abbreviated) in the logs:
|
I'm going to update the issue title since the original behavior I saw is expected to happen in this case 😄 |
So
|
Checklist:
What happened:
One of my workflows failed with a transient error. I retried it through the UI and all of its steps succeeded, but the root node (a DAG template) was still marked as an Error. I looked through the logs and found this error in the UPPERIO debug logs:
I'm not sure if this is the cause of the Error state, but either way it looks like a problem.
What you expected to happen:
I expected the workflow controller to perform and update-or-insert when adding info to
argo_archived_workflows
, and not fail with a PK error on workflow retries.How to reproduce it (as minimally and precisely as possible):
I'm not sure if this depends on the archived workflow being in a failed state, but I'm able to reproduce this every time I retry a failed workflow that's already been archived.
Anything else we need to know?:
The log with the
pq
error shows that the controller is trying to run a SQLUPDATE
, so I don't think the current implementation is too far off from what I'd expect. The SQL I see is:The arguments I see include the (large) workflow JSON. Wit that payload abbreviated, they are:
Message from the maintainers:
If you are impacted by this bug please add a 👍 reaction to this issue! We often sort issues this way to know what to prioritize.
The text was updated successfully, but these errors were encountered: