Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(controller): Fix argo retry with PVCs. Fixes #4275 #4277

Merged
merged 9 commits into from
Oct 14, 2020
Merged

fix(controller): Fix argo retry with PVCs. Fixes #4275 #4277

merged 9 commits into from
Oct 14, 2020

Conversation

jcmuller
Copy link
Contributor

@jcmuller jcmuller commented Oct 14, 2020

Checklist:

  • Either (a) I've created an enhancement proposal and discussed it with the community, (b) this is a bug fix, or (c) this is a chore.
  • The title of the PR is (a) conventional, (b) states what changed, and (c) suffixes the related issues number. E.g. "fix(controller): Updates such and such. Fixes #1234".
  • My organization is added to USERS.md.
  • I've signed the CLA.
  • I have written unit and/or e2e tests for my change. PRs without these are unlikely to be merged.
  • My builds are green. Try syncing with master if they are not.

@@ -388,7 +388,7 @@ func (woc *wfOperationCtx) operate() {
}
}

err = woc.deletePVCs()
err = woc.deletePVCs(workflowStatus)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

huh? how come woc.wf.Status.Phase is not suitable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point woc.wf.Status.Phase hasn't been changed. This happens in the switch block on line 414 onwards.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this block of code should be moved from line 391 to line 443

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we would therefore need to change how to deal with the error scenario - we should not mark a workflow as success then changes to error. Probably just log it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[fix-argo-retry-with-pvcs 6c93455d] Move woc.deletePVCs() to after workflow phase has been updated

retryStrategy:
limit: 1
steps:
- - arguments: {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any wayto make this easier to understand? it's just see lots and lots of very long YAML and I don't know what is trying to be tested:

  • reduce YAML to the minimum that is need to test bug

// Node 2 Failed
assert.Equal(node2.Phase, wfv1.NodeFailed)
// Hence, PVCs should stick around
assert.NotZero(len(woc.wf.Status.PersistentVolumeClaims), "PVCs not deleted")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert.Len?

// This test ensures that the PVCs used in the steps are not deleted when
// the workflow fails
func TestDeletePVCDoesNotDeletePVCOnFailedWorkflow(t *testing.T) {
assert := assert.New(t)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice - I will use this

cancel, controller := newController()
defer cancel()

wfcset := controller.wfclientset.ArgoprojV1alpha1().Workflows("")
Copy link
Contributor

@alexec alexec Oct 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather than create wfcset, which is overly complicated and long-winded, can you look for other examples that do newController(wf)


woc := newWorkflowOperationCtx(wf, controller)

assert.Equal(len(woc.wf.Status.PersistentVolumeClaims), 1, "1 PVC before operating")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert.Len?


err = woc.deletePVCs()
if err != nil {
woc.log.WithError(err).Error("failed to delete PVCs")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be warning

@jcmuller
Copy link
Contributor Author

@alexec Rebased onto master and addressed all comments.

Copy link
Contributor

@alexec alexec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@alexec alexec merged commit c02bb7f into argoproj:master Oct 14, 2020
@jcmuller jcmuller deleted the fix-argo-retry-with-pvcs branch October 14, 2020 21:13
@alexec alexec added this to the v2.12 milestone Oct 15, 2020
alexcapras pushed a commit to alexcapras/argo that referenced this pull request Nov 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pods in retried workflow with PVC after a step that succeeded never get scheduled
2 participants