Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: terminate workflow should not get throttled Fixes #12778 #12792

Merged

Conversation

tczhao
Copy link
Member

@tczhao tczhao commented Mar 13, 2024

Fixes #12778

Motivation

To prevent cronworkflow from queueing up pending workflows when

  • reached parallelism limit
  • concurrenpolicy-replace already update the spec of the pending workflow to shutdown: terminated

Modifications

Allow workflow in the queue to be processed (operated) regardless of parallelism limit.

We have scenarios

  • reach parallelism limit, terminate running workflow (no issue)
  • reach parallelism limit, terminate pending workflow (we should not block the queue from processing these workflow, just like semaphore doesn't block the queue from processing pending workflow that got terminated
    func (woc *wfOperationCtx) releaseLocksForPendingShuttingdownWfs(ctx context.Context) bool {
    )

Verification

Add test,
test fail
Add code change
test succeed

@tczhao tczhao changed the title fix: terminate workflow should not get throttled fix: terminate workflow should not get throttled Fixes #12778 Mar 13, 2024
@agilgur5
Copy link
Member

agilgur5 commented Mar 13, 2024

  • concurrenpolicy-replace already update the spec of the pending workflow to shutdown: terminated

oh interesting, I didn't know that. that does make the fix much simpler.

  • reach parallelism limit, terminate running workflow (no issue)
  • reach parallelism limit, terminate pending workflow (we should not block the queue from processing these workflow, just like semaphore doesn't block the queue from processing pending workflow that got terminated

this makes sense to me and sounds consistent, so I'll approve, but I'm not an expert in the synchronization area of the codebase so would appreciate if someone else could take a look. EDIT: asked on Slack for a second pair of eyes.
in particular, technically speaking this change means that terminating Workflows are no longer counted toward parallelism limits, right?

@agilgur5
Copy link
Member

Also if you're interested, another synchronization race condition issue is #11984

@agilgur5 agilgur5 added the area/parallelism `parallelism` for the Controller, Workflows, or templates label Mar 13, 2024
@tczhao
Copy link
Member Author

tczhao commented Mar 14, 2024

technically speaking this change means that terminating Workflows are no longer counted toward parallelism limits, right?

Correct,
We either

  • terminate running workflow (in parallelism limit, release it from parallelism)
  • terminate pending/unknown workflow (bypass parallelism)
  • terminate finished workflow (no effect)

@sarabala1979 sarabala1979 self-assigned this Mar 25, 2024
Copy link
Member

@sarabala1979 sarabala1979 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sarabala1979 sarabala1979 merged commit a82a689 into argoproj:main Mar 26, 2024
31 checks passed
@agilgur5 agilgur5 added this to the v3.5.x patches milestone Apr 19, 2024
agilgur5 pushed a commit that referenced this pull request Apr 19, 2024
Signed-off-by: Tianchu Zhao <[email protected]>
(cherry picked from commit a82a689)
@agilgur5
Copy link
Member

Backported cleanly to release-3.5 as c9dd50d

isubasinghe pushed a commit to isubasinghe/argo-workflows that referenced this pull request May 6, 2024
isubasinghe pushed a commit to isubasinghe/argo-workflows that referenced this pull request May 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/controller Controller issues, panics area/parallelism `parallelism` for the Controller, Workflows, or templates
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CronWorkflow with concurrencyPolicy: Replace or Forbid should not queue up pending workflows
3 participants