Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix cancel bug - WorkflowManager cancel in transaction #14608

Merged
merged 1 commit into from
Oct 30, 2023

Conversation

AlanCoding
Copy link
Member

SUMMARY

Describing the bug here, I have not seen it in the issue queue:

2023-10-26 05:36:07,011 WARNING  [d80bf2f3b13145809117ed4d9a705278] awx.main.dispatch checking dispatcher cancel for automationcontroller-0
2023-10-26 05:36:07,011 ERROR    [d80bf2f3b13145809117ed4d9a705278] awx.main.models.unified_jobs error encountered when checking task status
Traceback (most recent call last):
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/main/models/unified_jobs.py", line 1445, in cancel_dispatcher_process
    canceled = ControlDispatcher('dispatcher', self.controller_node).cancel([self.celery_task_id])
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/main/dispatch/control.py", line 41, in cancel
    return self.control_with_reply('cancel', *args, extra_data={'task_ids': task_ids}, **kwargs)
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/main/dispatch/control.py", line 56, in control_with_reply
    raise RuntimeError('Control-with-reply messages can only be done in autocommit mode')
RuntimeError: Control-with-reply messages can only be done in autocommit mode 

I was able to see the real bug with manual testing:

  • create a JT that runs a sleep task and has a host
  • create a WFJT with that JT in it
  • launch the WFJT
  • wait for first events to come in the JT
  • cancel the WFJT

Expectation is that JT will be canceled. It temporarily gets the "canceled" status (which is incorrect) and then it goes into "successful" status. This is a bug, where workflows run out the clock on jobs running inside that workflow.

I can verify this fixes it. This change makes the cancel message fire as part of the task manager transaction - similar to the messages that tell the dispatcher to start a new job.

ISSUE TYPE
  • Bug, Docs Fix or other nominal change
COMPONENT NAME
  • API

@AlanCoding AlanCoding merged commit 93c329d into ansible:devel Oct 30, 2023
21 checks passed
djyasin pushed a commit to djyasin/awx that referenced this pull request Sep 16, 2024
This fixes a bug where jobs within a workflow job were not canceled
  when the workflow job was canceled by the user

The fix is to submit the cancel request as a part of the
  transaction that WorkflowManager commits its work in
  this requires that we send the message without expecting a reply
  so this changes the control-with-reply cancel to just a control function
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants