Send notifications for dependency failures #14603

AlanCoding · 2023-10-25T21:05:38Z

SUMMARY

I had a report that notifications were not firing for cases when an update-on-launch dependency failed. This is fairly easily verifiable. It's somewhat of a stupidly simple case where handle_work_error sent websockets, but not notifications.

But then you get into the weeds of how to fix it, and realize there are race conditions, consider how the callback receiver handles this:

        # Update host_status_counts while holding the row lock
        with transaction.atomic():
            uj = UnifiedJob.objects.select_for_update().get(pk=job_identifier)
            uj.host_status_counts = host_status_counts
            uj.save(update_fields=['host_status_counts'])

This solves a contention problem between the callback receiver and the dispatcher.

But wait - here, we have a job that was created, spawned a project update, and then is getting marked failed by the dispatcher running the project update. At no point did either the dispatcher or the callback receiver go through the run path for that job. There shouldn't be any contention problem!

With the errback/callback methods as they are, there is still at least the vague possibility of the job starting because the project update fails after updating its final status... but this is going too far beyond realistic concerns. The much more realistic concern is that we set "Previous task failed task:" twice so sending notifications here would introduce a new contention problem.

Systems-wise, why are we even dealing with the contention problems? What if, instead, we only triggered the failure logic in a reliable system that has a periodic fallback... this would be the task manager. I have already argued that the errback methods are not needed, so this is a good time to delete them.

Missing notification problem solved, contention problems are not a risk with this solution.

ISSUE TYPE

Bug, Docs Fix or other nominal change

COMPONENT NAME

API

fosterseth

good code cleanup

fosterseth · 2023-10-26T18:02:14Z

awx/main/scheduler/task_manager.py

            task_cls = task._get_task_class()
            task_cls.apply_async(
                [task.pk],
                opts,
                queue=task.get_queue_name(),
                uuid=task.celery_task_id,
-                callbacks=[{'task': handle_work_success.name, 'kwargs': {'task_actual': task_actual}}],


* Send notifications for dependency failures * Delete tests for deleted method * Remove another test for removed method

github-actions bot added the component:api label Oct 25, 2023

AlanCoding added 2 commits October 25, 2023 23:00

Send notifications for dependency failures

f43e67c

Delete tests for deleted method

7792801

AlanCoding force-pushed the early_fail_notifications branch from 65036c4 to 7792801 Compare October 26, 2023 03:01

Remove another test for removed method

62a705d

AlanCoding marked this pull request as ready for review October 26, 2023 14:12

AlanCoding requested review from fosterseth and chrismeyersfsu October 26, 2023 14:12

fosterseth approved these changes Oct 26, 2023

View reviewed changes

fosterseth reviewed Oct 26, 2023

View reviewed changes

AlanCoding merged commit 333ef76 into ansible:devel Oct 30, 2023
21 checks passed

djyasin pushed a commit to djyasin/awx that referenced this pull request Sep 16, 2024

Send notifications for dependency failures (ansible#14603)

4c9ee78

* Send notifications for dependency failures * Delete tests for deleted method * Remove another test for removed method

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Send notifications for dependency failures #14603

Send notifications for dependency failures #14603

AlanCoding commented Oct 25, 2023 •

edited

Loading

fosterseth left a comment

fosterseth Oct 26, 2023

Send notifications for dependency failures #14603

Send notifications for dependency failures #14603

Conversation

AlanCoding commented Oct 25, 2023 • edited Loading

SUMMARY

ISSUE TYPE

COMPONENT NAME

fosterseth left a comment

Choose a reason for hiding this comment

fosterseth Oct 26, 2023

Choose a reason for hiding this comment

AlanCoding commented Oct 25, 2023 •

edited

Loading