Send notifications for dependency failures #14603
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
SUMMARY
I had a report that notifications were not firing for cases when an update-on-launch dependency failed. This is fairly easily verifiable. It's somewhat of a stupidly simple case where
handle_work_error
sent websockets, but not notifications.But then you get into the weeds of how to fix it, and realize there are race conditions, consider how the callback receiver handles this:
This solves a contention problem between the callback receiver and the dispatcher.
But wait - here, we have a job that was created, spawned a project update, and then is getting marked failed by the dispatcher running the project update. At no point did either the dispatcher or the callback receiver go through the run path for that job. There shouldn't be any contention problem!
With the errback/callback methods as they are, there is still at least the vague possibility of the job starting because the project update fails after updating its final status... but this is going too far beyond realistic concerns. The much more realistic concern is that we set "Previous task failed task:" twice so sending notifications here would introduce a new contention problem.
Systems-wise, why are we even dealing with the contention problems? What if, instead, we only triggered the failure logic in a reliable system that has a periodic fallback... this would be the task manager. I have already argued that the errback methods are not needed, so this is a good time to delete them.
Missing notification problem solved, contention problems are not a risk with this solution.
ISSUE TYPE
COMPONENT NAME