Fix bug that prevented dispatcher exit with downed DB #14469
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
SUMMARY
We have an issue where we the dispatcher would go down during the middle of the job, and this left the entire service in a deadlock in the end.
Previously, we did a bunch of stuff to prevent the dispatcher service from exiting if the database went out temporarily (default tolerance set to 40). We also moved to a new job canceling system which would tell it to cancel via a SIGTERM signal.
The problem is what happens when we exceed that 40 second threshold while a job is running. In that case:
pg_notify
, decides it will exitSo in summary, we have 2 layers of signal processing, and the inner layer was misbehaving in that it did not call that parent process signal handling method. This adds calls to do that.
Testing, I was able to see the dispatcher exit with this patch applied.
ISSUE TYPE
COMPONENT NAME