Prevent delayed opener error from crashing index servers #4811
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Previously, an index and opener process dying could have resulted in the index gen_server crashing. This was observed in the CI test as described in: #4809
The process in more detail was as follows:
When an async opener result is handled in the index server, there is a period of time when the index server is linked to both the index and the opener process.
After we reply early to the waiting clients, a client may do something to cause the indexer to crash, which would crash the opener along with it. That would generate two
{'EXIT', Pid, _}
messages queued in the indexer process' mailbox.The index gen_server, is still processing the async opener result callback, where it would remove the opener from the
openers
table, then it returnsok
to the async opener.Index gen_server continues processing queued
EXIT
messages inhandle_info
:exit(...)
clause since we ended with an unknown process exiting.To avoid the race condition, and the extra opener
EXIT
message, unlink and reply early to the opener, as soon we linked to the indexer or had received the error. To avoid the small chance of still getting anEXIT
message from the opener, in case it crashed right before we unlinked, flush any exit messages from it. We do a similar flushing in two other places so create small utility function to avoid duplicating the code too much.Fix: #4809