Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry transient shard failures in search #56045

Open
jimczi opened this issue Apr 30, 2020 · 6 comments
Open

Retry transient shard failures in search #56045

jimczi opened this issue Apr 30, 2020 · 6 comments
Labels
>enhancement :Search Foundations/Search Catch all for Search Foundations Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch

Comments

@jimczi
Copy link
Contributor

jimczi commented Apr 30, 2020

Today shard search requests are executed on each replica upon success. If all replicas fail for a shard, we consider the shard as failed and move on with the other shards.
Users can choose whether they accept partial results or not by setting allow_partial_search_results , however they have no choice but to replay the query if they want the full results (assuming that the shard failures were transient).
I am opening this discuss whether we could apply some exponential backoff to retry transient shard failures in search requests.
Failures such as:

  • Rejected executed exception.
  • Shard unavailable exception

could be retried with a configurable exponential backoff. This would be useful for search requests that run in the background (with _async_search) and that can afford waiting for a shard recovery.

This issue is also loosely related to #37867 since low-priority search requests could be configured to retry automatically.

@jimczi jimczi added >feature discuss :Search/Search Search-related issues that do not fall into other categories :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. labels Apr 30, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (:Search/Search)

@elasticmachine elasticmachine added the Team:Search Meta label for search team label Apr 30, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/Distributed)

@elasticmachine elasticmachine added the Team:Distributed Meta label for distributed team label Apr 30, 2020
@jimczi
Copy link
Contributor Author

jimczi commented May 5, 2020

We discussed this in Fix-it Thursday and agreed on two possible improvements:

  • We shouldn't retry non-transient failures.
  • We could add a configurable delay to wait for non-assigned shards before raising an error.

These improvements are not linked so I'll open a new issue for the latter so that it can be handled separately.
Shard failures are difficult to diagnose, for instance there is no way to determine if a circuit breaker exception is due to the current shard request or because the node is overloaded.
We didn't reach a conclusion for this specific failure but we agreed that we should categorize each error type in order to determine if they should be retried or not.

@jimczi jimczi removed the discuss label May 5, 2020
@jimczi
Copy link
Contributor Author

jimczi commented May 5, 2020

I opened #56236 to handle non-assigned shards in search request. This issue is now geared towards classifying shard failures that shouldn't be retried automatically.

@pxsalehi pxsalehi removed :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. Team:Distributed Meta label for distributed team labels Jul 28, 2022
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@javanna javanna added :Search Foundations/Search Catch all for Search Foundations and removed :Search/Search Search-related issues that do not fall into other categories labels Jul 17, 2024
@elasticsearchmachine elasticsearchmachine added Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch and removed Team:Search Meta label for search team labels Jul 17, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-foundations (Team:Search Foundations)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search Foundations/Search Catch all for Search Foundations Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch
Projects
None yet
Development

No branches or pull requests

5 participants