Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[8.14] [Bug] During heavy indexing load it's possible for lazy rollover to trigger multiple rollovers (#109636) #110031

Merged
merged 1 commit into from
Jun 21, 2024

Conversation

gmarouli
Copy link
Contributor

Backport

This will backport the following commits from main to 8.14:

Questions ?

Please refer to the Backport tool documentation

…rigger multiple rollovers (elastic#109636)

Let’s say we have `my-metrics`  data stream which is receiving a lot of
indexing requests. The following scenario can result in multiple
unnecessary rollovers:

1. We update the mapping and mark it to be lazy rolled over
2. We receive 5 bulk index requests that all contain a write request for this data stream.
3. Each of these requests are being picked up “at the same time”, they see that the data stream needs to be rolled over and they issue a lazy rollover request.
4. Currently, data stream my-metrics  has 5 tasks executing an unconditional rollover.
5. The data stream gets rolled over 5 times instead of one.

This scenario is captured in the `LazyRolloverDuringDisruptionIT`.

We have witnessed this also in the wild, where a data stream was rolled
over 281 times extra resulting in 281 empty indices.

This PR proposes:

- To create a new task queue with a more efficient executor that further batches/deduplicates the requests.
- We add two safe guards, the first to ensure we will not enqueue the rollover task if we see that a rollover has occurred already. The second safe guard is during task execution, if we see that the data stream does not have the `rolloverOnWrite` flag set to `true` we skip the rollover.
- When we skip the rollover we return the following response:

```
{
  "acknowledged": true,
  "shards_acknowledged": true,
  "old_index": ".ds-my-data-stream-2099.05.07-000002",
  "new_index": ".ds-my-data-stream-2099.05.07-000002",
  "rolled_over": false,
  "dry_run": false,
  "lazy": false,
}
```
Copy link
Contributor

@nielsbauman nielsbauman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I just left one small comment

@@ -374,7 +374,7 @@ static Condition.Stats buildStats(@Nullable final IndexMetadata metadata, @Nulla
.flatMap(Arrays::stream)
.filter(shard -> shard.getShardRouting().primary())
.map(ShardStats::getStats)
.mapToLong(shard -> shard.docs.getCount())
.mapToLong(shard -> shard.docs == null ? 0L : shard.docs.getCount())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these changes (this one and the one a couple of lines higher) intentional? I see they're also on main, so I think they're ok, but they're not part of your original PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is true, I intentionally chose to leave them because it seems to me like a bug fix than a feature.

@gmarouli gmarouli merged commit 978557a into elastic:8.14 Jun 21, 2024
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants