Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create IndexVersionAllocationDecider #102708

Merged
merged 30 commits into from
Sep 4, 2024

Conversation

thecoop
Copy link
Member

@thecoop thecoop commented Nov 28, 2023

Create IndexVersionAllocationDecider alongside NodeVersionAllocationDecider. This uses IndexVersion to ensure we don't allocate to nodes that have different IndexVersions (but might have the same Version). Eventually, NodeVersionAllocationDecider will go away, but we're not there yet

@thecoop thecoop added :Distributed/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) >refactoring labels Nov 28, 2023
@elasticsearchmachine elasticsearchmachine added Team:Distributed Meta label for distributed team v8.12.0 labels Nov 28, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This allocation decider exists because we need to be able to copy data (both segments and operations) from the primary to the replica, which means that the replica must understand at least everything that the primary understands. However @rjernst just mentioned to me that IndexVersion doesn't cover certain changes that would prevent us from copying data from a newer version to an older version, such as the introduction of new mapping types. I think we need to account for all sorts of compatibility issues in this decider.

@thecoop thecoop changed the title Use IndexVersion in NodeVersionAllocationDecider Create IndexVersionAllocationDecider Dec 18, 2023
@thecoop
Copy link
Member Author

thecoop commented Dec 18, 2023

I've modified this to add IndexVersionAllocationDecider alongside NodeVersionAllocationDecider. This will improve the behaviour on serverless, but as discussed we need more work before we can replace NodeVersionAllocationDecider completely.

@thecoop
Copy link
Member Author

thecoop commented Feb 19, 2024

@DaveCTurner Could we get this in alongside NodeVersionAllocationDecider? It will prevent allocation problems when IndexVersion is bumped on serverless. We can remove VersionAllocationDecider later when we find a solution to the mapping type/option allocation problem

@thecoop
Copy link
Member Author

thecoop commented Feb 19, 2024

@elasticmachine update branch

@thecoop thecoop requested a review from rjernst June 21, 2024 11:01
@thecoop
Copy link
Member Author

thecoop commented Jun 21, 2024

@DaveCTurner I've updated this to use the mapping update index version added by #106748.

Do we also need to consider the highest index version used to write a shard, as added by #106779? Or if the mappings aren't updated, does it matter? If a new shard is created as part of a write, will have the index version of the node that created it?

@thecoop
Copy link
Member Author

thecoop commented Jun 21, 2024

So, IndexVersionAllocationDeciderTests.testRollingRestart is failing, however I'm not sure this test is valid as-is. This test is ensuring that indicies do not get moved from a higher version to a lower version node during a restart...but we're not longer considering the node versions we're moving to/from, we're now looking at the index metadata and what is actually stored.

Maybe combined with #106779, might it actually be valid to move a shard from a later version node to an earlier node, if nothing has actually changed about the shard?

I've also removed the distinction between primary and replica shard behaviour, now its just considering the shard as-is. I don't think there's a difference needed between the two

@thecoop thecoop removed the WIP label Jun 21, 2024
Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few initial comments to the decider.

return allocation.decision(Decision.YES, NAME, "the primary shard is new or already existed on the node");
}
} else {
return isShardIndexVersionCompatible(shardRouting, node, allocation);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this use isIndexVersionCompatibleRelocatePrimary as well?

Comment on lines 45 to 46
// replica - just check the index mappings version & segment updated version
return isShardIndexVersionCompatible(shardRouting, node, allocation);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this should check for same or newer index version of the primary vs replica nodes in addition to the check against metadata?

@thecoop
Copy link
Member Author

thecoop commented Jul 1, 2024

I think we also need a data version check - make sure we don't allocate shards to an index version that is less than the highest index version used to write data. This needs to be stored in index metadata, which means we need some kind of listener to be notified of new commits being added, and which index version it is, and stores that version in the index metadata.

@thecoop
Copy link
Member Author

thecoop commented Aug 29, 2024

As discussed, I have re-created IndexVersionAllocationDecider by copying NodeVersionAllocationDecider, replacing references to node version with maximum index version

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Have a few comments, primarily on the test.

clusterState = ClusterState.builder(clusterState)
.nodes(
DiscoveryNodes.builder(clusterState.nodes())
.add(newNode("node3", VersionUtils.getPreviousVersion(), IndexVersionUtils.getPreviousVersion()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use a newer node-version here to check that the index version decider is the one that prevents the allocation? And perhaps vice-versa in NodeVersionAllocationDeciderTests?

Copy link
Member Author

@thecoop thecoop Sep 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. The one exception is deciding on snapshot allocations - which can only use index version. NodeVersionAllocationDecider references IndexVersion in isVersionCompatible - is it best to just remove that path so NodeVersionAllocationDecider always returns ALWAYS for snapshots, and leaves it up to IndexVersionAllocationDecider?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd leave that as is for this PR. To some extent, NodeVersionAllocationDecider ought to use the node-version until we are certain that the index-version will work, but that is umodified by this.

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@thecoop thecoop merged commit b259622 into elastic:main Sep 4, 2024
15 checks passed
@thecoop thecoop deleted the indexversion-allocationdecider branch September 4, 2024 08:32
cbuescher pushed a commit to cbuescher/elasticsearch that referenced this pull request Sep 4, 2024
Create IndexVersionAllocationDecider as a counterpart to NodeVersionAllocationDecider, that checks the max index version rather than node version
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) >refactoring Team:Distributed Meta label for distributed team v8.16.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants