-
Notifications
You must be signed in to change notification settings - Fork 24.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create IndexVersionAllocationDecider #102708
Conversation
Pinging @elastic/es-distributed (Team:Distributed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This allocation decider exists because we need to be able to copy data (both segments and operations) from the primary to the replica, which means that the replica must understand at least everything that the primary understands. However @rjernst just mentioned to me that IndexVersion
doesn't cover certain changes that would prevent us from copying data from a newer version to an older version, such as the introduction of new mapping types. I think we need to account for all sorts of compatibility issues in this decider.
I've modified this to add |
...java/org/elasticsearch/cluster/routing/allocation/decider/IndexVersionAllocationDecider.java
Outdated
Show resolved
Hide resolved
@DaveCTurner Could we get this in alongside NodeVersionAllocationDecider? It will prevent allocation problems when IndexVersion is bumped on serverless. We can remove VersionAllocationDecider later when we find a solution to the mapping type/option allocation problem |
@elasticmachine update branch |
@DaveCTurner I've updated this to use the mapping update index version added by #106748. Do we also need to consider the highest index version used to write a shard, as added by #106779? Or if the mappings aren't updated, does it matter? If a new shard is created as part of a write, will have the index version of the node that created it? |
So, Maybe combined with #106779, might it actually be valid to move a shard from a later version node to an earlier node, if nothing has actually changed about the shard? I've also removed the distinction between primary and replica shard behaviour, now its just considering the shard as-is. I don't think there's a difference needed between the two |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few initial comments to the decider.
...java/org/elasticsearch/cluster/routing/allocation/decider/IndexVersionAllocationDecider.java
Outdated
Show resolved
Hide resolved
return allocation.decision(Decision.YES, NAME, "the primary shard is new or already existed on the node"); | ||
} | ||
} else { | ||
return isShardIndexVersionCompatible(shardRouting, node, allocation); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this use isIndexVersionCompatibleRelocatePrimary
as well?
// replica - just check the index mappings version & segment updated version | ||
return isShardIndexVersionCompatible(shardRouting, node, allocation); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this should check for same or newer index version of the primary vs replica nodes in addition to the check against metadata?
I think we also need a data version check - make sure we don't allocate shards to an index version that is less than the highest index version used to write data. This needs to be stored in index metadata, which means we need some kind of listener to be notified of new commits being added, and which index version it is, and stores that version in the index metadata. |
As discussed, I have re-created IndexVersionAllocationDecider by copying NodeVersionAllocationDecider, replacing references to node version with maximum index version |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Have a few comments, primarily on the test.
...java/org/elasticsearch/cluster/routing/allocation/decider/IndexVersionAllocationDecider.java
Outdated
Show resolved
Hide resolved
...java/org/elasticsearch/cluster/routing/allocation/decider/IndexVersionAllocationDecider.java
Outdated
Show resolved
Hide resolved
...java/org/elasticsearch/cluster/routing/allocation/decider/IndexVersionAllocationDecider.java
Outdated
Show resolved
Hide resolved
clusterState = ClusterState.builder(clusterState) | ||
.nodes( | ||
DiscoveryNodes.builder(clusterState.nodes()) | ||
.add(newNode("node3", VersionUtils.getPreviousVersion(), IndexVersionUtils.getPreviousVersion())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use a newer node-version here to check that the index version decider is the one that prevents the allocation? And perhaps vice-versa in NodeVersionAllocationDeciderTests
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated. The one exception is deciding on snapshot allocations - which can only use index version. NodeVersionAllocationDecider
references IndexVersion in isVersionCompatible
- is it best to just remove that path so NodeVersionAllocationDecider always returns ALWAYS for snapshots, and leaves it up to IndexVersionAllocationDecider?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd leave that as is for this PR. To some extent, NodeVersionAllocationDecider
ought to use the node-version until we are certain that the index-version will work, but that is umodified by this.
...st/java/org/elasticsearch/cluster/routing/allocation/IndexVersionAllocationDeciderTests.java
Show resolved
Hide resolved
...st/java/org/elasticsearch/cluster/routing/allocation/IndexVersionAllocationDeciderTests.java
Outdated
Show resolved
Hide resolved
...st/java/org/elasticsearch/cluster/routing/allocation/IndexVersionAllocationDeciderTests.java
Outdated
Show resolved
Hide resolved
...st/java/org/elasticsearch/cluster/routing/allocation/IndexVersionAllocationDeciderTests.java
Outdated
Show resolved
Hide resolved
…d NodeVersion tests modify Version
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Create IndexVersionAllocationDecider as a counterpart to NodeVersionAllocationDecider, that checks the max index version rather than node version
Create IndexVersionAllocationDecider alongside NodeVersionAllocationDecider. This uses IndexVersion to ensure we don't allocate to nodes that have different IndexVersions (but might have the same Version). Eventually, NodeVersionAllocationDecider will go away, but we're not there yet