Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure tasks preserve versions in MasterService #109900

Conversation

DaveCTurner
Copy link
Contributor

ClusterState#version, Metadata#version and RoutingTable#version
are all managed solely by the MasterService, in the sense that it's a
definite bug for the cluster state update task executor to meddle with
them. Today if we encounter such a bug then we try and publish the
resulting state anyway, which hopefully fails (triggering a master
election) but it may in theory succeed (potentially reverting older
cluster state updates). Neither is a particularly good outcome.

With this commit we add a check for consistency of these version numbers
during the cluster state computation and fail the state update without a
master failover if a discrepancy is found.

It also fixes a super-subtle bug in TransportMigrateToDataTiersAction
that can muck up these version numbers.

Backport of #109850 to 8.14

`ClusterState#version`, `Metadata#version` and `RoutingTable#version`
are all managed solely by the `MasterService`, in the sense that it's a
definite bug for the cluster state update task executor to meddle with
them. Today if we encounter such a bug then we try and publish the
resulting state anyway, which hopefully fails (triggering a master
election) but it may in theory succeed (potentially reverting older
cluster state updates). Neither is a particularly good outcome.

With this commit we add a check for consistency of these version numbers
during the cluster state computation and fail the state update without a
master failover if a discrepancy is found.

It also fixes a super-subtle bug in `TransportMigrateToDataTiersAction`
that can muck up these version numbers.
@DaveCTurner DaveCTurner added >bug :Distributed/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. backport v8.14.2 auto-merge Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) labels Jun 19, 2024
@elasticsearchmachine elasticsearchmachine merged commit 00c9943 into elastic:8.14 Jul 2, 2024
15 checks passed
@DaveCTurner DaveCTurner deleted the 2024/06/19/ensure-tasks-preserve-versions-8.14 branch July 2, 2024 13:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-merge Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) backport >bug :Distributed/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. v8.14.3
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants