Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ra_server: Update leaderboard after each command if cluster changed #417

Merged
merged 4 commits into from
Feb 8, 2024

Conversation

dumbbell
Copy link
Member

@dumbbell dumbbell commented Feb 6, 2024

Why

In a RabbitMQ test with a 5-node cluster where Erlang nodes are killed and restarted in a rolling fashion, I observed that the leaderboards might be inconsistent across the cluster.

Logs from node 1:

2024-02-06 12:29:15.792058+01:00 [debug] <0.251.0> Trying to restart local Ra server for store "rabbitmq_metadata" in Ra system "coordination"
2024-02-06 12:29:15.810508+01:00 [debug] <0.296.0> RabbitMQ metadata store: ra_log:init recovered last_index_term {60,1} first index 0
2024-02-06 12:29:15.813225+01:00 [debug] <0.296.0> RabbitMQ metadata store: post_init -> recover in term: 1 machine version: 0
2024-02-06 12:29:15.813291+01:00 [debug] <0.296.0> RabbitMQ metadata store: recovering state machine version 0:0 from index 0 to 0
2024-02-06 12:29:15.813347+01:00 [debug] <0.296.0> RabbitMQ metadata store: recovery of state machine version 0:0 from index 0 to 0 took 0ms
2024-02-06 12:29:15.813413+01:00 [info] <0.251.0> Waiting for Khepri leader for 30000 ms, 9 retries left
2024-02-06 12:29:15.813428+01:00 [debug] <0.296.0> RabbitMQ metadata store: recover -> recovered in term: 1 machine version: 0
2024-02-06 12:29:15.813484+01:00 [debug] <0.296.0> RabbitMQ metadata store: recovered -> follower in term: 1 machine version: 0
2024-02-06 12:29:15.813520+01:00 [debug] <0.296.0> RabbitMQ metadata store: is not new, setting election timeout.
2024-02-06 12:29:15.818609+01:00 [info] <0.251.0> Khepri leader elected
2024-02-06 12:29:15.820842+01:00 [debug] <0.296.0> RabbitMQ metadata store: leader call - leader not known. Command will be forwarded once leader is known.
2024-02-06 12:29:16.176876+01:00 [debug] <0.296.0> RabbitMQ metadata store: pre_vote election called for in term 1
2024-02-06 12:29:16.182866+01:00 [debug] <0.296.0> RabbitMQ metadata store: follower -> pre_vote in term: 1 machine version: 0
2024-02-06 12:29:16.183051+01:00 [debug] <0.296.0> RabbitMQ metadata store: pre_vote granted #Ref<0.2919399944.1500250113.30962> for term 1 votes 1
2024-02-06 12:29:16.183247+01:00 [debug] <0.296.0> RabbitMQ metadata store: election called for in term 2
2024-02-06 12:29:16.186666+01:00 [debug] <0.296.0> RabbitMQ metadata store: pre_vote -> candidate in term: 2 machine version: 0
2024-02-06 12:29:16.186815+01:00 [debug] <0.296.0> RabbitMQ metadata store: vote granted for term 2 votes 1
2024-02-06 12:29:16.186996+01:00 [notice] <0.296.0> RabbitMQ metadata store: candidate -> leader in term: 2 machine version: 0
2024-02-06 12:29:16.187220+01:00 [alert] <0.296.0> RA LEADER RECORD: ClusterName=rabbitmq_metadata Leader={rabbitmq_metadata,'rmq-ct-rolling_kill_restart-1-22440@localhost'} Members=[{rabbitmq_metadata,'rmq-ct-rolling_kill_restart-1-22440@localhost'}]
2024-02-06 12:29:16.190313+01:00 [debug] <0.296.0> RabbitMQ metadata store: applying ra cluster change to [{rabbitmq_metadata,'rmq-ct-rolling_kill_restart-1-22440@localhost'},{rabbitmq_metadata,'rmq-ct-rolling_kill_restart-5-22632@localhost'}]
2024-02-06 12:29:16.190648+01:00 [debug] <0.296.0> RabbitMQ metadata store: applying ra cluster change to [{rabbitmq_metadata,'rmq-ct-rolling_kill_restart-1-22440@localhost'},{rabbitmq_metadata,'rmq-ct-rolling_kill_restart-2-22488@localhost'},{rabbitmq_metadata,'rmq-ct-rolling_kill_restart-5-22632@localhost'}]
2024-02-06 12:29:16.190878+01:00 [debug] <0.296.0> RabbitMQ metadata store: applying ra cluster change to [{rabbitmq_metadata,'rmq-ct-rolling_kill_restart-1-22440@localhost'},{rabbitmq_metadata,'rmq-ct-rolling_kill_restart-2-22488@localhost'},{rabbitmq_metadata,'rmq-ct-rolling_kill_restart-3-22536@localhost'},{rabbitmq_metadata,'rmq-ct-rolling_kill_restart-5-22632@localhost'}]
2024-02-06 12:29:16.191393+01:00 [debug] <0.296.0> RabbitMQ metadata store: applying ra cluster change to [{rabbitmq_metadata,'rmq-ct-rolling_kill_restart-1-22440@localhost'},{rabbitmq_metadata,'rmq-ct-rolling_kill_restart-2-22488@localhost'},{rabbitmq_metadata,'rmq-ct-rolling_kill_restart-3-22536@localhost'},{rabbitmq_metadata,'rmq-ct-rolling_kill_restart-4-22584@localhost'},{rabbitmq_metadata,'rmq-ct-rolling_kill_restart-5-22632@localhost'}]
2024-02-06 12:29:16.200058+01:00 [debug] <0.296.0> RabbitMQ metadata store: enabling ra cluster changes in 2, index 61
2024-02-06 12:29:16.302892+01:00 [debug] <0.251.0> Khepri-based RabbitMQ metadata store ready

Logs from node 2:

2024-02-06 12:29:18.714368+01:00 [debug] <0.260.0> Trying to restart local Ra server for store "rabbitmq_metadata" in Ra system "coordination"
2024-02-06 12:29:18.732537+01:00 [debug] <0.310.0> RabbitMQ metadata store: ra_log:init recovered last_index_term {74,2} first index 0
2024-02-06 12:29:18.735899+01:00 [debug] <0.310.0> RabbitMQ metadata store: post_init -> recover in term: 2 machine version: 0
2024-02-06 12:29:18.735944+01:00 [debug] <0.310.0> RabbitMQ metadata store: recovering state machine version 0:0 from index 0 to 60
2024-02-06 12:29:18.736488+01:00 [info] <0.260.0> Waiting for Khepri leader for 30000 ms, 9 retries left
2024-02-06 12:29:18.741696+01:00 [debug] <0.310.0> RabbitMQ metadata store: applying ra cluster change to [{rabbitmq_metadata,'rmq-ct-rolling_kill_restart-1-22440@localhost'},{rabbitmq_metadata,'rmq-ct-rolling_kill_restart-5-22632@localhost'}]
2024-02-06 12:29:18.741784+01:00 [debug] <0.310.0> RabbitMQ metadata store: applying ra cluster change to [{rabbitmq_metadata,'rmq-ct-rolling_kill_restart-1-22440@localhost'},{rabbitmq_metadata,'rmq-ct-rolling_kill_restart-2-22488@localhost'},{rabbitmq_metadata,'rmq-ct-rolling_kill_restart-5-22632@localhost'}]
2024-02-06 12:29:18.741852+01:00 [debug] <0.310.0> RabbitMQ metadata store: applying ra cluster change to [{rabbitmq_metadata,'rmq-ct-rolling_kill_restart-1-22440@localhost'},{rabbitmq_metadata,'rmq-ct-rolling_kill_restart-2-22488@localhost'},{rabbitmq_metadata,'rmq-ct-rolling_kill_restart-3-22536@localhost'},{rabbitmq_metadata,'rmq-ct-rolling_kill_restart-5-22632@localhost'}]
2024-02-06 12:29:18.741969+01:00 [debug] <0.310.0> RabbitMQ metadata store: applying ra cluster change to [{rabbitmq_metadata,'rmq-ct-rolling_kill_restart-1-22440@localhost'},{rabbitmq_metadata,'rmq-ct-rolling_kill_restart-2-22488@localhost'},{rabbitmq_metadata,'rmq-ct-rolling_kill_restart-3-22536@localhost'},{rabbitmq_metadata,'rmq-ct-rolling_kill_restart-4-22584@localhost'},{rabbitmq_metadata,'rmq-ct-rolling_kill_restart-5-22632@localhost'}]
2024-02-06 12:29:18.743421+01:00 [debug] <0.310.0> RabbitMQ metadata store: recovery of state machine version 0:0 from index 0 to 60 took 8ms
2024-02-06 12:29:18.743487+01:00 [debug] <0.310.0> RabbitMQ metadata store: recover -> recovered in term: 2 machine version: 0
2024-02-06 12:29:18.748414+01:00 [debug] <0.310.0> RabbitMQ metadata store: recovered -> follower in term: 2 machine version: 0
2024-02-06 12:29:18.748445+01:00 [debug] <0.310.0> RabbitMQ metadata store: is not new, setting election timeout.
2024-02-06 12:29:18.748591+01:00 [info] <0.260.0> Khepri leader elected
2024-02-06 12:29:18.748642+01:00 [debug] <0.310.0> RabbitMQ metadata store: leader call - leader not known. Command will be forwarded once leader is known.
2024-02-06 12:29:19.036638+01:00 [debug] <0.310.0> RabbitMQ metadata store: pre_vote election called for in term 2
2024-02-06 12:29:19.040483+01:00 [debug] <0.310.0> RabbitMQ metadata store: follower -> pre_vote in term: 2 machine version: 0
2024-02-06 12:29:19.040652+01:00 [debug] <0.310.0> RabbitMQ metadata store: pre_vote granted #Ref<0.772528231.2305294338.197063> for term 2 votes 1
2024-02-06 12:29:19.043116+01:00 [debug] <0.310.0> RabbitMQ metadata store: pre_vote -> follower in term: 2 machine version: 0
2024-02-06 12:29:19.043212+01:00 [debug] <0.310.0> RabbitMQ metadata store: is not new, setting election timeout.
2024-02-06 12:29:19.044068+01:00 [debug] <0.310.0> RabbitMQ metadata store: enabling ra cluster changes in 2, index 61
2024-02-06 12:29:19.061918+01:00 [alert] <0.310.0> RA LEADER RECORD: ClusterName=rabbitmq_metadata Leader={rabbitmq_metadata,'rmq-ct-rolling_kill_restart-1-22440@localhost'} Members=[{rabbitmq_metadata,'rmq-ct-rolling_kill_restart-1-22440@localhost'},{rabbitmq_metadata,'rmq-ct-rolling_kill_restart-2-22488@localhost'},{rabbitmq_metadata,'rmq-ct-rolling_kill_restart-3-22536@localhost'},{rabbitmq_metadata,'rmq-ct-rolling_kill_restart-4-22584@localhost'},{rabbitmq_metadata,'rmq-ct-rolling_kill_restart-5-22632@localhost'}]
2024-02-06 12:29:19.062191+01:00 [info] <0.310.0> RabbitMQ metadata store: detected a new leader {rabbitmq_metadata,'rmq-ct-rolling_kill_restart-1-22440@localhost'} in term 2
2024-02-06 12:29:19.062378+01:00 [debug] <0.310.0> RabbitMQ metadata store: Leader node 'rmq-ct-rolling_kill_restart-1-22440@localhost' may be down, setting pre-vote timeout
2024-02-06 12:29:19.151724+01:00 [debug] <0.260.0> Khepri-based RabbitMQ metadata store ready

Logs on nodes 3, 4 and 5 are similar to node 2.

Note that messages starting with "RA LEADER RECORD" were added to ra_leaderboard:record/3 during debugging and are not committed to Ra.

We can see that node 1 is elected the leader on recovery before the ra_cluster_change commands are applied. It stays the leader afterwards but never update its leaderboard.

How

Instead of emitting an update_leaderboard effect after a cluster change command was appended or after some specific events, we check if the cluster changed after a command was applied and update the leaderboard if it did.

This check is performed on the leader and all followers. It is also performed once a Ra server finished recovery.

V2: This revision changes the approach. The first iteration updated the leaderboard after a ra_cluster_change command was applied in addition of the existing updates.

@dumbbell dumbbell added the bug label Feb 6, 2024
@dumbbell dumbbell marked this pull request as draft February 6, 2024 12:53
@dumbbell dumbbell marked this pull request as ready for review February 6, 2024 12:55
@dumbbell
Copy link
Member Author

dumbbell commented Feb 6, 2024

FTR, the testcase in RabbitMQ that is failing because of this:

gmake -C deps/rabbit ct-clustering_recovery t=cluster_size_5:rolling_kill_restart

It takes a few runs locally to hit the problem.

@dumbbell dumbbell force-pushed the update-leaderboard-on-recovery branch from 4fd69f1 to 7f8727f Compare February 6, 2024 13:11
@dumbbell dumbbell marked this pull request as draft February 6, 2024 14:37
@dumbbell dumbbell force-pushed the update-leaderboard-on-recovery branch from f8410be to 7eb516b Compare February 6, 2024 14:55
@dumbbell dumbbell changed the title ra_server: Emit update_leaderboard effect after applying ra_cluster_change ra_server: Update leaderboard after each command if cluster changed Feb 6, 2024
@dumbbell dumbbell force-pushed the update-leaderboard-on-recovery branch from 7eb516b to 301ca62 Compare February 6, 2024 14:58
src/ra_server_proc.erl Outdated Show resolved Hide resolved
src/ra_server_proc.erl Outdated Show resolved Hide resolved
dumbbell and others added 4 commits February 8, 2024 16:48
[Why]
In a RabbitMQ test with a 5-node cluster where Erlang nodes are killed
and restarted in a rolling fashion, I observed that the leaderboards
might be inconsistent across the cluster.

Logs from node 1:

    2024-02-06 12:29:15.792058+01:00 [debug] <0.251.0> Trying to restart local Ra server for store "rabbitmq_metadata" in Ra system "coordination"
    2024-02-06 12:29:15.810508+01:00 [debug] <0.296.0> RabbitMQ metadata store: ra_log:init recovered last_index_term {60,1} first index 0
    2024-02-06 12:29:15.813225+01:00 [debug] <0.296.0> RabbitMQ metadata store: post_init -> recover in term: 1 machine version: 0
    2024-02-06 12:29:15.813291+01:00 [debug] <0.296.0> RabbitMQ metadata store: recovering state machine version 0:0 from index 0 to 0
    2024-02-06 12:29:15.813347+01:00 [debug] <0.296.0> RabbitMQ metadata store: recovery of state machine version 0:0 from index 0 to 0 took 0ms
    2024-02-06 12:29:15.813413+01:00 [info] <0.251.0> Waiting for Khepri leader for 30000 ms, 9 retries left
    2024-02-06 12:29:15.813428+01:00 [debug] <0.296.0> RabbitMQ metadata store: recover -> recovered in term: 1 machine version: 0
    2024-02-06 12:29:15.813484+01:00 [debug] <0.296.0> RabbitMQ metadata store: recovered -> follower in term: 1 machine version: 0
    2024-02-06 12:29:15.813520+01:00 [debug] <0.296.0> RabbitMQ metadata store: is not new, setting election timeout.
    2024-02-06 12:29:15.818609+01:00 [info] <0.251.0> Khepri leader elected
    2024-02-06 12:29:15.820842+01:00 [debug] <0.296.0> RabbitMQ metadata store: leader call - leader not known. Command will be forwarded once leader is known.
    2024-02-06 12:29:16.176876+01:00 [debug] <0.296.0> RabbitMQ metadata store: pre_vote election called for in term 1
    2024-02-06 12:29:16.182866+01:00 [debug] <0.296.0> RabbitMQ metadata store: follower -> pre_vote in term: 1 machine version: 0
    2024-02-06 12:29:16.183051+01:00 [debug] <0.296.0> RabbitMQ metadata store: pre_vote granted #Ref<0.2919399944.1500250113.30962> for term 1 votes 1
    2024-02-06 12:29:16.183247+01:00 [debug] <0.296.0> RabbitMQ metadata store: election called for in term 2
    2024-02-06 12:29:16.186666+01:00 [debug] <0.296.0> RabbitMQ metadata store: pre_vote -> candidate in term: 2 machine version: 0
    2024-02-06 12:29:16.186815+01:00 [debug] <0.296.0> RabbitMQ metadata store: vote granted for term 2 votes 1
    2024-02-06 12:29:16.186996+01:00 [notice] <0.296.0> RabbitMQ metadata store: candidate -> leader in term: 2 machine version: 0
    2024-02-06 12:29:16.187220+01:00 [alert] <0.296.0> RA LEADER RECORD: ClusterName=rabbitmq_metadata Leader={rabbitmq_metadata,'rmq-ct-rolling_kill_restart-1-22440@localhost'} Members=[{rabbitmq_metadata,'rmq-ct-rolling_kill_restart-1-22440@localhost'}]
    2024-02-06 12:29:16.190313+01:00 [debug] <0.296.0> RabbitMQ metadata store: applying ra cluster change to [{rabbitmq_metadata,'rmq-ct-rolling_kill_restart-1-22440@localhost'},{rabbitmq_metadata,'rmq-ct-rolling_kill_restart-5-22632@localhost'}]
    2024-02-06 12:29:16.190648+01:00 [debug] <0.296.0> RabbitMQ metadata store: applying ra cluster change to [{rabbitmq_metadata,'rmq-ct-rolling_kill_restart-1-22440@localhost'},{rabbitmq_metadata,'rmq-ct-rolling_kill_restart-2-22488@localhost'},{rabbitmq_metadata,'rmq-ct-rolling_kill_restart-5-22632@localhost'}]
    2024-02-06 12:29:16.190878+01:00 [debug] <0.296.0> RabbitMQ metadata store: applying ra cluster change to [{rabbitmq_metadata,'rmq-ct-rolling_kill_restart-1-22440@localhost'},{rabbitmq_metadata,'rmq-ct-rolling_kill_restart-2-22488@localhost'},{rabbitmq_metadata,'rmq-ct-rolling_kill_restart-3-22536@localhost'},{rabbitmq_metadata,'rmq-ct-rolling_kill_restart-5-22632@localhost'}]
    2024-02-06 12:29:16.191393+01:00 [debug] <0.296.0> RabbitMQ metadata store: applying ra cluster change to [{rabbitmq_metadata,'rmq-ct-rolling_kill_restart-1-22440@localhost'},{rabbitmq_metadata,'rmq-ct-rolling_kill_restart-2-22488@localhost'},{rabbitmq_metadata,'rmq-ct-rolling_kill_restart-3-22536@localhost'},{rabbitmq_metadata,'rmq-ct-rolling_kill_restart-4-22584@localhost'},{rabbitmq_metadata,'rmq-ct-rolling_kill_restart-5-22632@localhost'}]
    2024-02-06 12:29:16.200058+01:00 [debug] <0.296.0> RabbitMQ metadata store: enabling ra cluster changes in 2, index 61
    2024-02-06 12:29:16.302892+01:00 [debug] <0.251.0> Khepri-based RabbitMQ metadata store ready

Logs from node 2:

    2024-02-06 12:29:18.714368+01:00 [debug] <0.260.0> Trying to restart local Ra server for store "rabbitmq_metadata" in Ra system "coordination"
    2024-02-06 12:29:18.732537+01:00 [debug] <0.310.0> RabbitMQ metadata store: ra_log:init recovered last_index_term {74,2} first index 0
    2024-02-06 12:29:18.735899+01:00 [debug] <0.310.0> RabbitMQ metadata store: post_init -> recover in term: 2 machine version: 0
    2024-02-06 12:29:18.735944+01:00 [debug] <0.310.0> RabbitMQ metadata store: recovering state machine version 0:0 from index 0 to 60
    2024-02-06 12:29:18.736488+01:00 [info] <0.260.0> Waiting for Khepri leader for 30000 ms, 9 retries left
    2024-02-06 12:29:18.741696+01:00 [debug] <0.310.0> RabbitMQ metadata store: applying ra cluster change to [{rabbitmq_metadata,'rmq-ct-rolling_kill_restart-1-22440@localhost'},{rabbitmq_metadata,'rmq-ct-rolling_kill_restart-5-22632@localhost'}]
    2024-02-06 12:29:18.741784+01:00 [debug] <0.310.0> RabbitMQ metadata store: applying ra cluster change to [{rabbitmq_metadata,'rmq-ct-rolling_kill_restart-1-22440@localhost'},{rabbitmq_metadata,'rmq-ct-rolling_kill_restart-2-22488@localhost'},{rabbitmq_metadata,'rmq-ct-rolling_kill_restart-5-22632@localhost'}]
    2024-02-06 12:29:18.741852+01:00 [debug] <0.310.0> RabbitMQ metadata store: applying ra cluster change to [{rabbitmq_metadata,'rmq-ct-rolling_kill_restart-1-22440@localhost'},{rabbitmq_metadata,'rmq-ct-rolling_kill_restart-2-22488@localhost'},{rabbitmq_metadata,'rmq-ct-rolling_kill_restart-3-22536@localhost'},{rabbitmq_metadata,'rmq-ct-rolling_kill_restart-5-22632@localhost'}]
    2024-02-06 12:29:18.741969+01:00 [debug] <0.310.0> RabbitMQ metadata store: applying ra cluster change to [{rabbitmq_metadata,'rmq-ct-rolling_kill_restart-1-22440@localhost'},{rabbitmq_metadata,'rmq-ct-rolling_kill_restart-2-22488@localhost'},{rabbitmq_metadata,'rmq-ct-rolling_kill_restart-3-22536@localhost'},{rabbitmq_metadata,'rmq-ct-rolling_kill_restart-4-22584@localhost'},{rabbitmq_metadata,'rmq-ct-rolling_kill_restart-5-22632@localhost'}]
    2024-02-06 12:29:18.743421+01:00 [debug] <0.310.0> RabbitMQ metadata store: recovery of state machine version 0:0 from index 0 to 60 took 8ms
    2024-02-06 12:29:18.743487+01:00 [debug] <0.310.0> RabbitMQ metadata store: recover -> recovered in term: 2 machine version: 0
    2024-02-06 12:29:18.748414+01:00 [debug] <0.310.0> RabbitMQ metadata store: recovered -> follower in term: 2 machine version: 0
    2024-02-06 12:29:18.748445+01:00 [debug] <0.310.0> RabbitMQ metadata store: is not new, setting election timeout.
    2024-02-06 12:29:18.748591+01:00 [info] <0.260.0> Khepri leader elected
    2024-02-06 12:29:18.748642+01:00 [debug] <0.310.0> RabbitMQ metadata store: leader call - leader not known. Command will be forwarded once leader is known.
    2024-02-06 12:29:19.036638+01:00 [debug] <0.310.0> RabbitMQ metadata store: pre_vote election called for in term 2
    2024-02-06 12:29:19.040483+01:00 [debug] <0.310.0> RabbitMQ metadata store: follower -> pre_vote in term: 2 machine version: 0
    2024-02-06 12:29:19.040652+01:00 [debug] <0.310.0> RabbitMQ metadata store: pre_vote granted #Ref<0.772528231.2305294338.197063> for term 2 votes 1
    2024-02-06 12:29:19.043116+01:00 [debug] <0.310.0> RabbitMQ metadata store: pre_vote -> follower in term: 2 machine version: 0
    2024-02-06 12:29:19.043212+01:00 [debug] <0.310.0> RabbitMQ metadata store: is not new, setting election timeout.
    2024-02-06 12:29:19.044068+01:00 [debug] <0.310.0> RabbitMQ metadata store: enabling ra cluster changes in 2, index 61
    2024-02-06 12:29:19.061918+01:00 [alert] <0.310.0> RA LEADER RECORD: ClusterName=rabbitmq_metadata Leader={rabbitmq_metadata,'rmq-ct-rolling_kill_restart-1-22440@localhost'} Members=[{rabbitmq_metadata,'rmq-ct-rolling_kill_restart-1-22440@localhost'},{rabbitmq_metadata,'rmq-ct-rolling_kill_restart-2-22488@localhost'},{rabbitmq_metadata,'rmq-ct-rolling_kill_restart-3-22536@localhost'},{rabbitmq_metadata,'rmq-ct-rolling_kill_restart-4-22584@localhost'},{rabbitmq_metadata,'rmq-ct-rolling_kill_restart-5-22632@localhost'}]
    2024-02-06 12:29:19.062191+01:00 [info] <0.310.0> RabbitMQ metadata store: detected a new leader {rabbitmq_metadata,'rmq-ct-rolling_kill_restart-1-22440@localhost'} in term 2
    2024-02-06 12:29:19.062378+01:00 [debug] <0.310.0> RabbitMQ metadata store: Leader node 'rmq-ct-rolling_kill_restart-1-22440@localhost' may be down, setting pre-vote timeout
    2024-02-06 12:29:19.151724+01:00 [debug] <0.260.0> Khepri-based RabbitMQ metadata store ready

Logs on nodes 3, 4 and 5 are similar to node 2.

Note that messages starting with "RA LEADER RECORD" were added to
`ra_leaderboard:record/3` during debugging and are not committed to Ra.

We can see that node 1 is elected the leader on recovery before the
`ra_cluster_change` commands are applied. It stays the leader afterwards
but never updates its leaderboard.

[How]
Instead of emitting an `update_leaderboard` effect after a cluster
change command was appended or after some specific events, we check if
the cluster changed after a command was applied and update the
leaderboard if it did.

This check is performed on the leader and all followers. It is also
performed once a Ra server finished recovery.

V2: This revision changes the approach. The first iteration updated the
    leaderboard after a `ra_cluster_change` command was applied in
    addition of the existing updates.

V3: Refactor the approach and and ensure the leaderboard is updated
after recovery as well as whenver the cluster changes on any member.
In Raft the cluster is changed when a cluster change command is
written to the log. Hence to correctly recover the current cluster
a state machine needs to scan until the end of the log on recovery
to ensure it picks up any observed changes.

Ra didn't do this and thus were vulnerable to pick up the wrong
cluster configuration (which ofc would affect subsequent leader elections).
When the wal detects a new writer it creates a new ETS mem table
and inserts an initial record into the open mem table.. hrm.. table.

This is so that the open_mem_tables table can later be updated using
ets:update_element/3.

However the initial update was such that it indicated that it should
contain a single entry which isn't yet true. The entires are written
a little later during batch completion.

This is an attempt to avoid this condition by setting the "To"
field to -1 that of the "From" field.
@kjnilsson kjnilsson force-pushed the update-leaderboard-on-recovery branch from ae4a61f to 2c9a7fe Compare February 8, 2024 16:50
@kjnilsson kjnilsson marked this pull request as ready for review February 8, 2024 16:52
@kjnilsson kjnilsson merged commit 7910c4f into main Feb 8, 2024
7 checks passed
@dumbbell dumbbell deleted the update-leaderboard-on-recovery branch February 8, 2024 17:06
@dumbbell dumbbell added this to the 2.9.1 milestone Feb 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants