Ra checkpoints #141

kjnilsson · 2019-11-12T13:46:03Z

Some state machines that deliberately leave a long log in place (e.g. RabbitMQ quorum queues may do this as messages on queue is increasing) may want to create checkpoints which effectively are snapshots that do not truncate the log.

This will help speeding up the recovery phase which can be very slow if the log is very large as the server can use the latest checkpoint rather than the snapshot as a recovery starting point. Checkpoints could even be written during an orderly shutdown of the ra server.

It can also help reduce memory overhead of RabbitMQ quorum queues as they currently keep something akin to checkpoints in memory.

Machines would use a new effect: {checkpoint, ra_index(), machine_state()} to emit new checkpoints. A server could have many checkpoints but it is likely that we'd need some upper limit. Once the upper limit is reached Ra would "thin" the list of checkpoints such that the oldest and newest checkpoints are always in retained but checkpoints between them would become further and further apart.

Checkpoints can be promoted to snapshots using a new {release_cursor, ra_index()} effect which will promote the checkpoint with the highest index that is lower or equal to the release_cursor index to a snapshot and delete all checkpoints up to and including the promoted checkpoint.

Any checkpoints with an index lower than the current snapshot should be removed.

Ra should avoid writing checkpoints if the last checkpoint was not written too many indexes ago. (e.g. 4096) to avoid a proliferation of checkpoint work.

Checkpoints are kept in the same structure as snapshots in a directory called checkpoints that is adjacent to the snapshots directory.

The text was updated successfully, but these errors were encountered:

kjnilsson added the effort-medium label Nov 12, 2019

mkuratczyk mentioned this issue Jul 12, 2023

Improvement request: Faster Quorum Queue startup rabbitmq/rabbitmq-server#3027

Closed

the-mikedavis self-assigned this Nov 29, 2023

the-mikedavis mentioned this issue Jan 29, 2024

Checkpoints #415

Merged

kjnilsson closed this as completed in #415 Feb 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ra checkpoints #141

Ra checkpoints #141

kjnilsson commented Nov 12, 2019 •

edited

Loading

Ra checkpoints #141

Ra checkpoints #141

Comments

kjnilsson commented Nov 12, 2019 • edited Loading

kjnilsson commented Nov 12, 2019 •

edited

Loading