Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ra checkpoints #141

Closed
kjnilsson opened this issue Nov 12, 2019 · 0 comments · Fixed by #415
Closed

Ra checkpoints #141

kjnilsson opened this issue Nov 12, 2019 · 0 comments · Fixed by #415
Assignees

Comments

@kjnilsson
Copy link
Contributor

kjnilsson commented Nov 12, 2019

Some state machines that deliberately leave a long log in place (e.g. RabbitMQ quorum queues may do this as messages on queue is increasing) may want to create checkpoints which effectively are snapshots that do not truncate the log.

This will help speeding up the recovery phase which can be very slow if the log is very large as the server can use the latest checkpoint rather than the snapshot as a recovery starting point. Checkpoints could even be written during an orderly shutdown of the ra server.

It can also help reduce memory overhead of RabbitMQ quorum queues as they currently keep something akin to checkpoints in memory.

Machines would use a new effect: {checkpoint, ra_index(), machine_state()} to emit new checkpoints. A server could have many checkpoints but it is likely that we'd need some upper limit. Once the upper limit is reached Ra would "thin" the list of checkpoints such that the oldest and newest checkpoints are always in retained but checkpoints between them would become further and further apart.

Checkpoints can be promoted to snapshots using a new {release_cursor, ra_index()} effect which will promote the checkpoint with the highest index that is lower or equal to the release_cursor index to a snapshot and delete all checkpoints up to and including the promoted checkpoint.

Any checkpoints with an index lower than the current snapshot should be removed.

Ra should avoid writing checkpoints if the last checkpoint was not written too many indexes ago. (e.g. 4096) to avoid a proliferation of checkpoint work.

Checkpoints are kept in the same structure as snapshots in a directory called checkpoints that is adjacent to the snapshots directory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants