-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ra checkpoints #141
Labels
Comments
Merged
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Some state machines that deliberately leave a long log in place (e.g. RabbitMQ quorum queues may do this as messages on queue is increasing) may want to create checkpoints which effectively are snapshots that do not truncate the log.
This will help speeding up the recovery phase which can be very slow if the log is very large as the server can use the latest checkpoint rather than the snapshot as a recovery starting point. Checkpoints could even be written during an orderly shutdown of the ra server.
It can also help reduce memory overhead of RabbitMQ quorum queues as they currently keep something akin to checkpoints in memory.
Machines would use a new effect:
{checkpoint, ra_index(), machine_state()}
to emit new checkpoints. A server could have many checkpoints but it is likely that we'd need some upper limit. Once the upper limit is reached Ra would "thin" the list of checkpoints such that the oldest and newest checkpoints are always in retained but checkpoints between them would become further and further apart.Checkpoints can be promoted to snapshots using a new
{release_cursor, ra_index()}
effect which will promote the checkpoint with the highest index that is lower or equal to the release_cursor index to a snapshot and delete all checkpoints up to and including the promoted checkpoint.Any checkpoints with an index lower than the current snapshot should be removed.
Ra should avoid writing checkpoints if the last checkpoint was not written too many indexes ago. (e.g. 4096) to avoid a proliferation of checkpoint work.
Checkpoints are kept in the same structure as snapshots in a directory called
checkpoints
that is adjacent to thesnapshots
directory.The text was updated successfully, but these errors were encountered: