Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Atomic storage API #121

Merged
merged 25 commits into from
Nov 14, 2023
Merged

Conversation

kevin-harrison
Copy link
Contributor

@kevin-harrison kevin-harrison commented Oct 29, 2023

Please make sure these boxes are checked, before submitting a new PR.

  • You ran the local CI checker with ./check.sh with no errors
  • You reference which issue is being closed in the PR text (if applicable)
  • You updated the OmniPaxos book (if applicable)

Issues

Fix #97, Fix #49

Breaking Changes

  • TheStorage trait has been reworked to facilitate atomic writes and allow for arbitrary crash-recovery. The indices passed to these functions now refer to the indices of the entry log as if no trimming or snapshotting has happened.
  • PersistentStorage and its config structs have been reworked to use RocksDB and implement the new Storage trait. The dependence on commitlog is therefore gone, addressing issue Implement better trim functionality #49.
  • All indices are now of type usize instead of u64. This includes the indices used in the Storage trait, Message structs, and Omnipaxos public functions.
  • Both Storage::get_promise implementations in the omnipaxos_storage module now return None instead of the default Ballot when there is no promise ballot saved in storage.
  • The batch_accept feature flag has been removed and is now the default behavior.
  • The Promise, AcceptSync, and AcceptDecide message struct fields have been changed.

Other Changes

  • MemoryStorage now implements the new Storage trait but its implementation of write_atomically does not guarantee atomicity (acceptable since memory can't persist state through crash-recovery).
  • Batched entries are now correctly flushed when reconfiguring and when deciding entries.
  • The syncing behavior in the the prepare phase in now consistent between both leaders and followers.

Copy link
Owner

@haraldng haraldng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some initial comments. I'll do a more detailed review after these comments are fixed.

For next time, please don't do refactorings before the actual feature is approved. E.g., here the changes for atomic storage wouldn't have been bloated with the changes for replacing u64 with usize (which is trivial).

omnipaxos/src/sequence_paxos/follower.rs Outdated Show resolved Hide resolved
omnipaxos/src/sequence_paxos/mod.rs Outdated Show resolved Hide resolved
omnipaxos/src/sequence_paxos/follower.rs Outdated Show resolved Hide resolved
omnipaxos/src/storage.rs Outdated Show resolved Hide resolved
omnipaxos/src/storage.rs Outdated Show resolved Hide resolved
omnipaxos/src/storage.rs Outdated Show resolved Hide resolved
@haraldng
Copy link
Owner

haraldng commented Nov 2, 2023

After our discussion yesterday, I looked at the current code to see how we can keep as much as possible of the current batching and at the same time introduce transaction for atomic storage. I think what we need is a struct InternalStorageTxn for InternalStorage. This is to enable the clean transaction object API in the Sequence Paxos code. When we handle the InternalStorageTxn in InternalStorage, we will know which entries to actually flush. And based on that we will create the real transaction for storage. So something like this:

// transaction object for InternalStorage
struct InternalStorageTxn {
    state_ops: Vec<StateOp>, // StateOp is an enum for modifying accepted_round, decided_idx, etc.
    append_op: AppendOp,       // AppendOp is an enum for append, append_on_prefix, etc.
    snapshot: Option<(usize, S)>  // snapshot index and snapshot
}

// handle_acceptsync() in SequencePaxos
let state_ops = vec![StorageOp::AcceptedRound(4), StorageOp::DecidedIdx(10)];
let append_op = AppendOp::AppendOnPrefix(accsync.sync_idx, accsync.suffix);
let internal_txn = InternalStorageTxn {state_ops, append_ops, ..};
self.internal_storage.do_txn(internal_txn);

// do_txn(it: InternalStorageTxn) in InternalStorage
let append_res = match it.append_op {
    Append(entries, do_batching) if do_batching => self.state_cache.append_entries(entries),
    Append(entries, do_batching) if !do_batching => self.state_cache.append_entries_without_batching(entries),
    // ... AppendOnPrefix etc.
};
let mut txn = StorageTxn::new();  // transaction object for Storage
txn.insert(it.state_ops);
if let Some(entries_to_be_flushed) = append_res {
    txn.insert(StorageOp::Append(entries_to_be_flushed));
}
if let Some((idx, snapshot)) = it.snapshot {
    txn.insert(StorageOp::Snapshot(idx, snapshot));
}
self.state_cache.real_log_len = self.storage.flush(txn)?;
Ok(self.get_accepted_idx())

omnipaxos/src/sequence_paxos/follower.rs Outdated Show resolved Hide resolved
omnipaxos/src/sequence_paxos/follower.rs Outdated Show resolved Hide resolved
omnipaxos/src/sequence_paxos/follower.rs Outdated Show resolved Hide resolved
omnipaxos/src/sequence_paxos/leader.rs Outdated Show resolved Hide resolved
omnipaxos/src/storage.rs Outdated Show resolved Hide resolved
InternalStorage. Hides unicache types in structs.
@kevin-harrison kevin-harrison marked this pull request as ready for review November 7, 2023 13:07
@kevin-harrison kevin-harrison changed the title Atomic storage API (WIP) Atomic storage API Nov 7, 2023
Copy link
Owner

@haraldng haraldng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job! The design looks really good and seems to have made things a lot cleaner. The comments are mostly related to the code structure. Please fix them all or comment on places you have other ideas.

omnipaxos/src/messages.rs Show resolved Hide resolved
omnipaxos/src/messages.rs Outdated Show resolved Hide resolved
Comment on lines 19 to 20
// Flush any pending writes
self.internal_storage.flush_batch().expect(WRITE_ERROR_MSG);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is nothing wrong with this, but it could be optimized.

If we in general only send Accepted after flushing then actually this is not strictly necessary. (any chosen entry must be flushed among a majority and if we haven't flushed it, then we will get it in the AcceptSync).

To prevent flushing stuff that will anyway get overwritten, we should perhaps do some checks here first. The most basic check is if the leader is more updated than us, we don't flush. There are probably more sophisticated checks we can do. For now I think we should do this basic check and leave other optimizations for the future.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a second thought, let's leave this as it is and we can create an issue and fix it in the future. So we don't add more complexity to this PR which already has many changes.

omnipaxos/src/sequence_paxos/follower.rs Outdated Show resolved Hide resolved
omnipaxos/src/sequence_paxos/follower.rs Outdated Show resolved Hide resolved
omnipaxos/src/storage.rs Outdated Show resolved Hide resolved
omnipaxos/src/messages.rs Outdated Show resolved Hide resolved
omnipaxos/tests/config/test.toml Outdated Show resolved Hide resolved
omnipaxos_storage/Cargo.toml Outdated Show resolved Hide resolved
omnipaxos_storage/src/persistent_storage.rs Show resolved Hide resolved
Copy link
Owner

@haraldng haraldng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly just add more comments/docs. Can now go ahead and refactor the storage file.

omnipaxos/src/sequence_paxos/follower.rs Show resolved Hide resolved
omnipaxos/src/sequence_paxos/follower.rs Show resolved Hide resolved
omnipaxos/src/sequence_paxos/follower.rs Outdated Show resolved Hide resolved
omnipaxos/src/sequence_paxos/leader.rs Show resolved Hide resolved
omnipaxos/src/sequence_paxos/mod.rs Outdated Show resolved Hide resolved
omnipaxos/src/sequence_paxos/mod.rs Outdated Show resolved Hide resolved
omnipaxos/src/storage.rs Outdated Show resolved Hide resolved
omnipaxos/src/storage.rs Outdated Show resolved Hide resolved
omnipaxos/src/storage.rs Outdated Show resolved Hide resolved
omnipaxos/src/util.rs Outdated Show resolved Hide resolved
omnipaxos/src/lib.rs Outdated Show resolved Hide resolved
omnipaxos/src/sequence_paxos/follower.rs Outdated Show resolved Hide resolved
omnipaxos/src/sequence_paxos/follower.rs Outdated Show resolved Hide resolved
omnipaxos/src/storage.rs Outdated Show resolved Hide resolved
omnipaxos/tests/docs_integration.rs Outdated Show resolved Hide resolved
Copy link
Owner

@haraldng haraldng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overwrite instead of delete where possible in PersistentStorage.

omnipaxos_storage/src/persistent_storage.rs Outdated Show resolved Hide resolved
omnipaxos_storage/src/persistent_storage.rs Outdated Show resolved Hide resolved
omnipaxos_storage/src/persistent_storage.rs Outdated Show resolved Hide resolved
omnipaxos_storage/src/persistent_storage.rs Show resolved Hide resolved
omnipaxos_storage/src/persistent_storage.rs Outdated Show resolved Hide resolved
omnipaxos_storage/src/persistent_storage.rs Outdated Show resolved Hide resolved
@haraldng haraldng merged commit 45888af into haraldng:master Nov 14, 2023
6 checks passed
@haraldng haraldng mentioned this pull request Nov 14, 2023
3 tasks
@soruh
Copy link

soruh commented Dec 24, 2023

All indices are now of type usize instead of u64. This includes the indices used in the Storage trait, Message structs, and Omnipaxos public functions.

I couldn't find the reason for this anywhere. It seems unfortunate to effectively constrain omnipaxos to 64-bit systems.
Using 32-bit ids and assuming 150_000 ops/s (taken from the EuroSys'23 paper quoted in the readme). Would give approximately 8 hours until the 32-bit id is exhausted...

Am I missing something obvious / what is the reason for this change?

@haraldng
Copy link
Owner

The change is not performance-related but rather for API reasons, we wanted reading from the omnipaxos log to be similar to the Rust Vec API.

Since it is now usize, the user should be able to use it with any compatible system.

@soruh
Copy link

soruh commented Dec 27, 2023

That makes sense ergonomics wise, my concern would be that a system with a 32bit architecture and one with a 64bit architecture could not be part of the same cluster. Or, for that matter any 32 bit system could exhause the 32bit log index in reasonable time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Proposal: support truly atomic storage implementations Implement better trim functionality
3 participants