-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Atomic storage API #121
Atomic storage API #121
Conversation
… to write batches. Removes batch_accept feature flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some initial comments. I'll do a more detailed review after these comments are fixed.
For next time, please don't do refactorings before the actual feature is approved. E.g., here the changes for atomic storage wouldn't have been bloated with the changes for replacing u64 with usize (which is trivial).
…tries on Decide messages.
After our discussion yesterday, I looked at the current code to see how we can keep as much as possible of the current batching and at the same time introduce transaction for atomic storage. I think what we need is a struct // transaction object for InternalStorage
struct InternalStorageTxn {
state_ops: Vec<StateOp>, // StateOp is an enum for modifying accepted_round, decided_idx, etc.
append_op: AppendOp, // AppendOp is an enum for append, append_on_prefix, etc.
snapshot: Option<(usize, S)> // snapshot index and snapshot
}
// handle_acceptsync() in SequencePaxos
let state_ops = vec![StorageOp::AcceptedRound(4), StorageOp::DecidedIdx(10)];
let append_op = AppendOp::AppendOnPrefix(accsync.sync_idx, accsync.suffix);
let internal_txn = InternalStorageTxn {state_ops, append_ops, ..};
self.internal_storage.do_txn(internal_txn);
// do_txn(it: InternalStorageTxn) in InternalStorage
let append_res = match it.append_op {
Append(entries, do_batching) if do_batching => self.state_cache.append_entries(entries),
Append(entries, do_batching) if !do_batching => self.state_cache.append_entries_without_batching(entries),
// ... AppendOnPrefix etc.
};
let mut txn = StorageTxn::new(); // transaction object for Storage
txn.insert(it.state_ops);
if let Some(entries_to_be_flushed) = append_res {
txn.insert(StorageOp::Append(entries_to_be_flushed));
}
if let Some((idx, snapshot)) = it.snapshot {
txn.insert(StorageOp::Snapshot(idx, snapshot));
}
self.state_cache.real_log_len = self.storage.flush(txn)?;
Ok(self.get_accepted_idx()) |
…yncing, trimming, and snapshotting.
InternalStorage. Hides unicache types in structs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job! The design looks really good and seems to have made things a lot cleaner. The comments are mostly related to the code structure. Please fix them all or comment on places you have other ideas.
// Flush any pending writes | ||
self.internal_storage.flush_batch().expect(WRITE_ERROR_MSG); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is nothing wrong with this, but it could be optimized.
If we in general only send Accepted after flushing then actually this is not strictly necessary. (any chosen entry must be flushed among a majority and if we haven't flushed it, then we will get it in the AcceptSync).
To prevent flushing stuff that will anyway get overwritten, we should perhaps do some checks here first. The most basic check is if the leader is more updated than us, we don't flush. There are probably more sophisticated checks we can do. For now I think we should do this basic check and leave other optimizations for the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On a second thought, let's leave this as it is and we can create an issue and fix it in the future. So we don't add more complexity to this PR which already has many changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly just add more comments/docs. Can now go ahead and refactor the storage file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overwrite instead of delete where possible in PersistentStorage.
I couldn't find the reason for this anywhere. It seems unfortunate to effectively constrain omnipaxos to 64-bit systems. Am I missing something obvious / what is the reason for this change? |
The change is not performance-related but rather for API reasons, we wanted reading from the omnipaxos log to be similar to the Rust Vec API. Since it is now usize, the user should be able to use it with any compatible system. |
That makes sense ergonomics wise, my concern would be that a system with a 32bit architecture and one with a 64bit architecture could not be part of the same cluster. Or, for that matter any 32 bit system could exhause the 32bit log index in reasonable time. |
Please make sure these boxes are checked, before submitting a new PR.
./check.sh
with no errorsIssues
Fix #97, Fix #49
Breaking Changes
Storage
trait has been reworked to facilitate atomic writes and allow for arbitrary crash-recovery. The indices passed to these functions now refer to the indices of the entry log as if no trimming or snapshotting has happened.PersistentStorage
and its config structs have been reworked to use RocksDB and implement the newStorage
trait. The dependence on commitlog is therefore gone, addressing issue Implement better trim functionality #49.usize
instead ofu64
. This includes the indices used in theStorage
trait,Message
structs, andOmnipaxos
public functions.Storage::get_promise
implementations in theomnipaxos_storage
module now returnNone
instead of the defaultBallot
when there is no promise ballot saved in storage.batch_accept
feature flag has been removed and is now the default behavior.Promise
,AcceptSync
, andAcceptDecide
message struct fields have been changed.Other Changes
MemoryStorage
now implements the newStorage
trait but its implementation ofwrite_atomically
does not guarantee atomicity (acceptable since memory can't persist state through crash-recovery).