-
Notifications
You must be signed in to change notification settings - Fork 6.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache the mapping of sequence to log block index in transaction log iterator #12538
base: main
Are you sure you want to change the base?
Conversation
f26e11d
to
de39db5
Compare
ae84b5c
to
d1ad917
Compare
@ajkr @jowlyzhang hi, could you please review my code when you have time? Any comments or suggestions are welcome ~ reusing the iterator may be a workaround, but it may be weird and undefined to use a iterator after it is invalid. Additionally, the comments for the Next() method state that it requires Valid() to be true. |
d1ad917
to
06ab2da
Compare
We have also observed that WAL tailing using consecutive calls to |
ReportCorruption(fragment.size(), | ||
"missing start of fragmented record(2)"); | ||
if (!skipped_) { | ||
ReportCorruption(fragment.size(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can also clear skipped_ here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, thx for review
@@ -330,6 +330,26 @@ TEST_F(DBTestXactLogIterator, TransactionLogIteratorBlobs) { | |||
"Delete(0, key2)", | |||
handler.seen); | |||
} | |||
|
|||
TEST_F(DBTestXactLogIterator, TransactionIteratorCache) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test would pass before this change too. Maybe check that cache is effective by check some statistics on IO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update it, thx for review
db/log_reader.cc
Outdated
@@ -130,13 +132,16 @@ bool Reader::ReadRecord(Slice* record, std::string* scratch, | |||
prospective_record_offset = physical_record_offset; | |||
scratch->assign(fragment.data(), fragment.size()); | |||
in_fragmented_record = true; | |||
skipped_ = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to also clear after kUserDefinedTimestampSizeType
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, thx for review
db/log_reader.h
Outdated
@@ -170,6 +185,10 @@ class Reader { | |||
// is only for WAL logs. | |||
UnorderedMap<uint32_t, size_t> recorded_cf_to_ts_sz_; | |||
|
|||
// if log reader is skipped, may need to drop bytes | |||
// until seek to the first of a record | |||
bool skipped_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider changing the name to something like skip_first_fragmented_record_
since this field will be set to false after skipping a record.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done ✅ , thx for reveiw
include/rocksdb/transaction_log.h
Outdated
// if true, the mapping of db sequence to WAL file block index will be | ||
// cached. This prevents repeated reading from the beginning of the | ||
// target wal log when GetUpdatesSince() is called. | ||
// Default: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not enable this by default since this won't work with WAL compression, where the first record sets the compression type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done , thx for reveiw~
and now will return not support err when wal compression and with_cache are both set
if (read_options_.with_cache_) { | ||
// cache the mapping of sequence to log block index when seeking to the | ||
// start or end sequence | ||
if ((current_batch_seq_ <= starting_sequence_number_ && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the use case to cache for the start sequence?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When there are multiple slaves catching up, the start sequence may not necessarily be incremental; Also, there may be scenarios that do not iterate to the end sequence but calling GetUpdateSince frequently. Cache at the beginning and end sounds reasonable for me
uint64_t block_index; | ||
SeqWithFileBlockIdx(uint64_t log_num, uint64_t seq_num, uint64_t block_idx) | ||
: log_number(log_num), seq_number(seq_num), block_index(block_idx){}; | ||
bool operator<(const SeqWithFileBlockIdx& other) const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would block_index be different with the same log_number and seq_number?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There shouldn't be such a situation, but retaining the judgment for block_idx seems no problem
update: remove the judgement and add assertion to ensure it's same.
db/transaction_log_impl.h
Outdated
std::lock_guard<std::mutex> lk{mutex_}; | ||
if (cache_.size() > size_) { | ||
auto iter = cache_.begin(); | ||
std::advance(iter, rand_.Next() % cache_.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about always erasing the oldest one? I assume GetUpdatesSince() is called with likely increasing sequence numbers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
has changed to replace the oldest one. thx for the advice
0f276f3
to
89231e3
Compare
a382537
to
0da6a66
Compare
0da6a66
to
ec7a7e4
Compare
Context/Summary:
The official documentation recommends using the GetUpdatesSince method for master-slave synchronization. Every time this method called, it will search for the start sequence from the beginning of the target WAL log. It may costs too much to seek to start sequence when wal file size is large. If high requirements are placed on master-slave delay, SeekToStartSequence() may be called very frequently and cause a large amount of read IO (although in most cases read page cache).
There are some issues discussing about it:
#12516
#889
In most scenarios, the start sequences of the GetUpdatesSince method are continuous. Caching the previously seek position can avoid most repeated readings of the wal file.
This commit will cache the mapping of sequence to WAL file block index. It does this both after seeking to the start sequence of the iterator and after iterating to the end of the iterator. When seeking to start sequence, the log reader will lookup the first batch sequence that is not larger than the target sequence and then skip wal file read pointer to the start of the block, instead of seeking from the start.
The optimization effect on the getupdatesSince interface is quite obvious
Test:
covered by ./db_log_iter_test