Verify explicitly synced data is recoverable #12152

ajkr · 2023-12-15T18:07:36Z

This issue is a next step mentioned in my blog post (https://rocksdb.org/blog/2022/10/05/lost-buffered-write-recovery.html):

An untested guarantee is that RocksDB recovers all writes that the user explicitly flushed out of the buffers lost in the crash. We may recover more writes than these due to internal flushing of buffers, but never less. Our test oracle needs to be further extended to track the lower bound on the sequence number that is expected to survive a crash.

The first step is to handle the single column family, WAL disabled case. We could use either seqno for tracking or a new write counter internal to the test oracle. We should update the expected minimum value of that counter for (1) explicit flushes, and (2) OnFlushCompleted() events. We should verify the recovery reaches at least the expected minimum value.

hx235 · 2023-12-18T23:33:11Z

In case it's not already obvious, for WAL disabled case, we will only consider with atomic_flush=true. For a flush, we do not need to verify recoverability of writes concurrent to that flush as there is no guarantee from RocksDB on whether that concurrent writes are included in that flush or not.

jacobxli-xx · 2023-12-21T03:25:48Z

hi, ajkr@ and hx235@.
Glad to take this task : )

ajkr · 2023-12-21T03:36:18Z

@jacobxli-xx Thank you, it is assigned now.

ajkr · 2023-12-26T16:25:40Z

@jacobxli-xx Here are some questions I had about what you are planning to do:

For explicit sync mechanisms, will the scope be limited to verifying writes that completed before the explicit sync are recovered?
For implicit sync mechanisms (i.e., automatic flush), will the scope be limited to verifying writes that are persisted according to OnFlushCompleted()?

If any answers are no, we'll probably want to know the reason and eventually the plan to make a guess of whether we'd be on board with it.

jacobxli-xx · 2024-01-05T00:11:24Z

Hey, Andrew.
The answer for the both questions are yes.
For both explicit flush and auto flush, we are going to limit the scope of verification to recover all writes till the maximum sequence number of write recorded before onFlushCompleted() (test cases limited to only 1 column family).

hx235 · 2024-01-18T23:25:19Z

@jacobxli-xx If you are considering storing and m-mapping map<std::string, uint64_t> cf_flushed_seqno, I wonder why can't we just store and mmap the maximum of all the seqnos in the map?

jacobxli-xx · 2024-01-23T01:18:11Z

@hx235 The m-mapping map<std::string, uint64_t> cf_flushed_seqno stores the cf name and the corresponding maximum flushed seqno of the specific cf. Why we do not only store the maximum of all the seqnos is because when we do explicit flush on specific cf, it's not guaranteed that all cfs will be flushed, so the maximum of all seqno is not guaranteed to be equal to the maximum seqno of the cf we flushed.

Although our test cases are limited to 1 cf, we use this map for forward compatibility as it can be used in multiple cfs cases.

ajkr · 2024-01-24T00:11:33Z

because when we do explicit flush on specific cf, it's not guaranteed that all cfs will be flushed, so the maximum of all seqno is not guaranteed to be equal to the maximum seqno of the cf we flushed.

It is guaranteed in the case where it is needed (atomic_flush=true) -

rocksdb/db_stress_tool/db_stress_test_base.cc

Lines 2266 to 2268 in 3ef9092

 if (FLAGS_atomic_flush) { 

 return db_->Flush(flush_opts, column_families_); 

 }

It is not guaranteed when WAL is enabled and atomic_flush is disabled, but there it is not needed because WAL ensures consistent recovery across column families.

WAL-disabled, atomic_flush-disabled may be worth testing at some future point but that is still an open topic to discuss. My thoughts are here - #11841.

Another case that may be worth testing at some point is WAL-disabled, atomic_flush-enabled, and manual flush triggered on a subset of column families (what you suggested we are already doing). Still, it's not something we decided to do yet, so adding extra complexity assuming we will do it feels premature

jacobxli-xx · 2024-02-22T15:50:55Z

The thoughts on WAL-disabled, atomic_flush-disabled cases in #11841 sounds reasonable to me. Using only one seqno should be fine. Will modify the implementation a bit accordingly.

jacobxli-xx · 2024-04-01T19:01:08Z

Hi, an update on the status. Planning on sending out the PR by Apr 10th.

ajkr added enhancement up-for-grabs Up for grabs labels Dec 15, 2023

ajkr removed the up-for-grabs Up for grabs label Dec 21, 2023

ajkr assigned ajkr and unassigned ajkr Dec 21, 2023

ajkr assigned jacobxli-xx Dec 21, 2023

hx235 self-assigned this Jun 21, 2024

hx235 mentioned this issue Jun 21, 2024

Verify flushed data are recovered upon reopen in crash test #12787

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Verify explicitly synced data is recoverable #12152

Verify explicitly synced data is recoverable #12152

ajkr commented Dec 15, 2023 •

edited

Loading

hx235 commented Dec 18, 2023

jacobxli-xx commented Dec 21, 2023

ajkr commented Dec 21, 2023

ajkr commented Dec 26, 2023 •

edited

Loading

jacobxli-xx commented Jan 5, 2024 •

edited

Loading

hx235 commented Jan 18, 2024 •

edited

Loading

jacobxli-xx commented Jan 23, 2024

ajkr commented Jan 24, 2024

jacobxli-xx commented Feb 22, 2024 •

edited

Loading

jacobxli-xx commented Apr 1, 2024

Verify explicitly synced data is recoverable #12152

Verify explicitly synced data is recoverable #12152

Comments

ajkr commented Dec 15, 2023 • edited Loading

hx235 commented Dec 18, 2023

jacobxli-xx commented Dec 21, 2023

ajkr commented Dec 21, 2023

ajkr commented Dec 26, 2023 • edited Loading

jacobxli-xx commented Jan 5, 2024 • edited Loading

hx235 commented Jan 18, 2024 • edited Loading

jacobxli-xx commented Jan 23, 2024

ajkr commented Jan 24, 2024

jacobxli-xx commented Feb 22, 2024 • edited Loading

jacobxli-xx commented Apr 1, 2024

ajkr commented Dec 15, 2023 •

edited

Loading

ajkr commented Dec 26, 2023 •

edited

Loading

jacobxli-xx commented Jan 5, 2024 •

edited

Loading

hx235 commented Jan 18, 2024 •

edited

Loading

jacobxli-xx commented Feb 22, 2024 •

edited

Loading