Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CompactRange with option BottommostLevelCompaction::kForceOptimized doesn't work correctly in route split situation. Rocksdb v6.29.5 #12702

Open
coconutn256 opened this issue May 27, 2024 · 2 comments

Comments

@coconutn256
Copy link

coconutn256 commented May 27, 2024

Situation

We use rocksdb as storage engine for our kv distributed database. Compaction Range with BottommostLevelCompaction::kForceOptimized is used to clear data out of range after partition split.
e.g. A partition with range [0, 200) may be splitted in two with range [0, 100) and [100, 200).

Expected behavior

CompactRange should clear data for both splitted partitions.

Actual behavior

Partition with right range always succeeds, but another sometimes fails.

Steps to reproduce the behavior

  1. Write enough data with specific range(e.g. 2 times of cf max_compaction_bytes).
  2. Do manual compaction.
  3. Only write data in left part of the range.
  4. Do manual compaction. In bottommost level, only ssts in left part will be compacted.

Possible Reason

In CompactionPicker::CompactRange function, compaction with kForceOptimized will try to skip ssts with large fd number. However, in split situation, ssts in left part range are always compacted, but ssts in right part range are not. With enough data in rocksdb, inputs may only contains ssts in left part range (code 1), which are already compacted in compaction of above level. This causes the picker returns nullptr finally and terminated the compaction(code 2).
It seems that this function hasn't been modified in the latest version, covering_the_whole_range should be considered in code 2 before return nullptr. please check.

Compaction* CompactionPicker::CompactRange(...) {
...
// 1. resize inputs with limit
      if (input_level_total + output_level_total >= limit) {
        covering_the_whole_range = false;
        // still include the current file, so the compaction could be larger
        // than max_compaction_bytes, which is also to make sure the compaction
        // can make progress even `max_compaction_bytes` is small (e.g. smaller
        // than an SST file).
        inputs.files.resize(i + 1);
        break;
      }
...
// 2. all ssts are genereated by compaction, return nullptr
  // for BOTTOM LEVEL compaction only, use max_file_num_to_ignore to filter out
  // files that are created during the current compaction.
  if (compact_range_options.bottommost_level_compaction ==
          BottommostLevelCompaction::kForceOptimized &&
      max_file_num_to_ignore != port::kMaxUint64) {
    assert(input_level == output_level);
    // inputs_shrunk holds a continuous subset of input files which were all
    // created before the current manual compaction
    std::vector<FileMetaData*> inputs_shrunk;
    size_t skip_input_index = inputs.size();
    for (size_t i = 0; i < inputs.size(); ++i) {
      if (inputs[i]->fd.GetNumber() < max_file_num_to_ignore) {
        inputs_shrunk.push_back(inputs[i]);
      } else if (!inputs_shrunk.empty()) {
        // inputs[i] was created during the current manual compaction and
        // need to be skipped
        skip_input_index = i;
        break;
      }
    }
    if (inputs_shrunk.empty()) {
      return nullptr;
    }
...
}
@wolfdan666
Copy link

Hello, I got the same error. Did you find a solution for this problem?

@coconutn256
Copy link
Author

Hello, I got the same error. Did you find a solution for this problem?

We simply use kForce. It works well but may cause heavier write amplification.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants