-
Notifications
You must be signed in to change notification settings - Fork 6.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow more intra-L0 compaction when L0 is small #12214
Conversation
I happened to notice #10865 - are these related? |
Yes, this is following #10865 (comment) and will solve this same problem. |
@cbi42 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
include/rocksdb/advanced_options.h
Outdated
// For leveled compaction, when total L0 size is no more than this threshold, | ||
// RocksDB will attempt intra-L0 compaction first before trying to compact | ||
// L0 files down. This is used to reduce write amplification when L0 size | ||
// can be very small. | ||
// | ||
// Note that this option does not affect the intra-L0 compaction that happens | ||
// when there are too many L0 files. | ||
// | ||
// UINT64_MAX - 1 (0xfffffffffffffffe) is a special flag to allow RocksDB | ||
// to pick default. | ||
// 0 means disabled. | ||
// | ||
// Default: | ||
// when atomic_flush = 0: 0 | ||
// when atomic_flush=1: | ||
// max_bytes_for_level_base / max_bytes_for_level_multiplier | ||
uint64_t intra_l0_compaction_size = 0xfffffffffffffffe; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if we need a new option. Maybe.
What if the threshold is determined based on the actual size(Lbase)? For example, size(L0) < size(Lbase) / max_bytes_for_level_multiplier. Maybe scale down the limit even more. The intuition is we don't want to run L0->Lbase compactions that are very inefficient for write-amp. Does that approach let unit tests pass?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. Updated to remove the option and use size(Lbase) / max(10, max_bytes_for_level_multiplier)
as the threshold. It turns out that no unit test change is needed.
@cbi42 has updated the pull request. You must reimport the pull request before landing. |
@cbi42 has updated the pull request. You must reimport the pull request before landing. |
I also added some db_bench result to the PR summary. |
@cbi42 has updated the pull request. You must reimport the pull request before landing. |
@cbi42 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Summary: introduce a new option `intra_l0_compaction_size` to allow more intra-L0 compaction when total L0 size is under a threshold. This option applies only to leveled compaction. It is enabled by default and set to `max_bytes_for_level_base / max_bytes_for_level_multiplier` only for atomic_flush users. When atomic_flush=true, it is more likely that some CF's total L0 size is small when it's eligible for compaction. This option aims to reduce write amplification in this case. Pull Request resolved: facebook/rocksdb#12214 Test Plan: - new unit test - benchmark: ``` TEST_TMPDIR=/dev/shm ./db_bench --benchmarks=fillrandom --write_buffer_size=51200 --max_bytes_for_level_base=5242880 --level0_file_num_compaction_trigger=4 --statistics=1 main: fillrandom : 234.499 micros/op 4264 ops/sec 234.499 seconds 1000000 operations; 0.5 MB/s rocksdb.compact.read.bytes COUNT : 1490756235 rocksdb.compact.write.bytes COUNT : 1469056734 rocksdb.flush.write.bytes COUNT : 71099011 branch: fillrandom : 128.494 micros/op 7782 ops/sec 128.494 seconds 1000000 operations; 0.9 MB/s rocksdb.compact.read.bytes COUNT : 807474156 rocksdb.compact.write.bytes COUNT : 781977610 rocksdb.flush.write.bytes COUNT : 71098785 ``` Reviewed By: ajkr Differential Revision: D52637771 Pulled By: cbi42 fbshipit-source-id: 4f2c7925d0c3a718635c948ea0d4981ed9fabec3 (cherry picked from commit 4b684e96b71cd6f1d50e29fae8b55c323ccd2869)
Summary: introduce a new option
intra_l0_compaction_size
to allow more intra-L0 compaction when total L0 size is under a threshold. This option applies only to leveled compaction. It is enabled by default and set tomax_bytes_for_level_base / max_bytes_for_level_multiplier
only for atomic_flush users. When atomic_flush=true, it is more likely that some CF's total L0 size is small when it's eligible for compaction. This option aims to reduce write amplification in this case.Test plan: