Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stats for redundant insertions into block cache #6681

Closed

Conversation

pdillinger
Copy link
Contributor

@pdillinger pdillinger commented Apr 9, 2020

Summary: Since read threads do not coordinate on loading data into block
cache, two threads between Lookup and Insert can end up loading and
inserting the same data. This is particularly concerning with
cache_index_and_filter_blocks since those are hot and more likely to
be race targets if ejected from (or not pre-populated in) the cache.

Particularly with moves toward disaggregated / network storage, the cost
of redundant retrieval might be high, and we should at least have some
hard statistics from which we can estimate impact.

Example with full filter thrashing "cliff":

$ ./db_bench --benchmarks=fillrandom --num=15000000 --cache_index_and_filter_blocks -bloom_bits=10
...
$ ./db_bench --db=/tmp/rocksdbtest-172704/dbbench --use_existing_db --benchmarks=readrandom,stats --num=200000 --cache_index_and_filter_blocks --cache_size=$((130 * 1024 * 1024)) --bloom_bits=10 --threads=16 -statistics 2>&1 | egrep '^rocksdb.block.cache.(.*add|.*redundant)' | grep -v compress | sort
rocksdb.block.cache.add COUNT : 14181
rocksdb.block.cache.add.failures COUNT : 0
rocksdb.block.cache.add.redundant COUNT : 476
rocksdb.block.cache.data.add COUNT : 12749
rocksdb.block.cache.data.add.redundant COUNT : 18
rocksdb.block.cache.filter.add COUNT : 1003
rocksdb.block.cache.filter.add.redundant COUNT : 217
rocksdb.block.cache.index.add COUNT : 429
rocksdb.block.cache.index.add.redundant COUNT : 241
$ ./db_bench --db=/tmp/rocksdbtest-172704/dbbench --use_existing_db --benchmarks=readrandom,stats --num=200000 --cache_index_and_filter_blocks --cache_size=$((120 * 1024 * 1024)) --bloom_bits=10 --threads=16 -statistics 2>&1 | egrep '^rocksdb.block.cache.(.*add|.*redundant)' | grep -v compress | sort
rocksdb.block.cache.add COUNT : 1182223
rocksdb.block.cache.add.failures COUNT : 0
rocksdb.block.cache.add.redundant COUNT : 302728
rocksdb.block.cache.data.add COUNT : 31425
rocksdb.block.cache.data.add.redundant COUNT : 12
rocksdb.block.cache.filter.add COUNT : 795455
rocksdb.block.cache.filter.add.redundant COUNT : 130238
rocksdb.block.cache.index.add COUNT : 355343
rocksdb.block.cache.index.add.redundant COUNT : 172478

Test Plan: Some manual testing (above) and unit test covering key metrics is included

Summary: Since read threads do not coordinate on loading data into block
cache, two threads between Lookup and Insert can end up loading and
inserting the same data. This is particularly concerning with
cache_index_and_filter_blocks since those are hot and more likely to
be race targets if ejected from (or not pre-populated in) the cache.

Particularly with moves toward disaggregated / network storage, the cost
of redundant retrieval might be high, and we should at least have some
hard statistics from which we can estimate impact.

TODO (if this change is endorsed): unit tests
Maybe: update Java for backlog of statistics

Test Plan: TODO
Copy link
Contributor

@ltamasi ltamasi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be really useful to track this; however, I don't think we should change the semantics of the existing counters in GetContext / Statistics since people are monitoring those and would see a significant drop once this is rolled out. So instead of breaking down the existing ADD into two disjoint categories ADD and REPLACE, I think it would be better to introduce a subcategory of ADD called... what? ADD_OVERWRITE ? ADD_REPLACE ? ... and increment ADD unconditionally upon cache insertion (like now) and the new subcategory only if the insertion was redundant.

P.S. I'm also wondering about the use of the word redundant; it makes sense in the context of the block cache since there cached objects are supposed to be immutable (although historically this wasn't really true, see e.g. #5719) but from the perspective of the cache classes themselves it's more of an overwrite/replace...

@pdillinger
Copy link
Contributor Author

Makes sense, though it probably means extending the Cache class to expose replace vs. add behavior of Insert (not in the status). And perhaps it makes sense for correctness to make Cache able to check (opt-in, debug build only) that a replacement contains the same data. However, this is not totally possible with the existing interface, because 'charge' is not required to be the length of 'value'.

Thoughts? @ltamasi

@ltamasi
Copy link
Contributor

ltamasi commented Apr 13, 2020

I think the Status change is actually backwards compatible-ish; both s.ok() and s == Status::OK() would still work (since operator== only checks code_ but not subcode_). Only paranoid checks that explicitly check subcode would break, which I think is fine.

So I think we're good as long as we don't change the meaning of the existing stats (adding new ones for the replace case is perfectly fine).

@pdillinger pdillinger changed the title WIP/RFC: stats for redundant insertions into block cache Stats for redundant insertions into block cache Apr 17, 2020
Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pdillinger has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@pdillinger
Copy link
Contributor Author

Vexing failure in https://travis-ci.org/github/facebook/rocksdb/jobs/677335996 that does not repeat locally, with ASAN, with Clang, with Clang+ASAN. And for some reason UBSAN is not compiling for me.

@pdillinger
Copy link
Contributor Author

And for some reason UBSAN is not compiling for me.

Working now (must use environment variables not make variables 😡). No error with UBSAN nor TSAN.

@facebook-github-bot
Copy link
Contributor

@pdillinger has updated the pull request. Re-import the pull request

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pdillinger has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@pdillinger
Copy link
Contributor Author

(must use environment variables not make variables 😡).

Also applies to ROCKSDB_NO_FBCODE, which quietly interfered with reproducer. Fix in #6740

cache/clock_cache.cc Show resolved Hide resolved
Comment on lines 162 to 182
namespace {
// To access protected Status API
class InsertStatusOkOverwritten : public Status {
public:
InsertStatusOkOverwritten()
: Status(Status::kOk, static_cast<Status::SubCode>(1)) {
// Confirm compatible with normal OK checks
assert(ok());
assert(*this == Status::OK());
}
};
} // anonymous namespace

const Status ShardedCache::kInsertStatusOkOverwritten =
InsertStatusOkOverwritten();

bool ShardedCache::IsInsertStatusOkOverwritten(const Status& s) {
return s.code() == kInsertStatusOkOverwritten.code() &&
s.subcode() == kInsertStatusOkOverwritten.subcode();
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, any reason the new subcode can't go into the existing Status class? We could simply add e.g. kOverwritten there and add a default param value to the existing static Status OK() factory method.

Yes, yes, default parameter values are not nice but I think it's the lesser evil in this case (besides, factory methods for the other codes already have them).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel like this extension to Status is seasoned enough to bake into the public API (forever). In fact, one could argue that using variants of OK status for logic is an anti-pattern. We're doing it here because (a) we're wedging something into an existing public API, and (b) only using it for stats gathering.

So I think leaving the Status public API as it is the the lesser of evils for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that's an anti-pattern, I would argue this is a bigger one. :) It introduces the same subtype of OK and breaks the code_ == kOk => subcode_ == kNone invariant but does it in a sneakier way (it's not obvious to someone reading the code of Status that the invariant no longer holds). It also hijacks the pre-existing enum value 1 in the process, so now kInsertStatusOkOverwritten.subcode() == kMutexTimeout.

From a C++ perspective, there are a couple of issues (granted, these are somewhat theoretical; I'm not saying the current code would blow up in practice):

  • Status is not meant to be a base class. It does not have a virtual destructor, so technically speaking Status* status = new InsertStatusOkOverwritten; delete status; is undefined behavior.
  • Assignments like const Status ShardedCache::kInsertStatusOkOverwritten = InsertStatusOkOverwritten(); are susceptible to the slicing problem. (If kInsertStatusOkOverwritten had members of its own, they would be sliced off here.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more thing regarding the public API: I feel that the Cache interface is special in the sense that its intended client is the RocksDB library itself, not the application. So as long as we change the API and the RocksDB code that uses it in lockstep, we're good. At least that's the theory. Obviously we can't stop people from taking Cache and using it for their own nefarious purposes. :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that's an anti-pattern, I would argue this is a bigger one. :)

I still disagree. Any change to public APIs puts a 1000x multiplier on the impact of any hygienic evil involved. Status is already a little confusing in its == semantics. Elaborating Ok Status with variants would extend the confusion.

My hack is extremely localized. You haven't convinced me otherwise. In fact, you called potential direct users of Cache "nefarious".

it's not obvious to someone reading the code of Status that the invariant no longer holds

I'm just not worried about someone relying on that invariant (very unlikely, odd, and not future-proof) and also being a direct user of Cache::Lookup.

I feel that the Cache interface is special in the sense that its intended client is the RocksDB library itself

Exposing a shared_ptr in BlockBasedTableOptions is pretty inviting to custom implementations, IMHO. If you think we should codify your intention, we can move all functions from Cache to a new, non-public CacheImpl (extending Cache) and use casts (like implementations of Cache::Handle). I'm not opposed to that if we don't expect an uprising.

I'm not against breaking the interface, but I would like to do that in big steps rather than little steps. This API change is a little step that's possible to avoid with a hack. The hack would obviously be removed when we make a "big step" breaking change to the interface. (I would not be surprised if we change/extend the interface to support optimizations.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that's an anti-pattern, I would argue this is a bigger one. :)

I still disagree. Any change to public APIs puts a 1000x multiplier on the impact of any hygienic evil involved. Status is already a little confusing in its == semantics. Elaborating Ok Status with variants would extend the confusion.

Many codes already have subcodes; I don't see why OK can't have one. In fact, your patch also introduces an OK variant, albeit one that's pretending to be a "Timeout Acquiring Mutex". There is no difference in terms of backwards compatibility either: 1) both are safe except for explicit subcode checks, and 2) the whole issue of backwards compatibility is kind of moot since we control the (intended) client classes (e.g. BlockBasedTable) as well. TBH I don't see a difference between the two, other than your current one attempts to hide the new subcode and is more hackish.

I feel that the Cache interface is special in the sense that its intended client is the RocksDB library itself

Exposing a shared_ptr in BlockBasedTableOptions is pretty inviting to custom implementations, IMHO. If you think we should codify your intention, we can move all functions from Cache to a new, non-public CacheImpl (extending Cache) and use casts (like implementations of Cache::Handle). I'm not opposed to that if we don't expect an uprising.

I'm not against breaking the interface, but I would like to do that in big steps rather than little steps. This API change is a little step that's possible to avoid with a hack. The hack would obviously be removed when we make a "big step" breaking change to the interface. (I would not be surprised if we change/extend the interface to support optimizations.)

OK now I feel there's some confusion here. Yes, the Cache interface is public so people can provide their own cache implementations. What I meant is that the intended client (or caller) of any cache implementation (LRU, clock, or a fancy custom cache) is BlockBasedTable et al., which are under our control. Having said that, I'm not suggesting any change that would break the compilation for anybody, not even for folks who are using Cache for other purposes in their applications. Cache::Insert would still have the exact same signature and return Status; the only change is that Status that could take on an additional code+subcode combination.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P.S. If someone has a custom Cache implementation, they would have to tweak it so that Insert returns the new Status upon overwrite, or they don't get the new stats. Again, this is true regardless of how the new code+subcode combination is added.

db/db_test_util.h Outdated Show resolved Hide resolved
db/db_test_util.h Outdated Show resolved Hide resolved
@facebook-github-bot
Copy link
Contributor

@pdillinger has updated the pull request. Re-import the pull request

1 similar comment
@facebook-github-bot
Copy link
Contributor

@pdillinger has updated the pull request. Re-import the pull request

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pdillinger has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@pdillinger has updated the pull request. Re-import the pull request

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pdillinger has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@pdillinger merged this pull request in 249eff0.

1 similar comment
@facebook-github-bot
Copy link
Contributor

@pdillinger merged this pull request in 249eff0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants