-
Notifications
You must be signed in to change notification settings - Fork 6.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TableCache in highly concurrent query with limited max_open_files #6699
Comments
One way is to use similar concept Guava in Java uses: https://github.com/google/guava/wiki/StripedExplained And then lock using StripedLock by file name. It will block other threads from loading the same. We can do this also optional using Options. @siying , what do you think? Can I go ahead with change and PR? Or you do not like this approach at all. |
I think we want to at least have the option to close these performance race conditions. See e.g. #6681 for a similar issue in block cache. I prototyped a solution in pdillinger@513affa. The link you provided suggests (in my reading) that if file1 and file2 are in the same shard, opening file2 would not start until file1 is completed opening, which is bad. My prototype for the block cache has no such limitation because the lock is only held at beginning and completion of loading data. Other threads wanting the same data wait on a condition variable for the shard. Locks are never held during I/O and extraneous wake-ups should be small. |
I think to solve the problem of table cache specifically, a stripe lock in table cache to make sure one file is being loaded sounds fine to me, as long as the max_open_files path is not touched, I don't worry about it. |
I quickly looked at your solution and it seems very complex to me and will take long time to release (need a lot of testing). The idea of striped lock is super simple. You create N number of locks (1024) and then you simply take the lock on hash % 1024 location. You use it and do the job. The theoretical max concurrency is 1024 threads without context switch. If one hide the control behind one mutex or condition variable it suffers from context switch in case more threads meets. From my experience it stops working when there is big amount of threads. One hit on same lock basically cumulate and stop all. You can pick any she size of lock array to lower probability of hash clash between two threads. We are using this in many highly concurrent applications and it just works. Pseudo code:
This approach can be used even for keys, blocks, ... Again only the size defines theoretical concurrency. If there as a conflict time to time it is still only a portion of 1/size. Which can be overhead of 1% and less. |
My code uses the existing lock sharding of the LRUCache, usually 64 locks (see #6592, input welcome), which is essentially what you're referring to as "striped lock," but IMHO "striped lock" is a misnomer, because with hashing there's no better likelihood that "adjacent" files will map to a different lock vs. "distant" files, as one should expect in "striping." A meta-idea of my prototype was that the distinction between aggressive loading (by each thread on cache miss) vs. cooperative loading should be made in the Cache object, and Cache API users should be written so that both are possible. As seen in my changes to block_based_table_reader.cc, the added complexity for the API user is essentially an extra variable and two extra if's. Not much, though maybe that could be simplified more. Anyway, this allow for shared cache infrastructure, and per-use customization, without disjoint but related one-off implementations. The danger in adding simple code expected to become obsolete is configuration parameters that become obsolete. :-/ |
Yes LRU is sharded 6 bits. It is there to increase concurrency of LRU itself. I was trying to solve input to cache. That input in case it is complex needs to be evaluated once (like open file). I understand that your solution is basically working based on cooperation like:
This concept is similar to Guava LoadingCache. Client simply does "get" and if key is not there it starts loading. When load is done It updates cache and notify all blocked threads that have requested same key. Loading cache is passing "loader" during cache construction which is very nice and readable. Then one uses that as a "map" like (get, invalidate, ....). That is true and I agree with that. There is actually only 4 cases that uses cache now (if I am not wrong):
First one is I think most complex. "Striped lock" or "sharded loading" or "pick any" is I think most easiest way (10 lines of code) that simply do the job. Probability of hitting same lock is absolute minimum compared to the complexity of opening file. I do not count that we do not have 1000 cores today. The IO itself and CPU overhead right now is hundred times worse that stopping one thread in case of having same hash value out of 1000 possible ones. So I was talking about this flow: Ofcourse having something like LoadingCache would be nice, but when I was testing performance it was always way slower than simple ConcurrentMap, because of the complexity it needs to do. BTW striped lock has much more features (multiple key mutex without deadlock possibility). I have just used that as standard solution for this simple one. |
Lock held to span cache->Lookup and cache->Insert, OK. For a key application of ours, TableCache::FindTable is about 0.5% CPU, and almost none of that in ShardedCache::Lookup. That suggests modest needs for number of lock shards (in the steady state at least), and that the overhead of acquiring and releasing an uncontended lock should be negligible. |
…nt requests. Fixes (facebook#6699)
If anyone interested, I have posted LRU concurrent performance analysis here: https://groups.google.com/forum/#!topic/rocksdb/1_uo3Pr6DiE |
Fixed with #6707? |
Yes seems fixed to me. Havent's seen the issue so far. |
Expected behavior
TableCache::FindTable should not allow to load two same files at the same time.
Actual behavior
TableCache::FindTable is loading same files many times in case there is concurrent query hitting same table that needs to be used. Because cache is limited in size to max_open_files it pushes out already loaded tables, because these are duplicated (other thread are loading the same tables).
It cause snowball effect that slow down queries to absolute minimum, because it is opening files again and again. My scenario:
To mitigate this issue I used max_open_files = -1 which is not what I want to use.
Actual metric NO_FILES_OPENS. After some time when load lowers little bit it drops. Entries are not push out of table Cache that fast so it will catch up (it is less than max_open_files ...). Then it is stable, because it will always fit in cache (300 < 500).
Steps to reproduce the behavior
I wrote simple test, but it is visible even from source code there is no protection against parallel load.
NO_FILE_OPENS(notParallelTest=true): 5
NO_FILE_OPENS(notParallelTest=false): 10
The text was updated successfully, but these errors were encountered: