Implementation of Local Cache Algorithm Based on ARC #36376

KinderRiven · 2022-04-18T06:35:39Z

Changelog category (leave one):

Improvement

Changelog entry:

Compared with the LRU algorithm, the local cache algorithm based on ARC can effectively avoid the problem of cache pool pollution caused by one-time large-scale scanning.
If that commit is accepted, we can support choosing a different local cache algorithm in the config file in subsequent commits.

alesapin · 2022-04-18T09:55:33Z

cc @kssenii

KinderRiven · 2022-04-19T03:45:12Z

Something went wrong in testing, but I didn't find out what caused the problem when I submitted the code. Can you help me see what's causing the problem? Or maybe it's due to the test system itself？Thank you ! @kssenii

KinderRiven · 2022-04-19T04:41:40Z

The following figure is the comparison effect of the ARC and LRU algorithms in ClickHouse. We run the Q1.1 request of the SSB for four times (it is a small request to simulate the user's small-scale intensive access), and then we run the Q2.1 of the SSB for once (it is a big request, it may scan the whole table, but it will not occur often in real scenarios). After that we run Q1.1 for 4 times .
The results are shown in the figure, when Q1.1 is executed after the execution of Q2.1, the system based on the LRU algorithm almost replaces the data of the entire buffer pool (this may not be necessary, because large-scale loads are not common), and then executing Q1.1 after that will cause a significant latency rise. The system based on the ARC algorithm does not have the above problems, because it will place frequently accessed cache data in a higher-level queue to avoid pollution.

KinderRiven · 2022-04-19T04:49:11Z

We can specify the optional cahce algorithm in the configuration file to manage the local cache. (The default setting is LRU).

kssenii · 2022-04-19T09:27:02Z

Something went wrong in testing, but I didn't find out what caused the problem when I submitted the code. Can you help me see what's causing the problem? Or maybe it's due to the test system itself？Thank you ! @kssenii

It was a temporary issue with our CI, now should be ok, I just re-run all checks, now they should start properly.

kssenii · 2022-04-20T09:36:23Z

Let's temporarily turn on your cache algorithm for our CI. Go to ClickHouse/tests/config/config.d/storage_conf.xml and change disk config there. Then it will be applied for some stateless and stateful tests and also for a checks named with tag s3 storage.

KinderRiven · 2022-04-20T12:00:04Z

Let's temporarily turn on your cache algorithm for our CI. Go to ClickHouse/tests/config/config.d/storage_conf.xml and change disk config there. Then it will be applied for some stateless and stateful tests and also for a checks named with tag s3 storage.

I may have found a system error caused by assert in the local cache. I need to fix it first and then start subsequent testing.
PR

KinderRiven · 2022-04-21T04:07:39Z

Let's temporarily turn on your cache algorithm for our CI. Go to ClickHouse/tests/config/config.d/storage_conf.xml and change disk config there. Then it will be applied for some stateless and stateful tests and also for a checks named with tag s3 storage.

There was a failure in executing 02240_system_remote_filesystem_cache and 02241_remote_filesystem_cache_on_insert , but after my investigation, I found that it may not be due to the addition of arc algorithm, because I adopted the latest official master branch, which still exists after local retest. I guess it may be that the comparison information is not updated in time after updating the relevant code of local cache. @kssenii

KinderRiven · 2022-04-22T06:23:31Z

After using the official master code recomit to test LRU, there will still be the problem of test failure. I think it is really the problem of CI.

tests/queries/0_stateless/02241_remote_filesystem_cache_on_insert.reference

src/Common/FileCacheFactory.cpp

tests/config/config.d/storage_conf.xml

src/Common/ARCFileCache.h

src/Common/ARCFileCache.cpp

KinderRiven · 2022-04-28T06:11:11Z

Stateless tests flaky check (address, actions) — Timeout, fail: 0, passed: 150
The reason for the above problems is that we have added additional arc tests, which may lead to insufficient reservation time (1800 seconds at most).

kssenii · 2022-05-06T14:56:53Z

src/Common/ARCFileCache.cpp

+ return cell.hit_count >= move_threshold;
+}
+
+bool ARCFileCache::tryMoveLowToHigh(const FileSegmentCell & cell, std::lock_guard<std::mutex> & cache_lock)


I checked https://www.usenix.org/legacy/events/fast03/tech/full_papers/megiddo/megiddo.pdf and https://dbs.uni-leipzig.de/file/ARC.pdf and it does not really look like ARC described there. To what article did you refer?

Sorry for not recovering in time as I have some trivia in my reality. It looks like this, and I will optimize according to the paper.

KinderRiven · 2022-05-12T06:00:44Z

@kssenii I looked at some cache algorithm papers and our implementation is probably closer (LRU-K, https://dl.acm.org/doi/pdf/10.1145/170036.170081). The ARC algorithm may not be very suitable for the current situation, because it requires that the size of each cache page is fixed (However, the size of the file segment is not fixed in the implementation of the local cache). We can first implement an LRU-K algorithm (which is still valid for access loads with flushing) and then consider other algorithms (such as the improved ARC algorithm), what do you think?

kssenii · 2022-05-23T10:04:19Z

Sorry for not replying for some time.

We can first implement an LRU-K algorithm (which is still valid for access loads with flushing) and then consider other algorithms (such as the improved ARC algorithm), what do you think?

I agree 👍🏻. I'll read the paper you mentioned and review once again.

src/Common/ARCFileCache.cpp

KinderRiven · 2022-06-10T16:19:56Z

sorry for the late reply. I may have to close this PR for the following reasons：
[1] Avoiding cache pollution has been addressed in my previous PR1 and PR2. For example, avoiding excessive downloading is actually similar to the implementation of LRU-K.
[2] The implementation of the current algorithm and the cache function is not decoupled, so too much redundancy may need to be introduced when implementing the cache algorithm. For example, currently we only support set query cache size and avoid excessive download in LRUFileCache. Implementing a new caching algorithm may require implementing these functions again, which I think is obviously not good.
[3] In the future, if the caching algorithm can be decoupled from the caching function, I will introduce some new caching algorithms, and I will also consider how to achieve the above goals.

alexey-milovidov · 2022-06-12T02:11:08Z

@KinderRiven there is a work on decoupling caching strategy here: #34651

robot-ch-test-poll added the pr-improvement Pull request with some product improvements label Apr 18, 2022

KinderRiven mentioned this pull request Apr 18, 2022

More local cache algorithm implementations #36380

Closed

alesapin added the can be tested Allows running workflows for external contributors label Apr 18, 2022

kssenii self-assigned this Apr 18, 2022

KinderRiven force-pushed the local_cache_with_arc_v1 branch from f6adee9 to e31ab5e Compare April 19, 2022 07:27

KinderRiven force-pushed the local_cache_with_arc_v1 branch from 615eaf2 to 61a5606 Compare April 20, 2022 18:35

kssenii reviewed Apr 22, 2022

View reviewed changes

tests/queries/0_stateless/02241_remote_filesystem_cache_on_insert.reference Outdated Show resolved Hide resolved

KinderRiven force-pushed the local_cache_with_arc_v1 branch from e3ec259 to be1288b Compare April 22, 2022 16:06

KinderRiven requested a review from kssenii April 24, 2022 14:25

kssenii reviewed Apr 25, 2022

View reviewed changes

src/Common/FileCacheFactory.cpp Outdated Show resolved Hide resolved

tests/config/config.d/storage_conf.xml Outdated Show resolved Hide resolved

src/Common/ARCFileCache.h Outdated Show resolved Hide resolved

src/Common/ARCFileCache.cpp Show resolved Hide resolved

KinderRiven force-pushed the local_cache_with_arc_v1 branch from a1c10bd to 60ce8f1 Compare April 26, 2022 11:22

KinderRiven mentioned this pull request Apr 27, 2022

Fix deadlock for local cache #36697

Merged

KinderRiven force-pushed the local_cache_with_arc_v1 branch from 499afd2 to 7db7495 Compare April 27, 2022 16:26

KinderRiven requested a review from kssenii April 28, 2022 06:09

KinderRiven force-pushed the local_cache_with_arc_v1 branch from 3b28a7e to 1cb1c72 Compare April 28, 2022 06:49

KinderRiven mentioned this pull request Apr 28, 2022

fix bug for local cache #36737

Merged

KinderRiven force-pushed the local_cache_with_arc_v1 branch 3 times, most recently from b72f4c2 to 9f9f14b Compare May 1, 2022 12:36

kssenii force-pushed the local_cache_with_arc_v1 branch from c78cf55 to 1e63b59 Compare May 2, 2022 19:09

KinderRiven and others added 20 commits May 5, 2022 00:29

add unit_test for arc

9eff28f

update storage_conf.xml for arc

d87b92b

fix

ff47fba

fix remove

f3c52a7

fix

88a2486

fix

10a76f2

fix unit_test bug

b510fc4

add point for unit test

d7f46ca

fix unit test

0f6b6fa

fix unit test

14eb035

fix unit test

08d3cd1

fix

6c57f78

add tests

966d506

Add tests runner for fs cache policies

51e4892

add test

816b1bb

fix

3591e55

fix config

262a4de

fix fs test

fd19177

fix

156d9e5

rebase master

9db1023

KinderRiven force-pushed the local_cache_with_arc_v1 branch from 83c747e to 9db1023 Compare May 4, 2022 16:48

KinderRiven requested a review from kssenii May 5, 2022 04:19

kssenii reviewed May 6, 2022

View reviewed changes

kssenii reviewed Jun 3, 2022

View reviewed changes

KinderRiven closed this Jun 10, 2022

KinderRiven mentioned this pull request Jun 14, 2022

Decoupling local cache function and cache algorithm #38048

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementation of Local Cache Algorithm Based on ARC #36376

Implementation of Local Cache Algorithm Based on ARC #36376

KinderRiven commented Apr 18, 2022

alesapin commented Apr 18, 2022

KinderRiven commented Apr 19, 2022

KinderRiven commented Apr 19, 2022 •

edited

Loading

KinderRiven commented Apr 19, 2022

kssenii commented Apr 19, 2022

kssenii commented Apr 20, 2022 •

edited

Loading

KinderRiven commented Apr 20, 2022

KinderRiven commented Apr 21, 2022 •

edited

Loading

KinderRiven commented Apr 22, 2022

KinderRiven commented Apr 28, 2022

kssenii May 6, 2022

KinderRiven May 10, 2022

KinderRiven commented May 12, 2022 •

edited

Loading

kssenii commented May 23, 2022

KinderRiven commented Jun 10, 2022

alexey-milovidov commented Jun 12, 2022

Implementation of Local Cache Algorithm Based on ARC #36376

Implementation of Local Cache Algorithm Based on ARC #36376

Conversation

KinderRiven commented Apr 18, 2022

Changelog category (leave one):

Changelog entry:

alesapin commented Apr 18, 2022

KinderRiven commented Apr 19, 2022

KinderRiven commented Apr 19, 2022 • edited Loading

KinderRiven commented Apr 19, 2022

kssenii commented Apr 19, 2022

kssenii commented Apr 20, 2022 • edited Loading

KinderRiven commented Apr 20, 2022

KinderRiven commented Apr 21, 2022 • edited Loading

KinderRiven commented Apr 22, 2022

KinderRiven commented Apr 28, 2022

kssenii May 6, 2022

Choose a reason for hiding this comment

KinderRiven May 10, 2022

Choose a reason for hiding this comment

KinderRiven commented May 12, 2022 • edited Loading

kssenii commented May 23, 2022

KinderRiven commented Jun 10, 2022

alexey-milovidov commented Jun 12, 2022

KinderRiven commented Apr 19, 2022 •

edited

Loading

kssenii commented Apr 20, 2022 •

edited

Loading

KinderRiven commented Apr 21, 2022 •

edited

Loading

KinderRiven commented May 12, 2022 •

edited

Loading