CN113050894A - Agricultural spectrum hybrid storage system cache replacement algorithm based on cuckoo algorithm - Google Patents
Agricultural spectrum hybrid storage system cache replacement algorithm based on cuckoo algorithm Download PDFInfo
- Publication number
- CN113050894A CN113050894A CN202110424188.2A CN202110424188A CN113050894A CN 113050894 A CN113050894 A CN 113050894A CN 202110424188 A CN202110424188 A CN 202110424188A CN 113050894 A CN113050894 A CN 113050894A
- Authority
- CN
- China
- Prior art keywords
- lru
- ccf
- cache replacement
- data
- hot
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 241000544061 Cuculus canorus Species 0.000 title claims abstract description 15
- 238000001228 spectrum Methods 0.000 title abstract description 6
- 230000003044 adaptive effect Effects 0.000 claims abstract description 9
- 238000001514 detection method Methods 0.000 claims abstract description 8
- 238000000034 method Methods 0.000 claims abstract description 5
- 239000000523 sample Substances 0.000 abstract description 6
- 230000003595 spectral effect Effects 0.000 abstract description 3
- 238000012217 deletion Methods 0.000 abstract description 2
- 230000037430 deletion Effects 0.000 abstract description 2
- 230000000704 physical effect Effects 0.000 abstract description 2
- 239000007787 solid Substances 0.000 abstract description 2
- 238000010586 diagram Methods 0.000 description 5
- 238000003780 insertion Methods 0.000 description 3
- 230000037431 insertion Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 230000008901 benefit Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000003111 delayed effect Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000000737 periodic effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0655—Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
- G06F3/0656—Data buffering arrangements
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/061—Improving I/O performance
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/0671—In-line storage system
- G06F3/0673—Single storage device
- G06F3/068—Hybrid storage device
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/004—Artificial life, i.e. computing arrangements simulating life
- G06N3/006—Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Memory System Of A Hierarchy Structure (AREA)
Abstract
The invention discloses an agricultural spectral data cache replacement algorithm based on a cuckoo algorithm. The massive growth of agricultural spectral data in recent years has brought about a tremendous challenge to storage systems. A single storage medium such as an HDD (Hard Disk Drive) or an SSD (Solid State Drive) cannot meet actual requirements due to its inherent physical property limitation, and a storage architecture in which the SSD and the HDD are used in combination is a feasible solution, so that a cache replacement policy for managing the mixed storage of the agricultural spectrum data becomes a key for improving the storage performance. We propose a Count Cuckoo Filter Hot-probe Method (CCF) based thermal detection scheme with very good space and time efficiency and support deletion function. By combining CCF and adaptive two-level LRU (Least Central Used, LRU), a CCF-LRU cache replacement policy is formed that identifies hot data using CCF and manages the cache using adaptive two-level LRU. The experimental result shows that the cache replacement strategy combined with the hot probe scheme can obviously improve the cache hit rate compared with the traditional strategy. CCF-LRU has less time and space complexity and higher hit rate than other cache replacement strategies incorporating hot probe schemes.
Description
Technical Field
The invention relates to an agricultural spectrum data hybrid storage system, which is generally based on a cuckoo algorithm and combines with an LRU (least recently used) cache replacement algorithm to provide a hybrid storage cache replacement strategy based on a counting cuckoo filter heat detection scheme, so that the time complexity and the space complexity are reduced, and the cache hit rate of the agricultural spectrum hybrid storage system is improved.
Background
With the rapid development and application of new technologies such as 5G, big data, Internet of things agriculture and the like, the scale and the speed of data generation are exponentially increased. The massive growth of data brings great challenges to storage systems, and performance improvement of HDDs (Hard Disk drives) serving as main data storage media has been subjected to bottlenecks due to inherent physical property limitations. SSD (Solid State Drive) is a new type of storage medium, and its read/write performance is much better than HDD, but the unit cost of SSD is much higher than HDD. The cost overhead of completely replacing the HDD with an SSD is quite expensive. Therefore, the hybrid storage system constructed by the SSD and the HDD has an excellent solution for both performance and cost.
The SSD-HDD hybrid storage system generally uses the SSD as a cache of the HDD, and provides a high-performance and large-capacity storage system for users by using the characteristics of high performance of the SSD and low cost and large capacity of the HDD. Due to the high price of the SSD, the SSD storage space is limited, so that the cache replacement policy for managing the hybrid storage becomes the key for performance improvement. The mixed storage cache replacement strategy with better performance considers the access frequency of data to determine whether the data is cold data or hot data, and an optimal cache replacement strategy is designed according to different replacement costs of cold and hot data. Therefore, it is a core technology to determine the hot detection (or cold detection) of cold and hot data.
This patent provides a count cuckoo filter heat detection scheme based on cuckoo filter. And combining the CCF and the self-adaptive two-stage LRU to form a CCF-LRU cache replacement strategy. The policy identifies hot data using CCF and manages the cache using adaptive two-level LRU. The experimental result shows that the cache replacement strategy combined with the hot probe scheme can obviously improve the cache hit rate compared with the traditional strategy. CCF-LRU has less time and space complexity and higher hit rate than other cache replacement strategies incorporating hot probe schemes.
Disclosure of Invention
The agricultural spectrum hybrid storage system described in this patent, SSD is used as a buffer between HDD and memory, SSD stores copies of data in HDD, and fig. 6 describes the relationship and data interaction path between the main components of the system. The black arrow represents a data flow path, when the memory sends a read request, the CCF-LRU judges whether the SSD hits, if so, the data is read from the SSD, and if not, the data is read from the HDD and a copy is copied to the SSD. When the memory sends out a write request, the memory is written into the SSD first, and then the write-back to the HDD is delayed.
The present patent combines CCF and adaptive two-stage LRU to form a CCF-LRU policy that utilizes CCF to identify hot data and adaptive two-stage LRU to manage the cache. As shown in FIG. 4, CCF-LRU maintains a cold and hot two-level LRU chain, a hot chain for storing hot data and a cold chain for storing cold data. The tail end of the hot chain is connected with the head end of the cold chain, and the evicted data in the hot chain is inserted into the head end of the cold chain.
When there is an access request, it is determined whether there is a hit. If hit, as shown in fig. 5(a), the data LAB address is returned, and whether hot data or cold data is hit is determined by the CCF, and if the hit is hot data, the data LAB address is put into the head end of the hot chain, and if the hit is cold data, the data LAB address is put into the head end of the cold chain. If there is a miss, as shown in FIG. 5(b), the data is read from external memory into the LRU chain, and the data is returned. The data is read for the first time and is the cold data by default, the head end of the cold LRU chain is inserted, if the LRU chain is full at the moment, the tail end data of the cold chain is judged through CCF, if the LRU chain is the cold data, the cold chain is directly evicted, if the LRU chain is the hot data, the LRU chain is reinserted into the head end of the hot chain, and the previous operation is repeated until the data is evicted. The CCF-LRU cache replacement policy is seen in Algorithm 5.
The length of the linked list of the two-level LRU cache replacement strategy is generally fixed or manually adjusted, and the strategy is difficult to adapt to different working scenes and variable workload. A great deal of practice shows that the hit rate can be improved by two proper chain table length ratios, so that the self-adaptive adjustment of the chain table length is meaningful.
In CCF-LRU, the lengths of two linked lists can be adaptively adjusted in the working process, and the total length of the LRU linked list is equal to the sum L of the hot linked list and the cold linked list, wherein L is equal to Lhot+Lcold,LhotAdjusted in the range of 0.2L to 0.8L. Initial state Lhot=0.2L、 LcoldAnd (3) comparing the number of hot data blocks in the linked lists with the length of 0.01L at the tail of the hot link queue and the length of the cold link queue head after every Q times of access, merging the linked lists with the length of 0.01L at the head of the cold link queue to the tail of the hot link queue if the hot data blocks of the hot link queue are more, merging the linked lists with the length of 0.01L at the tail of the hot link queue to the head of the cold link queue if the hot data blocks of the cold link queue are more, and not operating if the data blocks are equal. Therefore, the length of the two linked lists can be adjusted in a self-adaptive mode. Adaptive two-level LRU is shown in FIG. 6
Drawings
FIG. 1: the structure of the cuckoo filter is schematic.
FIG. 2: schematic diagram of a model of a thermal probe method of a counting cuckoo filter.
FIG. 3: LRU cache replacement policy diagram.
FIG. 4: CCF-LRU cache replacement strategy two-level LRU chain diagram.
FIG. 5: schematic diagram of CCF-LRU cache replacement strategy.
FIG. 6: and the schematic diagram of the adaptive two-level LRUU cache replacement strategy.
FIG. 7: overview of agricultural spectral hybrid storage system.
Detailed Description
1. Cuckoo filter
The cuckoo filter of the cuckoo algorithm consists of a hash table consisting of n buckets, each bucket capable of storing b entries. Obtaining a j-bit fingerprint f, formula 1, from each item through hash calculation, and then obtaining a hash function h1(x) And h2(x) Two candidate bucket indices, equations 2-3, are determined.
f=fingerprint(x) (1)
h1(x)=hash(x) (2)
XOR operation in equation 3 ensures h1(x) Can also pass through h2(x) And f, the fingerprint are obtained by XOR. In other words, if there are two buckets a and b, with fingerprint f stored in bucket a, the index η through bucket a can beaXOR-ing the fingerprint f to obtain the index η of another bucketbSee equation 4
The CF takes advantage of the above features to perform insert, query and delete operations. The left half of fig. 1 demonstrates a CF with 8 buckets, each with 4 entries (n-8, b-4). When a new element x is to be insertediThen, the CF calculates two candidate bucket indices η and fingerprints f by equations 1-3iIf there is an empty entry in the candidate bucket, then the fingerprint f is writteniIf both buckets are full, the CF randomly resets the entry in one bucket, as shown in FIG. 2 as f4The fingerprint fiThe entry is written. Victim f4Calculating the index eta of another candidate bucket according to formula 4bWriting a fingerprint f if the candidate bucket has an empty entry4If the candidate bucket also has no entries, then CF is the entries in the further candidate bucket, as shown in FIG. 2 as f12Write fingerPattern f4Victim f12Repeating the above processes until all fingerprints find their own entries or the repetition number is greater than the maximum kick-out number, and the last victim fingerprint is discarded, and the insertion operation is completed.
The query and delete operations of the CF are simpler, as in the right half of FIG. 2, the query element x8The CF computes the bucket index and fingerprint f for two candidates by equations 1-38Querying candidate buckets for fingerprints f8If yes, return to Ture, and if no, return to False. And the deleting operation is based on the query operation, if returning to Ture, the fingerprint f deleting operation is executed, and if returning to False, the operation is not executed.
2. LRU cache replacement policy
LRU is a common cache replacement policy, and as shown in fig. 3, selects the least recently used page to evict. The policy uses a linked list to store data. Insertion of newly accessed data into the head end of the linked list. And when the data in the linked list is accessed, the data is moved to the head end of the linked list. When the linked list is full, the data at the tail end of the linked list is discarded. The LRU cache replacement strategy is simple to implement, has high hit rate, does not consider access frequency, and has the disadvantages of sharply reduced hit rate and serious cache pollution caused by sporadic and periodic batch operation.
3. Counting-based cuckoo filter heat detection scheme
This patent proposes a based on count cuckoo filter heat detection scheme, and this scheme maintains a count table on CF hash table's basis, and the count table is the same with hash table, comprises n buckets, and every bucket has b clauses and subclauses. Each entry stores an N-bit counter, and when an entry in the hash table occurs, the corresponding count table changes accordingly. The counter value is halved (the counter is shifted 1 bit to the right) for each Q accesses, as shown in fig. 2.
Inserting: when an insertion request exists, firstly, calculating a fingerprint f corresponding to the LBA of the request and two candidate bucket indexes, judging whether empty entries exist in the two candidate buckets, if so, writing the fingerprint f, setting the corresponding counter to be 1, if not, randomly selecting a victim fingerprint, temporarily storing the victim fingerprint and the corresponding counter, then expelling the victim fingerprint and the corresponding counter, writing the fingerprint f, and setting the corresponding counter to be 1. Calculating the index of the alternative bucket by using formula 4, judging whether the alternative bucket has an empty entry, if not, executing the previous step again until the alternative bucket has an empty entry, or repeatedly executing the alternative bucket for a time greater than the maximum kick-out number, discarding the victim fingerprint and the corresponding counter, if so, judging whether the alternative bucket has the victim fingerprint, if not, writing the victim fingerprint and the corresponding counter, and if so, discarding the victim fingerprint and the corresponding counter. As shown in algorithm 1.
Accessing: when an access request exists, firstly, the fingerprint f corresponding to the requested LBA and the indexes of the two candidate buckets are calculated, and then whether the fingerprint f exists in the two candidate buckets or not is judged. If so, the corresponding counter increments by 1. If not, an insert operation is performed, as shown in algorithm 2.
And (3) deleting: when there is a deletion request, the fingerprint f corresponding to the LBA of the request and the two candidate bucket indexes are calculated first, and whether there is a fingerprint f in the two candidate buckets is determined, if there is a fingerprint f, the fingerprint f and the corresponding counter are deleted, otherwise, the operation is not performed, as shown in algorithm 3.
Cold and hot judgment: when a judgment request exists, firstly, calculating a fingerprint f corresponding to the LBA of the request and two candidate bucket indexes, and judging whether the fingerprint f is in the two candidate buckets or notIn (1). If not, the LBA is the first access, and the LBA is directly judged as cold data; if so, look up the value in the corresponding count table. The left K bit value is more than 0, i.e. the LAB address access times are not less than 2N-K And when the data is-1, judging the data to be hot data, otherwise, judging the data to be cold data. As shown in algorithm 4.
Claims (3)
1. The patent provides a counting Cuckoo Filter Hot-detection scheme (CCF) on the basis of a Cuckoo crossing algorithm.
2. The patent designs a self-adaptive two-stage LRU cache replacement algorithm capable of self-adaptive adjustment.
3. The method of claims 1, 2, wherein the CCF-LRU cache replacement policy is formed by combining a CCF and an adaptive two-level LRU (LRU), the policy identifying hot data using the CCF. The cache is managed using an adaptive two-level LRU.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110424188.2A CN113050894A (en) | 2021-04-20 | 2021-04-20 | Agricultural spectrum hybrid storage system cache replacement algorithm based on cuckoo algorithm |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110424188.2A CN113050894A (en) | 2021-04-20 | 2021-04-20 | Agricultural spectrum hybrid storage system cache replacement algorithm based on cuckoo algorithm |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113050894A true CN113050894A (en) | 2021-06-29 |
Family
ID=76519742
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110424188.2A Pending CN113050894A (en) | 2021-04-20 | 2021-04-20 | Agricultural spectrum hybrid storage system cache replacement algorithm based on cuckoo algorithm |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113050894A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113360516A (en) * | 2021-08-11 | 2021-09-07 | 成都信息工程大学 | Set member management method based on first-in first-out and minimum active number strategy |
CN117331860A (en) * | 2023-10-16 | 2024-01-02 | 中国电子技术标准化研究院 | Multi-stream solid state disk address mapping method based on bitmap and cuckoo filter |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2000004452A1 (en) * | 1998-07-16 | 2000-01-27 | Intel Corporation | Method and apparatus for managing temporal and non-temporal data in a single cache structure |
CN102760101A (en) * | 2012-05-22 | 2012-10-31 | 中国科学院计算技术研究所 | SSD-based (Solid State Disk) cache management method and system |
CN109542803A (en) * | 2018-11-20 | 2019-03-29 | 中国石油大学(华东) | A kind of mixing multi-mode dsc data cache policy based on deep learning |
US20200310969A1 (en) * | 2019-04-01 | 2020-10-01 | Arm Limited | Replacement of cache entries in a set-associative cache |
-
2021
- 2021-04-20 CN CN202110424188.2A patent/CN113050894A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2000004452A1 (en) * | 1998-07-16 | 2000-01-27 | Intel Corporation | Method and apparatus for managing temporal and non-temporal data in a single cache structure |
CN102760101A (en) * | 2012-05-22 | 2012-10-31 | 中国科学院计算技术研究所 | SSD-based (Solid State Disk) cache management method and system |
CN109542803A (en) * | 2018-11-20 | 2019-03-29 | 中国石油大学(华东) | A kind of mixing multi-mode dsc data cache policy based on deep learning |
US20200310969A1 (en) * | 2019-04-01 | 2020-10-01 | Arm Limited | Replacement of cache entries in a set-associative cache |
Non-Patent Citations (1)
Title |
---|
WANG, YY 等: "A path cost-based GRASP for minimum independent dominating set problem", NEURAL COMPUTING & APPLICATIONS, 19 December 2017 (2017-12-19), pages 143 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113360516A (en) * | 2021-08-11 | 2021-09-07 | 成都信息工程大学 | Set member management method based on first-in first-out and minimum active number strategy |
CN117331860A (en) * | 2023-10-16 | 2024-01-02 | 中国电子技术标准化研究院 | Multi-stream solid state disk address mapping method based on bitmap and cuckoo filter |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102760101B (en) | SSD-based (Solid State Disk) cache management method and system | |
Nam et al. | Assuring demanded read performance of data deduplication storage with backup datasets | |
US20130173853A1 (en) | Memory-efficient caching methods and systems | |
Cheng et al. | LRU-SP: a size-adjusted and popularity-aware LRU replacement algorithm for web caching | |
WO2009033419A1 (en) | A data caching processing method, system and data caching device | |
CN107066393A (en) | The method for improving map information density in address mapping table | |
Lv et al. | Operation-aware buffer management in flash-based systems | |
CN102314397B (en) | Method for processing cache data block | |
CN103176754A (en) | Reading and storing method for massive amounts of small files | |
CN110795363B (en) | Hot page prediction method and page scheduling method of storage medium | |
CN110532200B (en) | Memory system based on hybrid memory architecture | |
JP6711121B2 (en) | Information processing apparatus, cache memory control method, and cache memory control program | |
CN108762671A (en) | Hybrid memory system based on PCM and DRAM and management method thereof | |
CN113050894A (en) | Agricultural spectrum hybrid storage system cache replacement algorithm based on cuckoo algorithm | |
Wu et al. | APP-LRU: A new page replacement method for PCM/DRAM-based hybrid memory systems | |
CN113419976B (en) | Self-adaptive segmented caching method and system based on classification prediction | |
Park et al. | A lookahead read cache: improving read performance for deduplication backup storage | |
WO2013075306A1 (en) | Data access method and device | |
CN112486994A (en) | Method for quickly reading data of key value storage based on log structure merging tree | |
CN106055679A (en) | Multi-level cache sensitive indexing method | |
CN111506517B (en) | Flash memory page level address mapping method and system based on access locality | |
Wang et al. | CCF-LRU: hybrid storage cache replacement strategy based on counting cuckoo filter hot-probe method | |
CN102097128B (en) | Self-adaptive buffer area replacement method based on flash memory | |
CN109002400B (en) | Content-aware computer cache management system and method | |
Liu et al. | ROCO: Using a solid state drive cache to improve the performance of a host-aware shingled magnetic recording drive |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |