CN113050894A - Agricultural spectrum hybrid storage system cache replacement algorithm based on cuckoo algorithm - Google Patents

Agricultural spectrum hybrid storage system cache replacement algorithm based on cuckoo algorithm Download PDF

Info

Publication number
CN113050894A
CN113050894A CN202110424188.2A CN202110424188A CN113050894A CN 113050894 A CN113050894 A CN 113050894A CN 202110424188 A CN202110424188 A CN 202110424188A CN 113050894 A CN113050894 A CN 113050894A
Authority
CN
China
Prior art keywords
lru
ccf
cache replacement
data
hot
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110424188.2A
Other languages
Chinese (zh)
Inventor
王吟吟
杨余旺
柯亚琪
陈霆希
曹宏鑫
葛道阔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Science and Technology
Jiangsu Academy of Agricultural Sciences
Original Assignee
Nanjing University of Science and Technology
Jiangsu Academy of Agricultural Sciences
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Science and Technology, Jiangsu Academy of Agricultural Sciences filed Critical Nanjing University of Science and Technology
Priority to CN202110424188.2A priority Critical patent/CN113050894A/en
Publication of CN113050894A publication Critical patent/CN113050894A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • G06F3/0656Data buffering arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • G06F3/068Hybrid storage device
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

The invention discloses an agricultural spectral data cache replacement algorithm based on a cuckoo algorithm. The massive growth of agricultural spectral data in recent years has brought about a tremendous challenge to storage systems. A single storage medium such as an HDD (Hard Disk Drive) or an SSD (Solid State Drive) cannot meet actual requirements due to its inherent physical property limitation, and a storage architecture in which the SSD and the HDD are used in combination is a feasible solution, so that a cache replacement policy for managing the mixed storage of the agricultural spectrum data becomes a key for improving the storage performance. We propose a Count Cuckoo Filter Hot-probe Method (CCF) based thermal detection scheme with very good space and time efficiency and support deletion function. By combining CCF and adaptive two-level LRU (Least Central Used, LRU), a CCF-LRU cache replacement policy is formed that identifies hot data using CCF and manages the cache using adaptive two-level LRU. The experimental result shows that the cache replacement strategy combined with the hot probe scheme can obviously improve the cache hit rate compared with the traditional strategy. CCF-LRU has less time and space complexity and higher hit rate than other cache replacement strategies incorporating hot probe schemes.

Description

Agricultural spectrum hybrid storage system cache replacement algorithm based on cuckoo algorithm
Technical Field
The invention relates to an agricultural spectrum data hybrid storage system, which is generally based on a cuckoo algorithm and combines with an LRU (least recently used) cache replacement algorithm to provide a hybrid storage cache replacement strategy based on a counting cuckoo filter heat detection scheme, so that the time complexity and the space complexity are reduced, and the cache hit rate of the agricultural spectrum hybrid storage system is improved.
Background
With the rapid development and application of new technologies such as 5G, big data, Internet of things agriculture and the like, the scale and the speed of data generation are exponentially increased. The massive growth of data brings great challenges to storage systems, and performance improvement of HDDs (Hard Disk drives) serving as main data storage media has been subjected to bottlenecks due to inherent physical property limitations. SSD (Solid State Drive) is a new type of storage medium, and its read/write performance is much better than HDD, but the unit cost of SSD is much higher than HDD. The cost overhead of completely replacing the HDD with an SSD is quite expensive. Therefore, the hybrid storage system constructed by the SSD and the HDD has an excellent solution for both performance and cost.
The SSD-HDD hybrid storage system generally uses the SSD as a cache of the HDD, and provides a high-performance and large-capacity storage system for users by using the characteristics of high performance of the SSD and low cost and large capacity of the HDD. Due to the high price of the SSD, the SSD storage space is limited, so that the cache replacement policy for managing the hybrid storage becomes the key for performance improvement. The mixed storage cache replacement strategy with better performance considers the access frequency of data to determine whether the data is cold data or hot data, and an optimal cache replacement strategy is designed according to different replacement costs of cold and hot data. Therefore, it is a core technology to determine the hot detection (or cold detection) of cold and hot data.
This patent provides a count cuckoo filter heat detection scheme based on cuckoo filter. And combining the CCF and the self-adaptive two-stage LRU to form a CCF-LRU cache replacement strategy. The policy identifies hot data using CCF and manages the cache using adaptive two-level LRU. The experimental result shows that the cache replacement strategy combined with the hot probe scheme can obviously improve the cache hit rate compared with the traditional strategy. CCF-LRU has less time and space complexity and higher hit rate than other cache replacement strategies incorporating hot probe schemes.
Disclosure of Invention
The agricultural spectrum hybrid storage system described in this patent, SSD is used as a buffer between HDD and memory, SSD stores copies of data in HDD, and fig. 6 describes the relationship and data interaction path between the main components of the system. The black arrow represents a data flow path, when the memory sends a read request, the CCF-LRU judges whether the SSD hits, if so, the data is read from the SSD, and if not, the data is read from the HDD and a copy is copied to the SSD. When the memory sends out a write request, the memory is written into the SSD first, and then the write-back to the HDD is delayed.
The present patent combines CCF and adaptive two-stage LRU to form a CCF-LRU policy that utilizes CCF to identify hot data and adaptive two-stage LRU to manage the cache. As shown in FIG. 4, CCF-LRU maintains a cold and hot two-level LRU chain, a hot chain for storing hot data and a cold chain for storing cold data. The tail end of the hot chain is connected with the head end of the cold chain, and the evicted data in the hot chain is inserted into the head end of the cold chain.
When there is an access request, it is determined whether there is a hit. If hit, as shown in fig. 5(a), the data LAB address is returned, and whether hot data or cold data is hit is determined by the CCF, and if the hit is hot data, the data LAB address is put into the head end of the hot chain, and if the hit is cold data, the data LAB address is put into the head end of the cold chain. If there is a miss, as shown in FIG. 5(b), the data is read from external memory into the LRU chain, and the data is returned. The data is read for the first time and is the cold data by default, the head end of the cold LRU chain is inserted, if the LRU chain is full at the moment, the tail end data of the cold chain is judged through CCF, if the LRU chain is the cold data, the cold chain is directly evicted, if the LRU chain is the hot data, the LRU chain is reinserted into the head end of the hot chain, and the previous operation is repeated until the data is evicted. The CCF-LRU cache replacement policy is seen in Algorithm 5.
Figure BDA0003028625550000021
The length of the linked list of the two-level LRU cache replacement strategy is generally fixed or manually adjusted, and the strategy is difficult to adapt to different working scenes and variable workload. A great deal of practice shows that the hit rate can be improved by two proper chain table length ratios, so that the self-adaptive adjustment of the chain table length is meaningful.
In CCF-LRU, the lengths of two linked lists can be adaptively adjusted in the working process, and the total length of the LRU linked list is equal to the sum L of the hot linked list and the cold linked list, wherein L is equal to Lhot+Lcold,LhotAdjusted in the range of 0.2L to 0.8L. Initial state Lhot=0.2L、 LcoldAnd (3) comparing the number of hot data blocks in the linked lists with the length of 0.01L at the tail of the hot link queue and the length of the cold link queue head after every Q times of access, merging the linked lists with the length of 0.01L at the head of the cold link queue to the tail of the hot link queue if the hot data blocks of the hot link queue are more, merging the linked lists with the length of 0.01L at the tail of the hot link queue to the head of the cold link queue if the hot data blocks of the cold link queue are more, and not operating if the data blocks are equal. Therefore, the length of the two linked lists can be adjusted in a self-adaptive mode. Adaptive two-level LRU is shown in FIG. 6
Drawings
FIG. 1: the structure of the cuckoo filter is schematic.
FIG. 2: schematic diagram of a model of a thermal probe method of a counting cuckoo filter.
FIG. 3: LRU cache replacement policy diagram.
FIG. 4: CCF-LRU cache replacement strategy two-level LRU chain diagram.
FIG. 5: schematic diagram of CCF-LRU cache replacement strategy.
FIG. 6: and the schematic diagram of the adaptive two-level LRUU cache replacement strategy.
FIG. 7: overview of agricultural spectral hybrid storage system.
Detailed Description
1. Cuckoo filter
The cuckoo filter of the cuckoo algorithm consists of a hash table consisting of n buckets, each bucket capable of storing b entries. Obtaining a j-bit fingerprint f, formula 1, from each item through hash calculation, and then obtaining a hash function h1(x) And h2(x) Two candidate bucket indices, equations 2-3, are determined.
f=fingerprint(x) (1)
h1(x)=hash(x) (2)
Figure BDA0003028625550000031
XOR operation in equation 3 ensures h1(x) Can also pass through h2(x) And f, the fingerprint are obtained by XOR. In other words, if there are two buckets a and b, with fingerprint f stored in bucket a, the index η through bucket a can beaXOR-ing the fingerprint f to obtain the index η of another bucketbSee equation 4
Figure BDA0003028625550000032
The CF takes advantage of the above features to perform insert, query and delete operations. The left half of fig. 1 demonstrates a CF with 8 buckets, each with 4 entries (n-8, b-4). When a new element x is to be insertediThen, the CF calculates two candidate bucket indices η and fingerprints f by equations 1-3iIf there is an empty entry in the candidate bucket, then the fingerprint f is writteniIf both buckets are full, the CF randomly resets the entry in one bucket, as shown in FIG. 2 as f4The fingerprint fiThe entry is written. Victim f4Calculating the index eta of another candidate bucket according to formula 4bWriting a fingerprint f if the candidate bucket has an empty entry4If the candidate bucket also has no entries, then CF is the entries in the further candidate bucket, as shown in FIG. 2 as f12Write fingerPattern f4Victim f12Repeating the above processes until all fingerprints find their own entries or the repetition number is greater than the maximum kick-out number, and the last victim fingerprint is discarded, and the insertion operation is completed.
The query and delete operations of the CF are simpler, as in the right half of FIG. 2, the query element x8The CF computes the bucket index and fingerprint f for two candidates by equations 1-38Querying candidate buckets for fingerprints f8If yes, return to Ture, and if no, return to False. And the deleting operation is based on the query operation, if returning to Ture, the fingerprint f deleting operation is executed, and if returning to False, the operation is not executed.
2. LRU cache replacement policy
LRU is a common cache replacement policy, and as shown in fig. 3, selects the least recently used page to evict. The policy uses a linked list to store data. Insertion of newly accessed data into the head end of the linked list. And when the data in the linked list is accessed, the data is moved to the head end of the linked list. When the linked list is full, the data at the tail end of the linked list is discarded. The LRU cache replacement strategy is simple to implement, has high hit rate, does not consider access frequency, and has the disadvantages of sharply reduced hit rate and serious cache pollution caused by sporadic and periodic batch operation.
3. Counting-based cuckoo filter heat detection scheme
This patent proposes a based on count cuckoo filter heat detection scheme, and this scheme maintains a count table on CF hash table's basis, and the count table is the same with hash table, comprises n buckets, and every bucket has b clauses and subclauses. Each entry stores an N-bit counter, and when an entry in the hash table occurs, the corresponding count table changes accordingly. The counter value is halved (the counter is shifted 1 bit to the right) for each Q accesses, as shown in fig. 2.
Inserting: when an insertion request exists, firstly, calculating a fingerprint f corresponding to the LBA of the request and two candidate bucket indexes, judging whether empty entries exist in the two candidate buckets, if so, writing the fingerprint f, setting the corresponding counter to be 1, if not, randomly selecting a victim fingerprint, temporarily storing the victim fingerprint and the corresponding counter, then expelling the victim fingerprint and the corresponding counter, writing the fingerprint f, and setting the corresponding counter to be 1. Calculating the index of the alternative bucket by using formula 4, judging whether the alternative bucket has an empty entry, if not, executing the previous step again until the alternative bucket has an empty entry, or repeatedly executing the alternative bucket for a time greater than the maximum kick-out number, discarding the victim fingerprint and the corresponding counter, if so, judging whether the alternative bucket has the victim fingerprint, if not, writing the victim fingerprint and the corresponding counter, and if so, discarding the victim fingerprint and the corresponding counter. As shown in algorithm 1.
Figure BDA0003028625550000041
Accessing: when an access request exists, firstly, the fingerprint f corresponding to the requested LBA and the indexes of the two candidate buckets are calculated, and then whether the fingerprint f exists in the two candidate buckets or not is judged. If so, the corresponding counter increments by 1. If not, an insert operation is performed, as shown in algorithm 2.
Figure BDA0003028625550000042
And (3) deleting: when there is a deletion request, the fingerprint f corresponding to the LBA of the request and the two candidate bucket indexes are calculated first, and whether there is a fingerprint f in the two candidate buckets is determined, if there is a fingerprint f, the fingerprint f and the corresponding counter are deleted, otherwise, the operation is not performed, as shown in algorithm 3.
Figure BDA0003028625550000043
Figure BDA0003028625550000051
Cold and hot judgment: when a judgment request exists, firstly, calculating a fingerprint f corresponding to the LBA of the request and two candidate bucket indexes, and judging whether the fingerprint f is in the two candidate buckets or notIn (1). If not, the LBA is the first access, and the LBA is directly judged as cold data; if so, look up the value in the corresponding count table. The left K bit value is more than 0, i.e. the LAB address access times are not less than 2N-K And when the data is-1, judging the data to be hot data, otherwise, judging the data to be cold data. As shown in algorithm 4.
Figure BDA0003028625550000052

Claims (3)

1. The patent provides a counting Cuckoo Filter Hot-detection scheme (CCF) on the basis of a Cuckoo crossing algorithm.
2. The patent designs a self-adaptive two-stage LRU cache replacement algorithm capable of self-adaptive adjustment.
3. The method of claims 1, 2, wherein the CCF-LRU cache replacement policy is formed by combining a CCF and an adaptive two-level LRU (LRU), the policy identifying hot data using the CCF. The cache is managed using an adaptive two-level LRU.
CN202110424188.2A 2021-04-20 2021-04-20 Agricultural spectrum hybrid storage system cache replacement algorithm based on cuckoo algorithm Pending CN113050894A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110424188.2A CN113050894A (en) 2021-04-20 2021-04-20 Agricultural spectrum hybrid storage system cache replacement algorithm based on cuckoo algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110424188.2A CN113050894A (en) 2021-04-20 2021-04-20 Agricultural spectrum hybrid storage system cache replacement algorithm based on cuckoo algorithm

Publications (1)

Publication Number Publication Date
CN113050894A true CN113050894A (en) 2021-06-29

Family

ID=76519742

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110424188.2A Pending CN113050894A (en) 2021-04-20 2021-04-20 Agricultural spectrum hybrid storage system cache replacement algorithm based on cuckoo algorithm

Country Status (1)

Country Link
CN (1) CN113050894A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113360516A (en) * 2021-08-11 2021-09-07 成都信息工程大学 Set member management method based on first-in first-out and minimum active number strategy
CN117331860A (en) * 2023-10-16 2024-01-02 中国电子技术标准化研究院 Multi-stream solid state disk address mapping method based on bitmap and cuckoo filter

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000004452A1 (en) * 1998-07-16 2000-01-27 Intel Corporation Method and apparatus for managing temporal and non-temporal data in a single cache structure
CN102760101A (en) * 2012-05-22 2012-10-31 中国科学院计算技术研究所 SSD-based (Solid State Disk) cache management method and system
CN109542803A (en) * 2018-11-20 2019-03-29 中国石油大学(华东) A kind of mixing multi-mode dsc data cache policy based on deep learning
US20200310969A1 (en) * 2019-04-01 2020-10-01 Arm Limited Replacement of cache entries in a set-associative cache

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000004452A1 (en) * 1998-07-16 2000-01-27 Intel Corporation Method and apparatus for managing temporal and non-temporal data in a single cache structure
CN102760101A (en) * 2012-05-22 2012-10-31 中国科学院计算技术研究所 SSD-based (Solid State Disk) cache management method and system
CN109542803A (en) * 2018-11-20 2019-03-29 中国石油大学(华东) A kind of mixing multi-mode dsc data cache policy based on deep learning
US20200310969A1 (en) * 2019-04-01 2020-10-01 Arm Limited Replacement of cache entries in a set-associative cache

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
WANG, YY 等: "A path cost-based GRASP for minimum independent dominating set problem", NEURAL COMPUTING & APPLICATIONS, 19 December 2017 (2017-12-19), pages 143 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113360516A (en) * 2021-08-11 2021-09-07 成都信息工程大学 Set member management method based on first-in first-out and minimum active number strategy
CN117331860A (en) * 2023-10-16 2024-01-02 中国电子技术标准化研究院 Multi-stream solid state disk address mapping method based on bitmap and cuckoo filter

Similar Documents

Publication Publication Date Title
CN102760101B (en) SSD-based (Solid State Disk) cache management method and system
Nam et al. Assuring demanded read performance of data deduplication storage with backup datasets
US20130173853A1 (en) Memory-efficient caching methods and systems
Cheng et al. LRU-SP: a size-adjusted and popularity-aware LRU replacement algorithm for web caching
WO2009033419A1 (en) A data caching processing method, system and data caching device
CN107066393A (en) The method for improving map information density in address mapping table
Lv et al. Operation-aware buffer management in flash-based systems
CN102314397B (en) Method for processing cache data block
CN103176754A (en) Reading and storing method for massive amounts of small files
CN110795363B (en) Hot page prediction method and page scheduling method of storage medium
CN110532200B (en) Memory system based on hybrid memory architecture
JP6711121B2 (en) Information processing apparatus, cache memory control method, and cache memory control program
CN108762671A (en) Hybrid memory system based on PCM and DRAM and management method thereof
CN113050894A (en) Agricultural spectrum hybrid storage system cache replacement algorithm based on cuckoo algorithm
Wu et al. APP-LRU: A new page replacement method for PCM/DRAM-based hybrid memory systems
CN113419976B (en) Self-adaptive segmented caching method and system based on classification prediction
Park et al. A lookahead read cache: improving read performance for deduplication backup storage
WO2013075306A1 (en) Data access method and device
CN112486994A (en) Method for quickly reading data of key value storage based on log structure merging tree
CN106055679A (en) Multi-level cache sensitive indexing method
CN111506517B (en) Flash memory page level address mapping method and system based on access locality
Wang et al. CCF-LRU: hybrid storage cache replacement strategy based on counting cuckoo filter hot-probe method
CN102097128B (en) Self-adaptive buffer area replacement method based on flash memory
CN109002400B (en) Content-aware computer cache management system and method
Liu et al. ROCO: Using a solid state drive cache to improve the performance of a host-aware shingled magnetic recording drive

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination