CN111399777A - Differentiated key value data storage method based on data value classification - Google Patents
Differentiated key value data storage method based on data value classification Download PDFInfo
- Publication number
- CN111399777A CN111399777A CN202010182605.2A CN202010182605A CN111399777A CN 111399777 A CN111399777 A CN 111399777A CN 202010182605 A CN202010182605 A CN 202010182605A CN 111399777 A CN111399777 A CN 111399777A
- Authority
- CN
- China
- Prior art keywords
- data
- file
- value
- ordered
- files
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0638—Organizing or formatting or addressing of data
- G06F3/0644—Management of space entities, e.g. partitions, extents, pools
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/0223—User address space allocation, e.g. contiguous or non contiguous base addressing
- G06F12/023—Free address space management
- G06F12/0253—Garbage collection, i.e. reclamation of unreferenced memory
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/901—Indexing; Data structures therefor; Storage structures
- G06F16/9027—Trees
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/906—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/0608—Saving storage space on storage systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/061—Improving I/O performance
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/0671—In-line storage system
- G06F3/0673—Single storage device
- G06F3/0674—Disk device
- G06F3/0676—Magnetic disk device
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a differentiated key value data storage method based on data value classification, which is characterized by comprising the steps of classified storage of key value data, hierarchical management of ordered files, optimization of local orderliness of bottom data values and differentiated disk space recovery strategies, wherein the data values with different sizes are classified and stored in different orderliness by adopting a key value separation structure, so that the system can support efficient point query and range query on the data while the overhead caused by data combination is fully reduced; through hierarchical management and local ordering optimization of the ordered files, the sequential reading of most ordered files in the system in the range query is guaranteed with the lowest possible overhead; the file with the densest invalid data in the system can be quickly identified and recovered by using the differentiated disk space recovery strategy, and frequent reading and writing of the medium and small data values to the data index in the space recovery process are avoided, so that the storage space can be efficiently recovered and utilized. Therefore, the key value storage system designed by the method of the invention can have excellent performance indexes such as reading, writing, space overhead and the like.
Description
Technical Field
The invention belongs to the technical field of computer storage systems, and particularly relates to a Key Value access method which is used for classifying Key Value data and adopting a differentiated storage method in a Key Value storage system (KV Stores) and realizing high performance and low space overhead.
Background
According to introduction of companies such as Google (Google), Facebook (Facebook) and the like, in order to meet storage requirements of massive unstructured and semi-structured data in new network applications and services, storage systems based on Key-Value models are widely used in recent years, and L evedb of Google, RocksDB of Facebook, Tair of treasure and the like all adopt Key-Value storage models, all of the Key-Value storage systems of the mainstream adopt a log-structured merge Tree (L SM-Tree) as a core storage structure, the Key-Value data is organized into multiple layers on a disk, absolute ordering of the data in the disk is guaranteed through interlayer compression (composition) operations, but frequent composition operations cause a serious write amplification (write amplification) problem, in order to solve the problem, two main technical research directions exist at present, one is that an intra-layer ordering requirement of storing Key-Value pairs (pakv) in L SM-Tree, only an intra-layer ordering operation is required in a write amplification (write amplification) manner, and only one layer of Key-layer operation is required to be used as a storage system, and the read-layer ordering requirement of data storage systems of the write amplification and the write amplification of the data is required to be reduced, and the read-write-level data storage cost of the write amplification system is reduced, and the write amplification of data is required by a write amplification of the write amplification of a write amplification system, and a write amplification of a write amplification system, and a write.
Disclosure of Invention
The invention aims to provide a differentiated key value data storage method based on data value classification for a key value storage system, which guarantees the data reading performance with the maintenance cost as low as possible, realizes efficient space recovery, and overcomes the defects that the prior art cannot simultaneously guarantee high reading and writing performance and low space cost.
The invention discloses a differentiated key value data storage method based on data value classification, which is characterized by comprising the following steps of:
the first step is as follows: sorted storage of key-value data
The method comprises the following steps of adopting a data Value (Value) classification method, dividing key Value data into three types of large, medium and small according to the size of the data Value, and respectively storing the data Value in three different areas of an unordered data area, an ordered data area and an index area on a disk by a storage structure with key Value separation:
the unordered data area consists of a series of unordered files and stores large data values, and the data values are stored in the files according to a writing sequence;
the ordered data area consists of a series of ordered files, data values are stored, the data values are arranged in the files according to the dictionary order of keys (keys), and the range overlap can exist between different ordered files;
the index area is structured as a log structure merged Tree (L SM-Tree), and stores the small key value pairs and the positions of the large and medium data values in the data area;
the data value classification method comprises the steps of taking the position index size of an approximate data value as the maximum data value size SmallThreshold of a small key value pair, setting the minimum data value size L argeThreshold of a large key value pair according to concurrent random reading performance of a storage medium, meeting the requirement that concurrent random reading performance approaches sequential reading when the data value is larger than L argeThreshold, limiting the size of a data value file to MaxFileSize, wherein the value of the MaxFileSize is smaller than the size of a data file in L SM-Tree, and following the following procedures are followed when the key value pair is written:
(1) judging the data type of the key value, directly writing the medium and small key value pairs into a write cache (Memtable) of L SM-Tree, additionally writing the large key value pair into the latest unordered file, and writing the position index < key, value position > into the write cache after successful writing;
(2) when the write cache is written with the full disk, a new ordered file is generated, the middle data value is written in the dictionary order of the key, and the position < key, value position > of the middle data value in the file is recorded; generating a next file each time the ordered file reaches MaxFileSize;
(3) writing L SM-Tree with the small key value pair in the write cache and the position index < key, value position > of other data value;
the second step is that: hierarchical management of ordered files
The method comprises the following steps of logically dividing an ordered file into three layers, wherein the intermediate data values of the last two layers of L SM-Tree are only stored in the ordered files of the 1 st layer and the 2 nd layer respectively, the intermediate data values of the rest layers of L SM-Tree are only stored in the ordered files of the 0 th layer, performing layering operation when the L SM-Tree triggers the compact of the last two layers of output layers, merging, sequencing and rewriting the intermediate data values of the input layers in the compact data into the ordered files of the corresponding output layers, and the specific method comprises the following steps:
(1) recording the number of layers of the ordered file when the ordered file is generated, wherein the ordered file generated from the write cache disk brushing is the 0 th layer;
(2) when L SM-Tree executes compact, if output layer is the last two layers, then for each middle key value pair traversed in sequence in input data, if the layer number of the ordered file with data value is lower than the ordered file layer of output layer index, or the file is marked as needing to be merged, then writing the data value into the newly generated ordered file corresponding to output layer;
(3) compact updates the location of the current data value when saving the output data;
the third step: optimizing local ordering of underlying data values
Checking the range overlapping condition among the ordered files of the output data index of the compact every time after L SM-Tree completes one bottom layer of compact, if the number of the mutually overlapped ordered files in a certain keypad exceeds a limit value MaxOverlap, adding a mark needing to be merged on the files, and in the next compact related to the interval, merging the data values in the files by the layering operation of the ordered files no matter which layer the files belong to;
the fourth step: differentiated disk space reclamation strategies
For data that is invalid by being updated, overwritten, or deleted, the key, small data value, and data value location are deleted at compact; for medium and large data values, scheduling and executing space recycling by monitoring the distribution condition of invalid data values in the file in the compact process; the monitoring steps are as follows:
(1) maintaining a metadata table in the memory to record the type and invalid data volume of each data value file (namely, an ordered file or an unordered file);
(2) in each compact process, for each data value file D of the input data index, recording the data value size Di in which the input data is stored and the data value size Do in which the output data is stored;
(3) when the compact is finished, increasing the invalid data volume of each file D by Di-Do, wherein the invalid data proportion of the files is the invalid data volume divided by the total data volume;
(4) if the hierarchy operation in the compact generates a new ordered file, adding the new ordered file into the metadata table, wherein the invalid data volume is 0;
after the compact is finished, executing space recycling scheduling on the data value file with the invalid data ratio exceeding the MaxInvalidRatio, wherein the method comprises the following steps:
(1) if the invalid data proportion is 1, deleting the file and finishing scheduling;
(2) if the file is an ordered file, marking the file as needing to be merged, rewriting the effective data value in the file in the next hierarchical operation, and enabling the ratio of the ineffective data to be 1 so as to be deleted;
(3) if the file is an unordered file, adding the file into a space recovery queue for waiting processing;
traversing the large key value pair stored in the file, using the key to inquire L SM-Tree whether the position of the data value is consistent with the file so as to judge whether the data is valid, rewriting the valid data into a new unordered file, updating the position to L SM-Tree, and finally deleting the file.
The differentiated key value data storage method based on data value classification is characterized by comprising the steps of classified storage of key value data, hierarchical management of ordered files, optimization of local orderliness of underlying data values and differentiated disk space recovery strategies, wherein the data values with different sizes are classified and stored in different orderliness by adopting a key value separation structure, so that the system can support efficient point query and range query on the data while the overhead caused by data combination is fully reduced; through hierarchical management and local ordering optimization of the ordered files, the sequential reading of most ordered files in the system in the range query is guaranteed with the lowest possible overhead; the file with the densest invalid data in the system can be quickly identified and recovered by using the differentiated disk space recovery strategy, and frequent reading and writing of the medium and small data values to the data index in the space recovery process are avoided, so that the storage space can be efficiently recovered and utilized. Therefore, the key value storage system designed by the method of the invention can have excellent performance indexes such as reading, writing, space overhead and the like.
Drawings
Fig. 1 is a schematic overall architecture diagram of a differentiated key value data storage method based on data value classification according to the present invention.
FIG. 2 is a schematic diagram of the hierarchical operation of an ordered file.
FIG. 3 is a diagram of local order maintenance for ordered files.
Fig. 4 is a schematic diagram of a monitoring method of invalid data.
FIG. 5 is a schematic diagram of an unordered file space reclamation process.
Detailed Description
The following describes the differentiated key-value data storage method based on data value classification in detail with reference to the accompanying drawings.
Example 1:
the embodiment shows the operation flow of the differentiated key value data storage method based on data value classification according to the invention by writing and maintaining key value data. Fig. 1 is a schematic diagram of an overall architecture of a differentiated key value data storage method based on data value classification according to this embodiment, which includes a write cache in a memory, a data value file metadata table, data partitions (an index area, an ordered data area, and an unordered data area) on a disk, and other portions, and indicates a classified storage process of key value data.
In this embodiment, the system uses samsung 860 evosd as the underlying storage medium, and the throughput of the concurrent random reading with 8KB or more reaches three fourths of the sequential reading, so L argeThreshold is set to 8 KB., and other relevant parameters are SmallThreshold to 128B, and therefore the data value between 129B and 8KB to be the medium data value, MaxOverlap is usually set to 10 for better performance, and set to 2 in this embodiment for convenience of description, maxonaldiratio is set to 0.3, and MaxFileSize is set to 8 MB.
The differentiated key value data storage method based on data value classification in the embodiment specifically comprises the following steps:
the first step is as follows: sorted storage of key-value data
The Key Value pairs are divided into three types of big, middle and small according to the size of the data Value (Value), wherein the big and middle Key Value pairs ensure lower data merging expense by adopting a Key Value separation storage method, the data values are respectively stored in an unordered data area and an ordered data area, the position indexes of the Key (Key) and the data values are stored in L SM-Tree in the index area, and the small Key Value pairs are directly stored in L SM-Tree.
In this embodiment, when writing the key value pair < key, value > into the system, the large key value pair split storage (fig. 1 large key value pair split storage ①) is performed, first, it is determined whether the value is greater than L argeThreshold, in this embodiment, the L argeThreshold parameter is 8KB, if the value is greater than 8KB, the key value pair is a large key value pair, the whole key value pair is additionally written into the newest file of the unordered data area, the storage format is [ key size, key, value size, value ], if the file size after writing reaches MaxFileSize (in this embodiment, 8MB), a new unordered file is generated to store the subsequent data, after writing the file succeeds, the location information of the value in the unordered file < key, value location > is written into the write cache, and if the written unordered file is 003, the format of the value location is [003, the address of the value in 003 KB, the size in 003 KB ], and if the value is smaller than 8KB, the key value is directly written into the unordered data area.
Until the write cache is full, the write cache is converted into a locked write cache (Immunable Memtable) and is flashed in the background, and when the write cache is flashed, the key value pairs < key, value > in the write cache are traversed in sequence, and the split storage of the key value pairs is executed (the split storage ② of the key value pairs in FIG. 1). first, whether the value is greater than SmallThreshold or not is judged, in this embodiment, the SmallThreshold parameter is 128B, if the value is not greater than 128B or the key value pair is the location information of a large data value, the < key, value > is written into L SM-Tree, otherwise, the key value pair is a middle key value pair, the value is written into a newly generated 0 th-layer ordered file, note that compared with an unordered file, only the value itself needs to be saved in the ordered file, because the ordered file does not need other information to perform space reclamation (see the fourth step and the fifth step).
On one hand, the classified storage of the key value data reduces the write amplification expense of L SM-Tree by using a key value separation structure, and the position for storing the data value by using L SM-Tree can support the rapid point query operation, on the other hand, the system only needs to maintain the orderliness of the small and medium data values, the expense is small, and the performance of the disk is close to the sequential reading when the large data value is read concurrently and randomly, so the whole system can have higher point query, range query and write performance simultaneously.
Second-step hierarchical management of ordered files
The ordered files are logically divided into three layers, data values of the last two layers of L SM-Tree are respectively only stored in the ordered files of the 1 st layer and the 2 nd layer, data values of the other layers of L SM-Tree are only stored in the ordered files of the 0 th layer, layering operation is executed when the L SM-Tree triggers the output layer to be the last two layers of compact, and data values in the input layer of the compact data are merged, sorted and rewritten to the ordered files of the corresponding output layer.
FIG. 2 illustrates the merging of data values in a hierarchical operation of ordered files, the left side of the diagram being the last two layers of L SM-Tree, where the files are not shown in detail, and the right side being the partial ordered files of the last two layers (i.e., layer 1 and layer 2), where two files in layer 2 are marked as needed to be merged and the shaded portion is data involved in a certain compact, and an unordered data area is omitted in the diagram because compact traverses all the shaded key-value pairs in order regardless of the hierarchical operation, for each key-value pair encountered, if its data value is saved in the ordered file of layer 1 or its ordered file has a needed merge marker, its data value is written into a new ordered file (merge ③ of the ordered file of FIG. 2. after compact ends, these newly generated files are added to the ordered file of layer 2. by this method, the data of layer 1 is reordered and the data values in compact output data are all saved in the ordered file of layer 2. the second layer of ordered file is not shown in a similar manner.
Since the difference of data quantity of the upper layer and the lower layer in the L SM-Tree is about 10 times, the ordered data is divided into three layers, the order of about 99 percent of the data values in the system is maintained, meanwhile, except for the marked files, the ordered files are combined once only when flowing downwards, and the combined writing expense is small.
A third step of optimizing the local ordering of the data values
The hierarchical merging provides guarantee for the overall order of ordered data, however, if data of certain key intervals in the system is frequently written or updated, the number of overlapped ordered files in the intervals is significantly more than that of other parts of the system, the range query of the intervals is degraded into random reading, and the performance is affected, so that local optimization is needed, after L SM-Tree completes one bottom layer of compact, the range overlapping condition among the ordered files of the output data index of the compact is checked, if the number of the ordered files overlapped with each other in a certain key interval exceeds a limit value MaxOverlap, the files are added with a merge-needed mark, in the next hierarchical operation related to the interval, the data values in the marked files participate in merging no matter which layer the files belong to, so that the number of the overlapped files in the intervals is reduced to 1.
FIG. 3 is a schematic diagram of maintaining partial ordering of ordered data values, where the upper part of the diagram is an ordered file where the data values in the bottom-layer compact output data of a certain time in this embodiment are located, and key regions of the data in these files are [ a, b ], [ a, e ], [ c, h ], [ f, h ], [ i, j ], [ i, k ], [ m, o ]; the middle part is the overlapping condition of the sorted ranges of the key areas of the files; the lower part marks the files that need to participate in the hierarchical merging. From the sorted result, it can be found that 3 ordered files overlap in the key interval [ c, d ], which exceeds the maxooverlap parameter of this embodiment, so that the ordered files containing the interval (i.e., the ordered files with the key intervals [ a, e ], [ c, h ]) are marked as needing to be merged, and in the next hierarchical operation involving the key intervals, the data values in the files are merged into a new ordered file, so that the number of the overlapped files in the intervals is reduced to 1.
Since the second step ensures that the ordered files in the upper layer are always merged in the layering operation and the consistency frequency of the upper layer is high, the optimization operation on the local ordering is only executed after the consistency of the lower layer, so as to reduce the overhead.
Fourth step differencing disk space reclamation strategy
The key value pair is updated and deleted in the compact process of L SM-Tree, and the key, the small data value and the data value position are all stored in L SM-Tree, so that the key value pair can be directly deleted in compact, for the medium and large data values, the size of the data value and the file where the data value is located are recorded in the data value position information, and differential space recycling scheduling can be executed for different types of files by monitoring the change of the invalid data amount of the data value file before and after compact.
FIG. 4 is a schematic diagram of a method for monitoring invalid data of a data value file, wherein the left side of the diagram shows a compact occurring between the ith layer and the (i + 1) th layer in L SM-Tree, the middle part is distribution of input data and output data of the compact in the data value file, and the right side is update of a metadata table after the compact is finished.
And executing space recovery scheduling according to the change of the invalid data amount of each data value file after the compact. In this embodiment, the invalid data proportion of the ordered file 002 reaches 1, and therefore, it is deleted. The invalid data percentage of the files 003 and 004 exceeds the MaxInvalidRatio (0.3 in the embodiment), and space needs to be recycled. The file 003 is an ordered file, a merging-needed mark is added to the ordered file, the valid data value of the file can be rewritten in the next hierarchical operation related to the key interval of the file, and because the MaxFileSize is set to be small and the compact can only rewrite the valid data, most valid data values in the ordered file can be completely rewritten in one hierarchical operation, so that the invalid data ratio reaches 1 and is deleted; 004 is an out-of-order file, which is added to a space reclamation queue to wait for space reclamation thread processing.
The background space recycling thread firstly sorts unordered files in a queue according to the invalid data proportion, selects a file with the highest invalid data proportion, traverses the file, rewrites valid data into a new file, updates a position index to L SM-Tree, and deletes the file after rewriting all the invalid data proportion, because the unordered files store the whole key value pair, the key can be used for inquiring whether the position of the data value is consistent with that of the file or not, so as to judge whether the data is valid or not, fig. 5 is a space recycling flow diagram of the unordered files, the upper part of the diagram is a space recycling queue, the unordered file f is a file with the highest invalid data proportion in the queue, is being recycled, and the key value pair is traversed to be the key value pair < key, value >, and the file subscript is an address offset, according to the diagram, firstly, the L SM-Tree is read and judges whether the position of the value is at the offset x in the file f, if so as to add the key, value into the new file additionally, sets the update value to the update to the new file, the update space recycle the data, and the write the unordered data into a foreground separation area, and add the data, so as to reduce the recovery overhead of the data, and the recovery of the recovery data.
The differentiated disk space recovery strategy ensures that the scheduled file contains more invalid data, namely more space can be recovered and less valid data can be rewritten on one hand, and avoids frequent reading and writing of L SM-Tree when the data values are recovered by integrating rewriting of medium data values into the compact process on the other hand, and the frequency of reading and writing L SM-Tree is relatively low and the overhead is relatively low when large data value space is recovered, so the space recovery efficiency is also high.
In this embodiment, by using a differentiated key value data storage method based on data value classification, while reducing system write amplification by a key value separation architecture, differentiated classified storage of key value data is realized, data orderliness required for supporting efficient read operation is maintained by using hierarchical merging and local ordering optimization operations with lower overhead, and meanwhile, a differentiated space recycling scheduling method is executed by monitoring the proportion of invalid data in a file, thereby effectively reducing the overhead of a disk space. Compared with other key value storage technologies, the method and the device have the advantages that the high-efficiency read-write performance is comprehensively ensured, and meanwhile, the lower storage cost is kept.
Claims (1)
1. A differentiated key value data storage method based on data value classification is characterized by comprising the following steps:
the first step is as follows: sorted storage of key-value data
The method comprises the following steps of adopting a data value classification method, classifying key value data into three types of large, medium and small according to the size of the data value, and respectively storing the data value in three different areas of an unordered data area, an ordered data area and an index area on a disk by a storage structure with key value separation:
the unordered data area consists of a series of unordered files and stores large data values, and the data values are stored in the files according to a writing sequence;
the ordered data area consists of a series of ordered files, data values are stored, the data values are arranged in the files according to the dictionary order of keys, and the range overlap can exist between different ordered files;
the index area is structurally a log structure merged tree and stores the small key value pairs and the positions of the large and medium data values in the data area;
the data value classification method comprises the steps of taking the position index size of an approximate data value as the maximum data value size SmallThreshold of a small key value pair, setting the minimum data value size L argeThreshold of a large key value pair according to concurrent random reading performance of a storage medium, meeting the requirement that concurrent random reading performance approaches sequential reading when the data value is larger than L argeThreshold, limiting the size of a data value file to MaxFileSize, wherein the value of the MaxFileSize is smaller than the size of a data file in L SM-Tree, and following the following procedures are followed when the key value pair is written:
(1) judging the data type of the key value, directly writing the medium and small key value pairs into a write cache of L SM-Tree, additionally writing the large key value pair into the latest unordered file, and writing the position index < key, value position > into the write cache after successful writing;
(2) when the write cache is written with the full disk, a new ordered file is generated, the middle data value is written in the dictionary order of the key, and the position < key, value position > of the middle data value in the file is recorded; generating a next file each time the ordered file reaches MaxFileSize;
(3) writing L SM-Tree with the small key value pair in the write cache and the position index < key, value position > of other data value;
the second step is that: hierarchical management of ordered files
The method comprises the following steps of logically dividing an ordered file into three layers, wherein the intermediate data values of the last two layers of L SM-Tree are only stored in the ordered files of the 1 st layer and the 2 nd layer respectively, the intermediate data values of the rest layers of L SM-Tree are only stored in the ordered files of the 0 th layer, performing layering operation when the L SM-Tree triggers the compact of the last two layers of output layers, merging, sequencing and rewriting the intermediate data values of the input layers in the compact data into the ordered files of the corresponding output layers, and the specific method comprises the following steps:
(1) recording the number of layers of the ordered file when the ordered file is generated, wherein the ordered file generated from the write cache disk brushing is the 0 th layer;
(2) when L SM-Tree executes compact, if output layer is the last two layers, then for each middle key value pair traversed in sequence in input data, if the layer number of the ordered file with data value is lower than the ordered file layer of output layer index, or the file is marked as needing to be merged, then writing the data value into the newly generated ordered file corresponding to output layer;
(3) compact updates the location of the current data value when saving the output data;
the third step: optimizing local ordering of underlying data values
Checking the range overlapping condition among the ordered files of the output data index of the compact every time after L SM-Tree completes one bottom layer of compact, if the number of the mutually overlapped ordered files in a certain keypad exceeds a limit value MaxOverlap, adding a mark needing to be merged on the files, and in the next compact related to the interval, merging the data values in the files by the layering operation of the ordered files no matter which layer the files belong to;
the fourth step: differentiated disk space reclamation strategies
For data that is invalid by being updated, overwritten, or deleted, the key, small data value, and data value location are deleted at compact; for medium and large data values, scheduling and executing space recycling by monitoring the distribution condition of invalid data values in the file in the compact process; the monitoring steps are as follows:
(1) maintaining a metadata table in the memory to record the type and the invalid data volume of each data value file, namely an ordered file or an unordered file;
(2) in each compact process, for each data value file D of the input data index, recording the data value size Di in which the input data is stored and the data value size Do in which the output data is stored;
(3) when the compact is finished, increasing the invalid data volume of each file D by Di-Do, wherein the invalid data proportion of the files is the invalid data volume divided by the total data volume;
(4) if the hierarchy operation in the compact generates a new ordered file, adding the new ordered file into the metadata table, wherein the invalid data volume is 0;
after the compact is finished, executing space recycling scheduling on the data value file with the invalid data ratio exceeding the MaxInvalidRatio, wherein the method comprises the following steps:
(1) if the invalid data proportion is 1, deleting the file and finishing scheduling;
(2) if the file is an ordered file, marking the file as needing to be merged, rewriting the effective data value in the file in the next hierarchical operation, and enabling the ratio of the ineffective data to be 1 so as to be deleted;
(3) if the file is an unordered file, adding the file into a space recovery queue for waiting processing;
traversing the large key value pair stored in the file, using the key to inquire L SM-Tree whether the position of the data value is consistent with the file so as to judge whether the data is valid, rewriting the valid data into a new unordered file, updating the position to L SM-Tree, and finally deleting the file.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010182605.2A CN111399777B (en) | 2020-03-16 | 2020-03-16 | Differential key value data storage method based on data value classification |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010182605.2A CN111399777B (en) | 2020-03-16 | 2020-03-16 | Differential key value data storage method based on data value classification |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111399777A true CN111399777A (en) | 2020-07-10 |
CN111399777B CN111399777B (en) | 2023-05-16 |
Family
ID=71430941
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010182605.2A Active CN111399777B (en) | 2020-03-16 | 2020-03-16 | Differential key value data storage method based on data value classification |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111399777B (en) |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112131140A (en) * | 2020-09-24 | 2020-12-25 | 北京计算机技术及应用研究所 | SSD-based key value separation storage method supporting efficient storage space management |
CN112364278A (en) * | 2020-11-23 | 2021-02-12 | 浪潮云信息技术股份公司 | Data classification optimization method based on CockroachDB bottom key values |
CN112540731A (en) * | 2020-12-22 | 2021-03-23 | 北京百度网讯科技有限公司 | Data additional writing method, device, equipment, medium and program product |
CN112699092A (en) * | 2021-01-13 | 2021-04-23 | 浪潮云信息技术股份公司 | Method for storing big value data by RocksDB |
CN112817530A (en) * | 2021-01-22 | 2021-05-18 | 万得信息技术股份有限公司 | Method for safely and efficiently reading and writing ordered data in multithreading manner |
CN113282854A (en) * | 2021-06-01 | 2021-08-20 | 平安国际智慧城市科技股份有限公司 | Data request response method and device, electronic equipment and storage medium |
CN114020707A (en) * | 2022-01-06 | 2022-02-08 | 阿里云计算有限公司 | Storage space recovery method, storage medium, and program product |
CN114115734A (en) * | 2021-11-18 | 2022-03-01 | 新华三大数据技术有限公司 | Data deduplication method, device, equipment and storage medium |
CN114398007A (en) * | 2021-12-27 | 2022-04-26 | 南京邮电大学 | LSM-tree-based cache optimization method for reading performance of KV storage system |
CN114896250A (en) * | 2022-05-19 | 2022-08-12 | 中国地质大学(北京) | Key value separated key value storage engine index optimization method and device |
CN115168505A (en) * | 2022-06-21 | 2022-10-11 | 中国人民解放军国防科技大学 | Management system and method for ocean space-time data |
US11513704B1 (en) | 2021-08-16 | 2022-11-29 | International Business Machines Corporation | Selectively evicting data from internal memory during record processing |
US11537582B2 (en) | 2021-04-16 | 2022-12-27 | Samsung Electronics Co., Ltd. | Data access method, a data access control device, and a data access system |
US11675513B2 (en) | 2021-08-16 | 2023-06-13 | International Business Machines Corporation | Selectively shearing data when manipulating data during record processing |
US12033003B2 (en) | 2021-07-27 | 2024-07-09 | International Business Machines Corporation | Dynamic workload distribution for data processing |
Citations (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102214215A (en) * | 2011-06-07 | 2011-10-12 | 陆嘉恒 | Rapid reverse nearest neighbour search method based on text information |
US20160063021A1 (en) * | 2014-08-28 | 2016-03-03 | Futurewei Technologies, Inc. | Metadata Index Search in a File System |
US20160321294A1 (en) * | 2015-04-30 | 2016-11-03 | Vmware, Inc. | Distributed, Scalable Key-Value Store |
CN106708427A (en) * | 2016-11-17 | 2017-05-24 | 华中科技大学 | Storage method suitable for key value pair data |
US20180225315A1 (en) * | 2017-02-09 | 2018-08-09 | Micron Technology, Inc. | Kvs tree |
US20180349095A1 (en) * | 2017-06-06 | 2018-12-06 | ScaleFlux, Inc. | Log-structured merge tree based data storage architecture |
US20190034427A1 (en) * | 2017-12-28 | 2019-01-31 | Intel Corporation | Data management system employing a hash-based and tree-based key-value data structure |
US20190065621A1 (en) * | 2017-08-31 | 2019-02-28 | David Boles | Kvs tree database |
US20190095457A1 (en) * | 2017-09-27 | 2019-03-28 | Vmware, Inc. | Write-optimized nested trees |
CN110083601A (en) * | 2019-04-04 | 2019-08-02 | 中国科学院计算技术研究所 | Index tree constructing method and system towards key assignments storage system |
CN110188108A (en) * | 2019-06-10 | 2019-08-30 | 北京平凯星辰科技发展有限公司 | Date storage method, device, system, computer equipment and storage medium |
CN110347336A (en) * | 2019-06-10 | 2019-10-18 | 华中科技大学 | A kind of key assignments storage system based on NVM with SSD mixing storage organization |
CN110389942A (en) * | 2019-06-21 | 2019-10-29 | 华中科技大学 | A kind of the key assignments separate-storage method and system of no garbage reclamation |
CN110602517A (en) * | 2019-09-17 | 2019-12-20 | 腾讯科技(深圳)有限公司 | Live broadcast method, device and system based on virtual environment |
CN110825748A (en) * | 2019-11-05 | 2020-02-21 | 北京平凯星辰科技发展有限公司 | High-performance and easily-expandable key value storage method utilizing differential index mechanism |
-
2020
- 2020-03-16 CN CN202010182605.2A patent/CN111399777B/en active Active
Patent Citations (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102214215A (en) * | 2011-06-07 | 2011-10-12 | 陆嘉恒 | Rapid reverse nearest neighbour search method based on text information |
US20160063021A1 (en) * | 2014-08-28 | 2016-03-03 | Futurewei Technologies, Inc. | Metadata Index Search in a File System |
US20160321294A1 (en) * | 2015-04-30 | 2016-11-03 | Vmware, Inc. | Distributed, Scalable Key-Value Store |
CN106708427A (en) * | 2016-11-17 | 2017-05-24 | 华中科技大学 | Storage method suitable for key value pair data |
US20180225315A1 (en) * | 2017-02-09 | 2018-08-09 | Micron Technology, Inc. | Kvs tree |
US20180349095A1 (en) * | 2017-06-06 | 2018-12-06 | ScaleFlux, Inc. | Log-structured merge tree based data storage architecture |
US20190065621A1 (en) * | 2017-08-31 | 2019-02-28 | David Boles | Kvs tree database |
US20190095457A1 (en) * | 2017-09-27 | 2019-03-28 | Vmware, Inc. | Write-optimized nested trees |
US20190034427A1 (en) * | 2017-12-28 | 2019-01-31 | Intel Corporation | Data management system employing a hash-based and tree-based key-value data structure |
CN110083601A (en) * | 2019-04-04 | 2019-08-02 | 中国科学院计算技术研究所 | Index tree constructing method and system towards key assignments storage system |
CN110188108A (en) * | 2019-06-10 | 2019-08-30 | 北京平凯星辰科技发展有限公司 | Date storage method, device, system, computer equipment and storage medium |
CN110347336A (en) * | 2019-06-10 | 2019-10-18 | 华中科技大学 | A kind of key assignments storage system based on NVM with SSD mixing storage organization |
CN110389942A (en) * | 2019-06-21 | 2019-10-29 | 华中科技大学 | A kind of the key assignments separate-storage method and system of no garbage reclamation |
CN110602517A (en) * | 2019-09-17 | 2019-12-20 | 腾讯科技(深圳)有限公司 | Live broadcast method, device and system based on virtual environment |
CN110825748A (en) * | 2019-11-05 | 2020-02-21 | 北京平凯星辰科技发展有限公司 | High-performance and easily-expandable key value storage method utilizing differential index mechanism |
Non-Patent Citations (5)
Title |
---|
FEI MEI: ""LSM-Tree Managed Storage for Large-Scale"", 《IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS》 * |
JUNG-SANG AHN: ""ForestDB_A_Fast_Key-Value_Storage_System_for_Variable-Length_String_Keys"", 《IEEE TRANSACTIONS ON COMPUTERS》 * |
张伟韬: "基于LSM-tree的KV数据库性能优化", 《中国优秀博士学位论文全文数据库 信息科技辑》 * |
林立亚: ""无垃圾回收的键值分离存储系统优化设计与实现"", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
赵恒: ""面向键值数据库应用的混合存储系统设计与实现"", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112131140A (en) * | 2020-09-24 | 2020-12-25 | 北京计算机技术及应用研究所 | SSD-based key value separation storage method supporting efficient storage space management |
CN112364278A (en) * | 2020-11-23 | 2021-02-12 | 浪潮云信息技术股份公司 | Data classification optimization method based on CockroachDB bottom key values |
CN112540731A (en) * | 2020-12-22 | 2021-03-23 | 北京百度网讯科技有限公司 | Data additional writing method, device, equipment, medium and program product |
CN112540731B (en) * | 2020-12-22 | 2023-08-11 | 北京百度网讯科技有限公司 | Data append writing method, device, equipment, medium and program product |
CN112699092A (en) * | 2021-01-13 | 2021-04-23 | 浪潮云信息技术股份公司 | Method for storing big value data by RocksDB |
CN112699092B (en) * | 2021-01-13 | 2023-02-03 | 浪潮云信息技术股份公司 | Method for storing big value data by RocksDB |
CN112817530A (en) * | 2021-01-22 | 2021-05-18 | 万得信息技术股份有限公司 | Method for safely and efficiently reading and writing ordered data in multithreading manner |
CN112817530B (en) * | 2021-01-22 | 2024-06-07 | 万得信息技术股份有限公司 | Method for reading and writing ordered data in full high efficiency through multiple lines Cheng An |
US11537582B2 (en) | 2021-04-16 | 2022-12-27 | Samsung Electronics Co., Ltd. | Data access method, a data access control device, and a data access system |
CN113282854A (en) * | 2021-06-01 | 2021-08-20 | 平安国际智慧城市科技股份有限公司 | Data request response method and device, electronic equipment and storage medium |
US12033003B2 (en) | 2021-07-27 | 2024-07-09 | International Business Machines Corporation | Dynamic workload distribution for data processing |
US11513704B1 (en) | 2021-08-16 | 2022-11-29 | International Business Machines Corporation | Selectively evicting data from internal memory during record processing |
US11675513B2 (en) | 2021-08-16 | 2023-06-13 | International Business Machines Corporation | Selectively shearing data when manipulating data during record processing |
CN114115734A (en) * | 2021-11-18 | 2022-03-01 | 新华三大数据技术有限公司 | Data deduplication method, device, equipment and storage medium |
CN114398007A (en) * | 2021-12-27 | 2022-04-26 | 南京邮电大学 | LSM-tree-based cache optimization method for reading performance of KV storage system |
CN114398007B (en) * | 2021-12-27 | 2023-09-12 | 南京邮电大学 | LSM-tree-based caching optimization method for KV storage system read performance |
CN114020707B (en) * | 2022-01-06 | 2022-06-14 | 阿里云计算有限公司 | Storage space recovery method, storage medium, and program product |
CN114020707A (en) * | 2022-01-06 | 2022-02-08 | 阿里云计算有限公司 | Storage space recovery method, storage medium, and program product |
CN114896250A (en) * | 2022-05-19 | 2022-08-12 | 中国地质大学(北京) | Key value separated key value storage engine index optimization method and device |
CN114896250B (en) * | 2022-05-19 | 2023-02-03 | 中国地质大学(北京) | Key value separated key value storage engine index optimization method and device |
CN115168505A (en) * | 2022-06-21 | 2022-10-11 | 中国人民解放军国防科技大学 | Management system and method for ocean space-time data |
Also Published As
Publication number | Publication date |
---|---|
CN111399777B (en) | 2023-05-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111399777A (en) | Differentiated key value data storage method based on data value classification | |
CN105912687B (en) | Magnanimity distributed data base storage unit | |
CN102541757B (en) | Write cache method, cache synchronization method and device | |
KR102564170B1 (en) | Method and device for storing data object, and computer readable storage medium having a computer program using the same | |
CN112395212B (en) | Method and system for reducing garbage recovery and write amplification of key value separation storage system | |
US8560500B2 (en) | Method and system for removing rows from directory tables | |
US7418544B2 (en) | Method and system for log structured relational database objects | |
CN104484471B (en) | A kind of implementation method of high-performance data storage engines | |
CN107665219B (en) | Log management method and device | |
CN111475508A (en) | Efficient indexing method for optimizing leaf node merging operation | |
CN105787037A (en) | Repeated data deleting method and device | |
CN114996275B (en) | Key value storage method based on multi-tree conversion mechanism | |
CN116257523A (en) | Column type storage indexing method and device based on nonvolatile memory | |
CN113867627B (en) | Storage system performance optimization method and system | |
CN114416646A (en) | Data processing method and device of hierarchical storage system | |
CN105512325A (en) | Multi-version data index renewing, deleting and establishing method and device | |
CN114969069B (en) | Heat perception local updating method applied to key value storage system | |
CN101063976B (en) | Method and equipment for fast deletion of physically clustered data | |
CN114691041B (en) | Key value storage system and garbage recycling method | |
US20180011897A1 (en) | Data processing method having structure of cache index specified to transaction in mobile environment dbms | |
CN116204130A (en) | Key value storage system and management method thereof | |
CN110515897B (en) | Method and system for optimizing reading performance of LSM storage system | |
CN113253932A (en) | Read-write control method and system for distributed storage system | |
CN114625713A (en) | Metadata management method and device in storage system and storage system | |
CN110262755A (en) | A kind of file memory method of embedded system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20210115 Address after: Room 207, 2nd floor, C-1 building, Dongsheng Science Park, Zhongguancun, 66 xixiaokou Road, Haidian District, Beijing 100080 Applicant after: Pingkai star (Beijing) Technology Co.,Ltd. Address before: 100080 2nd floor, C-1 building, Dongsheng Science Park, 66 xixiaokou Road, Haidian District, Beijing Applicant before: Beijing Pingkai Star Technology Development Co.,Ltd. |
|
GR01 | Patent grant | ||
GR01 | Patent grant |