CN111399777A

CN111399777A - Differentiated key value data storage method based on data value classification

Info

Publication number: CN111399777A
Application number: CN202010182605.2A
Authority: CN
Inventors: 吴加禹; 崔秋; 唐刘; 吴毅
Original assignee: Beijing Pingkai Star Technology Development Co ltd
Current assignee: Pingkai Star Beijing Technology Co ltd
Priority date: 2020-03-16
Filing date: 2020-03-16
Publication date: 2020-07-10
Anticipated expiration: 2040-03-16
Also published as: CN111399777B

Abstract

The invention discloses a differentiated key value data storage method based on data value classification, which is characterized by comprising the steps of classified storage of key value data, hierarchical management of ordered files, optimization of local orderliness of bottom data values and differentiated disk space recovery strategies, wherein the data values with different sizes are classified and stored in different orderliness by adopting a key value separation structure, so that the system can support efficient point query and range query on the data while the overhead caused by data combination is fully reduced; through hierarchical management and local ordering optimization of the ordered files, the sequential reading of most ordered files in the system in the range query is guaranteed with the lowest possible overhead; the file with the densest invalid data in the system can be quickly identified and recovered by using the differentiated disk space recovery strategy, and frequent reading and writing of the medium and small data values to the data index in the space recovery process are avoided, so that the storage space can be efficiently recovered and utilized. Therefore, the key value storage system designed by the method of the invention can have excellent performance indexes such as reading, writing, space overhead and the like.

Description

Differentiated key value data storage method based on data value classification

Technical Field

The invention belongs to the technical field of computer storage systems, and particularly relates to a Key Value access method which is used for classifying Key Value data and adopting a differentiated storage method in a Key Value storage system (KV Stores) and realizing high performance and low space overhead.

Background

According to introduction of companies such as Google (Google), Facebook (Facebook) and the like, in order to meet storage requirements of massive unstructured and semi-structured data in new network applications and services, storage systems based on Key-Value models are widely used in recent years, and L evedb of Google, RocksDB of Facebook, Tair of treasure and the like all adopt Key-Value storage models, all of the Key-Value storage systems of the mainstream adopt a log-structured merge Tree (L SM-Tree) as a core storage structure, the Key-Value data is organized into multiple layers on a disk, absolute ordering of the data in the disk is guaranteed through interlayer compression (composition) operations, but frequent composition operations cause a serious write amplification (write amplification) problem, in order to solve the problem, two main technical research directions exist at present, one is that an intra-layer ordering requirement of storing Key-Value pairs (pakv) in L SM-Tree, only an intra-layer ordering operation is required in a write amplification (write amplification) manner, and only one layer of Key-layer operation is required to be used as a storage system, and the read-layer ordering requirement of data storage systems of the write amplification and the write amplification of the data is required to be reduced, and the read-write-level data storage cost of the write amplification system is reduced, and the write amplification of data is required by a write amplification of the write amplification of a write amplification system, and a write amplification of a write amplification system, and a write.

Disclosure of Invention

The invention aims to provide a differentiated key value data storage method based on data value classification for a key value storage system, which guarantees the data reading performance with the maintenance cost as low as possible, realizes efficient space recovery, and overcomes the defects that the prior art cannot simultaneously guarantee high reading and writing performance and low space cost.

The invention discloses a differentiated key value data storage method based on data value classification, which is characterized by comprising the following steps of:

the first step is as follows: sorted storage of key-value data

The method comprises the following steps of adopting a data Value (Value) classification method, dividing key Value data into three types of large, medium and small according to the size of the data Value, and respectively storing the data Value in three different areas of an unordered data area, an ordered data area and an index area on a disk by a storage structure with key Value separation:

the unordered data area consists of a series of unordered files and stores large data values, and the data values are stored in the files according to a writing sequence;

the ordered data area consists of a series of ordered files, data values are stored, the data values are arranged in the files according to the dictionary order of keys (keys), and the range overlap can exist between different ordered files;

the index area is structured as a log structure merged Tree (L SM-Tree), and stores the small key value pairs and the positions of the large and medium data values in the data area;

the data value classification method comprises the steps of taking the position index size of an approximate data value as the maximum data value size SmallThreshold of a small key value pair, setting the minimum data value size L argeThreshold of a large key value pair according to concurrent random reading performance of a storage medium, meeting the requirement that concurrent random reading performance approaches sequential reading when the data value is larger than L argeThreshold, limiting the size of a data value file to MaxFileSize, wherein the value of the MaxFileSize is smaller than the size of a data file in L SM-Tree, and following the following procedures are followed when the key value pair is written:

(1) judging the data type of the key value, directly writing the medium and small key value pairs into a write cache (Memtable) of L SM-Tree, additionally writing the large key value pair into the latest unordered file, and writing the position index < key, value position > into the write cache after successful writing;

(2) when the write cache is written with the full disk, a new ordered file is generated, the middle data value is written in the dictionary order of the key, and the position < key, value position > of the middle data value in the file is recorded; generating a next file each time the ordered file reaches MaxFileSize;

(3) writing L SM-Tree with the small key value pair in the write cache and the position index < key, value position > of other data value;

the second step is that: hierarchical management of ordered files

The method comprises the following steps of logically dividing an ordered file into three layers, wherein the intermediate data values of the last two layers of L SM-Tree are only stored in the ordered files of the 1 st layer and the 2 nd layer respectively, the intermediate data values of the rest layers of L SM-Tree are only stored in the ordered files of the 0 th layer, performing layering operation when the L SM-Tree triggers the compact of the last two layers of output layers, merging, sequencing and rewriting the intermediate data values of the input layers in the compact data into the ordered files of the corresponding output layers, and the specific method comprises the following steps:

(1) recording the number of layers of the ordered file when the ordered file is generated, wherein the ordered file generated from the write cache disk brushing is the 0 th layer;

(2) when L SM-Tree executes compact, if output layer is the last two layers, then for each middle key value pair traversed in sequence in input data, if the layer number of the ordered file with data value is lower than the ordered file layer of output layer index, or the file is marked as needing to be merged, then writing the data value into the newly generated ordered file corresponding to output layer;

(3) compact updates the location of the current data value when saving the output data;

the third step: optimizing local ordering of underlying data values

Checking the range overlapping condition among the ordered files of the output data index of the compact every time after L SM-Tree completes one bottom layer of compact, if the number of the mutually overlapped ordered files in a certain keypad exceeds a limit value MaxOverlap, adding a mark needing to be merged on the files, and in the next compact related to the interval, merging the data values in the files by the layering operation of the ordered files no matter which layer the files belong to;

the fourth step: differentiated disk space reclamation strategies

For data that is invalid by being updated, overwritten, or deleted, the key, small data value, and data value location are deleted at compact; for medium and large data values, scheduling and executing space recycling by monitoring the distribution condition of invalid data values in the file in the compact process; the monitoring steps are as follows:

(1) maintaining a metadata table in the memory to record the type and invalid data volume of each data value file (namely, an ordered file or an unordered file);

(2) in each compact process, for each data value file D of the input data index, recording the data value size Di in which the input data is stored and the data value size Do in which the output data is stored;

(3) when the compact is finished, increasing the invalid data volume of each file D by Di-Do, wherein the invalid data proportion of the files is the invalid data volume divided by the total data volume;

(4) if the hierarchy operation in the compact generates a new ordered file, adding the new ordered file into the metadata table, wherein the invalid data volume is 0;

after the compact is finished, executing space recycling scheduling on the data value file with the invalid data ratio exceeding the MaxInvalidRatio, wherein the method comprises the following steps:

(1) if the invalid data proportion is 1, deleting the file and finishing scheduling;

(2) if the file is an ordered file, marking the file as needing to be merged, rewriting the effective data value in the file in the next hierarchical operation, and enabling the ratio of the ineffective data to be 1 so as to be deleted;

(3) if the file is an unordered file, adding the file into a space recovery queue for waiting processing;

traversing the large key value pair stored in the file, using the key to inquire L SM-Tree whether the position of the data value is consistent with the file so as to judge whether the data is valid, rewriting the valid data into a new unordered file, updating the position to L SM-Tree, and finally deleting the file.

The differentiated key value data storage method based on data value classification is characterized by comprising the steps of classified storage of key value data, hierarchical management of ordered files, optimization of local orderliness of underlying data values and differentiated disk space recovery strategies, wherein the data values with different sizes are classified and stored in different orderliness by adopting a key value separation structure, so that the system can support efficient point query and range query on the data while the overhead caused by data combination is fully reduced; through hierarchical management and local ordering optimization of the ordered files, the sequential reading of most ordered files in the system in the range query is guaranteed with the lowest possible overhead; the file with the densest invalid data in the system can be quickly identified and recovered by using the differentiated disk space recovery strategy, and frequent reading and writing of the medium and small data values to the data index in the space recovery process are avoided, so that the storage space can be efficiently recovered and utilized. Therefore, the key value storage system designed by the method of the invention can have excellent performance indexes such as reading, writing, space overhead and the like.

Drawings

Fig. 1 is a schematic overall architecture diagram of a differentiated key value data storage method based on data value classification according to the present invention.

FIG. 2 is a schematic diagram of the hierarchical operation of an ordered file.

FIG. 3 is a diagram of local order maintenance for ordered files.

Fig. 4 is a schematic diagram of a monitoring method of invalid data.

FIG. 5 is a schematic diagram of an unordered file space reclamation process.

Detailed Description

The following describes the differentiated key-value data storage method based on data value classification in detail with reference to the accompanying drawings.

Example 1:

the embodiment shows the operation flow of the differentiated key value data storage method based on data value classification according to the invention by writing and maintaining key value data. Fig. 1 is a schematic diagram of an overall architecture of a differentiated key value data storage method based on data value classification according to this embodiment, which includes a write cache in a memory, a data value file metadata table, data partitions (an index area, an ordered data area, and an unordered data area) on a disk, and other portions, and indicates a classified storage process of key value data.

In this embodiment, the system uses samsung 860 evosd as the underlying storage medium, and the throughput of the concurrent random reading with 8KB or more reaches three fourths of the sequential reading, so L argeThreshold is set to 8 KB., and other relevant parameters are SmallThreshold to 128B, and therefore the data value between 129B and 8KB to be the medium data value, MaxOverlap is usually set to 10 for better performance, and set to 2 in this embodiment for convenience of description, maxonaldiratio is set to 0.3, and MaxFileSize is set to 8 MB.

The differentiated key value data storage method based on data value classification in the embodiment specifically comprises the following steps:

the first step is as follows: sorted storage of key-value data

The Key Value pairs are divided into three types of big, middle and small according to the size of the data Value (Value), wherein the big and middle Key Value pairs ensure lower data merging expense by adopting a Key Value separation storage method, the data values are respectively stored in an unordered data area and an ordered data area, the position indexes of the Key (Key) and the data values are stored in L SM-Tree in the index area, and the small Key Value pairs are directly stored in L SM-Tree.

In this embodiment, when writing the key value pair < key, value > into the system, the large key value pair split storage (fig. 1 large key value pair split storage ①) is performed, first, it is determined whether the value is greater than L argeThreshold, in this embodiment, the L argeThreshold parameter is 8KB, if the value is greater than 8KB, the key value pair is a large key value pair, the whole key value pair is additionally written into the newest file of the unordered data area, the storage format is [ key size, key, value size, value ], if the file size after writing reaches MaxFileSize (in this embodiment, 8MB), a new unordered file is generated to store the subsequent data, after writing the file succeeds, the location information of the value in the unordered file < key, value location > is written into the write cache, and if the written unordered file is 003, the format of the value location is [003, the address of the value in 003 KB, the size in 003 KB ], and if the value is smaller than 8KB, the key value is directly written into the unordered data area.

Until the write cache is full, the write cache is converted into a locked write cache (Immunable Memtable) and is flashed in the background, and when the write cache is flashed, the key value pairs < key, value > in the write cache are traversed in sequence, and the split storage of the key value pairs is executed (the split storage ② of the key value pairs in FIG. 1). first, whether the value is greater than SmallThreshold or not is judged, in this embodiment, the SmallThreshold parameter is 128B, if the value is not greater than 128B or the key value pair is the location information of a large data value, the < key, value > is written into L SM-Tree, otherwise, the key value pair is a middle key value pair, the value is written into a newly generated 0 th-layer ordered file, note that compared with an unordered file, only the value itself needs to be saved in the ordered file, because the ordered file does not need other information to perform space reclamation (see the fourth step and the fifth step).

On one hand, the classified storage of the key value data reduces the write amplification expense of L SM-Tree by using a key value separation structure, and the position for storing the data value by using L SM-Tree can support the rapid point query operation, on the other hand, the system only needs to maintain the orderliness of the small and medium data values, the expense is small, and the performance of the disk is close to the sequential reading when the large data value is read concurrently and randomly, so the whole system can have higher point query, range query and write performance simultaneously.

Second-step hierarchical management of ordered files

The ordered files are logically divided into three layers, data values of the last two layers of L SM-Tree are respectively only stored in the ordered files of the 1 st layer and the 2 nd layer, data values of the other layers of L SM-Tree are only stored in the ordered files of the 0 th layer, layering operation is executed when the L SM-Tree triggers the output layer to be the last two layers of compact, and data values in the input layer of the compact data are merged, sorted and rewritten to the ordered files of the corresponding output layer.

FIG. 2 illustrates the merging of data values in a hierarchical operation of ordered files, the left side of the diagram being the last two layers of L SM-Tree, where the files are not shown in detail, and the right side being the partial ordered files of the last two layers (i.e., layer 1 and layer 2), where two files in layer 2 are marked as needed to be merged and the shaded portion is data involved in a certain compact, and an unordered data area is omitted in the diagram because compact traverses all the shaded key-value pairs in order regardless of the hierarchical operation, for each key-value pair encountered, if its data value is saved in the ordered file of layer 1 or its ordered file has a needed merge marker, its data value is written into a new ordered file (merge ③ of the ordered file of FIG. 2. after compact ends, these newly generated files are added to the ordered file of layer 2. by this method, the data of layer 1 is reordered and the data values in compact output data are all saved in the ordered file of layer 2. the second layer of ordered file is not shown in a similar manner.

Since the difference of data quantity of the upper layer and the lower layer in the L SM-Tree is about 10 times, the ordered data is divided into three layers, the order of about 99 percent of the data values in the system is maintained, meanwhile, except for the marked files, the ordered files are combined once only when flowing downwards, and the combined writing expense is small.

A third step of optimizing the local ordering of the data values

The hierarchical merging provides guarantee for the overall order of ordered data, however, if data of certain key intervals in the system is frequently written or updated, the number of overlapped ordered files in the intervals is significantly more than that of other parts of the system, the range query of the intervals is degraded into random reading, and the performance is affected, so that local optimization is needed, after L SM-Tree completes one bottom layer of compact, the range overlapping condition among the ordered files of the output data index of the compact is checked, if the number of the ordered files overlapped with each other in a certain key interval exceeds a limit value MaxOverlap, the files are added with a merge-needed mark, in the next hierarchical operation related to the interval, the data values in the marked files participate in merging no matter which layer the files belong to, so that the number of the overlapped files in the intervals is reduced to 1.

FIG. 3 is a schematic diagram of maintaining partial ordering of ordered data values, where the upper part of the diagram is an ordered file where the data values in the bottom-layer compact output data of a certain time in this embodiment are located, and key regions of the data in these files are [ a, b ], [ a, e ], [ c, h ], [ f, h ], [ i, j ], [ i, k ], [ m, o ]; the middle part is the overlapping condition of the sorted ranges of the key areas of the files; the lower part marks the files that need to participate in the hierarchical merging. From the sorted result, it can be found that 3 ordered files overlap in the key interval [ c, d ], which exceeds the maxooverlap parameter of this embodiment, so that the ordered files containing the interval (i.e., the ordered files with the key intervals [ a, e ], [ c, h ]) are marked as needing to be merged, and in the next hierarchical operation involving the key intervals, the data values in the files are merged into a new ordered file, so that the number of the overlapped files in the intervals is reduced to 1.

Since the second step ensures that the ordered files in the upper layer are always merged in the layering operation and the consistency frequency of the upper layer is high, the optimization operation on the local ordering is only executed after the consistency of the lower layer, so as to reduce the overhead.

Fourth step differencing disk space reclamation strategy

The key value pair is updated and deleted in the compact process of L SM-Tree, and the key, the small data value and the data value position are all stored in L SM-Tree, so that the key value pair can be directly deleted in compact, for the medium and large data values, the size of the data value and the file where the data value is located are recorded in the data value position information, and differential space recycling scheduling can be executed for different types of files by monitoring the change of the invalid data amount of the data value file before and after compact.

FIG. 4 is a schematic diagram of a method for monitoring invalid data of a data value file, wherein the left side of the diagram shows a compact occurring between the ith layer and the (i + 1) th layer in L SM-Tree, the middle part is distribution of input data and output data of the compact in the data value file, and the right side is update of a metadata table after the compact is finished.

And executing space recovery scheduling according to the change of the invalid data amount of each data value file after the compact. In this embodiment, the invalid data proportion of the ordered file 002 reaches 1, and therefore, it is deleted. The invalid data percentage of the

files

003 and 004 exceeds the MaxInvalidRatio (0.3 in the embodiment), and space needs to be recycled. The file 003 is an ordered file, a merging-needed mark is added to the ordered file, the valid data value of the file can be rewritten in the next hierarchical operation related to the key interval of the file, and because the MaxFileSize is set to be small and the compact can only rewrite the valid data, most valid data values in the ordered file can be completely rewritten in one hierarchical operation, so that the invalid data ratio reaches 1 and is deleted; 004 is an out-of-order file, which is added to a space reclamation queue to wait for space reclamation thread processing.

The background space recycling thread firstly sorts unordered files in a queue according to the invalid data proportion, selects a file with the highest invalid data proportion, traverses the file, rewrites valid data into a new file, updates a position index to L SM-Tree, and deletes the file after rewriting all the invalid data proportion, because the unordered files store the whole key value pair, the key can be used for inquiring whether the position of the data value is consistent with that of the file or not, so as to judge whether the data is valid or not, fig. 5 is a space recycling flow diagram of the unordered files, the upper part of the diagram is a space recycling queue, the unordered file f is a file with the highest invalid data proportion in the queue, is being recycled, and the key value pair is traversed to be the key value pair < key, value >, and the file subscript is an address offset, according to the diagram, firstly, the L SM-Tree is read and judges whether the position of the value is at the offset x in the file f, if so as to add the key, value into the new file additionally, sets the update value to the update to the new file, the update space recycle the data, and the write the unordered data into a foreground separation area, and add the data, so as to reduce the recovery overhead of the data, and the recovery of the recovery data.

The differentiated disk space recovery strategy ensures that the scheduled file contains more invalid data, namely more space can be recovered and less valid data can be rewritten on one hand, and avoids frequent reading and writing of L SM-Tree when the data values are recovered by integrating rewriting of medium data values into the compact process on the other hand, and the frequency of reading and writing L SM-Tree is relatively low and the overhead is relatively low when large data value space is recovered, so the space recovery efficiency is also high.

In this embodiment, by using a differentiated key value data storage method based on data value classification, while reducing system write amplification by a key value separation architecture, differentiated classified storage of key value data is realized, data orderliness required for supporting efficient read operation is maintained by using hierarchical merging and local ordering optimization operations with lower overhead, and meanwhile, a differentiated space recycling scheduling method is executed by monitoring the proportion of invalid data in a file, thereby effectively reducing the overhead of a disk space. Compared with other key value storage technologies, the method and the device have the advantages that the high-efficiency read-write performance is comprehensively ensured, and meanwhile, the lower storage cost is kept.

Claims

1. A differentiated key value data storage method based on data value classification is characterized by comprising the following steps:

the first step is as follows: sorted storage of key-value data

The method comprises the following steps of adopting a data value classification method, classifying key value data into three types of large, medium and small according to the size of the data value, and respectively storing the data value in three different areas of an unordered data area, an ordered data area and an index area on a disk by a storage structure with key value separation:

the ordered data area consists of a series of ordered files, data values are stored, the data values are arranged in the files according to the dictionary order of keys, and the range overlap can exist between different ordered files;

the index area is structurally a log structure merged tree and stores the small key value pairs and the positions of the large and medium data values in the data area;

(1) judging the data type of the key value, directly writing the medium and small key value pairs into a write cache of L SM-Tree, additionally writing the large key value pair into the latest unordered file, and writing the position index < key, value position > into the write cache after successful writing;

the second step is that: hierarchical management of ordered files

the third step: optimizing local ordering of underlying data values

the fourth step: differentiated disk space reclamation strategies

(1) maintaining a metadata table in the memory to record the type and the invalid data volume of each data value file, namely an ordered file or an unordered file;