Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the following exemplary embodiments do not represent all implementations consistent with one or more embodiments of the present specification. Rather, they are merely examples of apparatus and methods consistent with certain aspects of one or more embodiments of the specification, as detailed in the claims which follow.
It should be noted that: in other embodiments, the steps of the corresponding methods are not necessarily performed in the order shown and described herein. In some other embodiments, the method may include more or fewer steps than those described herein. Moreover, a single step described in this specification may be broken down into multiple steps for description in other embodiments; multiple steps described in this specification may be combined into a single step in other embodiments.
Blockchains are generally divided into three types: public chain (Public Blockchain), Private chain (Private Blockchain) and alliance chain (Consortium Blockchain). Furthermore, there may be a combination of the above types, such as private chain + federation chain, federation chain + public chain, and so on.
Among them, the most decentralized is the public chain. Participants joining the public chain (also referred to as nodes in the blockchain) can read the data records on the chain, participate in transactions, compete for accounting rights for new blocks, and so on. Moreover, each node can freely join or leave the network and perform related operations.
Private chains are the opposite, with the network's write rights controlled by an organization or organization and the data read rights specified by the organization. Briefly, a private chain may be a weakly centralized system with strict restrictions on nodes and a small number of nodes. This type of blockchain is more suitable for use within a particular establishment.
A federation chain is a block chain between a public chain and a private chain, and "partial decentralization" can be achieved. Each node in a federation chain typically has a physical organization or organization corresponding to it; the nodes are authorized to join the network and form a benefit-related alliance, and block chain operation is maintained together.
Based on the basic characteristics of a blockchain, a blockchain is usually composed of several blocks. The time stamps corresponding to the creation time of the block are recorded in the blocks respectively, and all the blocks form a time-ordered data chain according to the time stamps recorded in the blocks strictly.
The data generated outside the chain can be constructed into a standard transaction (transaction) format supported by the blockchain, then the data is issued to the blockchain, the node devices in the blockchain perform consensus on the transaction, and after the consensus is achieved, the node devices serving as accounting nodes in the blockchain package the transaction into blocks, and the persistent evidence is stored in the blockchain.
In the field of blockchain, an important concept is Account (Account); in practical applications, the accounts can be generally divided into two categories, namely external accounts and contract accounts; the external account is an account directly controlled by the user and is also called as a user account; and the contract account is created by the user through an external account, the account containing the contract code (i.e. the smart contract).
For accounts in a blockchain, the account status of the account is usually maintained through a structure. When a transaction in a block is executed, the status of the account associated with the transaction in the block chain is also typically changed.
In one example, the structure of an account typically includes fields such as Balance, Nonce, Code, and Storage. Wherein:
a Balance field for maintaining the current account Balance of the account;
a Nonce field for maintaining a number of transactions for the account; the counter is used for guaranteeing that each transaction can be processed only once, and replay attack is effectively avoided;
a Code field for maintaining a contract Code for the account; in practical applications, only the hash value of the contract Code is typically maintained in the Code field; thus, the Code field is also commonly referred to as the Codhash field.
A Storage field for maintaining the Storage content of the account; for a contract account, an independent persistent storage space is generally allocated to store contract data stored in the storage space corresponding to the contract account; this separate storage space is often referred to as the account storage of the contract account. The storage content of the contract account is usually stored in the independent storage space in a data structure constructed as an mpt (media Patricia trie) tree in the form of key-value key value pairs. An MPT tree is a logical tree structure in the field of blockchains for storing and maintaining blockchain data, and typically includes root nodes, intermediate nodes, and leaf nodes in the tree structure.
In which, the Storage content based on the contract account is constructed into an MPT tree, which is also commonly referred to as a Storage tree. Whereas the Storage field typically maintains only the hash value of the root node of the Storage tree; therefore, the Storage field is also commonly referred to as the Storage Root hash field. Wherein, for the external account, the field values of the Code field and the Storage field shown above are both null values.
For most blockchain models, a Merkle tree is usually used; or a logical tree structure based on Merkle tree varieties of the Merkle tree data structure to store and maintain data. For example, the MPT tree is a Merkle tree variant that merges the tree structures of Trie dictionary trees for storing and maintaining blockchain data.
The following description will be given taking the example of using an MPT tree to store block chain data;
in one example, blockchain data that needs to be stored and maintained in the blockchain, typically includes account status (state) data, transaction data, and receipt data; therefore, in practical applications, the above account status data, transaction data and receipt data may be organized into three MPT trees, such as an MPT status tree, an MPT transaction tree and an MPT receipt tree, in the form of key-value key value pairs, and stored and maintained respectively.
In addition to the three MPT trees, the contract data stored in the Storage space corresponding to the contract account is usually constructed as an MTP Storage tree (hereinafter, referred to as a Storage tree). The hash value of the root node of the Storage tree is added to the Storage field in the struct of the contract account corresponding to the Storage tree.
The MPT state tree is organized by account state data of all accounts (including external accounts and contract accounts) in the block chain in the form of key-value key value pairs; the MPT transaction tree is organized by transaction (transaction) data in a block chain in a key-value key value pair form; the MPT receipt tree is an MPT tree which is organized in a key-value key value pair mode, wherein a transaction (receipt) receipt corresponding to each transaction is generated after the transactions in the block are executed.
The hash values of the root nodes of the MPT state tree, the MPT transaction tree, and the MPT receipt tree shown above are eventually added to the block header of the corresponding block.
The MPT transaction tree and the MPT receipt tree correspond to the blocks, namely each block has the MPT transaction tree and the MPT receipt tree. The MPT state tree is a global MPT tree, which does not correspond to a specific tile, but covers account state data of all accounts in the tile chain. Each time a block chain generates a latest block, the account status of the accounts (which may be external accounts or contract accounts) related to the executed transaction in the block chain is usually changed after the transaction in the latest block is executed.
For example, when a "transfer transaction" is completed in a block, the balances of the transferring party account and the transferring party account associated with the "transfer transaction" (i.e., the field values of the Balance fields of these accounts) are usually changed. After the transaction in the latest block generated by the blockchain is completed, the node device needs to construct an MPT state tree according to the current account state data of all accounts in the blockchain because the account state in the current blockchain changes, so as to maintain the latest state of all accounts in the blockchain.
That is, each time a latest block is generated in the block chain and the transaction in the latest block is completed, which results in a change of the account status of some accounts in the block chain, the node device needs to reconstruct an MPT status tree based on the latest account status data of all accounts in the block chain. In other words, each block in the block chain has a corresponding MPT state tree; the MPT status tree maintains the latest account status of all accounts in the blockchain after the transaction in the block is completed.
Referring to fig. 1, fig. 1 is a schematic diagram illustrating an example of organizing account status data of each blockchain account in a blockchain into an MPT status tree in the form of key-value key value pairs according to this specification.
The MPT tree is a relatively traditional modified Merkle tree variety, and combines the advantages of two tree structures, namely a Merkle tree and a Trie dictionary tree (also called as a prefix tree).
Three types of nodes are typically included in the MPT tree, namely leaf nodes (leaf nodes), extension nodes (extension nodes), and branch nodes (branch nodes). Wherein, the root node of the MPT tree may be an extended node in general; the intermediate nodes of the MPT tree may typically be branch nodes or other extended nodes.
The extension node and the branch node may be collectively referred to as a character node, and are used for storing a character prefix portion of a character string corresponding to a key (i.e., an account address) of the account status data; for the MPT tree, the above-mentioned character prefix part usually refers to a shared character prefix; the shared character prefix refers to a prefix formed by one or more identical characters possessed by keys (namely, block chain account addresses) of all account state data. The leaf node is used for storing the Value (i.e. specific account status data) and the suffix part of the character string corresponding to the key of the block chain data.
And the extension node is used for storing one or more characters (namely, shared nibbles shown in fig. 1) in the shared character prefix of the account address and a hash value (namely, Next node shown in fig. 1) of a node at the Next layer linked with the extension node.
The branch node comprises 17 slot positions, the first 16 slot positions correspond to 16 possible hexadecimal characters in the key, one character corresponds to one nibble, each slot position in the first 16 slot positions respectively represents one character in a shared character prefix of an account address, and the slot positions are used for filling a hash value of a node at the next layer linked with the branch node. The last slot is a value slot, typically null.
A leaf node for storing a character suffix (i.e., key-end shown in fig. 1) of the account address, and a value of the account status data (i.e., the structure of the account described above); the character suffix of the account address and the shared character prefix of the account address jointly form a complete account address; the character suffix refers to a suffix composed of the last one or more characters except the shared character prefix of the account address.
Assume that account status data that needs to be organized into an MTP status tree is shown in table 1 below:
TABLE 1
It should be noted that, in table 1, the block chain accounts corresponding to the account addresses in the first three rows are external accounts, and the Codehash and Storage root fields are null values. The block chain account corresponding to the account address in the 4 th row is a contract account, and the Codehash field maintains the hash value of the contract code corresponding to the contract account; the Storage root field maintains a hash value of the root node of the Storage tree of which the Storage contents of the contract account constitute.
The MPT state tree is finally organized according to the account state data in the table 1, as shown in FIG. 1; the MPT state tree is composed of 4 leaf nodes, 2 branch nodes, and 2 extension nodes (one of which serves as a root node).
In fig. 1, the prefix field is a prefix field that the extension node and the leaf node have in common. Different field values of the prefix field may be used to indicate different node types.
For example, the value of the prefix field is 0, which indicates that an extension node includes an even number of nibbles; as previously mentioned, a nibble represents a nibble, consisting of a 4-bit binary, and one nibble may correspond to one character that makes up an account address. The value of the prefix field is 1, and the prefix field represents an expansion node containing odd number of nibbles(s); the value of the prefix field is 2, which represents a leaf node containing an even number of nibbles; the value of the prefix field is 3, which indicates that the leaf node contains an odd number of nibbles(s).
And the branch node does not have the prefix field because the branch node is a character node of a parallel single neighbor.
A Shared neighbor field in the extension node, corresponding to a key value of a key-value pair contained in the extension node, represents a common character prefix between account addresses; for example, all account addresses in the table above have a common character prefix a 7. The Next Node field is filled with the hash value (hash pointer) of the Next Node.
The field of the 16-system characters 0-f in the branch node corresponds to the key value of the key value pair contained in the branch node; if the branch node is an intermediate node of the account address on the search path on the MPT tree, the Value field of the branch node may be null. And the 0-f field is used for filling the hash value of the next layer of nodes.
The Key-end in a leaf node, corresponding to the Key value of the Key-value pair contained in that leaf node, represents the last few characters of the account address (the character suffix of the account address). The key values of the nodes on the search path from the root node to the leaf nodes form a complete account address. Filling account state data corresponding to the account address in a Value field of the leaf node; for example, a structure composed of fields such as Balance, Nonce, Code, and storage may be encoded and filled in the Value field of the leaf node.
Further, the node on the MPT state tree shown in fig. 1 is finally stored in the database in the form of Key-Value Key Value pair;
when a node on the MPT state tree is stored in the database, a key in a key value pair of the node on the MPT state tree can be a hash value of data content contained in the node; value in the key Value pair of the node on the MPT state tree is the data content contained in the node.
When a node in the MPT state tree is stored in the database, a hash Value of data content contained in the node can be calculated (i.e., the whole node is subjected to hash calculation), the calculated hash Value is used as a Key, the data content contained in the node is used as a Value, and a Key-Value Key Value pair is generated; and then storing the generated Key-Value Key Value pair into a database.
The node on the MPT state tree is stored in a Key-value Key value pair mode; the Key may be a hash Value of data content contained in the node, and the Value may be data content contained in the node; therefore, when a node on the MPT state tree needs to be queried, content addressing can be performed as a key based on the hash value of the data content contained in the node.
Referring to fig. 2, fig. 2 is a schematic diagram illustrating organization of contract data stored in a storage space corresponding to a contract account into an MPT storage tree according to this specification.
With continued reference to table 1, the account with the account address "a 77d 397" shown in table 1 is a contract account, so that the contract data stored in the storage space corresponding to the contract account is organized into a storage tree; the hash value S1 of the root node of the storage tree is added to the storage root field in the leaf node corresponding to the contract account in the MTP state tree shown in fig. 1.
Assume that the contract data stored in the storage space of the contract account is as shown in table 2 below:
TABLE 2
It should be noted that, the contract data stored in the storage space of the contract account may be generally in the form of a state variable; when storing, the state variable names can be organized into a storage tree as shown in fig. 2 in the form of key-value key value pairs for storage.
For example, in one example, a hash value of the account address of the contract account and a storage location of the state variable in the account storage of the contract account may be used as a key, and a value of a variable corresponding to the state variable may be used as a value.
The basic structure of the storage tree shown in fig. 2 is similar to the MTP state tree shown in fig. 1, and is not described in detail in this specification. As can be seen from the above description of fig. 1 and fig. 2, based on the tree structure design of the MPT tree, the branch node may store one of the characters in the shared character prefixes of all account addresses; and the extension node may store one or more characters in the shared character prefix of all account addresses.
In practical application, the character length of the shared character prefix of the key of all data stored on the MPT tree is not fixed generally; moreover, when new data is written into the MPT tree, the character length of the shared character prefix may change accordingly; therefore, the expansion nodes on the MPT tree can be split to generate new branch nodes; that is, the splitting condition of the nodes in the MPT tree is that the character length of the shared character prefix is changed;
for example, taking the MPT state tree shown in fig. 1 as an example, assuming that account state data of an account address with first two digits of "a 8" of an account address is added to the MPT state tree, the Shared character prefix stored in the "Shared table" field of the extension node shown in fig. 1 as the root node is changed from "a 7" to "a"; according to the splitting condition of the nodes of the MPT state tree, the extension node serving as the root node is split into a stored extension node with a shared character prefix of 'a'; and one branch node with the character "8" occupied.
Once the expansion nodes on the MPT tree are split, the number of node layers of the MPT tree may also change, which results in that the number of node layers of the MPT tree is not stable enough. The character length of the shared character prefix of the key of all the data stored in the MPT tree can be changed frequently along with the writing of new data in the MPT tree; therefore, according to the node splitting manner shown above, frequent splitting of nodes is caused, and data storage efficiency when new data is written into the MPT tree is further affected.
In view of the above, the present specification proposes a new logical tree structure design scheme.
When the logic tree structure is realized, the logic tree structure still can comprise a root node, a middle node and a leaf node; the root node and the intermediate node are used for storing characters in keys of the block chain data; the leaf node is used to store the value of the blockchain data.
Different from the MPT tree, the root node and the middle node comprise a plurality of positions for storing characters in keys of the blockchain data, and each position comprises a plurality of slot positions for storing the characters in the keys of the blockchain data; the slot position is used for storing a hash value of a next layer node linked with the node;
when the node device in the block chain stores the block chain data, the key-value key value pair of the block chain data to be stored can be obtained, and then the block chain data to be stored is converted into a root node, a middle node and a leaf node on the logical tree structure; then, encoding processing may be performed on the root node, the intermediate node, and the leaf node, and the key-value key value pairs of the root node, the intermediate node, and the leaf node after encoding processing are stored in a database; in the key-value key value pairs of the leaf node, the intermediate node and the root node, value is the storage content of the node, and key is the hash value of the storage content of the node; the coding processing comprises bitmap coding processing aiming at the root node and the intermediate node; the root node and the middle node after the bitmap coding processing comprise bitmap coding information; and the bitmap coding information indicates whether the slots in each position of the root node and the middle node are filled with hash values or not.
In the above technical solution, on one hand, each root node and each intermediate node in the logical tree structure includes a plurality of positions respectively representing different characters; each position further comprises a plurality of slot positions which respectively represent different characters; therefore, by means of the design, each root node and each intermediate node have larger data storage capacity and data carrying capacity, so that when the root nodes and the intermediate nodes in the logical tree structure are written into the database for storage; or when accessing the root node and the intermediate node in the logical tree structure stored in the database, the data storage capacity of the root node and the intermediate node can be more adaptive to the capacity of single IO read-write of the storage medium bearing the database, so that the IO read-write capability of the storage medium bearing the database can be fully utilized, and the data read-write efficiency is improved; moreover, the improvement of the data storage capacity and the data carrying capacity of the nodes on the logical tree structure will also lead to the improvement of the data storage capacity and the data carrying capacity of the whole logical tree structure, so that more block chain data can be stored on the logical tree structure;
in the second aspect, each root node and each intermediate node in the logical tree structure adopt a unified data structure; for the root node and the intermediate node in the logical tree structure, the character length of the character prefix of the key of the block chain data actually stored in the root node and the intermediate node is kept fixed; therefore, through the design, the frequent splitting of the nodes caused by the fact that the actually stored character lengths of the root node and the middle node are not fixed can be avoided, and the number of layers of the root node and the middle node contained in the logic tree structure can be ensured to be in a relatively stable state all the time;
in the third aspect, bitmap coding is executed for the root node and the middle node in the tree structure, and bitmap coding information which can indicate whether the hash value is filled in each slot position in each position of the root node and the middle node or not is added in the coded root node and the coded middle node; therefore, by the mode, when the data filled in the positions in the character nodes stored in the database needs to be inquired, the non-empty slot positions in the positions can be determined through the bitmap coding information without traversing all the slot positions, so that the searching efficiency can be improved; in addition, through the mode, when the root nodes and the intermediate nodes are stored in the database, the empty slots in the positions of the root nodes and the intermediate nodes are deleted, so that the storage space occupied when the root nodes and the intermediate nodes are stored in the database can be further saved, and the storage efficiency is improved.
Referring to fig. 3, fig. 3 is a flowchart illustrating a method for storing blockchain data according to an exemplary embodiment. The method is applied to block chain node equipment; the method comprises the following steps:
step 302, acquiring a key-value key value pair of block chain data to be stored;
step 304, converting the key-value key value pair of the block chain data to be stored into a root node, a middle node and a leaf node on a logic tree structure; the root node and the middle node comprise a plurality of positions for storing characters in keys of the blockchain data, and each position comprises a plurality of slot positions for storing the characters in the keys of the blockchain data; the slot position is used for storing a hash value of a next layer node linked with the node;
step 306, encoding the root node, the intermediate node and the leaf node, and storing the key-value key value pairs of the root node, the intermediate node and the leaf node after encoding in a database; in the key-value key value pairs of the leaf node, the intermediate node and the root node, value is the storage content of the node, and key is the hash value of the storage content of the node; the coding processing comprises bitmap coding processing aiming at the root node and the intermediate node; the root node and the middle node after the bitmap coding processing comprise bitmap coding information; and the bitmap coding information indicates whether the slots in each position of the root node and the middle node are filled with hash values or not.
In this specification, in order to avoid frequent splitting of nodes, a logical tree structure with fixed number of layers of character nodes in a key for storing block chain data of the first N layers is proposed.
The logical tree structure is a structure in which physical storage structures completely corresponding to the tree structure do not exist in the underlying physical storage of the database, but only physical data of each node on the tree structure and link relation data between the nodes are stored in the database, so that the tree structure can be logically restored based on the physical data and the link relation data of each node stored in the database.
The logical tree structure may specifically include a root node, a middle node, and a leaf node; the nodes on the logic tree structure can be linked with the nodes on the upper layer through the hash value of the nodes. The root node and the intermediate node are specifically used for storing at least one character in a key-value key value pair of the block chain data; the leaf node is specifically used for storing the value of the blockchain data (i.e., the specific content of the blockchain data). The number of layers of the intermediate node may be one or more, and is not particularly limited in this specification.
For example, in one example, the key of the above tile chain data may still include a character prefix portion (Shared neighbor) and a character suffix portion (key-end); in this case, the root node and the intermediate node may be used to store the characters in the character prefixes; the leaf node may be used to store values of the character suffix and the blockchain data.
On one hand, because of the root node and the middle node in the logical tree structure, the characters in the keys of the block chain data can be stored; therefore, the tree structure of the logic has the characteristics of a Trie dictionary tree; on the other hand, the nodes on the logical tree structure can be linked with the nodes on the upper layer through the hash value of the nodes; therefore, the above logical tree structure also has the characteristics of a Merkle tree. In summary, the logical tree structure described in this specification is actually a Merkle tree variation of the tree structure similar to the MPT tree, which merges Trie dictionary trees.
Please refer to fig. 4, fig. 4 is a tree structure diagram of an fdmt (fixed Depth Merkle tree) tree shown in the present specification. The above-mentioned FDMT tree is a Merkle tree variety of a tree structure in which a Trie dictionary tree is fused.
As shown in fig. 4, in the Tree structure of the FDMT Tree shown in this specification, the Tree node of the first N layers (3 layers are shown in fig. 4, which is only schematic) and the Leaf node (i.e., Leaf node) of the last layer are included. Among the Tree nodes of the first N layers, the first layer is the Tree node serving as the root node, and the other layers are the Tree nodes serving as the intermediate nodes.
Different from the MPT Tree described above, the Tree nodes (i.e., the root node and the middle node) in the N-layer before the FDMT Tree adopt a uniform data structure, so that the node splitting caused by the change in the length of the stored character due to the newly written data can be avoided.
As shown in fig. 4, the Tree nodes in the first N layers of the FDMT Tree may each include a plurality of blocks respectively representing different characters; the block is the "location" of the character in the key used to store the block chain data. Each block may further comprise a plurality of slots respectively representing different characters; the slot is also used for storing characters in keys of the blockchain data. For example, fig. 4 shows that each Tree node includes N blocks; each block further comprises N slots. Among the nodes in each layer of the FDMT tree, the nodes may still be linked by filling the nodes in the previous layer with the hash value (hash pointer) of the nodes in the next layer. That is, the nodes in the above-described FDMT tree are linked to the nodes of the upper layer by their own hash values. Correspondingly, the slot may be specifically used to fill a hash value of a next node linked by the current Tree node. The node at the next layer of the Tree node may be the Tree node or a Leaf node.
In practical application, after the hash value filled in any slot in any block of the Tree node in the FDMT Tree is updated, the hash value of the Tree node is usually recalculated, and the node in the previous layer is linked again according to the calculated hash value.
When the hash value of the Tree node is calculated, all data filled in the Tree node is generally required to be used as calculation parameters for hash calculation; therefore, when the Tree node adopts the data structure shown in fig. 4, if the hash value of the Tree node needs to be calculated, the hash of each block included in the Tree node needs to be calculated first, then the hashes of the blocks need to be concatenated, and then secondary hash calculation is performed on the concatenated hash.
By the method, when the hash value filled in any slot in any block in the Tree node on the FDMT is updated, and the hash value of the Tree node is recalculated, even if data of other blocks in the Tree node is not updated, the other blocks still need to be used as calculation parameters to participate in hash calculation, and obviously, the problem of large calculation amount of hash calculation exists.
In view of this, in the present specification, in order to reduce the amount of computation in computing the hash of the Tree node, the Tree node may specifically adopt a data structure of a main block (i.e., a main position) and a plurality of sub-blocks (i.e., sub-positions).
Referring to fig. 5, fig. 5 is a tree structure diagram of another FDMT tree shown in the present specification.
As shown in fig. 5, in the Tree structure of the FDMT Tree shown in this specification, the Tree nodes of the first N layers and the leaf nodes of the last layer are still included.
Unlike the structure of the FDMT Tree shown in fig. 4, the Tree nodes of the first N layers on the FDMT Tree shown in fig. 5 may each include a main block (i.e., a root block shown in fig. 5) and a plurality of sub-blocks respectively representing different characters; each block can further comprise a plurality of slot positions; for example, fig. 5 shows that each Tree node includes one main block and N sub-blocks; and each sub-block further comprises N slots.
The functions of the slot positions contained in the main block and the sub-block can be different; as shown in fig. 5, for the main block, a plurality of slots respectively corresponding to the sub-blocks may be included, and each slot may be specifically used to fill a hash of data content stored in the corresponding sub-block;
for example, when calculating the hash of any sub-block, the hash values filled in the slots in the sub-block may be specifically spliced together, and then secondary hash calculation is performed on the spliced hash to obtain the hash of the sub-block.
For the subblock, a plurality of slots respectively representing different characters may be included, and each slot may be specifically used to fill a hash value of a node on a next layer of the Tree node link.
In this specification, according to the structure of the Tree node shown in fig. 5, the hash value of the master block in the Tree node may be used to represent the hash value of the Tree node; therefore, under the condition, after the hash value filled in the slot in any one sub-block in the Tree node on the FDMT Tree is updated, when the hash value of the Tree node is recalculated, the hash value filled in each slot in the sub-block with the updated data is only needed to be spliced, the spliced hash value is used as a calculation parameter to perform hash calculation again, the calculated hash value is filled in the slot corresponding to the sub-block in the main block in the Tree node, the hash filled in each slot in the main block is spliced, and the spliced hash value is used as a calculation parameter to perform hash calculation again to obtain the hash of the Tree node.
In the whole process, the hash values filled in the slots in other subblocks without data updating are not needed to be spliced and then are used as calculation parameters to participate in hash calculation, so that the hash calculation amount and the calculation time length when the hash value of the Tree node is recalculated can be reduced, and the calculation efficiency of the hash calculation is improved.
It should be noted that the link relationships between the nodes in the above-mentioned FDMT tree layers shown in fig. 4 and fig. 5 are only schematic, and do not refer to a specific limitation on the link relationships between the nodes in the above-mentioned FDMT tree layers.
With continued reference to fig. 4 and 5, each Tree node on the FDMT Tree shown in fig. 4 and 5 may be used to store at least a portion of the characters in the key of the blockchain data.
For each Tree node on the FDMT Tree shown in fig. 4, the actually stored characters may be specifically a character represented by a block in the Tree node (that is, a non-empty block with at least one slot filled with a hash value), and a character string generated by splicing the characters represented by the slots in the block filled with the hash value (that is, the non-empty slots).
For each Tree node on the FDMT Tree shown in fig. 5, the actually stored characters may be specifically a character represented by a sub-block in the Tree node, and a character string generated by splicing the characters represented by the slots filled with the hash value in the block.
It should be noted that, in practical application, each block in the Tree node may represent only one character; that is, based on the storage format of the Tree node shown in fig. 4 and 5, each of the partial characters in the character prefix of the key of the block chain data actually stored by the Tree node is a character string having a length of 2-bit characters.
For example, please refer to fig. 6, fig. 6 is a structural diagram of a Tree node shown in the present specification;
as shown in fig. 6, the Tree node contains 16 sub-blocks representing different 16-ary characters; each sub-block further comprises 16 slots each representing a different 16-ary character (only the 16 slots contained in block6 are shown in fig. 6); assuming that a sub-block 6 (representing 16-system character 6) in the Tree node is a non-empty block, and a slot4 (representing 16-system character 4), a slot6 (representing 16-system character 6) and a slot9 (representing 16-system character 9) in the sub-block are non-empty slots which fill hash values of nodes at the next layer of the Tree node link; the partial characters in the character prefix of the key of the above block chain data stored by the Tree node are 16-ary character strings "64", "66" and "69", respectively.
The number of sub-blocks included in the Tree node and the number of slots included in each sub-block are not particularly limited in this specification; in practical applications, the number of subblocks included in the Tree node may be determined based on the number of types of character elements included in a character string corresponding to the key of the block chain data; and the number of slots contained in the sub-block;
for example, assume that the key corresponding to the blockchain data is a 16-ary character string, and at this time, the number of types of character elements included in the character string corresponding to the key of the blockchain data is 16; the number of sub-blocks included in the Tree node and the number of slots included in each sub-block may be 16.
In practical applications, the number of sub-blocks included in the Tree node and the number of slots included in the sub-blocks may be the same;
for example, in an example, taking the character string corresponding to the key of the block chain data as a 16-ary character string, in this case, the Tree nodes of the first N layers may each include 16 sub-blocks respectively representing different 16-ary characters; and each sub-block may further comprise 16 slots each representing a different 16-ary character.
In this way, the single Tree node in each of the first N layers of the FDMT Tree may have 16 × 16=256 slots, and it is obvious that the Tree node in the FDMT Tree shown in fig. 4 has a larger storage capacity than the node for storing the character prefix in the MPT Tree shown in fig. 1.
Currently, in practical applications, the number of sub-blocks included in the Tree node and the number of slots included in the sub-blocks may be different;
for example, in practical applications, the string corresponding to the key of the block chain data may be a string composed of two different binary characters; for example, the character string corresponding to the key of the block chain data may specifically be a character string formed by mixing 16-ary characters and 10-ary characters, in this case, the Tree nodes of the first N layers may each include 16 sub-blocks respectively representing different 16-ary characters; each sub-block may further comprise 10 slots respectively representing different 10-system characters; or, the Tree nodes of the first N layers may each include 10 sub-blocks respectively representing different 10-ary characters; and each sub-block may further comprise 16 slots each representing a different 16-ary character.
In an embodiment shown, the number of layers of the Tree node included in the FDMT Tree may be a fixed value; in practical application, the value of N may be an integer greater than or equal to 1; that is, the FDMT Tree may be a Merkle Tree which contains at least one layer of Tree nodes and the number of layers of the Tree nodes is relatively fixed.
For example, in an example, taking the key of the above blockchain data as the blockchain account address as an example, assuming that the blockchain account addresses supported by the blockchain system are designed such that the first 6 address characters can be the same, in this case, the address character of the first 6 bits of the blockchain account address can be used as the character prefix of the blockchain account address; moreover, the length of the character in the character prefix of the block chain account address stored by the Tree node is 2-bit; thus, the above-described FDMT Tree can be designed to have a Tree structure including three levels of Tree nodes.
Based on the Tree structure design of the FDMT Tree shown in fig. 4 and 5, on one hand, each Tree node in the FDMT Tree includes a plurality of blocks respectively representing different characters; each block further comprises a plurality of slot positions which respectively represent different characters; therefore, by the design, each Tree node has larger data storage capacity and data carrying capacity, so that when the Tree nodes in the FDMT Tree are written into a database for storage; or, when the Tree node in the FDMT Tree stored in the database is accessed, the data storage capacity of the Tree node can be more adaptive to the capacity of single IO read-write of the storage medium bearing the database, so that the IO read-write capability of the storage medium bearing the database can be fully utilized, and the data read-write efficiency is improved; moreover, the improvement of the data storage capacity and the data carrying capacity of a single Tree node in the FDMT Tree will also lead to the improvement of the data storage capacity and the data carrying capacity of the whole FDMT Tree, so that more block chain data can be stored in the FDMT Tree;
for example, taking a storage medium carrying the database as a disk with a single physical sector of 4KB size as an example, assuming that the capacity of one IO read and write of the disk is 4KB (one sector), and assuming that 16 fields included in a Branch Node all fill a hash value of 32 bytes for 1 Branch Node on the MPT tree described in fig. 1, the data storage capacity of the Branch Node is about 32 bytes 16=512 bytes; obviously, the maximum capacity of one IO reading of a Branch node can only be about 512 bytes, which is much smaller than the maximum reading capacity of 4Kb of the disk for one IO reading, which obviously cannot fully utilize the IO reading capacity of the disk, and there is serious performance waste.
However, if the Tree structure design of the FDMT Tree shown in fig. 4 and 5 is adopted, it is assumed that the Tree nodes of the first N layers may each include 16 blocks respectively representing different 16-ary characters; each block may further include 16 slots each representing a different 16-ary character, and a single Tree node in each of the first N layers of the FDMT may have 16 × 16=256 slots; assuming that each slot is filled with a hash value of 32 bytes, the maximum storage capacity of a Tree node is 256 × 32 bytes =8192 bytes =8kb, which is exactly the capacity of two sectors, when all slots are fully loaded. Obviously, the data storage capacity of each Tree node in the Tree structure of the FDMT shown in fig. 4 and 5 is more adaptive to the capacity of single IO read/write of the disk, so that the IO read/write capability of the disk can be fully utilized, and the data read/write efficiency is improved.
Moreover, the improvement of the data storage capacity and the data carrying capacity of a single Tree node on the FDMT will also lead to the improvement of the overall data storage capacity and the data carrying capacity of the FDMT, so that more block chain data can be stored on the FDMT;
for example, assuming that the Tree structure of the FDMT shown in fig. 4 and 5 includes 3 layers of Tree nodes, the Tree nodes of each layer may each include 16 sub-blocks respectively representing different 16-ary characters; each block may further comprise 16 slots each representing a different 16-ary character; then, a single Tree node in each layer may have 16 by 16=256 slots; then the three layers of Tree nodes can bear 256 × 256=16.77M character combinations, and 16.77M bucket data blocks can be linked in total; assuming that a user defines that each packet data block can carry 16 data records, the whole FDMT tree can carry 16.77M by 16 block chain data at most; it is obvious that the above-described FDMT shown in fig. 4 and 5 can store more blockchain data and have a larger data carrying capacity than the MPT tree shown in fig. 1.
In the second aspect, each Tree node in the FDMT adopts a uniform data structure; for the Tree nodes of each layer in the above FDMT, the character length of the key character prefix of the block chain data actually stored in the Tree nodes will also be kept fixed; therefore, through the design, the frequent splitting of the nodes caused by the fact that the actually stored character length of each layer of the Tree node is not fixed can be avoided, and the number of the Tree nodes contained in the Tree structure of the FDMT can be ensured to be always in a relatively stable state.
In a third aspect, each Tree node will have greater data storage capacity and data carrying capacity due to the design; the number of layers of the Tree node included in the Tree structure of the FDMT is relatively stable; therefore, the storage capacity of the Tree node is improved, the number of the Tree node layers is relatively stable, and the fact that the Tree node of the FDMT has fewer layer numbers can be ensured to some extent; therefore, on the basis that the Tree nodes have larger data storage capacity and data carrying capacity and the Tree nodes of the FDMT have fewer layers, when the system is in cold start and needs to load the Tree nodes of the front N layer of the FDMT into a memory as data needing to be frequently read from a storage medium bearing the database, the IO reading times of reading the Tree nodes of the front N layer stored in the storage medium into the memory and the overall reading time of loading the Tree nodes of the front N layer of the FDMT into the memory can be obviously reduced, and further the starting time delay of the system in cold start is radically shortened.
For example, the FDMT shown in fig. 4 and 5 is fixed for 3-layer Tree nodes, and each layer of Tree node includes 16 blocks representing different 16-ary characters; each block further comprises 16 slots respectively representing different 16-system characters as an example, and under the condition that all the slots are fully loaded, the maximum storage capacity of one Tree node is 256 × 32 bytes =8192 bytes =8 kb; for the MPT tree shown in fig. 1, the number of layers of the MPT tree is not fixed because it needs to perform frequent node splitting; moreover, the storage capacity of a single Branch Node is 512 bytes, which is much smaller than the Tree Node on the FDMT shown in fig. 4 and 5; it is necessary to make the MPT tree have a larger number of layers (for example, the MPT tree can reach 64 layers at maximum and is much larger than 3 layers). When the system is in cold start and reads the front N layers of the FDMT tree into the memory, the data is read layer by layer generally; therefore, based on the MPT tree shown in fig. 1, it is apparent that more reading times are required.
Moreover, the storage capacity of a single Branch Node on the MPT tree is 512 bytes, which is only one eighth of 4KB of a single physical sector carrying the database, and the reading efficiency is very low; therefore, even if the MPT tree of fig. 1 and the FDMT tree shown in fig. 4 and 5 store the same data, the number of IO reads for the MPT tree may be at least 8 times the number of IO reads for the FDMT tree shown in fig. 4 and 5 when the system is cold-started.
Obviously, the number of IO reads for the above FDMT tree shown in fig. 4 and 5 will be much smaller than the number of IO reads for the MPT tree shown in fig. 1 when the system is in cold start; therefore, the tree structure design of the above-described FDMT tree shown in fig. 4 and 5 will be more friendly to system cold start.
In an embodiment shown, the character string corresponding to the key of the above block chain data may still include a character prefix and a character suffix; in this case, the Tree node may be configured to store a character in a character prefix of the key of the blockchain data; the leaf node may be configured to store a character suffix of the key of the blockchain data and a Value of the blockchain data.
It should be noted that, because the data actually stored by the leaf node generally has a larger data capacity than the Tree node; for example, the value of the block chain data actually stored by the leaf node is usually the original content of the block chain data, and the original content of the block chain data occupies a larger storage space compared to the character prefix of the block chain data; therefore, in this specification, in order to ensure that the leaf node can have a larger data capacity, the leaf node may store data in the form of a large data block.
The specific form and storage structure of the data block are not particularly limited in this specification;
in an embodiment shown, the leaf node may be in the form of a bucket; the bucket may be a container or a storage space for storing data.
Referring to fig. 7, fig. 7 is a structural diagram of a bucket shown in this specification;
as shown in fig. 7, in the above bucket (i.e. the bucket node shown in fig. 7), several data records may be included; it should be noted that the plurality of data records contained in the bucket may not be a whole in logic, but may be a plurality of different data records separated in logic. Each data record corresponds to a block chain data respectively and is used for storing the value of the block chain data; that is, a data record refers to a stored record whose stored data content includes the value of the blockchain data.
It should be noted that the plurality of data records included in the bucket data bucket are not logically an entirety, which means that each data record may correspond to an independent query key value (key), so that each data record in the bucket data bucket may be accurately queried based on the query key value of each data record, and all data records stored in the bucket data bucket are not required to be read as an entirety.
The specific form and the specific content of the query key value corresponding to each data record are not particularly limited in this specification, and may be any form of character string that can be used as a query index for each data record.
In an illustrated embodiment, the query key value corresponding to each data record may specifically be a hash value of data content included in each data record; the data records contained in the bucket data bucket may specifically include a key-value key value pair composed of a hash value of the blockchain data and data content corresponding to the value of the blockchain data.
Of course, in practical applications, the query key value corresponding to each data record may also be a character string that can be used as a query index for each data record, except for a hash value, and is not particularly limited in this description; for example, in an example, the query key value corresponding to each data record may specifically be a unique identifier (such as a number) set by the node device for each data record.
In one embodiment, if the Tree node is used to store a character in the character prefix of the key of the blockchain data; the leaf node is configured to store the suffix of the key of the blockchain data and the Value of the blockchain data;
accordingly, the data record included in the bucket data bucket may be a hash value obtained by hash calculation of the whole data content corresponding to the value of the blockchain data and the character suffix of the blockchain data and a key-value key value pair formed by the data content corresponding to the value of the blockchain data and the character suffix of the blockchain data.
In practical applications, the data records may be in other types of data forms besides key-value key value pairs, and are not listed in this specification.
With the above embodiment, because the several data records contained in the leaf nodes in the FDMT tree are no longer logically an integer, but are logically separated several key-value pairs; therefore, the data contained in the leaf node in the FDMT tree is no longer accessed as a whole as an access unit in the database, and each key-value key value pair contained in the leaf node is used as an independent access unit in the database, so that the data access to the database is more flexible;
for example, when the value of the blockchain data is the latest account status data corresponding to the blockchain account in the blockchain, and the key of the blockchain data is the blockchain account address, in this case, the key of the key-value pair corresponding to the data record contained in the bucket may be a hash value of the data content of the two parts, i.e., the character suffix of the blockchain account address and the corresponding account status data; the value of the key-value key value pair may be the data content of both the character suffix of the blockchain account address and the corresponding account status data.
Assuming that a character suffix of a specific account address and corresponding account status data contained in the bucket data bucket need to be read, each key-value key value pair in the bucket data bucket is an independent access unit; therefore, content addressing is carried out on the database only based on the hash values of the data contents of the character suffix of the specific account address and the corresponding account state data, all the data contents contained in the leaf node do not need to be read into the memory from the database, and the character suffix of the account address to be read and the corresponding account state data are further searched in the memory;
correspondingly, if a character suffix of a new account address and corresponding account state data need to be written into the bucket data bucket, or account state data corresponding to a specific account address contained in the bucket data bucket is updated, a key-value key value pair can be directly constructed according to the character suffix of the new account address and the corresponding account state data, and the key-value key value pair is written into the bucket data bucket; or based on the hash value of the data content of the two parts, namely the character suffix of the specific account address and the corresponding account state data, performing content addressing in the database, searching the corresponding key-value key value pair, then writing the updated account state data corresponding to the specific account address, and updating the original value of the key-value key value pair.
The number of data records contained in the bucket is not particularly limited in this specification; in an implementation manner shown, the number of data records contained in the bucket may be specifically configured by a user in a customized manner.
For example, taking the blockchain data as the latest account status data corresponding to the blockchain account in the blockchain, and taking the key of the blockchain data as the blockchain account address as an example, in this case, each data record in the bucket corresponds to the account status of one blockchain account; the number of data records in the bucket actually represents the account carrying capacity of the bucket for accommodating the blockchain account; therefore, the user can flexibly customize the account bearing capacity of the bucket by customizing the number of data records which can be contained in the bucket; for example, in one example, the number of data records contained in the above-mentioned bucket may be configured by the user into 16 or 64, so that a single bucket may carry status data of 16 or 64 blockchain accounts.
It should be noted that, the total storage capacity of the data records stored in the bucket is not particularly limited in this specification; in an implementation manner shown, the total storage capacity of the data records stored in the bucket may be specifically configured by a user in a customized manner.
For example, in implementation, taking the storage medium carrying the database as a disk with a single physical sector of 4KB size as an example, in this case, the user may set the maximum storage capacity of the data records stored in the bucket to 4 KB; or an integer multiple of 4KB, such that the maximum storage capacity of the bucket can be adapted to the storage capacity of a single physical sector of the storage medium.
In the present specification, as described above, the Tree nodes in the top N layer of the FDMT Tree adopt a uniform data structure; therefore, node splitting will not occur for each Tree node in the top N layers of the FDMT Tree. The leaf nodes in the FDMT tree can still be split when a certain splitting condition is satisfied.
Unlike the MPT tree, in the present specification, when a leaf node on the FDMT tree is split into nodes, the splitting may be performed according to the storage capacity of the leaf node instead of the length of the character sharing the character prefix. When the storage capacity of a leaf node on the above-mentioned FDMT tree satisfies the node splitting condition, an intermediate node (i.e., an extended node shown in fig. 5) is split from the leaf node. It should be noted that the split intermediate node is an upper node of the leaf node. Multiple splits may be performed for the same leaf node.
The node splitting condition may include any type of condition related to the storage capacity of the leaf node, and is not particularly limited in this specification;
in an embodiment shown, the node splitting condition may be specifically any one of conditions 1 and 2 shown below; alternatively, a combination of conditions 1 and 2 shown below is also possible:
condition 1: the total number of data records stored in the leaf node is greater than a threshold value;
condition 2: the total storage capacity of the data records stored in the leaf nodes is greater than a threshold.
For example, in one example, the node splitting condition may be set to be when the total number of data records stored in the leaf node is greater than 64; and/or, when the total storage capacity of the stored data records stored in the leaf node is larger than 4KB, splitting the intermediate node from the leaf node.
In this specification, the extended node, which is also an intermediate node of the FDMT Tree, may adopt a data structure identical to the Tree node described above, and is specifically configured to store characters split from a character suffix stored in the leaf node;
for example, referring to fig. 4 and 5, an extended node split from a leaf node may also include a plurality of blocks respectively representing different characters; each block may further comprise a plurality of slots respectively representing different characters; the slot may be specifically used to fill a hash value of a node of a next level linked to the node. For example, for an extended node, the next node is the leaf node; therefore, the slot contained in the block in the extended node can be used to fill the hash value of the next leaf node linked by the extended node.
In this specification, for an extended node split from the above leaf nodes, merging with the leaf node of the lower layer linked thereto may be performed according to the storage capacity of the extended node. When the storage capacity of an extended node on the above-mentioned FDMT tree satisfies the node merging condition, the extended node can be merged to a leaf node of a lower layer linked thereto.
The merging of an extended node to a lower leaf node linked thereto means writing a character stored in the extended node as the character suffix to one or more lower leaf nodes linked to the extended node.
The node merge condition may include any type of condition related to the storage capacity of the extended node, and is not particularly limited in this specification;
in an embodiment shown, the node merge condition may be specifically any one of conditions 3 and 4 shown below; alternatively, a combination of condition 3 and condition 3 shown below is also possible:
condition 1: the total number of characters stored in the extended node is less than or equal to a threshold; that is, the total number of non-empty slots (i.e., slots filled with a hash value) in each block in the extended node;
condition 2: the total storage capacity of the characters stored in the extended node is less than or equal to the threshold. That is, the total storage capacity of the hash value filled in the non-empty slot in each block in the extended node;
for example, in one example, the node merging condition may be set to be when the total number of characters stored in the extended node is less than or equal to 1; and/or when the total storage capacity of the characters stored in the extended node is less than or equal to 1KB, writing the characters stored in the extended node as a character suffix into one or more leaf nodes at the lower layer linked by the extended node.
As can be seen from the above description, an extended node described in this specification may be a dynamically scalable intermediate node on the above-mentioned FDMT tree; when the storage capacity of any leaf node on the FDMT tree meets the node splitting condition, at least one extended node and at least one leaf node can be split from the leaf node; when the storage capacity of any extended node satisfies the node merge condition, the extended node can be merged to the leaf node of the lower layer linked thereto.
In this specification, the block chain data to be stored may specifically include at least the following four types of data:
transactions captured in blocks; a transaction receipt corresponding to the transaction recorded in the block after the transaction in the block is completed; after the transaction in the block is executed, latest account state data corresponding to a block chain account in the block chain; the storage content of the intelligent contract account;
accordingly, the FDMT tree may specifically include:
a transaction tree for storing transactions embodied in blocks; a receipt tree for storing transaction receipts corresponding to transactions included in the block; a state tree for storing the latest account state data corresponding to blockchain accounts in the blockchain; and the storage tree is used for storing the storage content of the intelligent contract account.
Of course, in practical applications, only the tree structure of the FDMT tree may be used to store some types of data in the four types of data; for example, only the FDMT tree is used to store the latest account status data corresponding to the block chain account, and other types of data may be stored using other forms of binary trees (e.g., MPT or other forms of binary trees).
Wherein, the hash value of the root nodes of the transaction tree, the receipt tree and the state tree can be stored in the block header; the hash value of the root node of the Storage tree may be stored in a Storage field in a structure of the contract account corresponding to the Storage tree.
The key of the block chain data may specifically refer to a corresponding search key value of the block chain data in a database; correspondingly, the Value of the blockchain data may be the original content of the blockchain data;
in practical application, the search key value may be a character string corresponding to the blockchain data; when the block chain data are different types of data, the corresponding keys have certain differences;
for example, when the blockchain data is a transaction for listing in a block; or, after the transaction in the block is completed, when the transaction receipt corresponding to the transaction recorded in the block is received, the key corresponding to the block chain data at this time may be specifically the serial number of the transaction in the block; or other forms of transaction identification.
When the blockchain data is the latest account status data corresponding to the blockchain account in the blockchain after the transaction in the block is completed, the key corresponding to the blockchain data may be specifically the account address of the blockchain account.
When the blockchain data is the storage content of the intelligent contract account, the key corresponding to the blockchain data at this time may specifically be an account address of the contract account and a hash value of a storage location of the storage content in the account storage of the contract account; for example, in an example, the storage content may be a state variable in general, and then a hash value of the account address of the contract account and a storage location of the state variable in the account storage of the contract account may be used as a key.
In this specification, when a node device in a block chain stores block chain data, a key-value key value pair of the block chain data to be stored may be obtained first; for example, in one example, the node device may process blockchain data to be stored into a key-value key value pair when acquiring the blockchain data;
after the key-value key value pair of the blockchain data to be stored is obtained, the key-value key value pair of the blockchain data to be stored can be converted into a root node, a middle node and a leaf node on the logical tree structure;
for example, in an example, the node device may carry a storage interface or a storage service corresponding to the FDMT tree, where the storage interface or the storage service may be specifically configured to convert key-value key pairs of blockchain data to be stored into nodes in the FDMT tree; after acquiring the key-value key value pair of the blockchain data to be stored, the node device may convert the key-value key value pair of the blockchain data to be stored into a root node, an intermediate node, and a leaf node in fig. 4 or fig. 5 by calling the storage interface or the storage service according to the tree structure of the FDMT tree shown in fig. 4 or fig. 5.
After converting the key-value pairs of the stored blockchain data into the root node, the intermediate node and the leaf node on the logical tree structure, the root node, the intermediate node and the leaf node may be stored in the database in the form of the key-value pairs;
for example, the database is usually stored in a persistent storage medium (such as a storage disk) mounted on the node device; the storage medium is a physical storage corresponding to the database; when the FDMT tree is stored in the database, the commit command may be executed to further write the nodes in the FDMT tree into the memory of the node device in the form of Key-Value Key Value pairs, and the storage medium carrying the database.
The specific type of the database is not particularly limited in this specification, and those skilled in the art can flexibly select the database based on actual needs;
in an implementation manner, the database may be a Key-Value type database; for example, in one example, the database may be a LevelDB database with a multi-level storage structure; or, a database based on a levelDB architecture; for example, the Rocksdb database is a typical database based on a LevelDB database architecture.
It should be noted that, when the nodes in the FDMT tree are stored in the database in the form of Key-Value Key Value pairs, the Key of the Key-Value Key Value pair may be specifically the node IDs of the nodes in the FDMT tree;
the node ID may specifically include identification information that can uniquely identify a node in the FDMT tree;
for example, in one implementation, the node ID may specifically be a hash value of data content contained in a node in the FDMT tree; in this case, when the node on the above-mentioned FDMT tree needs to be queried, content addressing can be performed as a key based on the hash value of the data content contained in the node.
In another implementation, the node ID may specifically include path information of a node in the FDMT tree; that is, the path information of the node in the FDMT tree is used as the node ID of the node; the path information may specifically include any form of information that can describe a link relationship between a node and another node, and a position of the node on the FDMT tree;
for example, referring to fig. 8, fig. 8 is a schematic diagram illustrating setting node IDs for nodes in an FDMT tree according to the present disclosure; for the FDMT Tree including three levels of Tree nodes as shown in fig. 8, assuming that the node ID of the Tree node as the root node is represented by 0x00, the node ID of the bucket node shown in fig. 8 may be represented by 0x 00123456.
Where 0x00 is the node ID of the root node; 123456 denotes path information of the bucket node from a root node to the bucket node on the FDMT tree; 12 represents the 2 nd slot of the first block of the first layer tree node; 34 represents the 4 th slot of the 3 rd block of the second layer tree node; 56 denotes the 6 th slot of the 5 th block of the third level tree node.
Based on the node ID, the link relation between the bucket node and other nodes and the specific position of the bucket node on the FDMT tree can be determined; for example, based on the node ID, it can be determined that the bucket node is linked to the 6 th slot of the 5 th block of the third layer tree node; the tree node of the third layer is linked with the 4 th slot of the 3 rd block of the tree node of the second layer; the tree node of the second layer is further linked with the 2 nd slot of the 1 st block of the tree node of the first layer as the root node.
In this manner, a particular slot in the FDMT tree at which a node is located can be precisely located when the node ID is used to retrieve the node stored on the FDMT tree.
In another implementation manner, the node ID may specifically include a relative position of a node in the FDMT tree, and a hash value of data content included in the node (in this case, the node ID may also be used as a hash identifier of the node); that is, the relative position of the node in the FDMT tree and the hash value of the data content included in the node are used as the node ID of the node.
For example, in implementation, a character string generated by splicing the relative position of a node in the FDMT tree and a hash value of data content included in the node may be used as a node ID of the node; of course, in practical application, in the process of performing the splicing, other types of information besides the relative position and the hash value may also be further introduced to generate the node ID; are not listed in this specification.
In this way, in addition to content addressing based on the hash value of the data content contained by the node as a key, the specific slot where the node is located in the FDMT tree can be precisely located when the node ID is used to retrieve the node stored in the FDMT tree.
It should be noted that the storage medium used for bearing the database on the node device may specifically be a persistent storage medium; for example, it may be a disk, a memory or other forms of storage media capable of persistently storing data, and these are not listed in this specification.
In this specification, before writing the key-value key values of the nodes in the FDMT tree into the database, the nodes in the FDMT tree may be encoded in advance, and then the key-value key values of the encoded nodes may be stored in the database.
The Tree node and the extended node on the FDMT Tree are both adopted and comprise a plurality of blocks, and each block further comprises a data structure of a plurality of slots; therefore, in the present specification, when encoding is performed on the FDMT Tree, bitmap encoding may be performed on blocks in the Tree node and extended node on the FDMT Tree.
It should be noted that when a bitmap coding is executed for a block (which may include a main block and a sub block) in a Tree node and an extended node in the FDMT Tree, it may be specifically counted whether each slot in a blck in the Tree node and the extended node is filled with a hash value, then a statistical result is represented by using a special coding character to obtain bitmap coding information, and finally the bitmap coding information is added to the block to complete a bitmap coding process for the block;
for example, in an example, the bitmap encoding information may specifically be a 16-ary character string used to identify whether each slot in the block is filled with a hash value; it is assumed that a block to be encoded (which may be a main block or a sub block) contains 16 slots in total; the 0 th bit, the 8 th bit, the 9 th bit, the 10 th bit and the 11 th bit in the 16 slots are respectively filled with a hash value; then the statistical result of whether the hash is filled in each slot in the block can be represented by a binary string 0000111100000001; wherein, the leftmost bit of the binary string is the 15 th bit of the highest bit, and the rightmost bit is the 0 th bit of the lowest bit; finally, the binary string may be converted into a 16-ary string 0x0f 01; at this time, the 16-ary character string 0x0f01 is bitmap encoding information of the block.
It should be noted that, when performing bitmap encoding on the block in the Tree node and the extended node in the FDMT Tree, the processing procedure of using bitmap encoding is completely the same regardless of the main block or the sub-block.
In this way, as the block contained in the Tree node and the extended node in the encoded FDMT Tree is added, whether each slot in the block is filled with bitmap encoding information with a hash value or not can be indicated; therefore, when data filled in a block in each Tree node stored in a database needs to be inquired, non-empty slots in the block can be determined through bitmap coding information in the block, and the slots in the block do not need to be traversed, so that the searching efficiency can be improved.
Wherein, the leaf node on the FDMT Tree adopts a data structure completely different from the Tree node and extended node; therefore, when encoding leaf nodes in the FDMT Tree, a completely different encoding scheme from the Tree node and extended node may be used.
In the illustrated embodiment, when encoding is performed on the leaf nodes on the above-mentioned FDMT tree, RLP (Recursive Length Prefix) encoding may be still specifically employed; the RLP coding is a coding method commonly used in the MPT tree shown in fig. 1, and a specific coding process thereof will not be described in detail in this specification.
Of course, in practical applications, the encoding method used for encoding the leaf nodes on the FDMT tree may be other encoding methods besides RLP encoding, and in practical applications, the encoding method can be flexibly selected, and is not described in this specification.
In this specification, after encoding is completed for the node on the FDMT tree, the key-value key value pair of the node on the FDMT tree may be stored in the database.
In an embodiment shown, for the Tree node and the extended node in the FDMT Tree, before the key-value key value pair of the coded Tree node and the extended node is stored in the database, an empty slot in the block, which is not filled with a hash value, may be determined according to bitmap coding information added to the block in the coded Tree node and the extended node, and then the determined empty slot may be deleted from the block.
By the mode, when the Tree node and extended node-value key value pairs are stored in the database, the empty slot positions in the blocks can be deleted, the storage space occupied when the Tree nodes and the extended nodes are stored in the database can be further saved, and the storage efficiency is improved.
It should be noted that, for an encoded block, when a hash value of the block is calculated, hash calculation may be performed on bitmap encoding information contained in the encoded block and the hash value filled in a non-empty character slot as a whole; of course, bitmap encoding information may be excluded.
In this specification, after the key-value key value pairs of the nodes in the FDMT Tree are written into the database, if a value corresponding to a certain piece of blockchain data written in the FDMT Tree is updated, the leaf nodes of the FDMT Tree for storing the character suffix and the value of the blockchain data, the extended nodes split from the leaf nodes, and the Tree nodes for storing the character prefix of the key of the blockchain data may need to be updated.
Of course, if the leaf nodes for storing the character suffixes and the values of the blockchain data are not split, only the leaf nodes for storing the character suffixes and the values of the blockchain data and the Tree nodes for storing the character prefixes of the keys of the blockchain data in the FDMT Tree may be updated respectively.
In this case, the node device may search the database for a node corresponding to the blockchain data that needs data update; wherein, the specific searching mode is not described again; and reading the searched nodes from the database into a memory, modifying and updating the nodes in the memory, and further writing the updated nodes into the database to update the original updated nodes.
For example, updating the leaf node on which the data update occurs in the FDMT tree may specifically include updating the value of the blockchain data stored in the leaf node. After the update is completed, the hash value of the leaf node may be recalculated, and the node on the previous layer of the node is re-linked based on the hash value.
Updating an extended node or a Tree node on an upper layer of a leaf node on the FDMT Tree where data update occurs, which may specifically include updating a slot in the extended node or the Tree node filled with a hash value of the leaf node; after the update is completed, the hash value of the extended node or the Tree node can be recalculated, and the relinking with the node on the layer above the node is continued based on the hash value.
In an embodiment shown, the splitting judgment and the specific node splitting operation for the leaf node in the FDMT Tree may be performed at a stage of recalculating the hash value of the leaf node and re-linking with a node (which may be a Tree node or an extended node) at a layer above the leaf node based on the hash value.
In this case, the node device may determine whether data update occurs to a leaf node on the FDMT Tree, and if data update occurs to any leaf node on the FDMT Tree, the node device may further determine whether the storage capacity of the leaf node satisfies a node splitting condition before recalculating a hash value of the leaf node and re-linking with a Tree node or an extended node on an upper layer of the leaf node based on the hash value; if the storage capacity of the leaf node satisfies the node splitting condition, at least one extended node can be further split from the leaf node.
For example, in one example, when the storage capacity of a leaf node satisfies the node splitting condition, in order to rapidly reduce the storage capacity of the leaf node, the leaf node may be split into one extended node and a plurality of leaf nodes.
The splitting strategy of the extended node is split from the leaf nodes, which is not particularly limited in the specification, and in practical application, a user can perform custom setting according to specific splitting requirements;
in an embodiment shown, the splitting policy may specifically include: the latest data record written to the leaf node in the current blocking period is split from the leaf node, and an extended node and the leaf node are additionally created based on the latest data.
In this case, when node splitting is performed on any target leaf node, it may be determined that the latest data record written into the target leaf node is written in the current blocking period; deleting the latest data records from the target data node, and splitting character prefixes from character suffixes contained in the latest data records;
for example, in one example, the first two bits of these most recent data records may be split out by default; alternatively, when a shared character prefix exists in the character suffixes contained in the latest data records and the length of the shared character prefix reaches two bits, the shared character prefix is split.
Then, at least one extended node for storing the split character prefix and at least one leaf node for storing the latest data record after the character prefix is split are further created.
In this way, the latest data record written into the target leaf node in the current blocking period can be deleted from the target leaf node, and at least one extended node and at least one leaf node are re-created as the split-out nodes based on the deleted latest data record; therefore, for the target leaf node, after the target leaf node performs node splitting, the data record actually stored by the target leaf node is completely consistent with the data record stored by the target leaf node in the last blocking period; the hash value of the target leaf node does not change any way with respect to the last blocking period. Therefore, when the target leaf node writes new block chain data in the current blocking period, and the storage capacity of the target leaf node meets the node splitting condition, the newly written data record is only required to be split out to recreate additional extended nodes and leaf nodes, and the hash value of the target leaf node does not need to be recalculated to be re-linked with the node of the previous layer; by the splitting mode, the hash value of the leaf node meeting the node splitting condition can be ensured to be in a stable state, and the calculation times for recalculating the hash value of the leaf node can be reduced.
In another illustrated embodiment, the splitting policy may specifically include: and splitting a shared character prefix among character suffixes contained in a plurality of data records stored in a leaf node from the leaf node, and additionally creating an extended node based on the shared character prefix.
In this case, when node splitting is performed on any target leaf node, it may be determined whether shared character prefixes exist among character suffixes included in a plurality of data records stored by the target leaf node; if a shared character prefix exists among character suffixes contained in a plurality of data records stored by the target leaf node, the shared character prefix can be deleted from the plurality of data records, and at least one extended node for storing the shared character prefix is created.
For example, since the extended node may also adopt the same data structure as the Tree node; therefore, the characters stored by the extended node may also be a character string with a length of two bits, which is formed by splicing the characters represented by the block and the characters represented by the slots filled with the hash value in the block; a shared character prefix exists among the character suffixes contained in the data records stored in the target leaf node, and when the length of the shared character prefix reaches two bits, the shared character prefix can be split from the data records.
In an embodiment shown, the merge judgment and the specific node merge operation for the extended node on the FDMT may be performed at a stage of recalculating the hash value of the extended node and performing a relinking with a node on a layer above the extended node based on the hash value.
In this case, the node device may determine whether a data update occurs to an extended node on the FDMT tree, and if a data update occurs to any extended node split from a leaf node on the FDMY tree, the node device may further determine whether a storage capacity of the extended node satisfies a node merge condition before recalculating a hash value of the extended node and relinking with a node on an upper layer of the extended node based on the hash value;
if the storage capacity of the extended node satisfies the node splitting condition, the extended node may be further merged to a leaf node of a next layer linked with the extended node.
For example, in implementation, a leaf node at the next layer of the extended node link may be determined first; for example, the leaf node of the next layer linked by the extended node may be determined according to a hash value filled in a non-empty slot in a block included in the extended node; then, the character stored in the extended node may be written into the leaf node of the next layer linked to the determined extended node as a part of the character suffix.
Of course, in practical applications, when a leaf node is subjected to multiple node splits, the leaf node may split into multiple layers of extended nodes; in this case, when any extended node satisfies the node merging condition, if the node on the lower layer is still an extended node and is not a leaf node, the extended node cannot perform node merging for a while; in this case, after the extended node on the lower layer and the leaf node on the lower layer are merged to form a new leaf node, the new leaf node and the new leaf node on the lower layer may be merged for the second time, and the specific merging manner is not described again.
In this specification, since the Tree node on the FDMT Tree employs a data structure including a plurality of blocks, each block further includes a plurality of slots; therefore, the Tree node on the FDMT Tree will have a larger data storage capacity.
However, since the Tree node in the FDMT Tree has a larger data storage capacity, when the FDMT Tree is used to store the blockchain data, the blockchain data stored in the FDMT Tree is inevitably "sparse"; for example, when the block chain data is stored in the FDMT Tree, the situations that only one block of a single Tree node is filled with data and only one slot of the single block is filled with data may occur at high frequency, which causes a problem that the interval between the slots where the block chain data stored in the FDMT Tree is located is large, resulting in that the block chain data stored in the FDMT Tree is logically sparse.
The block chain data stored in the FDMT tree is too sparse, and when the FDMT tree is stored in the database, a large number of empty slots are stored, which inevitably causes waste of storage space and cannot fully utilize the storage space of the database.
In view of this, in the present specification, during the storage of the key-value key value pairs of the nodes in the FDMT Tree in the database, or after the storage of the key-value key value pairs of the nodes in the FDMT Tree in the database is completed, the blocks having the number of non-empty slots of 1 in each Tree node may be compressed into the blocks of the upper stage.
It should be noted that the timing of compressing the blocks with the number of non-empty slots of 1 in each Tree node may be during the process of storing the FDMT Tree in the database, or after the completion of storing the FDMT Tree in the database, and is not particularly limited in this specification. In the process of storing the key-value key value pair of the Tree node or the extended node serving as the intermediate node on the FDMT Tree in the database, or after the key-value key value pair of the Tree node or the extended node serving as the intermediate node on the FDMT Tree is completely stored in the database, whether the number of non-empty slots of each block in the Tree node or the extended node serving as the intermediate node is 1 or not can be determined; if the number of the non-empty slots of any target block in any Tree node or any extended node is 1, compressing the target block into a block at the previous stage or a corresponding block in a node at the previous stage;
in one embodiment shown, if the Tree node employs the data structure shown in fig. 4; namely, the Tree node comprises a plurality of blocks respectively representing different characters; each block may further comprise a plurality of slots respectively representing different characters; in this case, if the number of non-empty slots of any target block of any Tree node or extended node as an intermediate node is 1, a target slot for filling a hash value of the Tree node or extended node in a previous node linked with the Tree node or extended node may be first determined.
And then, filling the hash mark of the target position into the target slot as the hash value of the Tree node or the extended node, and deleting the Tree node or the extended node.
The hash identifier may specifically refer to an intermediate node capable of uniquely identifying an FDMT tree; or identification information of a specific block in the intermediate node; for example, when the hash identifier is used to identify an intermediate node, the hash identifier may be used as the node ID of the intermediate node.
In this specification, the hash identifier may specifically include a hash value filled in the unique non-empty slot, and compressed information corresponding to the non-empty slot; that is, the hash identifier is a binary group formed by splicing the hash value filled in the unique non-empty slot and the compressed information corresponding to the non-empty slot.
The compression information corresponding to the non-empty slot may specifically include the number of times the non-empty slot is compressed and the relative position of the non-empty slot in the Tree node or extended node serving as the intermediate node. The relative position may specifically be any information capable of locating the relative position of the non-empty slot on the intermediate node; for example, the non-empty slot may be a 16-ary or other-ary string that concatenates the block in which the non-empty slot is located and a specific slot number. In another embodiment shown, if the Tree node employs the data structure shown in fig. 5; namely, the Tree node comprises a main block and a plurality of sub-blocks which respectively represent different characters; each block may further comprise a plurality of slots:
on one hand, if the number of non-empty slots of any target sub-block in any Tree node or extended node as an intermediate node is 1, a target slot in a previous stage main block linked with the target sub-block and used for filling a hash value of storage content in the target sub-block can be determined, the hash identifier of the target sub-block is used as the hash value of the storage content in the target sub-block to fill the target slot, and the target sub-block is deleted.
On the other hand, if the number of the non-empty slots of any main block in any Tree node or extended node as the intermediate node is 1, a target slot in the previous layer node linked with the Tree node or extended node for filling the hash value of the Tree node or extended node may be determined, the hash identifier of the main block may be used as the hash value of the Tree node or extended node to fill the target slot, and the Tree node or extended node may be deleted.
It should be noted that, when there is a target block in which only one slot is filled with data, the above-described compression process for a non-empty slot may specifically be a recursive compression process; the recursive compression means that after non-empty slots of a target block are compressed into a previous block of the block, the previous block may have similar situations, and only one slot is filled with data; therefore, the same compression process needs to be executed for the previous block, and the non-empty character slot positions of the target block are compressed to the higher block step by step.
The above recursive compression process is described in detail by a specific example, taking the above Tree node or extended node as an example to adopt the data structure shown in fig. 4.
Referring to fig. 9, in the FDMT Tree shown in fig. 9 and including three layers of blocks, the hash value of the leaf node having a node ID of 0x00123456 is initially stored in the 6 th slot of the 5 th block in the Tree nodeC of the third layer; assuming that only one slot is filled with data in each three-layer block of the FDMT tree; the 4 th slot of the 3 rd block of the Tree nodeB of the second layer is filled with data; data is filled in the 2 nd slot of the 2 nd block of the Tree nodeA of the first layer;
then, in the above-described recursive compression manner, the 6 th slot of the 5 th block of the Tree nodeC of the third layer is filled with the hash value of the leaf node;
because only the 6 th slot of the 5 th block of the Tree nodeC is filled with data, the 6 th slot is compressed into the 3 rd block of the Tree nodeB of the previous layer, and the hash value of the 5 th block filled by the 4 th slot of the 3 rd block is replaced by the hash-ID of the 5 th block. At this time, the hash-ID may be a binary group formed by splicing the compressed data "0 x01, 0x 65" of the 5 th block and the "hash value" filled in the 6 th slot of the 5 th block;
wherein, "0 x 01" indicates that the number of times of compaction is 1, and "0 x 65" indicates the relative position of the slot in the Tree nodeC, i.e. the 6 th slot in the 5 th block; "0 x01, 0x 65" indicates that the first compression is the 6 th slot of the 5 th block.
Because only the 4 th slot of the 3 rd block of the Tree nodeB is filled with data, the same compression process needs to be executed at the moment, the 4 th slot can be compressed into the 1 st block of the Tree nodeA of the upper layer, and the hash value of the 3 rd block filled by the 2 nd slot in the 1 st block can be replaced by the hash-ID of the 3 rd block. At this time, the hash-ID may be specifically a binary group formed by splicing the compressed data "0 x02, 0x43, 0x 65" of the 3 rd block and the "hash value" filled in the 4 th slot of the 3 rd block;
wherein "0 x 02" indicates that the number of times of compression is 2; "0 x 43" indicates the relative position of the slot in the Tree nodeB, namely the 4 th slot in the 3 rd block; "0 x02, 0x43, 0x 65" indicates that the first compression is the 6 th slot of the 5 th block; the second compression is the 4 th slot of the 3 rd block.
It should be noted that the above-mentioned representation manners for representing the position information of the slot and the number of times of compression are only exemplary, and are not used to limit the solution of the present specification; in practical applications, those skilled in the art can flexibly define the position information of the slot and the expression of the compression times, which are not illustrated in this specification. For example, in practical applications, in order to further reduce the number of bits occupied by the compressed information, the number of times of compression and the position information in the compressed information may be encoded into a single value.
It should be noted that, the above example is described by taking the Tree node as an example to adopt the data structure shown in fig. 4, and in practical applications, when the Tree node adopts the data structure shown in fig. 5, the process of the recursive compression is similar to the process described in the above example;
for example, when the Tree node adopts the data structure shown in fig. 5, the 5 th block of the Tree node c and the 3 rd block of the Tree node b shown in fig. 9 will be sub-blocks; at the moment, the previous block of the 5 th block of the Tree nodeC is the main block of the Tree nodeC, and the previous block of the 3 rd block of the Tree nodeB is the main block of the Tree nodeB; in this case, the 6 th slot of the 5 th block (child node) in the Tree nodeB may be compressed to the main node of the Tree nodeB, and then the main node of the Tree nodeB may be compressed to the 6 th slot of the 3 rd block (child node) of the Tree nodeB, and then the above processes may be repeated, and the compression manner of compressing the child node to the main node of the previous stage may refer to the above example, and will not be described again.
A specific process of writing the blockchain data into the FDMT tree shown in fig. 4 and 5 in the form of key-value key value pairs is described in detail below, taking the blockchain data as the latest account status data corresponding to the blockchain account, and taking the key of the blockchain data as the blockchain account address as an example.
When the method is implemented, a user client accessing the block chain can pack transaction data into a standard transaction format supported by the block chain and then release the transaction data to the block chain; the node equipment in the block chain can carry out consensus on the transactions issued to the block chain by the user client based on the carried consensus algorithm together with other node equipment so as to generate the latest block for the block chain; wherein, the specific consensus process is not repeated.
After the node devices in the blockchain execute the transactions in the target blocks, the account status of the target accounts in the blockchain related to the executed transactions is usually changed; therefore, after the transaction in the target block is completed, the node device may acquire the latest account status data after the account update of the target account occurs (i.e., the account status after the transaction in the target block is performed), and process the acquired latest account status data of the target account into a key-value key value pair; wherein the key of the key-value key value pair is the account address of the target account; the value of the key-value pair is the latest account status of the target account.
After the obtained latest account state data of the target account is processed into a key-value key value pair, the key-value key value pair may be converted into a Tree node, an extended node, and a Leaf node on the FDMT Tree, and the key-value key value pairs of the Tree node, the extended node, and the Leaf node may be stored in a database.
Referring to fig. 10, fig. 10 is a schematic diagram illustrating a process for writing account status data into an FDMT status tree according to the present disclosure;
in the above-described FDMT state Tree shown in fig. 10, the first three layers are Tree nodes, and the last layer is a leaf node; the leaf node may adopt the storage structure of the bucket described above. Each layer of Tree node comprises a main block and 16 sub-blocks which respectively represent different 16-system characters 1-f; and each sub-block further comprises 16 slots which respectively represent different 16-system characters 1-f.
Assuming that the account address of the target account is "a 71135125", the latest account status data is "state 1"; at this time, the character prefix (Shared neighbor) of the account address is "a 71135"; the suffix (key-end) of the character is "125".
When the account status data "state 1" of the account address "a 71135125" is written into the FDMT Tree in the form of key-value key value pairs, first, a Tree node serving as a root node on the FDMT Tree may be located in the database (for example, the root node may be located according to the hash of the root node filled in the block header); if the data is written in for the first time and the root node is not established, the root node can be established at the moment; and determining a character slot position corresponding to the character prefix 'a 71135' of the account address in three layers of Tree nodes in sequence from the root node1 of the first layer. As shown in fig. 10, the character slot corresponding to the character prefix "a 71135" of the account address specifically includes: slot 8 (representing character 7) in the 11 th sub-block (representing character a) of Tree node 1; slot 2 (representing character 1) in sub-block 2 (representing character 1) of Tree node 2; slot6 (representing character 5) in sub-block 4 (representing character 3) of Tree node 3.
After the above slot positions are determined, a data record composed of a character suffix "125" and state data "state 1" may be written in the bucket node linked with the 6 th slot position of the 4 th sub-block of the Tree node 3; of course, if there is already a data record corresponding to the character suffix "125" in the bucket node, it indicates a value corresponding to the historical account status data written with the above account address "a 71135125" on the FMDT tree; at this time, Value stored in the data record may be updated to "state 1".
After the data writing is completed, the hash value of the data content contained in the bucket node may be recalculated, the hash value is filled into the 6 th slot of the 4 th block of the Tree node3, and the original hash value of the slot is updated.
Further, after the original hash value in the 6 th slot of the 4 th block of the Tree node3 is updated, the hash value of the data content contained in the main block of the Tree node3 is recalculated, the 2 nd slot of the 2 nd block of the Tree node2 is filled with the hash value, and the original hash value in the slot is updated.
Then, after updating the original hash value of the slot 2 of the block 2 of the Tree node2, recalculating the hash value of the data content contained in the master block of the Tree node2, continuously filling the hash value into the slot 8 of the block 11 of the Tree node1 (i.e., the root node), and updating the original hash value of the slot.
When the original hash value of the 8 th slot of the 11 th block of the root node Tree node1 is updated, the hash value of the data content contained in the main block of the root node Tree node1 is recalculated, and the hash value of the root node of the FDMT state Tree stored in the block header is updated based on the hash value. When the hash value of the root node of the FDMT state tree stored in the block header is updated, and the update of the FDMT state tree is completed, the key-value key value pair formed by the account address "a 71135512" and the corresponding account state data "state 1" is successfully written into the FDMT state tree.
Referring to fig. 11, it is assumed that the bucket node shown in fig. 10 satisfies the node splitting condition described above, and the character suffixes in the several data records included in the bucket node have a shared character prefix "12"; the bucket node shown in fig. 10 may further split an extended node for storing the shared character prefix "12" as shown in fig. 11, and the specific splitting process is not described in detail.
It should be emphasized that the above-mentioned block chain data is taken as the latest account status data corresponding to the block chain account as an example, and is only exemplary; in practical applications, when the block chain data is other types of block chain data of the 4 types of block chain data described above, a specific process of writing the block chain data into the corresponding FDMT tree is similar to the implementation process described above, and is not described in detail in this specification.
Corresponding to the method embodiment, the application also provides an embodiment of the device.
Corresponding to the above method embodiments, the present specification also provides an embodiment of a block chain data storage device.
The embodiments of the block chain data storage apparatus of the present specification can be applied to electronic devices. The device embodiments may be implemented by software, or by hardware, or by a combination of hardware and software. Taking a software implementation as an example, as a logical device, the device is formed by reading, by a processor of the electronic device where the device is located, a corresponding computer program instruction in the nonvolatile memory into the memory for operation.
From a hardware aspect, as shown in fig. 12, the block chain data storage device in this specification is a hardware structure diagram of an electronic device where the block chain data storage device is located, and except for the processor, the memory, the network interface, and the nonvolatile memory shown in fig. 11, the electronic device where the block chain data storage device is located in the embodiment may also include other hardware according to an actual function of the electronic device, which is not described again.
Fig. 12 is a block diagram of a blockchain data storage device shown in an exemplary embodiment of the present description.
Referring to fig. 12, the block chain data storage device 130 can be applied to the electronic device shown in fig. 12, and the device 120 includes:
the obtaining module 1201 obtains a key-value key value pair of block chain data to be stored;
a conversion module 1202, configured to convert the key-value key value pair of the blockchain data to be stored into a root node, an intermediate node, and a leaf node on a logical tree structure; the root node and the middle node comprise a plurality of positions for storing characters in keys of the blockchain data, and each position comprises a plurality of slot positions for storing the characters in the keys of the blockchain data; the slot position is used for storing a hash value of a next layer node linked with the node;
the storage module 1203 is configured to perform encoding processing on the root node, the intermediate node, and the leaf node, and store the key-value key value pairs of the root node, the intermediate node, and the leaf node after the encoding processing in a database; in the key-value key value pairs of the leaf node, the intermediate node and the root node, value is the storage content of the node, and key is the hash value of the storage content of the node; the coding processing comprises bitmap coding processing aiming at the root node and the intermediate node; the root node and the middle node after the bitmap coding processing comprise bitmap coding information; and the bitmap coding information indicates whether the slots in each position of the root node and the middle node are filled with hash values or not.
In this embodiment, the character string corresponding to the key of the blockchain data includes a character prefix and a character suffix; the root node and the intermediate node are used for storing characters in the character prefix; the leaf node is used for storing the character suffix and Value of the blockchain data.
In this embodiment, the apparatus 120 further includes:
a splitting module 1204 (not shown in FIG. 12) that determines whether a storage capacity of a leaf node on the tree structure satisfies a node splitting condition;
splitting at least one intermediate node from the leaf node if the storage capacity of the leaf node meets a node splitting condition; the split intermediate node is used for storing the characters split from the character suffixes stored in the leaf nodes.
In this embodiment, the root node and the intermediate node each include a main position and a plurality of sub positions for storing characters in keys of the blockchain data; the main position comprises a plurality of slot positions which respectively correspond to the sub positions and are used for storing hash values of the storage contents in the sub positions; the sub-position comprises a plurality of slot positions for storing characters in keys of the block chain data; the slot position in the sub position is used for storing the hash value of the next layer node linked with the node; and the hash values of the root node and the intermediate node are the hash of the storage contents in the main positions of the root node and the intermediate node.
In this embodiment, the characters stored in the root node and the intermediate node are character represented by each sub-position in the root node and the intermediate node, and a character string generated by splicing the characters represented by the slot position filled with the hash value in each sub-position. In this embodiment, the leaf node is a bucket; the bucket data barrel comprises a plurality of data records; the data content stored in the data record includes the character suffix and Value of the blockchain data.
In this embodiment, the splitting module:
determining whether a data update occurs to a leaf node on the tree structure;
and if any leaf node on the tree structure has data updating, further determining whether the storage capacity of the leaf node meets a node splitting condition before recalculating the hash value of the leaf node and performing relinking with a node on the upper layer of the leaf node based on the hash value.
The splitting module, the apparatus 120 further comprises:
a merge module 1205 (not shown in FIG. 12) that determines whether an update of data occurs to an extension node on the tree structure;
if any expansion node on the tree structure is updated, before recalculating the hash value of the expansion node and performing relinking with a node on the upper layer of the expansion node based on the hash value, determining whether the storage capacity of the expansion node meets a node merging condition;
and if the storage capacity of the expansion node meets the node merging condition, further merging the expansion node to a leaf node of the next layer linked with the expansion node.
In this embodiment, the data splitting condition includes:
the total number of data records stored in the leaf node is greater than a threshold; and/or the total storage capacity of the data records stored in the leaf nodes is greater than a threshold;
correspondingly, the data merging condition comprises:
the total number of characters stored in the expansion node is less than or equal to a threshold; and/or the total storage capacity of the characters stored in the extension node is less than or equal to a threshold.
In this embodiment, the splitting module 1204:
determining the latest data record written into the leaf node in the current blocking period;
deleting the most recent data record from the leaf node; and the number of the first and second groups,
splitting a character prefix from a character suffix contained in the latest data record, and creating at least one extended node for storing the split character prefix, and at least one leaf node for storing the latest data record after splitting the character prefix.
In this embodiment, the splitting module 1204:
determining shared character prefixes among character suffixes contained in a plurality of data records stored by the leaf node;
deleting the shared character prefix from the data records, and creating at least one extended node for storing the shared character prefix.
In this embodiment, the merging module 1205:
determining a leaf node of a next layer linked with the extension node;
and writing the character stored by the extension node into the leaf node as the character suffix.
In this embodiment, the storage module 1203 further:
before the root node, the middle node and the leaf nodes which are subjected to coding processing are stored in a database, according to bitmap coding information added to the positions of the root node, the middle node and the leaf nodes which are subjected to coding processing, an empty slot position which is not filled with a hash value in the positions is determined, and the empty slot position is deleted from the positions to be determined.
The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. A typical implementation device is a computer, which may take the form of a personal computer, laptop computer, cellular telephone, camera phone, smart phone, personal digital assistant, media player, navigation device, email messaging device, game console, tablet computer, wearable device, or a combination of any of these devices.
In a typical configuration, a computer includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic disk storage, quantum memory, graphene-based storage media or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
The terminology used in the description of the one or more embodiments is for the purpose of describing the particular embodiments only and is not intended to be limiting of the description of the one or more embodiments. As used in one or more embodiments of the present specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It should be understood that although the terms first, second, third, etc. may be used in one or more embodiments of the present description to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of one or more embodiments herein. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.
The above description is only for the purpose of illustrating the preferred embodiments of the one or more embodiments of the present disclosure, and is not intended to limit the scope of the one or more embodiments of the present disclosure, and any modifications, equivalent substitutions, improvements, etc. made within the spirit and principle of the one or more embodiments of the present disclosure should be included in the scope of the one or more embodiments of the present disclosure.