Background
The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art that is already known to a person of ordinary skill in the art.
Block chains, i.e. chains of blocks one after the other. Each block stores certain information, which are connected into a chain according to the time sequence generated by each block, the chain is stored in all servers, and the whole block chain is safe as long as one server in the whole system can work. These servers, referred to as nodes in the blockchain system, provide storage space and computational support for the entire blockchain system. If the information in the block chain is to be modified, more than half of the nodes must be authenticated and the information in all the nodes must be modified, and the nodes are usually held in different hands of different subjects, and it is extremely difficult to tamper with the information in the block chain. Therefore, the information recorded by the block chain is more real and reliable, and the problem that people are not trusted mutually can be solved.
In the actual operation process of the blockchain network, problems of network jitter, disk faults and the like can occur, the execution speed of part of nodes lags behind most of nodes, and if the number of the lagged nodes exceeds the fault tolerance upper limit, the whole blockchain network presents an unavailable state, which is not acceptable in the actual production environment. The conventional block chain network recovery method of the Byzantine consensus mechanism (PBFT) has two types: when the main node has a problem or does not respond after overtime, the main node is triggered to switch the view change (viewchange) flow to the main node, and after the view change is completed, the main node is switched to the next node; at the time of checkpoint (checkpoint), if a node finds that a block falls behind, a recovery process is triggered, and data before a stable checkpoint is pulled from other nodes, and the following problems still exist in such a recovery mechanism:
(1) When the network is unstable, the view change process is easily triggered, and the view change process can be successfully switched only when the network is stable, so that the whole block chain network cannot normally process transactions due to network problem switching failure in the actual execution process;
(2) The triggering of the recovery flow at the check point is passive and can be triggered only by the execution process of the check point, and the check point usually has a certain time interval in an actual environment, so that the backward node cannot normally process the transaction for a long time;
(3) For the laggard nodes, if the checkpoint is found to lag behind the self-stability checkpoint, the laggard nodes can only recover to the latest stability checkpoint, and the laggard consensus information behind the checkpoint cannot be obtained, so that the laggard consensus information cannot really participate in the consensus all the time.
Disclosure of Invention
In order to solve the above problems, the present invention provides a method for recovering a failure of a link node of a block and a system of the link node of the block, which avoid the problem of unavailable nodes caused by node block lag by adding a node block lag discovery mechanism in consensus, discovering node block lag and actively performing node synchronization.
In order to achieve the above object, the present invention mainly includes the following aspects:
in a first aspect, an embodiment of the present invention provides a method for recovering a failure of a block link point, including:
detecting node states of a block chain node, wherein the node states comprise a normal state and a recovery state;
if the block chain link point is in a normal state, verifying and voting the generated block by using a Byzantine consensus mechanism, switching the node state from the normal state to a recovery state under the condition that the block chain link point block or the view is found to be backward, and sending a recovery request message;
and receiving the fed back recovery response message, and recovering the block chain nodes to the latest blocks and views according to the content carried in the recovery response message.
In a possible implementation manner, a target block height of the block chain node is obtained, and if a consensus message that a preset number of block heights are greater than the target block height is received, it is determined that the block of the block chain node lags behind.
And acquiring the target view height of the blockchain node, and if a consensus message that the preset number of view heights are greater than the target view height is received, judging that the view of the blockchain node lags behind.
Further, the method also comprises the following steps: and switching the node state from the normal state to the recovery state when the block of the block chain node is found to be lagged in the execution process of the check point, and sending a recovery request message.
After restoring the blockchain node to the latest block and view, further comprising: and switching the node state to a normal state to process the transaction request.
Further, the node states further include a view switching state; under the condition that the block link point is in a normal state, if the transaction request is not processed due to timeout or the view is found to be behind, the node state is switched from the normal state to a view switching state so as to switch the view.
And if the view switching confirmation message is received, judging that the view of the block chain node is behind.
And under the condition that the block chain node is in the view switching state, performing view switching, and when the view switching is finished, switching the node state from the view switching state to the normal state.
In the view switching process, if the master node in the view does not respond, the master node is reselected to carry out next round of view switching.
In a second aspect, an embodiment of the present invention further provides a blockchain system, which includes a plurality of blockchain nodes, where each blockchain node performs fault recovery by using the method for recovering a fault of a blockchain node according to the first aspect.
The above one or more technical solutions have the following beneficial effects:
(1) The invention provides a fault recovery method for block chain nodes, which is characterized in that a block lagging discovery mechanism of block chain nodes in consensus is added, the node blocks are found to be lagging, node synchronization is actively carried out, the latest blocks and views are recovered, transaction messages are processed in time, and the problem of node unavailability caused by node block lagging is avoided;
(2) By adding a node view lagging discovery mechanism, when the network is unstable, the node can still discover and synchronize the view in time when the node does not receive a transaction message or a view switching message, so that the problem of node unavailability caused by node view lagging is avoided.
Detailed Description
The invention is further described with reference to the following figures and examples.
It is to be understood that the following detailed description is exemplary and is intended to provide further explanation of the invention as claimed. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the invention. As used herein, the singular forms "a", "an", and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
Example one
Referring to fig. 1, the present embodiment provides a method for recovering a failure of a block link point, which includes the following steps:
s101: detecting node states of a block chain node, wherein the node states comprise a normal state and a recovery state;
s102: if the block link point is in a normal state, verifying and voting the generated block by using a Byzantine consensus mechanism, switching the node state from the normal state to a recovery state under the condition that the block link point block or the view is found to be backward, and sending a recovery request message;
s103: and receiving the fed back recovery response message, and recovering the block chain nodes to the latest blocks and views according to the content carried in the recovery response message.
The consensus mechanism is how to achieve consensus among all blockchain nodes to identify the validity of a record, and is a means for identification and a means for preventing tampering. The block chain provides a plurality of different consensus mechanisms, is suitable for different application scenes, and balances efficiency and safety. The PBFT (physical Byzantine Fault permission) is a consistency algorithm based on message passing, and the algorithm achieves consistency through three stages. The PBFT consensus is high in efficiency, can meet the requirements of commercial real-time processing and high-frequency trading, and a block chain system based on the PBFT consensus is widely applied to the field of commercialization, and application groups mainly comprise banks, insurance, securities, business associations, enterprise groups and the like.
The existing PBFT Byzantine consensus mechanism does not realize the active recovery function, the execution speed of part of nodes lags behind most of nodes, and if the number of the lagged nodes exceeds the fault-tolerant upper limit, the whole block chain network can present an unavailable state.
Based on this, the embodiment of the present invention provides an active recovery mechanism for a block link point, where the recovery mechanism can be actively triggered to quickly recover the synchronous data of the laggard nodes to a normal state, so as to normally participate in subsequent consensus, thereby ensuring stable and normal operation of the whole block link network. In this embodiment, the block chain nodes are divided into different node states, where the node states include a normal state and a recovery state, the block chain nodes in the normal state can process any message, and the block chain nodes in the recovery state only process the recovery message.
The block chain node is started to default to a normal state, when the block chain node is in the normal state, the generated block is verified and voted by using a Byzantine consensus mechanism, and when the block chain node is found to be out of date, the node state is switched to a recovery state, and a recovery request message is sent; and after receiving the recovery response message, recovering the block chain nodes to the latest blocks and views by comparing the content of the recovery response message. Therefore, by adding a block backward discovery mechanism of the block chain node in the consensus, discovering that the block of the node is backward, actively synchronizing the node, recovering to the latest block and view, and timely processing transaction messages, the problem of node unavailability caused by backward block of the node is avoided.
As an alternative, a block link point block is found to be behind when:
(1) And acquiring the target block height of the block chain nodes, and if a consensus message that the block heights of a preset number of blocks are greater than the target block height is received, judging that the blocks of the block chain nodes fall behind.
In practical application, the total number of block link points is 3f +1, f is the number of Byzantine error nodes, each block link point stores a block height of the node, the consensus message includes the block height of the message source node, the block link receives 2f +1 consensus messages, and if the block height is greater than the block height of the node, the block of the block link node falls behind. For example, the block height of the node 0 is 1, the block height of the node 0 receiving the node 1 consensus message is 2, and if the node 0 receives 2f +1 block heights is 2, it means that the block of the node 0 is behind.
(2) And acquiring the target view height of the blockchain node, and if a consensus message that the preset number of view heights are greater than the target view height is received, judging that the view of the blockchain node lags behind.
In practical application, each node stores a view height, the consensus message contains the view height of the message source node, and the node receives 2f +1 consensus messages, wherein the view is larger than the view of the node, and the view of the node of the block chain is determined to be behind. For example, the view height of the node 0 is 1, the view height of the node 0 receiving the 1-node consensus message is 2, and if the node 0 receiving 2f +1 view heights are all 2, it indicates that the view of the node 0 is behind.
(3) And switching the node state into a recovery state when the block of the block chain node is found to be lagged in the execution process of the check point, and sending a recovery request message.
After restoring the blockchain node to the latest block and view, switching the node state to a normal state to process the transaction request.
As an optional implementation, the node states further include a view switching state; and under the condition that the block link point is in a normal state, if the transaction request is not processed due to timeout or the view is found to be behind, switching the node state from the normal state to a view switching state so as to switch the view.
Optionally, if 2f +1 view switching confirmation messages are received, it is determined that the view of the block chain node is behind. And under the condition that the block chain node is in the view switching state, performing view switching, and when the view switching is finished, switching the node state from the view switching state to the normal state. And in the view switching process, if the master node in the view does not respond, the master node is reselected to carry out the next view switching.
In a specific implementation, the block chain node states are first classified into the following three categories:
(1) Ready state: the Reay state is a normal state and can process any message;
(2) Viewchange status: the Viewchange state is that the node enters a view switching state and does not process the consensus information;
(3) Catch up status: the catcher state is that the node enters the recovery state and only the recovery message is processed.
Further, the node state switching mechanism is as follows:
(1) The block chain node is started and defaults to a Ready state;
(2) When the blockchain node is in the Ready state, the node is switched to the Viewchange state when the following conditions occur:
a) Transaction timeout is not processed;
b) Finding that the current view is behind in the view message processing process, the node receives 2f +1 view switching confirmation messages;
(3) When the blockchain node is in a Viewchange state, the node is switched to a Reay state when the following conditions occur:
a) The view switching is completed;
b) View switch timeout;
(4) When the block chain node is in a Viewchange state, if the master node in the view responds, the master node is reselected to switch the view of the next round;
(5) When the blockchain node is in a Ready state, the node is switched to a catch up state and sends a recovery request message to other nodes when the following conditions occur:
a) Finding that the node block lags behind in the execution process of the check point;
b) Finding that the block of the node is behind in real time in the consensus process, and receiving 2f +1 consensus messages by the node, wherein the block height is greater than the block height of the node;
c) In the consensus process, the node views are found to be behind in real time, and the nodes receive 2f +1 consensus messages, wherein the views are larger than the node views;
(6) When the blockchain node is in a cache state, the node is switched to a real state when the following conditions occur:
a) After receiving the recovery response message, the nodes compare the contents of the response message, and if the 2f +1 node confirms the contents of the block, the block is recovered, so that the latest block and view are recovered;
b) The node resumes timeout.
By adding a node view lagging discovery mechanism, when the network is unstable, the node can still discover and synchronize views in time when the node does not receive a transaction message or does not receive a combined view switching message, so that the problem of node unavailability caused by node view lagging is avoided; and a node view lagging discovery mechanism is also added, when the network is unstable, the node can still discover and synchronize the view in time when not receiving the transaction message or the view switching message, and the problem of node unavailability caused by node view lagging is avoided.
Example two
The embodiment of the invention also provides a block chain system which comprises a plurality of block chain nodes, wherein each block chain node adopts the block chain node fault recovery method to recover the fault.
The detailed description of the embodiment can be found in the foregoing embodiment of the block link point fault recovery method, and is not repeated herein.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.