discv4: Fix Kademlia crash when trying to sync #342
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #341, fixes status-im/nimbus-eth1#489.
When using discv4 (Kademlia) to find peers, there is a crash after a few minutes. It occurs for most of us on Eth1 mainnet, and everyone on Ropsten.
The cause is
findNodes
being called twice in succession to the same peer, within about 5 seconds of each other. ("About" 5 seconds, because Chronos does not guarantee to run the timeout branch at a particular time, due to queuingand clock reading delays.)
Then
findNodes
sends a duplicate message to the peer and callswaitNeighbours
to listen for the reply. There's already awaitNeighbours
callback in a shared table, so that function hits an assert failure.Ignoring the assert would be wrong as it would break timeout logic, and sending
FindNodes
twice in rapid succession also makes us a bad peer.As a simple workaround, just skip
findNodes
in this state and return a fake emptyNeighbours
reply. This is a bit of a hack asfindNodes
should not be called like this; there's a logic error at a higher level. But it works.Tested for about 4 days constant operation on Ropsten. The crash which occured every few minutes no longer occurs, and discv4 keeps working.