RLPx out of bounds crash when downloading from Goerli #767

jlokier · 2021-07-24T01:41:25Z

Reported by @mjfh, occurring regularly with Nimbus-eth1 on the Goerli network:

Can confirm thal on my Goerli block collection test. Reproducible event:

/home/jordan/nimbus-eth1/nimbus/nimbus.nim(264) nimbus
/home/jordan/nimbus-eth1/nimbus/nimbus.nim(226) process
/home/jordan/nimbus-eth1/vendor/nim-chronos/chronos/asyncloop.nim(279) poll
/home/jordan/nimbus-eth1/vendor/nim-chronos/chronos/asyncmacro2.nim(74) colonanonymous
/home/jordan/nimbus-eth1/vendor/nim-eth/eth/p2p/rlpx.nim(619) dispatchMessages
/home/jordan/nimbus-eth1/vendor/nimbus-build-system/vendor/Nim/lib/system/chcks.nim(23) raiseIndexError2
/home/jordan/nimbus-eth1/vendor/nimbus-build-system/vendor/Nim/lib/system/fatal.nim(49) sysFatal
[[reraised from:
/home/jordan/nimbus-eth1/nimbus/nimbus.nim(264) nimbus
/home/jordan/nimbus-eth1/nimbus/nimbus.nim(226) process
/home/jordan/nimbus-eth1/vendor/nim-chronos/chronos/asyncloop.nim(279) poll
/home/jordan/nimbus-eth1/vendor/nim-chronos/chronos/asyncmacro2.nim(96) colonanonymous
]]
Error: unhandled exception: index 33 not in 0 .. 32 [IndexError]

with always the same index bat varying time until the error occurs (best run was to collect 500k blocks which was enough for my replay.)

The text was updated successfully, but these errors were encountered:

jlokier · 2021-07-24T02:40:01Z

I haven't seen this when testing on Goerli. Perhaps they are rare events so tend not to be seen testing on a fast syncing network.

The out of bounds index is msgId in if peer.awaitedMessages[msgId] != nil:. That's processing incoming messages.

Message 33 is the eth protocol message Receipts. Ie. it's a valid message.

(If it was an invalid message, it would be caught by invokeThunk first, which checks if msgId is valid.)

There is a bug (easily fixed) which is that awaitedMessages size is off by 1.

However there is something else interesting going on here. Receipts should only ever be received in response to a GetReceipts request we have sent, and we don't send those currently. So why did we ge a Receipts message, and which client is sending those unsolicited?

jlokier · 2021-07-29T08:38:41Z

Alright, after more testing:

The reason I didn't see this on Goerli is it doesn't happen when running with eth/65.

The culprit is a client called dangnn/v1.9.10-stable/windows-amd64/go1.16.2.
@mjfh, next time when reporting a bug, the trace output shortly before a stacktrace event would be helpful.
In this case it's:

TRC 2021-07-24 04:17:52.264+01:00 Received Hello                             topics="rlpx" tid=1027594 file=rlpx.nim:1208 version=5 id=dangnn/v1.9.10-stable/windows-amd64/go1.16.2
TRC 2021-07-24 04:17:52.264+01:00 devp2p handshake completed                 topics="rlpx" tid=1027594 file=rlpx.nim:1219 peer=Node[REDACTED:6709] clientId=dangnn/v1.9.10-stable/windows-amd64/go1.16.2
TRC 2021-07-24 04:17:52.264+01:00 DevP2P local and remote protocols          topics="rlpx" tid=1027594 file=rlpx.nim:209 selected=[eth/63] localOffer=[eth/64] remoteOffer="[eth/63, eth/64]"
TRC 2021-07-24 04:17:52.264+01:00 >> Sending eth.Status (0x00) [eth/63]      tid=1027594 file=eth65.nim:52 peer=Node[REDACTED:6709] td=2 bestHash=e9dc478239532458976ae9286ef84f3ebe345c05cc5e74d67f0c49ab37dc4805 networkId=5 genesis=bf7e331f7f7c1dd2e05159666b3bf8bc7a8a3a9eb1d518969eab529dd9b88c1a
TRC 2021-07-24 04:17:52.474+01:00 Waiting for more peers                     tid=1027594 file=p2p.nim:115 peers=0
WRN 2021-07-24 04:17:52.538+01:00 Error while handling RLPx message          topics="rlpx" tid=1027594 file=rlpx.nim:620 peer=Node[210.91.82.155:6709] msg=33 err="RLPx message with an invalid id 33 on a connection supporting eth"

Amazingly, the RLPx specification is wrong (PR sent).
I was wrong about message 33 being Receipts, due to being misled by the RLPx spec error. 33 is invalid. It's actually out of range.
It's not a bug in awaitedMessages size: The size is correct. The bug is the range check in earlier invokeThunk doesn't cause an early return and awaitedMessages is then read with out of range msgId.
dangnn/v1.9.10 is sending its eth.Status message with incorrect msgId 33 instead of 16. It's buggy software on the network. It also uses 33 if we negotiate eth/64 instead of eth/63.
Even though it's a buggy peer, it's a networking bug in Nimbus that RLPx is vulnerable to be crashed by out of range msgId. The cause is when the range check in earlier invokeThunk detects a bad message, it then falls through to read awaitedMessages out of range. It should probably terminate the dispatch loop instead.

zah · 2021-07-29T12:19:37Z

Regarding the RPLx problem, indeed, disconnecting the peer with BreachOfProtocol seems the right way to go.

Closes [nimbus-eth1#767](status-im/nimbus-eth1#767). Crashes occur when certain invalid RLPx messages are received from a peer. Specifically, `msgId` out of range. Because any peer can easily trigger this crash, we'd consider it a DOS vulnerability if Nimbus-eth1 was in general use. We noticed when syncing to Goerli, there were some rare crashes with this exception. It turned out one peer with custom code, perhaps malfunctioning, was sending these messages if we were unlucky enough to connect to it. `invokeThunk` is called from `dispatchMessages` and checks the range of `msgId`. It correctly recognise that it's out of range, raises and exception and produces a message. Job done. Except the code in `dispatchMessage` treats all that as a warning instead of error, and continues to process the message. A bit lower down, `msgId` is used again without a range check. The trivial fix is to check array bounds before access. -- ps. Here's the stack trace ("reraised" sections hidden): ``` WRN 2021-11-08 21:29:33.238+00:00 Error while handling RLPx message topics="rlpx" tid=2003472 file=rlpx.nim:607 peer=Node[<IP>:45456] msg=45 err="RLPx message with an invalid id 45 on a connection supporting eth,snap" /home/jamie/Status/nimbus-eth1/nimbus/p2p/chain/chain_desc.nim(437) main /home/jamie/Status/nimbus-eth1/nimbus/p2p/chain/chain_desc.nim(430) NimMain /home/jamie/Status/nimbus-eth1/nimbus/nimbus.nim(258) process /home/jamie/Status/nimbus-eth1/vendor/nim-chronos/chronos/asyncloop.nim(279) poll /home/jamie/Status/nimbus-eth1/vendor/nim-chronos/chronos/asyncmacro2.nim(74) colonanonymous /home/jamie/Status/nimbus-eth1/vendor/nim-eth/eth/p2p/rlpx.nim(1218) rlpxAccept /home/jamie/Status/nimbus-eth1/vendor/nim-chronos/chronos/asyncmacro2.nim(101) postHelloSteps /home/jamie/Status/nimbus-eth1/vendor/nim-chronos/chronos/asyncmacro2.nim(74) colonanonymous /home/jamie/Status/nimbus-eth1/vendor/nim-eth/eth/p2p/rlpx.nim(985) postHelloSteps /home/jamie/Status/nimbus-eth1/vendor/nim-chronos/chronos/asyncmacro2.nim(101) dispatchMessages /home/jamie/Status/nimbus-eth1/vendor/nim-chronos/chronos/asyncmacro2.nim(77) colonanonymous /home/jamie/Status/nimbus-eth1/vendor/nim-eth/eth/p2p/rlpx.nim(614) dispatchMessages /home/jamie/Status/nimbus-eth1/vendor/nimbus-build-system/vendor/Nim/lib/system/chcks.nim(23) raiseIndexError2 /home/jamie/Status/nimbus-eth1/vendor/nimbus-build-system/vendor/Nim/lib/system/fatal.nim(49) sysFatal [[reraised from: ... ]] [[reraised from: ... ]] [[reraised from: ... ]] [[reraised from: ... ]] Error: unhandled exception: index 45 not in 0 .. 40 [IndexError] ``` Signed-off-by: Jamie Lokier <[email protected]>

jlokier mentioned this issue Jul 24, 2021

RLPx out of range crash triggered by message in auth handshake #766

Open

jlokier added Networking Security Security vulnerability or security related labels Jul 24, 2021

jlokier mentioned this issue Oct 15, 2021

Answering: "What happens today when we run nimbus --goerli?" #862

Closed

jlokier mentioned this issue Nov 30, 2021

Security/RLPx: Fix crash when peer sends out of bounds message id status-im/nim-eth#438

Merged

zah closed this as completed in status-im/nim-eth#438 Nov 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RLPx out of bounds crash when downloading from Goerli #767

RLPx out of bounds crash when downloading from Goerli #767

jlokier commented Jul 24, 2021

jlokier commented Jul 24, 2021

jlokier commented Jul 29, 2021 •

edited

Loading

zah commented Jul 29, 2021

RLPx out of bounds crash when downloading from Goerli #767

RLPx out of bounds crash when downloading from Goerli #767

Comments

jlokier commented Jul 24, 2021

jlokier commented Jul 24, 2021

jlokier commented Jul 29, 2021 • edited Loading

zah commented Jul 29, 2021

jlokier commented Jul 29, 2021 •

edited

Loading