Many peers disconnect when lots of transactions in a block are processed #345

kdeme · 2019-06-28T09:27:49Z

Problem as mentioned before, ideally we can offload this work. I did not debug this in dept, but I assume it is because we simply do no respond on certain requests, so the peers flags us as useless.
Created an issue here to remain aware that this happens.

zah · 2019-06-28T09:33:00Z

Our current plan is to solve this by introducing a separate thread for the block processing. The network I/O thread will communicate with it through an AsyncChannel (currently being developed in Chronos).

stefantalpalaru · 2019-06-28T15:56:45Z

a separate thread for the block processing

This is an embarrassingly parallel scenario, so it's worth using workers in a thread pool.

jlokier · 2021-05-11T16:36:06Z

It's not trivially embarrasingly parallel, because one of the things EVM executions spend most time on is fetching and updating account states, contract code, storage slots. That's only parallel if the database accesses are parallel, and although the database APIs can be called in multiple threads, there's a lot of synchronisation we don't see going on.

Still, it's possible to cache temporarily written states locally in a block-processing thread, so it's only database reads that need run in parallel in this scheme.

jlokier · 2021-05-11T16:53:21Z

The plan has changed a little, away from using a threadpool perhaps.

The Nimbus EVM is now almost at a point where we can run it in an async manner under Chronos. The same mechanism as used to reduce the stack usage, ie. a continuation, is used to suspend EVM execution when the host says it is waiting for data.

The async pragma adds a high speed penalty, so it's not actually used in the EVM, which doesn't even know about Chronos.

Because all the suspension points occur outside the EVM (the "host" side of EVMC), all the suspension logic is confined to those points as well, except for the EVM's ability to be told it must pause on certain operations, and return to its caller with a continuation.

In the stack-reduction patches, the EVM always uses a continuation for nested calls. In the async flavour it only uses a continuation when EVMC host operations reply with a "suspend now" flag. This is a bit different from Chronos, where async functions always return a Future[T]. The difference matters for fetch-from-cache operations, which many of the EVMC host calls are. When there are a lot of cache hits, it's best if the caller continues with its best path in the fast-path, cache-hit case. And it also matters for an interpreter loop in Nim, when composing heavy functions that might suspend (like SLOAD and CALL) alongside tiny, fast functions that never suspend like ADD.

By itself, this is enough to support concurrent EVMs without using threads, and it might be that EVMs always make enough calls to fetch storage data that we can add extra suspensions even when the data is in memory, just to allow the network processes to keep up.

However, if they don't pause often enough, the next step is to make the EVM pre-emptive as well, so it always checks whether enough time has passed for a suspension.

A timer may be needed to set this flag, but the current idea is to maintain an artificially capped gas limit, which when triggered causes a pre-emption time check or simply yields, and then is replenished with more from the real gas limit. Modifying the gas limit in this way provides the check for free in the interpreter loop, and because gas is designed to reflect performance, it may well be a good enough approximation to a pre-emption timer without requiring a timer thread or signal to be used.

The preference for trying async EVM instead of running it in a threadpool came out of discussions a few weeks ago with @zah and @arnetheduck. It's not obvious this is better than threads (and moving the database and state cache to threads etc), not least because threads can use spare CPU cores. However, there are advantages to either approach and "small devices", with Nimbus confined to a single core for CPU work, was a motivation for having the async option to try out.

Having said that, there is still a thread pool in the plan. We need a threadpool to run concurrent third party EVMs (as a plugin via EVMC). That means as a side-effect of supporting third-party EVMs for Berlin HF sync we will still end up, eventually, with a threadpool option for Nimbus EVM. But with async falling out nicely from the current architecture, and threadpool EVM being a bit more complicated than spawning parallel threads due to the database aspect, async is probably the mode that will be working first.

This combines two things, a C stack usage change with EVM nested calls via EVMC, and changes to host call tracing. Feature-wise, the tracing is improved: - Storage keys and values are make more sense. - The message/result/context objects are shown with all relevant fields. - `call` trace is split into entry/exit, so these can be shown around the called contract's operations, instead of only showing the `call` parameters after the nested call is finished. - Nested calls are indented, which helps to highlight the flow. - C stack usage considerably reduced in nested calls when more functionality is enabled (either tracing here, or other things to come). This will seem like a minor patch, but C stack usage was the real motivation, after plenty of time in the debugger. Nobody cares about stack when `showTxCalls` (you can just use a big stack when debugging). But these subtle changes around the `call` path were found to be necessary for passing all tests when the EVMC nested call code is completed, and that's a prerequisite for many things: async EVM, dynamic EVM, Beam Sync, and to fix #345. Signed-off-by: Jamie Lokier <[email protected]>

jlokier added the Sync Prevents or affects sync with Ethereum network label May 11, 2021

jlokier mentioned this issue Aug 10, 2021

EVMC: Improve host call tracing and fix nested call C stack usage #795

Merged

jangko closed this as completed in 3047c83 Aug 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Many peers disconnect when lots of transactions in a block are processed #345

Many peers disconnect when lots of transactions in a block are processed #345

kdeme commented Jun 28, 2019 •

edited

Loading

zah commented Jun 28, 2019

stefantalpalaru commented Jun 28, 2019

jlokier commented May 11, 2021

jlokier commented May 11, 2021 •

edited

Loading

Many peers disconnect when lots of transactions in a block are processed #345

Many peers disconnect when lots of transactions in a block are processed #345

Comments

kdeme commented Jun 28, 2019 • edited Loading

zah commented Jun 28, 2019

stefantalpalaru commented Jun 28, 2019

jlokier commented May 11, 2021

jlokier commented May 11, 2021 • edited Loading

kdeme commented Jun 28, 2019 •

edited

Loading

jlokier commented May 11, 2021 •

edited

Loading