Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Many peers disconnect when lots of transactions in a block are processed #345

Closed
kdeme opened this issue Jun 28, 2019 · 4 comments
Closed
Labels
Sync Prevents or affects sync with Ethereum network

Comments

@kdeme
Copy link
Contributor

kdeme commented Jun 28, 2019

Problem as mentioned before, ideally we can offload this work. I did not debug this in dept, but I assume it is because we simply do no respond on certain requests, so the peers flags us as useless.
Created an issue here to remain aware that this happens.

@zah
Copy link
Member

zah commented Jun 28, 2019

Our current plan is to solve this by introducing a separate thread for the block processing. The network I/O thread will communicate with it through an AsyncChannel (currently being developed in Chronos).

@stefantalpalaru
Copy link
Contributor

a separate thread for the block processing

This is an embarrassingly parallel scenario, so it's worth using workers in a thread pool.

@jlokier jlokier added the Sync Prevents or affects sync with Ethereum network label May 11, 2021
@jlokier
Copy link
Contributor

jlokier commented May 11, 2021

It's not trivially embarrasingly parallel, because one of the things EVM executions spend most time on is fetching and updating account states, contract code, storage slots. That's only parallel if the database accesses are parallel, and although the database APIs can be called in multiple threads, there's a lot of synchronisation we don't see going on.

Still, it's possible to cache temporarily written states locally in a block-processing thread, so it's only database reads that need run in parallel in this scheme.

@jlokier
Copy link
Contributor

jlokier commented May 11, 2021

The plan has changed a little, away from using a threadpool perhaps.

The Nimbus EVM is now almost at a point where we can run it in an async manner under Chronos. The same mechanism as used to reduce the stack usage, ie. a continuation, is used to suspend EVM execution when the host says it is waiting for data.

The async pragma adds a high speed penalty, so it's not actually used in the EVM, which doesn't even know about Chronos.

Because all the suspension points occur outside the EVM (the "host" side of EVMC), all the suspension logic is confined to those points as well, except for the EVM's ability to be told it must pause on certain operations, and return to its caller with a continuation.

In the stack-reduction patches, the EVM always uses a continuation for nested calls. In the async flavour it only uses a continuation when EVMC host operations reply with a "suspend now" flag. This is a bit different from Chronos, where async functions always return a Future[T]. The difference matters for fetch-from-cache operations, which many of the EVMC host calls are. When there are a lot of cache hits, it's best if the caller continues with its best path in the fast-path, cache-hit case. And it also matters for an interpreter loop in Nim, when composing heavy functions that might suspend (like SLOAD and CALL) alongside tiny, fast functions that never suspend like ADD.

By itself, this is enough to support concurrent EVMs without using threads, and it might be that EVMs always make enough calls to fetch storage data that we can add extra suspensions even when the data is in memory, just to allow the network processes to keep up.

However, if they don't pause often enough, the next step is to make the EVM pre-emptive as well, so it always checks whether enough time has passed for a suspension.

A timer may be needed to set this flag, but the current idea is to maintain an artificially capped gas limit, which when triggered causes a pre-emption time check or simply yields, and then is replenished with more from the real gas limit. Modifying the gas limit in this way provides the check for free in the interpreter loop, and because gas is designed to reflect performance, it may well be a good enough approximation to a pre-emption timer without requiring a timer thread or signal to be used.

The preference for trying async EVM instead of running it in a threadpool came out of discussions a few weeks ago with @zah and @arnetheduck. It's not obvious this is better than threads (and moving the database and state cache to threads etc), not least because threads can use spare CPU cores. However, there are advantages to either approach and "small devices", with Nimbus confined to a single core for CPU work, was a motivation for having the async option to try out.

Having said that, there is still a thread pool in the plan. We need a threadpool to run concurrent third party EVMs (as a plugin via EVMC). That means as a side-effect of supporting third-party EVMs for Berlin HF sync we will still end up, eventually, with a threadpool option for Nimbus EVM. But with async falling out nicely from the current architecture, and threadpool EVM being a bit more complicated than spawning parallel threads due to the database aspect, async is probably the mode that will be working first.

jlokier added a commit that referenced this issue Aug 10, 2021
This combines two things, a C stack usage change with EVM nested calls
via EVMC, and changes to host call tracing.

Feature-wise, the tracing is improved:

- Storage keys and values are make more sense.
- The message/result/context objects are shown with all relevant fields.
- `call` trace is split into entry/exit, so these can be shown around the
  called contract's operations, instead of only showing the `call` parameters
  after the nested call is finished.
- Nested calls are indented, which helps to highlight the flow.
- C stack usage considerably reduced in nested calls when more functionality
  is enabled (either tracing here, or other things to come).

This will seem like a minor patch, but C stack usage was the real motivation,
after plenty of time in the debugger.

Nobody cares about stack when `showTxCalls` (you can just use a big stack when
debugging).  But these subtle changes around the `call` path were found to be
necessary for passing all tests when the EVMC nested call code is completed,
and that's a prerequisite for many things: async EVM, dynamic EVM, Beam Sync,
and to fix #345.

Signed-off-by: Jamie Lokier <[email protected]>
@jangko jangko closed this as completed in 3047c83 Aug 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Sync Prevents or affects sync with Ethereum network
Projects
None yet
Development

No branches or pull requests

4 participants