Cache code and invalid jump destination tables (fixes #2268) #2404

arnetheduck · 2024-06-20T19:28:10Z

It is common for many accounts to share the same code - at the database level, code is stored by hash meaning only one copy exists per unique program but when loaded in memory, a copy is made for each account.

Further, every time we execute the code, it must be scanned for invalid jump destinations which slows down EVM exeuction.

Finally, the extcodesize call causes code to be loaded even if only the size is needed.

This PR improves on all these points by introducing a shared CodeBytesRef type whose code section is immutable and that can be shared between accounts. Further, a dedicated len API call is added so that the EXTCODESIZE opcode can operate without polluting the GC and code cache, for cases where only the size is requested - rocksdb will in this case cache the code itself in the row cache meaning that lookup of the code itself remains fast when length is asked for first.

With 16k code entries, there's a 90% hit rate which goes up to 99% during the 2.3M attack - the cache significantly lowers memory consumption and execution time not only during this event but across the board.

It is common for many accounts to share the same code - at the database level, code is stored by hash meaning only one copy exists per unique program but when loaded in memory, a copy is made for each account. Further, every time we execute the code, it must be scanned for invalid jump destinations which slows down EVM exeuction. Finally, the extcodesize call causes code to be loaded even if only the size is needed. This PR improves on all these points by introducing a shared CodeBytesRef type whose code section is immutable and that can be shared between accounts. Further, a dedicated `len` API call is added so that the EXTCODESIZE opcode can operate without polluting the GC and code cache, for cases where only the size is requested - rocksdb will in this case cache the code itself in the row cache meaning that lookup of the code itself remains fast when length is asked for first. With 16k code entries, there's a 90% hit rate which goes up to 99% during the 2.3M attack - the cache significantly lowers memory consumption and execution time not only during this event but across the board.

arnetheduck added 6 commits June 20, 2024 21:28

add comparator

c03fe69

fixes

1ddf16a

copyright

024cc62

evmc fixes

caa0044

restore decompile

e141783

arnetheduck merged commit 768307d into master Jun 21, 2024
26 checks passed

arnetheduck deleted the code-cache branch June 21, 2024 07:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache code and invalid jump destination tables (fixes #2268) #2404

Cache code and invalid jump destination tables (fixes #2268) #2404

arnetheduck commented Jun 20, 2024

Cache code and invalid jump destination tables (fixes #2268) #2404

Cache code and invalid jump destination tables (fixes #2268) #2404

Conversation

arnetheduck commented Jun 20, 2024