Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

resource usage during sync #262

Closed
stefantalpalaru opened this issue Mar 5, 2019 · 18 comments
Closed

resource usage during sync #262

stefantalpalaru opened this issue Mar 5, 2019 · 18 comments
Labels
Sync Prevents or affects sync with Ethereum network

Comments

@stefantalpalaru
Copy link
Contributor

stefantalpalaru commented Mar 5, 2019

SVG graph with per-process statistics provided by pidstat (missing the network activity, for now, but still interesting):
SVG graph

That CPU usage over 100% must come from the multi-threaded rocksdb library.

@jangko
Copy link
Contributor

jangko commented Mar 5, 2019

interesting, disk write pattern shows there is still room for improvements.

@stefantalpalaru
Copy link
Contributor Author

Another run of ./build/nimbus --prune:archive --port:30304, with network traffic added (using nethogs), better colours and some variables drawn as areas instead of lines.

Full dataset, one second per pixel:

nimbus3-long.svg

Five seconds per pixel, to better see the memory leak:

nimbus3-short.svg

Except for the short RocksDB spikes every 6-7 minutes or so, most of the time is spent waiting for data from the network or maxing out a CPU core while processing that data. It all looks very serialised, which means it will benefit from parallelisation.

The average download speed is extremely low, at 7.83 kB/s. Disk I/O is a non-issue right now, on this SSD I'm using.

@jangko
Copy link
Contributor

jangko commented Mar 6, 2019

is the keep climbing up red-line-RSS an indicator of memory leak? if yes, that is very bad.

@stefantalpalaru
Copy link
Contributor Author

stefantalpalaru commented Mar 6, 2019

Yep: https://en.wikipedia.org/wiki/Resident_set_size
When it drops, thats's a garbage collection. I don't see a legitimate reason to hang on to so much data in RAM, during execution, so my guess is that the upward trend is due to a memory leak.

What's weird is that even the stack keeps growing, albeit much slower.

@jangko
Copy link
Contributor

jangko commented Mar 6, 2019

which region of blocks do you sync? I mean the block number. I noticed during block 600K to 700K, memory consumption is very high then stable at block 800K to 900K.
I think I will do some measurement myself to improve block sync speed.

@stefantalpalaru: can you share some script with me? how did you produce that svg?

@stefantalpalaru
Copy link
Contributor Author

which region of blocks do you sync? I mean the block number. I noticed during block 600K to 700K, memory consumption is very high then stable at block 800K to 900K.

I started with an empty db and let it run until it crashed due to an assert in transaction rollback (vendor/nim-eth/eth/trie/db.nim:145 - "doAssert t.db.mostInnerTransaction == t and t.state == Pending").

I don't see block numbers in the output log, because those are logged at the TRACE level which is not included by default.

@stefantalpalaru: can you share some script with me? how did you produce that svg?

Freshly published: https://github.com/status-im/process-stats

@jangko
Copy link
Contributor

jangko commented Mar 6, 2019

thank you very much.

@jangko
Copy link
Contributor

jangko commented Mar 19, 2019

The backend database contribute significantly to block syncing speed.
When the database size already reach 20GB+, it become slower and slower because rocksdb doing background compaction.
Writing to the database seems not slowing down because of WAL(write ahead layer) mechanism, but reading from database can be really-really slow when it competes with compaction.

at 50GB+ (900K blocks), it become very slow. my current solution is: I created separate databases on separate physical drives. every time I have synced around or near 20GB, I move the database to drive A, and open it as read only database.

when a database opened as read only on drive A, it will doing compaction faster because it does not have to compete with regular read write operation on drive B.

without this poor man sharding, the drive activity will always 100%, while doing this simple sharding, the disk activity on both drive A or B only less than 30%.

for comparison, using single db, to sync 1.4M blocks will take many hours.
but when using several 20GB dbs, it will take less than one hour.

@zah
Copy link
Member

zah commented Mar 19, 2019

Thanks for sharing this, @jangko. BTW, how does the lmdb performance compare to rocksdb?

@jangko
Copy link
Contributor

jangko commented Mar 19, 2019

I stopped using it because it is slower compared to rocksdb when still syncing below 100K blocks, don't know the performance if it contains more data.

@Swader
Copy link
Contributor

Swader commented Mar 20, 2019

Would it be possible to actually use this approach of "poor man's sharding" as a solution? Maybe divide data into 10GB snapshots, each snapshot can be one such shard i.e. one rocksdb database, and then use those same snapshots to retrieve data across the network for faster sync among Nimbus clients?

@arnetheduck
Copy link
Member

https://www.zeroknowledge.fm/9 - interview with one of the parity devs about how they're tuning rocksdb

@stefantalpalaru
Copy link
Contributor Author

stefantalpalaru commented Mar 31, 2019

A look at allocated RAM (RSS) versus heap usage according to the GC:

nimbus4.svg

heap.svg

To get these heap stats, I added at the end of persistBlocks(), in nimbus/p2p/chain.nim:

  dumpNumberOfInstances()
  echo "===", getTime().toUnix()

(and an import times above the function)

Nimbus compile flags:
make NIMFLAGS="--opt:speed -d:nimTypeNames" nimbus

I ran Nimbus like this:
rm -rf ~/.cache/nimbus/db; ./build/nimbus --prune:archive --maxpeers:250 --log-level:trace --log-file:output6.log > heap.txt

I processed "heap.txt" using this quick and dirty script: https://gist.github.com/stefantalpalaru/0b502def452591aaca289ec8fc119e8b


This looks like memory fragmentation to me, with the RSS growing from 47 to 219 MiB in 37 minutes.

The memory leak is extremely small in comparison, with the used heap minimum going from about 5 to about 10 MiB.

@jangko
Copy link
Contributor

jangko commented Apr 15, 2019

currently, our rocksdb using default configuration:

  • target_file_size_base=64MB
  • target_file_size_multiplier=1
  • filter_policy=null.

if we change some of the configurations:

  • target_file_size_base=64MB
  • target_file_size_multiplier=4 or 8 -> it will reduces number of files, reduces number of file descriptors, faster file access.
  • filter_policy= 10 bits bloom filter. -> speed up random read if accounts not in state trie.

@zah
Copy link
Member

zah commented Apr 25, 2019

@jangko, can we use the Premix's regress tool as a benchmarking utility when deciding whether to go for these RocksDB tweaks? It would be nice if we can create a database of blocks that can be distributed in some efficient way to multiple machines with various hardware configurations and then we'll be able to use regress to obtain statistics that will inform us regarding the best possible settings.

@jangko
Copy link
Contributor

jangko commented Apr 25, 2019

regress is too complicated. I observed, the bottleneck of database operations came from building the state trie.

here what I have done:
block 4.174.280 already contains 5.819.335 accounts ~24.9GB, it took almost 19 hours to move that 5.8m accounts from one SSD to another SSD.

we can use the hexary-trie to tweak and benchmark the database. both of the hexary-trie and database need more optimization.

@jlokier jlokier added the Sync Prevents or affects sync with Ethereum network label May 11, 2021
@SjonHortensius
Copy link

apparently this is still an issue, syncing a fresh nimbus instance on a high performance machine will result in mediocre sync performance (less than 10 blocks/s) with one thread blocking at 100% and all the other cores using less than 10% cpu

@arnetheduck
Copy link
Member

Obsoleted by aristo - will need to be re-run

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Sync Prevents or affects sync with Ethereum network
Projects
None yet
Development

No branches or pull requests

7 participants