Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Massive spike in DB size on Goerli #1159

Closed
jakubgs opened this issue Jul 11, 2022 · 13 comments
Closed

Massive spike in DB size on Goerli #1159

jakubgs opened this issue Jul 11, 2022 · 13 comments

Comments

@jakubgs
Copy link
Member

jakubgs commented Jul 11, 2022

I've identified a very big spike in disk usage by our nimbus-eth1 node running on Goerli:

image

It appears the version running at the time was 7f0bc71:

Jun 29 00:00:05 metal-01.he-eu-hel1.nimbus.eth1 build.sh[2044102]: From https://github.com/status-im/nimbus-eth1
Jun 29 00:00:05 metal-01.he-eu-hel1.nimbus.eth1 build.sh[2044102]:  + 6e9eacb3...694d1c45 getLogs-endpoint -> origin/getLogs-endpoint  (forced update)
Jun 29 00:00:05 metal-01.he-eu-hel1.nimbus.eth1 build.sh[2044114]: HEAD is now at 7f0bc71b add invalidMissingAncestorReOrg test case

This caused the DB to grow up to 1.6 TB:

[email protected]:~ % sudo du -hs /data/nimbus-eth1-goerli-master/data
1.6T    /data/nimbus-eth1-goerli-master/data

Which is over 10x more than a fully synced Geth node on Goerli:

[email protected]:~ % sudo du -hs /docker/nimbus-goerli/node/data 
131G    /docker/nimbus-goerli/node/data
@jakubgs
Copy link
Member Author

jakubgs commented Jul 11, 2022

It doesn't seem like sst files are the issue:

[email protected]:.../nimbus/data % ls | wc -l
25932
[email protected]:.../nimbus/data % ls *.sst | wc -l
25753
[email protected]:.../nimbus/data % ls *.sst | xargs du -hsc | tail -n1
202G	total
[email protected]:.../nimbus/data % du -hs .  
1.6T	.

@jakubgs
Copy link
Member Author

jakubgs commented Jul 11, 2022

But ncdu doesn't seem to show any massive files:

65M	/data/nimbus-eth1-goerli-master/data/shared_goerli_0/nimbus/data/1094254.sst
ncdu 1.14.1 ~ Use the arrow keys to navigate, press ? for help                                                                                                                                                                     
--- /data/nimbus-eth1-goerli-master/data/shared_goerli_0/nimbus/data ----------------------------------
  460.8 MiB [##########]  LOG.old.1656979293418180                                                                                                                                                                                 
  300.4 MiB [######    ]  LOG.old.1656720086681351
  122.1 MiB [##        ]  LOG.old.1657324897762741
   90.0 MiB [#         ]  LOG                     
   70.4 MiB [#         ]  1095853.log
   64.3 MiB [#         ]  1004708.sst
   64.3 MiB [#         ]  999379.sst 
   64.3 MiB [#         ]  1038531.sst
   64.3 MiB [#         ]  1038543.sst

@jakubgs
Copy link
Member Author

jakubgs commented Jul 11, 2022

The ls command shows the same thing essentially:

[email protected]:.../nimbus/data % ls -lS | head 
total 1648954516
-rw-r--r-- 1 nimbus staff  461M Jul  5 00:01 LOG.old.1656979293418180
-rw-r--r-- 1 nimbus staff  301M Jul  1 23:59 LOG.old.1656720086681351
-rw-r--r-- 1 nimbus staff  123M Jul  8 23:59 LOG.old.1657324897762741
-rw-r--r-- 1 nimbus staff   91M Jul 11 15:57 LOG
-rw-r--r-- 1 nimbus staff   65M Jul  8 23:56 1004708.sst
-rw-r--r-- 1 nimbus staff   65M Jul  8 20:43 999379.sst
-rw-r--r-- 1 nimbus staff   65M Jul 10 00:58 1038531.sst
-rw-r--r-- 1 nimbus staff   65M Jul 10 00:58 1038543.sst
-rw-r--r-- 1 nimbus staff   65M Jul 10 04:07 1046388.sst

@jakubgs
Copy link
Member Author

jakubgs commented Jul 11, 2022

Interestingly, we can also see a big spike in open files at the time:

image

@jakubgs
Copy link
Member Author

jakubgs commented Jul 11, 2022

Running find shows only 395 GB out of the 1.6 TB:

[email protected]:.../nimbus/data % find . | xargs du -hsc | tail
65M	./1095797.sst
61M	./1095798.sst
10M	./1095799.sst
65M	./1095942.sst
65M	./1095943.sst
37M	./1095944.sst
65M	./1095945.sst
65M	./1095946.sst
26M	./1095947.sst
395G	total

This is very weird.

@jakubgs
Copy link
Member Author

jakubgs commented Jul 12, 2022

If I try to copy the contents to the root volume I can see it takes up more then 395 GB:

sudo rsync --progress -aur /data/nimbus-eth1-goerli-master /data_new/
[email protected]:/data_new/.../data % find | xargs du -hsc
424G	.
424G	total

So there's something off with how I'm counting the files size.

@jakubgs
Copy link
Member Author

jakubgs commented Jul 12, 2022

Actually, if I manually add up the sizes of all files using bc it does add up to 1.69 TB:

[email protected]:.../data % echo "scale=2;($(find . -printf '%s+')0)/(10^9)" | bc
1694.15

So there's something wrong with how du and ncdu calculates the total size.

@mjfh
Copy link
Contributor

mjfh commented Jul 21, 2022

My suspicion has been that the persistent storage of consensus snapshots could make problems. By default, there is a persistent snapshot for every 1k blocks (I changed that to 4k for the upcoming pull request.) I added some snapshot storage logging in TRACE mode and run against the goerli-shadow-5 network.

As it seems, the impact of reducing the number of persistent snapshots by 300% is negligible for the first 2.2m blocks. The disk storage size showed not much difference for either sample.

Here are the statistics for syncing ~2.2m blocks.

Caching every 1k blocks:

[..]
blockNumber=2214912 nSnaps=2236 snapsTotal=1.14m
blockNumber=2215936 nSnaps=2237 snapsTotal=1.14m
[..]
Persisting blocks fromBlock=2216449 toBlock=2216640
36458496 datadir-nimbus-goerlish/data/nimbus/

Caching every 1k blocks after replacing legacy LRU handler

[..]
blockNumber=2234368 nSnaps=2259 snapsTotal=1.15m
blockNumber=2235392 nSnaps=2260 snapsTotal=1.15m
[..]
Persisting blocks fromBlock=2235649 toBlock=2235840
37627288 datadir-nimbus-goerlish/data/nimbus/

Caching every 4k blocks after replacing legacy LRU handler

[..]
blockNumber=2232320 nSnaps=620 snapsTotal=0.30m
blockNumber=2236416 nSnaps=621 snapsTotal=0.30m
[..]
Persisting blocks fromBlock=2237185 toBlock=2237376
37627288 datadir-nimbus-goerlish/data/nimbus/

Legend:

  • nSnaps -- number of persistently cached snapshots
  • snapsTotal -- accumulated snapshot payload (without DB metadata)
  • Persisting blocks -- logging from persistBlocks() method
  • datadir-nimbus-goerlish/data/nimbus/ -- storage directory, size (in kibibytes) calculated with du -sk

@jakubgs
Copy link
Member Author

jakubgs commented Aug 8, 2022

I'm running out of space on the host. Unless you have objections @mjfh I will purdge the node data to let it re-sync.

@mjfh
Copy link
Contributor

mjfh commented Aug 8, 2022

No objections :)

@jakubgs
Copy link
Member Author

jakubgs commented Aug 8, 2022

Done. Lets see how it will look after resyncing.

@jakubgs
Copy link
Member Author

jakubgs commented Aug 18, 2022

We got re-synced(I think?) and we're back to 1.6 TB:

[email protected]:~ % df -h /data
Filesystem      Size  Used Avail Use% Mounted on
/dev/nvme2n1    1.9T  1.6T  243G  87% /data

I'm not actually sure if we are synced because the RPC call times out:

[email protected]:~ % /data/nimbus-eth1-goerli-master/rpc.sh eth_syncing
curl: (28) Operation timed out after 10000 milliseconds with 0 bytes received

@tersec
Copy link
Contributor

tersec commented May 25, 2024

@tersec tersec closed this as completed May 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants