Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Checkpoint sync fails with "Attempt to download checkpoint state timed out" on holesky on all endpoints #6333

Closed
ibhagwan opened this issue Jun 8, 2024 · 7 comments

Comments

@ibhagwan
Copy link

ibhagwan commented Jun 8, 2024

Describe the bug
Unable to checkpoint sync on holesky, all EPs fail with the error Attempt to download checkpoint state timed out, doesn't seem like I have a connection issue as I can curl the EPs as shown in the screenshot below

To Reproduce
Steps to reproduce the behavior:

  1. Platform details (OS, architecture): Void Linux
  2. Branch/commit used: imbus beacon node v24.5.1-d2a075-stateofus
  3. Commands being executed:

run-trusted.sh

#!/bin/sh
sudo -u nimbus sh -c '/usr/local/bin/nimbus_beacon_node trustedNodeSync --config-file=/var/lib/nimbus/holesky/config-trusted.toml'

config-trusted.toml
Tried with all EPs below

network = "holesky"
data-dir = "/data/nimbus/holesky/beacon"

[trustedNodeSync]
backfill = false
# trusted-node-url = "https://holesky.beaconstate.info"
# trusted-node-url = "https://beaconstate-holesky.chainsafe.io"
# trusted-node-url = "https://holesky.beaconstate.ethstaker.cc"
# trusted-node-url = "https://checkpoint-sync.holesky.ethpandaops.io"
trusted-node-url = "https://holesky-checkpoint-sync.stakely.io"
  1. Relevant log lines:
INF 2024-06-08 12:27:49.488-07:00 Obtaining genesis state                    topics="beacnde" sourceUrl=https://github.com/status-im/nimbus-eth2/releases/download/v23.9.1/holesky-genesis.ssz.sz
NTC 2024-06-08 12:29:04.122-07:00 Starting trusted node sync                 databaseDir=/data/nimbus/holesky/beacon/db backfill=false reindex=false syncTarget=finalized restUrl=https://holesky.beaconstate.info
NTC 2024-06-08 12:29:04.517-07:00 Downloading checkpoint state               syncTarget=finalized restUrl=https://holesky.beaconstate.info stateId=finalized
ERR 2024-06-08 12:30:04.519-07:00 Attempt to download checkpoint state timed out syncTarget=finalized restUrl=https://holesky.beaconstate.info stateId=finalized

Screenshots
image

Additional context
This same machine used to work before (since holesky genesis), it was down for a while so I removed the DB and attempted checkpoint sync.

@tersec
Copy link
Contributor

tersec commented Jun 15, 2024

Those curl command results are not successfully getting checkpoint states either, they're just requesting the main website front pages of those sites.

If you use the /eth/v2/debug/beacon/states/finalized REST endpoint shown by https://nimbus.guide/trusted-node-sync.html#sync-from-checkpoint-files directly via curl, what happens?

# Obtain a state and a block from a Beacon API - these must be in SSZ format:
curl -o state.finalized.ssz \
  -H 'Accept: application/octet-stream' \
  https://localhost:5052/eth/v2/debug/beacon/states/finalized

# Start the beacon node using the downloaded state as starting point
./run-mainnet-beacon-node.sh \
  --finalized-checkpoint-state=state.finalized.ssz

where localhost:5052 is one of the hosts from https://eth-clients.github.io/checkpoint-sync-endpoints/ instead

@ibhagwan
Copy link
Author

Ty @tersec, the manual workaround worked flawlessly, stil unsure why the trusted-node-url fails on all official holesky endpoints allwhile the manual curl to the same exact EP works.

Downloading the state:
image

Beacon node started sucesfully and is syncing:
image

@tersec
Copy link
Contributor

tersec commented Jun 15, 2024

curl required 1m7s to download the state, while the timeout for that request in Nimbus is 60s:

largeRequestsTimeout = 60.seconds # Downloading large items such as states.

so likely trusted node sync's download attempt timed out while almost complete.

There's no strongly principled reason it must be exactly 60s, though. A machine which requires 10 minutes to download the state probably can't keep up with the ongoing Ethereum gossip anyway, but a more modest increase is reasonable. Created #6363 to increase this timeout to 90s, which would have sufficed here.

@ibhagwan
Copy link
Author

Created #6363 to increase this timeout to 90s, which would have sufficed here.

Ty @tersec for the help!

IMHO, 120 would be a better, timeout, perhaps even 180? Network issues can occur even when you have a decent connection.

Question regarding this:

# Start the beacon node using the downloaded state as starting point
./run-mainnet-beacon-node.sh \
  --finalized-checkpoint-state=state.finalized.ssz

I'm assuming I can I drop the --finalized-checkpoint-state=state.finalized.ssz part on the next restart?

@tersec
Copy link
Contributor

tersec commented Jun 21, 2024

Created #6363 to increase this timeout to 90s, which would have sufficed here.

Ty @tersec for the help!

IMHO, 120 would be a better, timeout, perhaps even 180? Network issues can occur even when you have a decent connection.

Question regarding this:

# Start the beacon node using the downloaded state as starting point
./run-mainnet-beacon-node.sh \
  --finalized-checkpoint-state=state.finalized.ssz

I'm assuming I can I drop the --finalized-checkpoint-state=state.finalized.ssz part on the next restart?

Well, 90s is at least longer, so merged that. It can be changed later. Especially as the mainnet and Holesky states grow larger over time ("state bloat" has been a long discussed issue in Ethereum), this might have to increase by default.

And, yes, you can drop --finalized-checkpoint-state=state.finalized.ssz next time.

@tersec
Copy link
Contributor

tersec commented Jun 21, 2024

Closing because the issue per se is resolved, but that doesn't prevent further comments.

@tersec tersec closed this as completed Jun 21, 2024
@ibhagwan
Copy link
Author

Ty @tersec for all your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants