Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BN: Disable genesis sync via long-range-sync argument. #6361

Merged
merged 4 commits into from
Jun 20, 2024

Conversation

cheatfate
Copy link
Contributor

No description provided.

@tersec
Copy link
Contributor

tersec commented Jun 15, 2024

One quality-of-implementation thing here is that while it's true that to catch the general case, it has to live in the sync manager, because of the limitation of TNS not to work with existing databases, this means that this currently will, in a new --data-dir:

  • create the database, with some genesis state
  • start trying to sync
  • discover that it can't sync
  • alert the user and stop syncing

From a UX flow perspective this is fine in isolation for now, but the trouble is that the next step is made harder by requiring the user to manually delete the database it pointlessly created which now does nothing but hinder the usage of TNS. Once light client sync exists, and if/when we can get an Altair or newer finalized state always available (already true for Sepolia and Holesky, not true for mainnet yet), this won't be a concern, but it's currently sort of UX jank.

It already kind of exists now, and this PR doesn't really make it worse, but it's also probably the right place to fix it, because it's already the PR adding the weak subjectivity period detection. The additional check would be at the beginning, checking as a special case if slot 0 (genesis) is within the WSP of the wall slot, the same way as it currently checks whether the head state is. The difference is that the head state conceptually can't exist until the database does, while the check can be performed specially for GENESIS_SLOT even before the database is created.

Copy link

github-actions bot commented Jun 15, 2024

Unit Test Results

         9 files  ±0    1 328 suites  ±0   26m 46s ⏱️ - 6m 26s
  4 998 tests ±0    4 650 ✔️ ±0  348 💤 ±0  0 ±0 
20 850 runs  ±0  20 446 ✔️ ±0  404 💤 ±0  0 ±0 

Results for commit 3877e99. ± Comparison against base commit 597f473.

♻️ This comment has been updated with latest results.

@tersec
Copy link
Contributor

tersec commented Jun 15, 2024

References #6306

@tersec
Copy link
Contributor

tersec commented Jun 17, 2024

It already kind of exists now, and this PR doesn't really make it worse, but it's also probably the right place to fix it, because it's already the PR adding the weak subjectivity period detection. The additional check would be at the beginning, checking as a special case if slot 0 (genesis) is within the WSP of the wall slot, the same way as it currently checks whether the head state is. The difference is that the head state conceptually can't exist until the database does, while the check can be performed specially for GENESIS_SLOT even before the database is created.

Done in 228a3b2

@tersec tersec merged commit 61610fd into unstable Jun 20, 2024
15 checks passed
@tersec tersec deleted the long-range-sync branch June 20, 2024 18:57
@arnetheduck
Copy link
Member

So with long-range-sync=light, the default mode, we should be performing a light client sync automatically if we have sync committees available - does that happen now?

@tersec
Copy link
Contributor

tersec commented Jun 21, 2024

So with long-range-sync=light, the default mode, we should be performing a light client sync automatically if we have sync committees available - does that happen now?

No. Light client sync does not actually exist in Nimbus right now in a way integrated with the rest of sync.

There are some options of how to approach, including but not limited to

  • treat the whole thing as a beta, --debug-foo, hidden, not documented, and set the default back to allowing long-range sync, but keep the mechanism (which is fine and needed regardless of light client sync);
  • remove the command-line option and hardwire it to allowing long-range (don't even expose as beta, just keep the mechanism);
  • keep as is, current defaults, etc

I don't think it makes sense to remove it.

Part of the point from my perspective is, right now, regardless of light client sync, we repeatedly have people come to help channels having inadvertently triggered genesis sync because it's default. Putting WSP concerns aside even, just a end-user UX it's bad, because people mostly don't want this, but they show up a day or two after genesis sync has started. It's ok to allow genesis sync, and I don't actually think the trusted node sync idea of separate commands is that bad, but the combination of:

  • default genesis sync; and
  • trusted node sync which depends on --data-dir lining up; and
  • the run-foo.sh scripts and bare binaries using different default --data-dirs

predictably, repeatedly, reliably creates scenarios where someone tries to do a TNS, falls into a genesis sync, and just thinks things are a bit slower than they should be.

So, it's better regardless to make sure that genesis sync is what people want. The people who want archive modes, and therefore currently need genesis sync (though in theory one could adjust backfill to get archive mode and 100% deprecate this genesis sync, this discussion has been had, so, sure) to create archive nodes already have to specify custom settings for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants