Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ingest/ledgerbackend: Differentiate between isPrepared and isClosed in captive core #4088

Merged
merged 9 commits into from
Dec 8, 2021

Conversation

erika-sdf
Copy link
Contributor

@erika-sdf erika-sdf commented Nov 18, 2021

PR Checklist

PR Structure

  • This PR has reasonably narrow scope (if not, break it down into smaller PRs).
  • This PR avoids mixing refactoring changes with feature changes (split into two PRs
    otherwise).
  • This PR's title starts with name of package that is most changed in the PR, ex.
    services/friendbot, or all or doc if the changes are broad or impact many
    packages.

Thoroughness

  • This PR adds tests for the most critical parts of the new functionality or fixes.
  • I've updated any docs (developer docs, .md
    files, etc... affected by this change). Take a look in the docs folder for a given service,
    like this one.

Release planning

  • I've updated the relevant CHANGELOG (here for Horizon) if
    needed with deprecations, added features, breaking changes, and DB schema changes.
  • I've decided if this PR requires a new major/minor version according to
    semver, or if it's mainly a patch change. The PR is targeted at the next
    release branch if it's not a patch change.

What

CaptiveCoreBackend should be closed when the last ledger the core is prepared for has been consumed, or if it is explicitly closed. After the core is closed, and GetLedger is called on it, it returns the error session is closed, call PrepareRange first.

Why

The caller should not be able to prepare a core after it is closed.

Known limitations

N/A

return xdr.LedgerCloseMeta{}, errors.New("session is closed")
}

if c.prepared == nil {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if the right thing to do here is to use c.prepared or c.isPrepared(BoundedRange(sequence, sequence)).

It seems like the range checks follow this line would yield a more accurate error (eg. requested ledger ## is behind captive core stream vs session not prepared).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that error detail if available, sounds valuable when reading the logs to get that narrow(er) context of the issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should use isPrepared. Using c.prepared alone is not enough because it just means the backend is prepared for any range. As an example, let's say backend was prepared for bounded range 100-200 but someone is requesting 201.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, we can't use isPrepared() here, because this is called from PrepareRange, so the range may not be prepared yet. A better approach would be to have a separate awaitRange() function that PrepareRange could use, but that's a much bigger change here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What Paul said. I'm leaving this check as c.prepared

@erika-sdf erika-sdf requested a review from a team November 18, 2021 19:00
Copy link
Contributor

@sreuland sreuland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, I'm probably not qualified to approve yet as still pretty new on the uptake of code.

ingest/CHANGELOG.md Outdated Show resolved Hide resolved
Copy link
Contributor

@bartekn bartekn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the most important thing we need to fix as explained in #3705 (and also a comment for CaptiveStellarCore.cancel) is that once Close() is called is unusable but we don't enforce it leaving users with cryptic errors and/or not working backend.

We need to distinguish 3 states of the backend:

  • NOT PREPARED - initial state, user need to call PrepareRange to do anything.
  • PREPARING/PREPARED - this is de facto the same state because the backend is not thread-safe and is blocking when calling PrepareRange or GetLedger.
  • CLOSED - backend closed.

ingest/ledgerbackend/captive_core_backend.go Outdated Show resolved Hide resolved
ingest/ledgerbackend/captive_core_backend.go Outdated Show resolved Hide resolved
return xdr.LedgerCloseMeta{}, errors.New("session is closed")
}

if c.prepared == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should use isPrepared. Using c.prepared alone is not enough because it just means the backend is prepared for any range. As an example, let's say backend was prepared for bounded range 100-200 but someone is requesting 201.

ingest/ledgerbackend/captive_core_backend.go Show resolved Hide resolved
ingest/ledgerbackend/captive_core_backend.go Outdated Show resolved Hide resolved
@@ -600,11 +608,11 @@ func (c *CaptiveStellarCore) GetLatestLedgerSequence(ctx context.Context) (uint3
}

func (c *CaptiveStellarCore) isClosed() bool {
return c.prepared == nil || c.stellarCoreRunner == nil || c.stellarCoreRunner.context().Err() != nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should check c.closed here only and because isClosed is only used internally we can remove it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it might not be intentionally closed. c.closed only gets set when we call Close(), but it could also be closed due to a timeout or some other issue with the c.stellarCoreRunner.context(), no?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly and this is confusing. closed means that user called Close() and CaptiveStellarCore can no longer be used. However in case of stellarCoreRunner issues, crashes, etc. it just isn't prepared and can be started again.

Copy link
Contributor Author

@erika-sdf erika-sdf Dec 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on this discussion, it sounds like you expect the errored state to be the same as the initial NOT PREPARED state. This is counter to the behavior in a couple of the tests that check for the core to be closed after these two scenarios:

  1. the core gets an error getting a ledger
  2. after the last ledger has been consumed.

I've pulled out the second half of this check for a core error, and prompt the user to call PrepareRange when there is one. I've also updated the tests to reflect that we now expect stellar-core to not be closed in these scenarios.

Copy link
Contributor

@bartekn bartekn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Just one comment: we don't need coreHasError helper.

ingest/ledgerbackend/captive_core_backend.go Outdated Show resolved Hide resolved
@erika-sdf erika-sdf merged commit 6d63189 into stellar:master Dec 8, 2021
bartekn added a commit that referenced this pull request Jan 28, 2022
…lled (#4192)

After refactoring in #4088 (and as a result of my wrong comment:
#4088#discussion_r764394296) the `CaptiveCoreBackend.isPrepared` method returned
`true` if `stellarCoreRunner` process was shutdown without calling `close()` -
so in case of binary update but also in case of Stellar-Core crash.

This commit fixes this bug by checking if `stellarCoreRunner` context was
cancelled (meaning Stellar-Core is closed or closing but not as a result of
`close` call). I also removed `isClose` method because it was simply checking
`close` variable. All test changes are only adding an extra `context()` call
mocks because `isPrepared` now calls it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CaptiveCoreBackend.isClosed can be confused with isPrepared
5 participants