-
Notifications
You must be signed in to change notification settings - Fork 6.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix manifest_number_ point to invalid file #12882
Fix manifest_number_ point to invalid file #12882
Conversation
db/version_set.cc
Outdated
@@ -5768,7 +5770,7 @@ Status VersionSet::ProcessManifestWrites( | |||
for (auto v : versions) { | |||
delete v; | |||
} | |||
if (manifest_io_status.ok()) { | |||
if (manifest_io_status.ok() && !current_file_io_status.ok()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if s is set to non-ok after successfully creating the CURRENT file? Would it make more sense to just record if a new MANIFEST is created and check that flag here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, this keeps the manifest_number_
unchanged, to point to the existing manifest and not updating its size. If we crashed right after this and CURRENT points to the new manifest file, the new manifest file should be around to be used since below logic only remove the new manifest file if manifest write IO itself encountered issues. If we didn't crash, this divergence will be fixed by the next manifest write attempt when it sees descriptor_log_
is nullptr.
Anyways, in these error scenario, manifest_number_
no longer points to what descriptor_log_
points to, I wonder should we err towards keeping it point to the old manifest number (which is removing this code block altogether), or switch to the new manifest number, and the criteria for doing that, the criteria is the diff between this approach and the flag approach I think:
- this approach : new manifest is in a good state, because manifest_io_status is ok, but current file creation may have some issues
- flag approach: a new manifest file is successfully created and written to.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think either is fine. I found checking current_file_io_status harder to reason about since we are updating the manifest_file_number when CURRENT creation failed, but not updating it when CURRENT Is created successfully.
wonder should we err towards keeping it point to the old manifest number (which is removing this code block altogether)
Yeah, something probably went wrong with the new MANIFEST. Maybe backup should read the old MANIFEST that's valid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point that in this approach, manifest number is updated when current file could encounter issues, looks very unintuitive. I have removed this block altogether.
7113a67
to
5c146af
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@jowlyzhang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@jowlyzhang merged this pull request in 9883b5f. |
Worth a history update since users could have observed "Failure in BackupEngine::CreateNewBackup..."? @jowlyzhang |
This PR fix
VersionSet
'smanifest_number_
could be pointing to an invalid number intermediately. This happens when a new manifest roll is attempted but fast failed after loading table handlers and before the new manifest file creation/writing is actually attempted.In theory, a later manifest roll effort will overthrow this intermediate invalid in memory state. There is on harm when the DB crashes in this invalid state either. But efforts that takes a file snapshot of the DB like backup will incorrectly try to copy a non existing manifest file.