Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core] Clear backlogged deletions for spilled objects #25742

Merged

Conversation

stephanie-wang
Copy link
Contributor

Why are these changes needed?

When objects that are spilled or being spilled are freed, we queue a request to delete them. We only clear this queue when additional objects get freed, so GC for spilled objects can fall behind when there is a lot of concurrent spilling and deletion (as in Datasets push-based shuffle). This PR fixes this to clear the deletion queue if the backlog is greater than the configured batch size. Also fixes some accounting bugs for how many bytes are pinned vs pending spill.

Related issue number

Closes #25467.

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Copy link
Member

@mwtian mwtian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, though I don't have the full context here.

AssertNoLeaks();
}

TEST_F(LocalObjectManagerTest, TestConcurrentSpillAndDelete2) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on the difference between the two tests?

Copy link
Contributor

@scv119 scv119 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very nice!

@stephanie-wang stephanie-wang merged commit 1bd5b93 into ray-project:master Jun 16, 2022
@stephanie-wang stephanie-wang deleted the delete-spilled-objects branch June 16, 2022 01:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[core] GC of spilled objects falls behind under load, causing out-of-disk errors
3 participants