Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Data] Improve filesystem retry coverage #46685

Merged
merged 10 commits into from
Jul 19, 2024

Conversation

bveeramani
Copy link
Member

@bveeramani bveeramani commented Jul 17, 2024

Why are these changes needed?

See #43803 (comment).

Related issue number

Fixes #43803

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: Balaji Veeramani <[email protected]>
Comment on lines 97 to 104
DEFAULT_RETRIED_FILESYSTEM_ERRORS = (
"AWS Error INTERNAL_FAILURE",
"AWS Error NETWORK_CONNECTION",
"AWS Error SLOW_DOWN",
"AWS Error UNKNOWN (HTTP status 503)",
"AWS Error ACCESS_DENIED",
"AWS Error NETWORK_CONNECTION",
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consolidating all of the transient errors in one place so users don't need to configure retries for opening files/reading files/writing files separately.

Signed-off-by: Balaji Veeramani <[email protected]>
@bveeramani bveeramani changed the title [Data] Improve retry coverage [Data] Improve filesystem retry coverage Jul 17, 2024
@bveeramani bveeramani marked this pull request as ready for review July 17, 2024 23:44
Signed-off-by: Balaji Veeramani <[email protected]>
python/ray/data/context.py Show resolved Hide resolved
@bveeramani bveeramani enabled auto-merge (squash) July 18, 2024 17:52
@github-actions github-actions bot added the go add ONLY when ready to merge, run all tests label Jul 18, 2024
Signed-off-by: Balaji Veeramani <[email protected]>
@github-actions github-actions bot disabled auto-merge July 18, 2024 21:38
Signed-off-by: Balaji Veeramani <[email protected]>
Signed-off-by: Balaji Veeramani <[email protected]>
Signed-off-by: Balaji Veeramani <[email protected]>
@bveeramani bveeramani merged commit 6b1fc0a into ray-project:master Jul 19, 2024
5 checks passed
@bveeramani bveeramani deleted the improve-retry-coverage branch July 19, 2024 01:06
@anyscalesam
Copy link
Collaborator

@raulchen @bveeramani shouldn't we also add coverage for GCP specific ERRORs?

scottjlee added a commit that referenced this pull request Sep 3, 2024
## Why are these changes needed?

#46685 didn't include handling
for webdatasets

## Related issue number

Fixes #43803 for webdatasets

## Checks

- [x] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [x] I've run `scripts/format.sh` to lint the changes in this PR.
- [x] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [x] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [x] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [x] This PR is not tested :(

---------

Signed-off-by: Eric Meier <[email protected]>
Co-authored-by: Scott Lee <[email protected]>
ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 12, 2024
…6892)

## Why are these changes needed?

ray-project#46685 didn't include handling
for webdatasets

## Related issue number

Fixes ray-project#43803 for webdatasets

## Checks

- [x] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [x] I've run `scripts/format.sh` to lint the changes in this PR.
- [x] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [x] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [x] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [x] This PR is not tested :(

---------

Signed-off-by: Eric Meier <[email protected]>
Co-authored-by: Scott Lee <[email protected]>
Signed-off-by: ujjawal-khare <[email protected]>
ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024
…6892)

## Why are these changes needed?

ray-project#46685 didn't include handling
for webdatasets

## Related issue number

Fixes ray-project#43803 for webdatasets

## Checks

- [x] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [x] I've run `scripts/format.sh` to lint the changes in this PR.
- [x] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [x] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [x] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [x] This PR is not tested :(

---------

Signed-off-by: Eric Meier <[email protected]>
Co-authored-by: Scott Lee <[email protected]>
Signed-off-by: ujjawal-khare <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
go add ONLY when ready to merge, run all tests
Projects
None yet
3 participants