-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[data] Improve str/repr of lazy Datasets #31417
Labels
data
Ray Data-related issues
enhancement
Request for new feature and/or capability
P1
Issue that should be fixed within a few weeks
Comments
ericl
added
enhancement
Request for new feature and/or capability
P1
Issue that should be fixed within a few weeks
data
Ray Data-related issues
labels
Jan 3, 2023
Agree to make str/repr clearer. Print out the execution plan sounds good to me. |
13 tasks
ericl
pushed a commit
that referenced
this issue
Jan 6, 2023
This PR is to enable lazy execution by default. See ray-project/enhancements#19 for motivation. The change includes: * Change `Dataset` constructor: `Dataset.__init__(lazy: bool = True)`. Also remove `defer_execution` field, as it's no longer needed. * `read_api.py:read_datasource()` returns a lazy `Dataset` with computing the first input block. * Add `ds.fully_executed()` calls to required unit tests, to make sure they are passing. TODO: - [x] Fix all unit tests - [x] #31459 - [x] #31460 - [ ] Remove the behavior to eagerly compute first block for read - [ ] #31417 - [ ] Update documentation
7 tasks
AmeerHajAli
pushed a commit
that referenced
this issue
Jan 12, 2023
This PR is to enable lazy execution by default. See ray-project/enhancements#19 for motivation. The change includes: * Change `Dataset` constructor: `Dataset.__init__(lazy: bool = True)`. Also remove `defer_execution` field, as it's no longer needed. * `read_api.py:read_datasource()` returns a lazy `Dataset` with computing the first input block. * Add `ds.fully_executed()` calls to required unit tests, to make sure they are passing. TODO: - [x] Fix all unit tests - [x] #31459 - [x] #31460 - [ ] Remove the behavior to eagerly compute first block for read - [ ] #31417 - [ ] Update documentation
tamohannes
pushed a commit
to ju2ez/ray
that referenced
this issue
Jan 16, 2023
This PR is to enable lazy execution by default. See ray-project/enhancements#19 for motivation. The change includes: * Change `Dataset` constructor: `Dataset.__init__(lazy: bool = True)`. Also remove `defer_execution` field, as it's no longer needed. * `read_api.py:read_datasource()` returns a lazy `Dataset` with computing the first input block. * Add `ds.fully_executed()` calls to required unit tests, to make sure they are passing. TODO: - [x] Fix all unit tests - [x] ray-project#31459 - [x] ray-project#31460 - [ ] Remove the behavior to eagerly compute first block for read - [ ] ray-project#31417 - [ ] Update documentation Signed-off-by: tmynn <[email protected]>
tamohannes
pushed a commit
to ju2ez/ray
that referenced
this issue
Jan 25, 2023
This PR is to enable lazy execution by default. See ray-project/enhancements#19 for motivation. The change includes: * Change `Dataset` constructor: `Dataset.__init__(lazy: bool = True)`. Also remove `defer_execution` field, as it's no longer needed. * `read_api.py:read_datasource()` returns a lazy `Dataset` with computing the first input block. * Add `ds.fully_executed()` calls to required unit tests, to make sure they are passing. TODO: - [x] Fix all unit tests - [x] ray-project#31459 - [x] ray-project#31460 - [ ] Remove the behavior to eagerly compute first block for read - [ ] ray-project#31417 - [ ] Update documentation Signed-off-by: tmynn <[email protected]>
tamohannes
pushed a commit
to ju2ez/ray
that referenced
this issue
Jan 25, 2023
This PR is to enable lazy execution by default. See ray-project/enhancements#19 for motivation. The change includes: * Change `Dataset` constructor: `Dataset.__init__(lazy: bool = True)`. Also remove `defer_execution` field, as it's no longer needed. * `read_api.py:read_datasource()` returns a lazy `Dataset` with computing the first input block. * Add `ds.fully_executed()` calls to required unit tests, to make sure they are passing. TODO: - [x] Fix all unit tests - [x] ray-project#31459 - [x] ray-project#31460 - [ ] Remove the behavior to eagerly compute first block for read - [ ] ray-project#31417 - [ ] Update documentation Signed-off-by: tmynn <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
data
Ray Data-related issues
enhancement
Request for new feature and/or capability
P1
Issue that should be fixed within a few weeks
Description
Currently, a lazy Dataset's string repr shows something like this:
This doesn't provide a lot of useful information, and is also confusing to the user. We could improve this to something like this:
Or even add a verbose form, like:
We should implement this prior to making lazy execution the default.
Use case
No response
The text was updated successfully, but these errors were encountered: