-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Datasets] Make Dataset lazy-only #31639
Milestone
Comments
8 tasks
ericl
pushed a commit
that referenced
this issue
Mar 29, 2023
joncarter1
pushed a commit
to joncarter1/ray
that referenced
this issue
Apr 2, 2023
ray-project#31639 Signed-off-by: Jonathan Carter <[email protected]>
elliottower
pushed a commit
to elliottower/ray
that referenced
this issue
Apr 22, 2023
ray-project#31639 Signed-off-by: elliottower <[email protected]>
ProjectsByJackHe
pushed a commit
to ProjectsByJackHe/ray
that referenced
this issue
May 4, 2023
ray-project#31639 Signed-off-by: Jack He <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The Dataset is lazy by default with #31286.
There are still some issues to keep eager execution as an option, like the memory tracking for blocks (complicated right now, and making the transition to new execution backend difficult: #30903), the in-place conversion (confusing semantics) from eager to lazy (
ds.lazy()
). Making Dataset lazy-only will make the execution semantics more clear and enable us to clean up complexities around handling block GC.In particular for the memory model, we'll just rely on if the blocks are "owned" by consumer: we can eagerly release the blocks if the blocks are owned by consumer. We have those cases where blocks are not owned:
from_XXX
;split()
; andfully_executed()
.Key items:
from_XXX
APIs lazy: currently they create eager dataset since they take a in-memory blocklist. We will handlefrom_XXX()
andsplit()
in a unified way, i.e. creating a lazy dataset which takes in a materialized blocklist that NOT owned (cannot be eagerly released after use).fully_executed()
andsplit()
produce blocklists that are NOT owned by consumer (cannot be eagerly released after use)..lazy()
API: there will be no eager dataset, so this API will be obsolete.run_by_consumer
arg: it's used to indicate if the blocklists are produced by consumption APIs (if yes, the blocks can be eagerly released after use); with lazy-only,run_by_consumer
should always be True, so no longer needed.allow_clear_input_blocks
arg: this is also used to for determining eager memory releasing. With lazy-only, this should also always be True, so no longer needed.@ericl @clarkzinzow @c21
The text was updated successfully, but these errors were encountered: