Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[data] Add take_batch API for collecting data in the same format as iter_batches and map_batches #34217

Merged
merged 9 commits into from
Apr 11, 2023

Conversation

ericl
Copy link
Contributor

@ericl ericl commented Apr 10, 2023

Why are these changes needed?

There isn't any convenient way to take just a single batch today, which is confusing. Introduce ds.take_batch(n, batch_format="default"), which returns a batch of n records as next(ds.iter_batches(batch_size=n, batch_format="default")) would.

TODO:

  • Update docs

Closes #34116

python/ray/data/dataset.py Outdated Show resolved Hide resolved
python/ray/data/dataset.py Outdated Show resolved Hide resolved
Signed-off-by: Eric Liang <[email protected]>
Copy link
Contributor

@c21 c21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code LGTM.

Signed-off-by: Eric Liang <[email protected]>
@ericl ericl requested review from maxpumperla and a team as code owners April 10, 2023 22:18
Signed-off-by: Eric Liang <[email protected]>
@ericl ericl changed the title [WIP] Add take_batch API for collecting data in the same format as iter_batches and map_batches [data] Add take_batch API for collecting data in the same format as iter_batches and map_batches Apr 10, 2023
Signed-off-by: Eric Liang <[email protected]>
Signed-off-by: Eric Liang <[email protected]>
@c21
Copy link
Contributor

c21 commented Apr 10, 2023

Let's also update the API documentation - https://github.com/ray-project/ray/blob/master/doc/source/data/api/dataset.rst?plain=1#L87 . Thanks.

Signed-off-by: Eric Liang <[email protected]>
@ericl ericl merged commit fba9d15 into ray-project:master Apr 11, 2023
elliottower pushed a commit to elliottower/ray that referenced this pull request Apr 22, 2023
…ter_batches and map_batches (ray-project#34217)

There isn't any convenient way to take just a single batch today, which is confusing. Introduce ds.take_batch(n, batch_format="default"), which returns a batch of n records as next(ds.iter_batches(batch_size=n, batch_format="default")) would.

Signed-off-by: elliottower <[email protected]>
ProjectsByJackHe pushed a commit to ProjectsByJackHe/ray that referenced this pull request May 4, 2023
…ter_batches and map_batches (ray-project#34217)

There isn't any convenient way to take just a single batch today, which is confusing. Introduce ds.take_batch(n, batch_format="default"), which returns a batch of n records as next(ds.iter_batches(batch_size=n, batch_format="default")) would.

Signed-off-by: Jack He <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Data] iter_batches/map_batches return different format than take
3 participants