Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[data] Add iterator batch_format=None support, which will yield batches in the current batch format with zero copies #33562

Merged
merged 7 commits into from
Mar 22, 2023

Conversation

ericl
Copy link
Contributor

@ericl ericl commented Mar 21, 2023

Why are these changes needed?

This PR is a cleanup of #33536

It uses "None" instead of "zero-copy" as a batch format, since None has a similar meaning for batch_size, where it means a system-chosen batch size. Here "None" also means the system chosen optimal batch format.

Signed-off-by: Eric Liang <[email protected]>
Signed-off-by: Eric Liang <[email protected]>
Signed-off-by: Eric Liang <[email protected]>
Signed-off-by: Eric Liang <[email protected]>
Signed-off-by: Eric Liang <[email protected]>
@@ -379,7 +379,7 @@ def map_batches(
*,
batch_size: Optional[Union[int, Literal["default"]]] = "default",
compute: Optional[Union[str, ComputeStrategy]] = None,
batch_format: Literal["default", "pandas", "pyarrow", "numpy"] = "default",
batch_format: Optional[str] = "default",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keep it as Optional[Literal] for full explicitness of supported batch formats?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like that is hard to maintain (per the inconsistencies in the code already), so opted to go unify on the shorter signature.

@amogkam
Copy link
Contributor

amogkam commented Mar 21, 2023

@ericl ericl mentioned this pull request Mar 22, 2023
8 tasks
Signed-off-by: Eric Liang <[email protected]>
@ericl ericl requested review from maxpumperla and a team as code owners March 22, 2023 02:14
@ericl
Copy link
Contributor Author

ericl commented Mar 22, 2023

@@ -540,7 +540,9 @@ def map_batches(
(promotes tables to Pandas and tensors to NumPy), ``"pandas"`` to select
``pandas.DataFrame``, "pyarrow" to select ``pyarrow.Table``, or
``"numpy"`` to select ``numpy.ndarray`` for tensor datasets and
``Dict[str, numpy.ndarray]`` for tabular datasets. Default is "default".
``Dict[str, numpy.ndarray]`` for tabular datasets, or None to return
the underlying block exactly as is with no additional formatting.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I like batch_size=None a good bit more than adding another literal string!

@ericl ericl merged commit 68afa43 into ray-project:master Mar 22, 2023
ericl pushed a commit that referenced this pull request Mar 23, 2023
#33601)

The failure in rllib should have been fixed by #33562

Verified with `python -m pytest rllib/core/learner/torch/tests/test_torch_learner.py::TestLearner::test_end_to_end_update`.
scottsun94 pushed a commit to scottsun94/ray that referenced this pull request Mar 28, 2023
…es in the current batch format with zero copies (ray-project#33562)

This PR is a cleanup of ray-project#33536

It uses "None" instead of "zero-copy" as a batch format, since None has a similar meaning for batch_size, where it means a system-chosen batch size. Here "None" also means the system chosen optimal batch format.
scottsun94 pushed a commit to scottsun94/ray that referenced this pull request Mar 28, 2023
…project#324… (ray-project#33601)

The failure in rllib should have been fixed by ray-project#33562

Verified with `python -m pytest rllib/core/learner/torch/tests/test_torch_learner.py::TestLearner::test_end_to_end_update`.
cassidylaidlaw pushed a commit to cassidylaidlaw/ray that referenced this pull request Mar 28, 2023
…es in the current batch format with zero copies (ray-project#33562)

This PR is a cleanup of ray-project#33536

It uses "None" instead of "zero-copy" as a batch format, since None has a similar meaning for batch_size, where it means a system-chosen batch size. Here "None" also means the system chosen optimal batch format.
cassidylaidlaw pushed a commit to cassidylaidlaw/ray that referenced this pull request Mar 28, 2023
…project#324… (ray-project#33601)

The failure in rllib should have been fixed by ray-project#33562

Verified with `python -m pytest rllib/core/learner/torch/tests/test_torch_learner.py::TestLearner::test_end_to_end_update`.
brycehuang30 pushed a commit to brycehuang30/ray that referenced this pull request Mar 29, 2023
…project#324… (ray-project#33601)

The failure in rllib should have been fixed by ray-project#33562

Verified with `python -m pytest rllib/core/learner/torch/tests/test_torch_learner.py::TestLearner::test_end_to_end_update`.

Signed-off-by: bhuang <[email protected]>
joncarter1 pushed a commit to joncarter1/ray that referenced this pull request Apr 2, 2023
…es in the current batch format with zero copies (ray-project#33562)

This PR is a cleanup of ray-project#33536

It uses "None" instead of "zero-copy" as a batch format, since None has a similar meaning for batch_size, where it means a system-chosen batch size. Here "None" also means the system chosen optimal batch format.

Signed-off-by: Jonathan Carter <[email protected]>
joncarter1 pushed a commit to joncarter1/ray that referenced this pull request Apr 2, 2023
…project#324… (ray-project#33601)

The failure in rllib should have been fixed by ray-project#33562

Verified with `python -m pytest rllib/core/learner/torch/tests/test_torch_learner.py::TestLearner::test_end_to_end_update`.

Signed-off-by: Jonathan Carter <[email protected]>
elliottower pushed a commit to elliottower/ray that referenced this pull request Apr 22, 2023
…es in the current batch format with zero copies (ray-project#33562)

This PR is a cleanup of ray-project#33536

It uses "None" instead of "zero-copy" as a batch format, since None has a similar meaning for batch_size, where it means a system-chosen batch size. Here "None" also means the system chosen optimal batch format.

Signed-off-by: elliottower <[email protected]>
elliottower pushed a commit to elliottower/ray that referenced this pull request Apr 22, 2023
…project#324… (ray-project#33601)

The failure in rllib should have been fixed by ray-project#33562

Verified with `python -m pytest rllib/core/learner/torch/tests/test_torch_learner.py::TestLearner::test_end_to_end_update`.

Signed-off-by: elliottower <[email protected]>
ProjectsByJackHe pushed a commit to ProjectsByJackHe/ray that referenced this pull request May 4, 2023
…es in the current batch format with zero copies (ray-project#33562)

This PR is a cleanup of ray-project#33536

It uses "None" instead of "zero-copy" as a batch format, since None has a similar meaning for batch_size, where it means a system-chosen batch size. Here "None" also means the system chosen optimal batch format.

Signed-off-by: Jack He <[email protected]>
ProjectsByJackHe pushed a commit to ProjectsByJackHe/ray that referenced this pull request May 4, 2023
…project#324… (ray-project#33601)

The failure in rllib should have been fixed by ray-project#33562

Verified with `python -m pytest rllib/core/learner/torch/tests/test_torch_learner.py::TestLearner::test_end_to_end_update`.

Signed-off-by: Jack He <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants