[data] Refactor all to all op implementations into a separate file #26585

ericl · 2022-07-14T22:46:11Z

Why are these changes needed?

Pull these op implementations into separate classes for readability. This change also simplifies the implementation of #25708

Signed-off-by: Eric Liang <[email protected]>

ericl · 2022-07-14T22:46:26Z

python/ray/data/_internal/plan.py

@@ -194,6 +194,8 @@ def schema(
 Returns:
 The schema of the output dataset.
 """
+ from ray.data._internal.stage_impl import RandomizeBlocksStage


Circular import problems...

ericl · 2022-07-14T22:46:41Z

python/ray/data/_internal/stage_impl.py

+ from ray.data import Dataset
+
+
+class RepartitionStage(AllToAllStage):


All code moved verbatim except when commented.

ericl · 2022-07-14T22:47:08Z

python/ray/data/_internal/stage_impl.py

+ context = DatasetContext.get_current()
+ if context.use_push_based_shuffle:
+ if output_num_blocks is not None:
+ raise NotImplementedError(


@stephanie-wang there was a bug where we ignored the num_blocks parameter for random_shuffle, FYI. It appears to crash push-based shuffle, so I disabled it for now.

Signed-off-by: Eric Liang <[email protected]>

jianoaix · 2022-07-15T22:52:04Z

python/ray/data/_internal/stage_impl.py

+
+if TYPE_CHECKING:
+ from ray.data import Dataset
+


Nice refactoring. Shall we also pull out the one-to-one stages?

I was going to do that later (since OneToOne stages are basically all the same, having them as explicit different classes is not that useful for optimization purposes).

* master: (35 commits) [data] Refactor all to all op implementations into a separate file (ray-project#26585) [Datasets] Explicitly define Dataset-like APIs in DatasetPipeline class (ray-project#26394) [Serve][Part2] Migrate the tests to use deployment graph api (ray-project#26507) [Serve] Default to EveryNode when starting Serve from REST API (ray-project#26588) Revert "[KubeRay][Autoscaler][Core] Add a flag to disable ray status version check (ray-project#26584)" (ray-project#26597) [air] Add initial benchmark section (ray-project#26608) [Workflow] Remove workflow execution module (ray-project#26504) [air] Add xgboost release test for silver tier(10-node case). (ray-project#26460) Revert "Revert "[serve] Use soft constraint for pinning controller on head node (ray-project#25091)" (ray-project#25857)" (ray-project#25858) [RLlib] Fixes MARWIL release tests (ray-project#26586) [Datasets] Improve read_xxx experience of HTTP file (ray-project#26454) Cleanup ActorContext due to multi actor instances got removed. (ray-project#26497) Print newest_ckpt_path when resuming trial. (ray-project#26561) Fix test_serialization_error_message for pytest 6.x (ray-project#26591) [RLlib] Make DQN update_target use only trainable variables. (ray-project#25226) [RLlib] In env check, step only expected agents. (ray-project#26425) [RLlib] `restart_failed_sub_environments` now works for MA cases and crashes during `reset()`; +more tests and logging; add eval worker sub-env fault tolerance test. (ray-project#26276) [runtime env] plugin refactor[4/n]: remove runtime env protobuf (ray-project#26522) Improve streaming read performance for default configuration. (ray-project#26587) [Dashboard] Fix test dashboard flaky by catch an expected exception (ray-project#26555) ...

…ay-project#26585) Signed-off-by: Your Name <[email protected]>

…ay-project#26585) Signed-off-by: Xiaowei Jiang <[email protected]>

…ay-project#26585) Signed-off-by: Avnish <[email protected]>

…ay-project#26585) Signed-off-by: klwuibm <[email protected]>

…ay-project#26585) Signed-off-by: Frank Luan <[email protected]>

…ay-project#26585) Signed-off-by: Scott Graham <[email protected]>

…ay-project#26585)

…ay-project#26585) Signed-off-by: Stefan van der Kleij <[email protected]>

ericl added 3 commits July 14, 2022 13:53

update

0d97ef5

wip

7e45e2f

Signed-off-by: Eric Liang <[email protected]>

wip

18e15f5

Signed-off-by: Eric Liang <[email protected]>

ericl requested a review from scv119 as a code owner July 14, 2022 22:46

ericl assigned c21 Jul 14, 2022

ericl requested review from clarkzinzow, jjyao and jianoaix as code owners July 14, 2022 22:46

ericl assigned clarkzinzow and jianoaix Jul 14, 2022

ericl commented Jul 14, 2022

View reviewed changes

ericl added 2 commits July 14, 2022 15:48

fix

dcb4978

Signed-off-by: Eric Liang <[email protected]>

fix

4c54e48

Signed-off-by: Eric Liang <[email protected]>

ericl added the tests-ok The tagger certifies test failures are unrelated and assumes personal liability. label Jul 15, 2022

jianoaix approved these changes Jul 15, 2022

View reviewed changes

ericl merged commit cf980c3 into ray-project:master Jul 16, 2022

truelegion47 pushed a commit to truelegion47/ray that referenced this pull request Jul 16, 2022

[data] Refactor all to all op implementations into a separate file (r…

8c23556

…ay-project#26585) Signed-off-by: Your Name <[email protected]>

xwjiang2010 pushed a commit to xwjiang2010/ray that referenced this pull request Jul 19, 2022

[data] Refactor all to all op implementations into a separate file (r…

44d6200

…ay-project#26585) Signed-off-by: Xiaowei Jiang <[email protected]>

avnishn pushed a commit to smorad/ray that referenced this pull request Jul 20, 2022

[data] Refactor all to all op implementations into a separate file (r…

9cd0667

…ay-project#26585) Signed-off-by: Avnish <[email protected]>

klwuibm pushed a commit to yuanchi2807/ray that referenced this pull request Jul 27, 2022

[data] Refactor all to all op implementations into a separate file (r…

6a84e5e

…ay-project#26585) Signed-off-by: klwuibm <[email protected]>

franklsf95 pushed a commit to franklsf95/ray that referenced this pull request Aug 2, 2022

[data] Refactor all to all op implementations into a separate file (r…

e1d5633

…ay-project#26585) Signed-off-by: Frank Luan <[email protected]>

gramhagen pushed a commit to gramhagen/ray that referenced this pull request Aug 15, 2022

[data] Refactor all to all op implementations into a separate file (r…

41fbd87

…ay-project#26585) Signed-off-by: Scott Graham <[email protected]>

gramhagen pushed a commit to gramhagen/ray that referenced this pull request Aug 15, 2022

[data] Refactor all to all op implementations into a separate file (r…

b461378

…ay-project#26585)

Stefan-1313 pushed a commit to Stefan-1313/ray_mod that referenced this pull request Aug 18, 2022

[data] Refactor all to all op implementations into a separate file (r…

8133a69

…ay-project#26585) Signed-off-by: Stefan van der Kleij <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[data] Refactor all to all op implementations into a separate file #26585

[data] Refactor all to all op implementations into a separate file #26585

ericl commented Jul 14, 2022

ericl Jul 14, 2022

ericl Jul 14, 2022

ericl Jul 14, 2022

jianoaix Jul 15, 2022

ericl Jul 16, 2022

		from ray.data import Dataset


		class RepartitionStage(AllToAllStage):

[data] Refactor all to all op implementations into a separate file #26585

[data] Refactor all to all op implementations into a separate file #26585

Conversation

ericl commented Jul 14, 2022

Why are these changes needed?

ericl Jul 14, 2022

Choose a reason for hiding this comment

ericl Jul 14, 2022

Choose a reason for hiding this comment

ericl Jul 14, 2022

Choose a reason for hiding this comment

jianoaix Jul 15, 2022

Choose a reason for hiding this comment

ericl Jul 16, 2022

Choose a reason for hiding this comment