[WIP] Bulk executor initial implementation #30903

ericl · 2022-12-05T22:38:38Z

Why are these changes needed?

Initial implementation of ray-project/enhancements#18

Original prototype: https://github.com/ray-project/ray/pull/30222/files

TODO:

Merge plan:

interfaces.py
memory debug utils
operator implementations with unit tests
bulk executor implementation with unit tests
legacy integration and flag for enabling
debug nightly test / regressions
enable by default

Signed-off-by: Eric Liang <[email protected]>

ericl · 2022-12-14T00:20:18Z

Some progress: now passing a good portion of dataset tests:

FAILED test_dataset.py::test_bulk_lazy_eval_split_mode[False] - AssertionError: (ObjectRef(00ffffffffffffffffffffffffffffffffffffff0100000001000000), BlockMetadata(num_rows=None, size_bytes=10, schema...
FAILED test_dataset.py::test_bulk_lazy_eval_split_mode[True] - AssertionError: (ObjectRef(00ffffffffffffffffffffffffffffffffffffff0100000009000000), BlockMetadata(num_rows=None, size_bytes=10, schema=...
FAILED test_dataset.py::test_basic_actors[False] - NotImplementedError
FAILED test_dataset.py::test_basic_actors[True] - NotImplementedError
FAILED test_dataset.py::test_callable_classes - NotImplementedError
FAILED test_dataset.py::test_convert_to_pyarrow - ModuleNotFoundError: No module named 'dask.dataframe'
FAILED test_dataset.py::test_iter_batches_local_shuffle[arrow-True] - ray.exceptions.RayTaskError: ray::_map_task() (pid=177449, ip=10.103.244.198)
FAILED test_dataset.py::test_iter_batches_local_shuffle[pandas-True] - ray.exceptions.ObjectFreedError: Failed to retrieve object 7f90397ff370237bffffffffffffffffffffffff0100000002000000. To see infor...
FAILED test_dataset.py::test_iter_batches_local_shuffle[simple-True] - ray.exceptions.RayTaskError: ray::_map_task() (pid=179727, ip=10.103.244.198)
FAILED test_dataset.py::test_map_batches_extra_args - ray.exceptions.RayTaskError(AssertionError): ray::_map_task() (pid=180287, ip=10.103.244.198)
FAILED test_dataset.py::test_map_batches_actors_preserves_order - NotImplementedError
FAILED test_dataset.py::test_map_batches_block_bundling_auto[1-2] - assert 10 == 5
FAILED test_dataset.py::test_map_batches_block_bundling_auto[1-3] - assert 10 == 4
FAILED test_dataset.py::test_map_batches_block_bundling_auto[1-4] - assert 10 == 3
FAILED test_dataset.py::test_map_batches_block_bundling_auto[2-4] - assert 10 == 5
FAILED test_dataset.py::test_map_batches_block_bundling_auto[1-5] - assert 10 == 2
FAILED test_dataset.py::test_map_batches_block_bundling_auto[2-5] - assert 10 == 5
FAILED test_dataset.py::test_map_batches_block_bundling_auto[1-6] - assert 12 == 2
FAILED test_dataset.py::test_map_batches_block_bundling_auto[2-6] - assert 10 == 4
FAILED test_dataset.py::test_map_batches_block_bundling_auto[3-6] - assert 10 == 5
FAILED test_dataset.py::test_map_batches_block_bundling_auto[1-7] - assert 14 == 2
FAILED test_dataset.py::test_map_batches_block_bundling_auto[2-7] - assert 10 == 4
FAILED test_dataset.py::test_map_batches_block_bundling_auto[3-7] - assert 10 == 5
FAILED test_dataset.py::test_map_batches_block_bundling_skewed_manual[block_sizes0-3-1] - assert 2 == 1
FAILED test_dataset.py::test_map_batches_block_bundling_skewed_manual[block_sizes1-3-2] - assert 3 == 2
FAILED test_dataset.py::test_map_batches_block_bundling_skewed_manual[block_sizes2-4-3] - assert 4 == 3
FAILED test_dataset.py::test_map_batches_block_bundling_skewed_manual[block_sizes3-4-2] - assert 4 == 2
FAILED test_dataset.py::test_map_batches_block_bundling_skewed_manual[block_sizes5-4-1] - assert 4 == 1
FAILED test_dataset.py::test_map_batches_block_bundling_skewed_manual[block_sizes6-4-2] - assert 4 == 2
FAILED test_dataset.py::test_map_batches_block_bundling_skewed_auto[block_sizes6-2] - assert 2 == 1
FAILED test_dataset.py::test_map_batches_block_bundling_skewed_auto[block_sizes28-3] - assert 2 == 1
FAILED test_dataset.py::test_map_batches_block_bundling_skewed_auto[block_sizes29-3] - assert 2 == 1
FAILED test_dataset.py::test_map_batches_block_bundling_skewed_auto[block_sizes34-3] - assert 2 == 1
FAILED test_dataset.py::test_map_batches_block_bundling_skewed_auto[block_sizes64-3] - assert 3 == 1
FAILED test_dataset.py::test_map_batches_block_bundling_skewed_auto[block_sizes65-3] - assert 3 == 2
FAILED test_dataset.py::test_map_batches_block_bundling_skewed_auto[block_sizes66-3] - assert 3 == 2
FAILED test_dataset.py::test_map_batches_block_bundling_skewed_auto[block_sizes67-3] - assert 3 == 2
FAILED test_dataset.py::test_map_batches_block_bundling_skewed_auto[block_sizes68-3] - assert 3 == 2
FAILED test_dataset.py::test_map_batches_block_bundling_skewed_auto[block_sizes69-3] - assert 3 == 2
FAILED test_dataset.py::test_map_batches_block_bundling_skewed_auto[block_sizes70-3] - assert 3 == 2
FAILED test_dataset.py::test_map_batches_block_bundling_skewed_auto[block_sizes71-3] - assert 3 == 2
FAILED test_dataset.py::test_map_batches_block_bundling_skewed_auto[block_sizes72-3] - assert 3 == 2
FAILED test_dataset.py::test_map_batches_block_bundling_skewed_auto[block_sizes73-3] - assert 3 == 2
FAILED test_dataset.py::test_map_batches_block_bundling_skewed_auto[block_sizes74-3] - assert 3 == 2
FAILED test_dataset.py::test_map_batches_block_bundling_skewed_auto[block_sizes75-3] - assert 3 == 2
FAILED test_dataset.py::test_map_batches_block_bundling_skewed_auto[block_sizes100-3] - assert 3 == 2
FAILED test_dataset.py::test_map_batches_block_bundling_skewed_auto[block_sizes101-3] - assert 3 == 2
FAILED test_dataset.py::test_map_batches_block_bundling_skewed_auto[block_sizes102-3] - assert 3 == 2
FAILED test_dataset.py::test_map_batches_block_bundling_skewed_auto[block_sizes103-3] - assert 3 == 2
FAILED test_dataset.py::test_map_batches_block_bundling_skewed_auto[block_sizes104-3] - assert 3 == 2
FAILED test_dataset.py::test_map_batches_block_bundling_skewed_auto[block_sizes105-3] - assert 3 == 2
FAILED test_dataset.py::test_map_batches_block_bundling_skewed_auto[block_sizes106-3] - assert 3 == 2
FAILED test_dataset.py::test_map_batches_block_bundling_skewed_auto[block_sizes136-3] - assert 3 == 2
FAILED test_dataset.py::test_map_batches_block_bundling_skewed_auto[block_sizes137-3] - assert 3 == 2
FAILED test_dataset.py::test_map_batches_block_bundling_skewed_auto[block_sizes142-3] - assert 3 == 2
FAILED test_dataset.py::test_map_batches_block_bundling_skewed_auto[block_sizes172-3] - assert 3 == 2
FAILED test_dataset.py::test_map_batches_block_bundling_skewed_auto[block_sizes173-3] - assert 3 == 2
FAILED test_dataset.py::test_map_batches_block_bundling_skewed_auto[block_sizes178-3] - assert 3 == 2
FAILED test_dataset.py::test_map_batches_block_bundling_skewed_auto[block_sizes208-3] - assert 3 == 2
FAILED test_dataset.py::test_map_batches_block_bundling_skewed_auto[block_sizes209-3] - assert 3 == 2
FAILED test_dataset.py::test_map_batches_block_bundling_skewed_auto[block_sizes214-3] - assert 3 == 2
FAILED test_dataset.py::test_map_batches_block_bundling_skewed_auto[block_sizes244-3] - assert 3 == 2
FAILED test_dataset.py::test_map_batches_block_bundling_skewed_auto[block_sizes245-3] - assert 3 == 2
FAILED test_dataset.py::test_map_batches_block_bundling_skewed_auto[block_sizes250-3] - assert 3 == 2
FAILED test_dataset.py::test_from_dask - ModuleNotFoundError: No module named 'dask.dataframe'
FAILED test_dataset.py::test_to_dask[pandas] - ModuleNotFoundError: No module named 'dask.core'
FAILED test_dataset.py::test_to_dask[arrow] - ModuleNotFoundError: No module named 'dask.core'
FAILED test_dataset.py::test_to_dask_tensor_column_cast_pandas - ModuleNotFoundError: No module named 'dask.dataframe'
FAILED test_dataset.py::test_to_dask_tensor_column_cast_arrow - ModuleNotFoundError: No module named 'dask.dataframe'
FAILED test_dataset.py::test_from_modin - ImportError: cannot import name 'FilePathOrBuffer' from 'pandas._typing' (/home/eric/.local/lib/python3.8/site-packages/pandas/_typing.py)
FAILED test_dataset.py::test_to_modin - ImportError: cannot import name 'FilePathOrBuffer' from 'pandas._typing' (/home/eric/.local/lib/python3.8/site-packages/pandas/_typing.py)
FAILED test_dataset.py::test_map_batches_combine_empty_blocks - assert 30 == 3
FAILED test_dataset.py::test_random_shuffle[True-True] - ray.exceptions.RayTaskError: ray::_map_task() (pid=184295, ip=10.103.244.198)
FAILED test_dataset.py::test_random_shuffle[False-True] - ray.exceptions.RayTaskError: ray::_map_task() (pid=185254, ip=10.103.244.198)
FAILED test_dataset.py::test_dataset_retry_exceptions - AssertionError: (ObjectRef(00ffffffffffffffffffffffffffffffffffffff01000000ad6a0000), BlockMetadata(num_rows=None, size_bytes=20, schema=None, i...
FAILED test_dataset.py::test_split_is_not_disruptive - ray.exceptions.ObjectFreedError: Failed to retrieve object b6aef734ef5f822bffffffffffffffffffffffff0100000002000000. To see information about whe...
FAILED test_dataset.py::test_actor_pool_strategy_apply_interrupt - AssertionError: Legacy backend off
FAILED test_dataset.py::test_actor_pool_strategy_default_num_actors - NotImplementedError
================================================================== 78 failed, 539 passed, 3 skipped, 5010 warnings in 431.51s (0:07:11) ===================================================================

Signed-off-by: Eric Liang <[email protected]>

jianoaix · 2023-01-23T23:38:56Z

python/ray/data/_internal/execution/operators/actor_pool_submitter.py

@@ -28,8 +28,6 @@ def __init__(
 ray_remote_args: Remote arguments for the Ray actors to be created.
 pool_size: The size of the actor pool.
 """
- if "num_cpus" not in ray_remote_args:


@ericl I reverted this, since it looks not compatible with the requirement that "num_gpus and num_cpus cannot be both specified" (user may want to run actors on gpus and then they won't be able tp specify cpus).

…executorimpl

jianoaix · 2023-01-24T23:31:22Z

All CI tests are passing now. However, we seem to have a release test failure that may be relevant: https://buildkite.com/ray-project/release-tests-pr/builds/26265#0185e158-7979-4c94-be61-22da62d208ca

jianoaix · 2023-01-24T23:40:54Z

All CI tests are passing now. However, we seem to have a release test failure that may be relevant: https://buildkite.com/ray-project/release-tests-pr/builds/26265#0185e158-7979-4c94-be61-22da62d208ca

It seems likely due to the lack of autoscaling for actor pool, as it had just one actor through the entire run, @clarkzinzow

And since it's about the same issue as TODO in unit test, shall we note this as a debt to fix soon and merge this PR? @ericl

2023-01-23 20:47:03,018	INFO bulk_executor.py:39 -- Executing DAG InputDataBuffer[Input] -> MapOperator[map_batches]
--
  |  
  | 0%\|          \| 0/363 [00:00<?, ?it/s]
  | map_batches:   0%\|          \| 0/363 [00:00<?, ?it/s]
  | map_batches:   0%\|          \| 1/363 [00:18<1:52:33, 18.66s/it]
  | map_batches, 1 actors:   0%\|          \| 1/363 [00:18<1:52:33, 18.66s/it](raylet, ip=172.31.203.20) Spilled 34054 MiB, 110 objects, write throughput 637 MiB/s.
  |  
  | map_batches, 1 actors:   1%\|          \| 2/363 [00:37<1:54:07, 18.97s/it]
  | map_batches, 1 actors:   1%\|          \| 3/363 [01:28<3:21:07, 33.52s/it]
  | map_batches, 1 actors:   1%\|          \| 4/363 [01:42<2:35:06, 25.92s/it]
  | map_batches, 1 actors:   1%\|▏         \| 5/363 [01:44<1:42:34, 17.19s/it]
  | map_batches, 1 actors:   2%\|▏         \| 6/363 [01:45<1:10:09, 11.79s/it]
  | map_batches, 1 actors:   2%\|▏         \| 7/363 [01:48<51:34,  8.69s/it]
  | map_batches, 1 actors:   2%\|▏         \| 8/363 [01:49<37:42,  6.37s/it]
  | map_batches, 1 actors:   2%\|▏         \| 9/363 [01:51<28:59,  4.91s/it]
  | map_batches, 1 actors:   3%\|▎         \| 10/363 [01:52<22:21,  3.80s/it]
  | map_batches, 1 actors:   3%\|▎         \| 11/363 [01:54<19:05,  3.25s/it]
  | map_batches, 1 actors:   3%\|▎         \| 12/363 [01:56<15:44,  2.69s/it]
  | map_batches, 1 actors:   4%\|▎         \| 13/363 [01:58<14:29,  2.49s/it]
  | map_batches, 1 actors:   4%\|▍         \| 14/363 [01:59<12:23,  2.13s/it]
  | map_batches, 1 actors:   4%\|▍         \| 15/363 [02:01<11:58,  2.06s/it]
  | map_batches, 1 actors:   4%\|▍         \| 16/363 [02:02<10:36,  1.84s/it]
  | map_batches, 1 actors:   5%\|▍         \| 17/363 [02:04<10:42,  1.86s/it]
  | map_batches, 1 actors:   5%\|▍         \| 18/363 [02:05<09:43,  1.69s/it]
  | map_batches, 1 actors:   5%\|▌         \| 19/363 [02:07<10:04,  1.76s/it]
  | map_batches, 1 actors:   6%\|▌         \| 20/363 [02:09<09:26,  1.65s/it]
  | map_batches, 1 actors:   6%\|▌         \| 21/363 [02:11<09:51,  1.73s/it]
  | map_batches, 1 actors:   6%\|▌         \| 22/363 [02:12<09:17,  1.64s/it]
  | map_batches, 1 actors:   6%\|▋         \| 23/363 [02:14<09:54,  1.75s/it]
  | map_batches, 1 actors:   7%\|▋         \| 24/363 [02:15<09:07,  1.62s/it]
  | map_batches, 1 actors:   7%\|▋         \| 25/363 [02:17<08:55,  1.58s/it]
  | map_batches, 1 actors:   7%\|▋         \| 26/363 [02:18<08:56,  1.59s/it]
  | map_batches, 1 actors:   7%\|▋         \| 27/363 [02:20<08:05,  1.45s/it]
  | map_batches, 1 actors:   8%\|▊         \| 28/363 [02:21<08:40,  1.55s/it]
  | map_batches, 1 actors:   8%\|▊         \| 29/363 [02:23<08:14,  1.48s/it]
  | map_batches, 1 actors:   8%\|▊         \| 30/363 [02:25<08:55,  1.61s/it]
  | map_batches, 1 actors:   9%\|▊         \| 31/363 [02:26<08:24,  1.52s/it]
  | map_batches, 1 actors:   9%\|▉         \| 32/363 [02:28<09:01,  1.64s/it]
  | map_batches, 1 actors:   9%\|▉         \| 33/363 [02:29<08:17,  1.51s/it]
  | map_batches, 1 actors:   9%\|▉         \| 34/363 [02:30<08:05,  1.48s/it]
  ......
  | map_batches, 1 actors:  90%\|████████▉ \| 325/363 [09:53<00:43,  1.15s/it]
  | map_batches, 1 actors:  90%\|████████▉ \| 326/363 [09:54<00:38,  1.05s/it]
  | map_batches, 1 actors:  90%\|█████████ \| 327/363 [09:55<00:36,  1.01s/it]
  | map_batches, 1 actors:  90%\|█████████ \| 328/363 [09:56<00:37,  1.06s/it]
  | map_batches, 1 actors:  91%\|█████████ \| 329/363 [09:57<00:37,  1.11s/it]
  | map_batches, 1 actors:  91%\|█████████ \| 330/363 [09:58<00:33,  1.02s/it]
  | map_batches, 1 actors:  91%\|█████████ \| 331/363 [09:59<00:30,  1.05it/s]
  | map_batches, 1 actors:  91%\|█████████▏\| 332/363 [10:00<00:35,  1.15s/it]
  | map_batches, 1 actors:  92%\|█████████▏\| 333/363 [10:01<00:31,  1.04s/it]
  | map_batches, 1 actors:  92%\|█████████▏\| 334/363 [10:02<00:29,  1.00s/it]
  | map_batches, 1 actors:  92%\|█████████▏\| 335/363 [10:04<00:33,  1.18s/it]
  | map_batches, 1 actors:  93%\|█████████▎\| 336/363 [10:05<00:28,  1.07s/it]
  | map_batches, 1 actors:  93%\|█████████▎\| 337/363 [10:05<00:25,  1.01it/s]
  | map_batches, 1 actors:  93%\|█████████▎\| 338/363 [10:06<00:24,  1.04it/s]
  | map_batches, 1 actors:  93%\|█████████▎\| 339/363 [10:08<00:27,  1.13s/it]
  | map_batches, 1 actors:  94%\|█████████▎\| 340/363 [10:09<00:23,  1.03s/it]
  | map_batches, 1 actors:  94%\|█████████▍\| 341/363 [10:09<00:21,  1.01it/s]
  | map_batches, 1 actors:  94%\|█████████▍\| 342/363 [10:11<00:25,  1.20s/it]
  | map_batches, 1 actors:  94%\|█████████▍\| 343/363 [10:12<00:21,  1.08s/it]
  | map_batches, 1 actors:  95%\|█████████▍\| 344/363 [10:13<00:19,  1.03s/it]
  | map_batches, 1 actors:  95%\|█████████▌\| 345/363 [10:15<00:21,  1.20s/it]
  | map_batches, 1 actors:  95%\|█████████▌\| 346/363 [10:15<00:18,  1.08s/it]
  | map_batches, 1 actors:  96%\|█████████▌\| 347/363 [10:16<00:15,  1.00it/s]
  | map_batches, 1 actors:  96%\|█████████▌\| 348/363 [10:17<00:14,  1.03it/s]
  | map_batches, 1 actors:  96%\|█████████▌\| 349/363 [10:19<00:16,  1.16s/it]
  | map_batches, 1 actors:  96%\|█████████▋\| 350/363 [10:19<00:13,  1.05s/it]
  | map_batches, 1 actors:  97%\|█████████▋\| 351/363 [10:20<00:11,  1.02it/s]
  | map_batches, 1 actors:  97%\|█████████▋\| 352/363 [10:22<00:13,  1.20s/it]
  | map_batches, 1 actors:  97%\|█████████▋\| 353/363 [10:23<00:10,  1.08s/it]
  | map_batches, 1 actors:  98%\|█████████▊\| 354/363 [10:24<00:08,  1.01it/s]
  | map_batches, 1 actors:  98%\|█████████▊\| 355/363 [10:25<00:08,  1.03s/it]
  | map_batches, 1 actors:  98%\|█████████▊\| 356/363 [10:26<00:07,  1.11s/it]
  | map_batches, 1 actors:  98%\|█████████▊\| 357/363 [10:27<00:06,  1.02s/it]
  | map_batches, 1 actors:  99%\|█████████▊\| 358/363 [10:28<00:04,  1.02it/s]
  | map_batches, 1 actors:  99%\|█████████▉\| 359/363 [10:29<00:04,  1.17s/it]
  | map_batches, 1 actors:  99%\|█████████▉\| 360/363 [10:30<00:03,  1.06s/it]
  | map_batches, 1 actors:  99%\|█████████▉\| 361/363 [10:31<00:02,  1.01s/it]
  | map_batches, 1 actors: 100%\|█████████▉\| 362/363 [10:32<00:01,  1.16s/it]
  | map_batches, 1 actors: 100%\|██████████\| 363/363 [10:33<00:00,  1.04s/it]
  | map_batches, 0 actors: 100%\|██████████\| 363/363 [10:33<00:00,  1.04s/it]
  | map_batches, 0 actors: 100%\|██████████\| 363/363 [10:33<00:00,  1.75s/it]
  | run_xgboost_prediction takes 671.7315916329999 seconds.
  | Results: {'training_time': 781.2928511609999, 'prediction_time': 671.7315916329999}
  | Traceback (most recent call last):
  | File "workloads/xgboost_benchmark.py", line 175, in <module>
  | main(args)
  | File "workloads/xgboost_benchmark.py", line 155, in main
  | f"Batch prediction on XGBoost is taking {prediction_time} seconds, "
  | RuntimeError: Batch prediction on XGBoost is taking 671.7315916329999 seconds, which is longer than expected (450 seconds).

ericl · 2023-01-25T00:17:31Z

Alright. Let's leave the TODO to fix this test (or we can fix this test by increasing the min pool size).

…-project#31283) Add a utility class for tracing object allocation / freeing. This makes it a lot easier to debug memory allocation / freeing issues. This is split out from ray-project#30903 Signed-off-by: tmynn <[email protected]>

…oject#31305) Add the initial operator implementations. This is split out from ray-project#30903 Signed-off-by: tmynn <[email protected]>

…oject#31443) Add the basic bulk executor. This is split out from ray-project#30903 Signed-off-by: tmynn <[email protected]>

…-project#31283) Add a utility class for tracing object allocation / freeing. This makes it a lot easier to debug memory allocation / freeing issues. This is split out from ray-project#30903 Signed-off-by: tmynn <[email protected]>

…oject#31305) Add the initial operator implementations. This is split out from ray-project#30903 Signed-off-by: tmynn <[email protected]>

…oject#31443) Add the basic bulk executor. This is split out from ray-project#30903 Signed-off-by: tmynn <[email protected]>

…work (ray-project#31825) To enable the new bulk execution backend: ray-project#30903 Based on the most recent test (https://buildkite.com/ray-project/oss-ci-build-pr/builds/9947#_), this should be last issue to fix it! (note the failure of Dataset tests is not real as all tests passing, some issue with bazel test)

Initial implementation of ray-project/enhancements#18 Original prototype: https://github.com/ray-project/ray/pull/30222/files Co-authored-by: Clark Zinzow <[email protected]> Co-authored-by: jianoaix <[email protected]>

…ecutor (ray-project#31579) Initial implementation of ray-project/enhancements#18, dependent on ray-project#30903 Streaming execution can be toggled with the following env var: RAY_DATASET_USE_STREAMING_EXECUTOR=0|1.

This is fixing lazy fanout failure in new bulk execution backend (ray-project#30903). It turns out we can have a smaller fix (than making Dataset lazy-only: ray-project#31668) for lazy fanout.

To pass tests to enable bulk execution backend (ray-project#30903).

…work (ray-project#31825) To enable the new bulk execution backend: ray-project#30903 Based on the most recent test (https://buildkite.com/ray-project/oss-ci-build-pr/builds/9947#_), this should be last issue to fix it! (note the failure of Dataset tests is not real as all tests passing, some issue with bazel test)

Initial implementation of ray-project/enhancements#18 Original prototype: https://github.com/ray-project/ray/pull/30222/files Co-authored-by: Clark Zinzow <[email protected]> Co-authored-by: jianoaix <[email protected]>

ericl added 7 commits December 5, 2022 14:31

copy prototype

9e4451e

cleanup

8924a89

wip compatibility

44578ce

Signed-off-by: Eric Liang <[email protected]>

add basic wiring

e0a346a

Signed-off-by: Eric Liang <[email protected]>

works

22504c0

Signed-off-by: Eric Liang <[email protected]>

fix up split handling

0b26570

Signed-off-by: Eric Liang <[email protected]>

refactor legacy compat package

3f0e0cb

Signed-off-by: Eric Liang <[email protected]>

ericl force-pushed the bulk-executor branch from bac87d6 to 3f0e0cb Compare December 6, 2022 22:57

ericl added 3 commits December 6, 2022 16:27

todo move operators fully

eaa46b0

Signed-off-by: Eric Liang <[email protected]>

reorganize opeators

3162f44

stub out actors impl

2136170

ericl force-pushed the bulk-executor branch from 94a64b9 to 2136170 Compare December 7, 2022 01:44

ericl added 3 commits December 6, 2022 17:56

improve legacy integration

38ae324

add str

9f24555

add own block propagation

f33c772

Signed-off-by: Eric Liang <[email protected]>

ericl assigned ericl and unassigned ericl Dec 7, 2022

ericl added 2 commits December 6, 2022 22:04

rename to tasks

bf5288f

Signed-off-by: Eric Liang <[email protected]>

add basic stats

f5efe2c

ericl force-pushed the bulk-executor branch from 4ec0bc4 to f5efe2c Compare December 13, 2022 21:01

ericl added 2 commits December 13, 2022 15:56

implement alltoall

e5790dc

Signed-off-by: Eric Liang <[email protected]>

revert format change

5c7e490

Signed-off-by: Eric Liang <[email protected]>

ericl added 7 commits December 13, 2022 16:22

Merge remote-tracking branch 'upstream/master' into bulk-executor

d6bee3c

fixme

1eb5519

Signed-off-by: Eric Liang <[email protected]>

fix

ec66fd0

Signed-off-by: Eric Liang <[email protected]>

fix own propagation

5aa082b

add debug mem metrics

c8f8c79

Signed-off-by: Eric Liang <[email protected]>

fix block clearing for datasetpipeline

5b2f7ec

add config

00025f5

Signed-off-by: Eric Liang <[email protected]>

jianoaix reviewed Jan 23, 2023

View reviewed changes

jianoaix added 2 commits January 24, 2023 00:43

fix bazel test

06b1ad7

Merge branch 'master' of https://github.com/ray-project/ray into bulk…

3867061

…executorimpl

jianoaix mentioned this pull request Jan 24, 2023

Fixes for the new bulk execution backend #31884

Merged

7 tasks

jianoaix added 3 commits January 24, 2023 19:38

Merge branch 'master' of https://github.com/ray-project/ray into bulk…

7097973

…executorimpl

minimize dif

0b74edf

less diff

a9a66ab

jianoaix marked this pull request as ready for review January 24, 2023 19:43

jianoaix requested review from scv119, jjyao and c21 as code owners January 24, 2023 19:43

disable incremental take test

a265437

jianoaix approved these changes Jan 24, 2023

View reviewed changes

ericl merged commit 877770e into ray-project:master Jan 25, 2023

cassidylaidlaw pushed a commit to cassidylaidlaw/ray that referenced this pull request Mar 28, 2023

Bulk executor impl fixes for tests (ray-project#31781)

97d13e4

To pass tests to enable bulk execution backend (ray-project#30903).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Bulk executor initial implementation #30903

[WIP] Bulk executor initial implementation #30903

ericl commented Dec 5, 2022 •

edited

Loading

ericl commented Dec 14, 2022

jianoaix Jan 23, 2023

jianoaix commented Jan 24, 2023

jianoaix commented Jan 24, 2023 •

edited

Loading

ericl commented Jan 25, 2023

[WIP] Bulk executor initial implementation #30903

[WIP] Bulk executor initial implementation #30903

Conversation

ericl commented Dec 5, 2022 • edited Loading

Why are these changes needed?

ericl commented Dec 14, 2022

jianoaix Jan 23, 2023

Choose a reason for hiding this comment

jianoaix commented Jan 24, 2023

jianoaix commented Jan 24, 2023 • edited Loading

ericl commented Jan 25, 2023

ericl commented Dec 5, 2022 •

edited

Loading

jianoaix commented Jan 24, 2023 •

edited

Loading