[runtime env] support eager_install in runtime env #17949

SongGuyang · 2021-08-19T13:49:53Z

Why are these changes needed?

Support prepare runtime env eagerly before the workers are leased.

    runtime_env = {"conda": {"dependencies": ["toolz"]}, "eager_install": True}
    ray.init(runtime_env=runtime_env)

Future work:

Support actor level eagerly install:

Actor.options(runtime_env={"conda": {...}, "eager_install": True}).remote()

architkulkarni · 2021-08-19T18:23:19Z

src/ray/raylet/worker_pool.cc

 bool successful, const std::string &serialized_runtime_env_context) {
 if (successful) {
 start_worker_process_fn(task_spec, state, {}, false,
 task_spec.SerializedRuntimeEnv(),
 serialized_runtime_env_context, callback);
 } else {
- RAY_LOG(WARNING) << "Couldn't create a runtime environment for task "
- << task_spec.TaskId() << ". The runtime environment was "


It looks like the task-specific failure message here was replaced by a job-specific error message below. Should we pass the task ID through here? If it will make the code messy I don't feel strongly about it, the most useful thing is the serialized runtime env, which is still being printed.

Good point. Add a log here ce76c5c#diff-4c19d6ebd8575189e9439fa5646b567546dc54f64d576a959ce1bb43902d1a03R1019

architkulkarni · 2021-08-19T18:31:20Z

python/ray/tests/test_runtime_env_complicated.py

+def test_job_config_conda_env_eagerly(conda_envs, shutdown_only):
+ runtime_env = {"conda": f"package-{REQUEST_VERSIONS[0]}"}
+ job_config = ray.job_config.JobConfig(
+ runtime_env=runtime_env, prepare_runtime_env_eagerly=True)


I think we would like to move runtime_env out of the JobConfig and just have it specified in ray.init(). runtime_env is already supported as a keyword argument in ray.init().

Is it possible to make this feature work with ray.init(runtime_env=runtime_env, job_config=ray.job_config.JobConfig(prepare_runtime_env_eagerly=True)? Or even simpler, ray.init(runtime_env=runtime_env, prepare_runtime_env_eagerly=True)? cc @edoakes, what are your thoughts on the API here?

Yep, I'm also confused that there are two fields to set runtime_env. I agree that we should reserve only one. Let's make it clearly.

ericl

@edoakes @SongGuyang what's the long-term plan for this feature? Should we have it on by default?

SongGuyang · 2021-08-20T10:33:47Z

@ericl It's a trade-off. Eager mode is much friendlier for large scale jobs or time-delay sensitive jobs. But it we bring downloading storm in cluster. We should determine to set this flag in different scenes. And in AntGroup, most of the jobs will use eager mode because we have divided the nodes into different namespace. The downloads will happen in one namespace inside. But recently, we took some little scale jobs. I think we also need lazy mode to decrease the downloads.

ericl · 2021-08-20T17:02:01Z

Hmm, I see. If we want to support both eager and on-demand loading, does it make sense to just add a "install runtime_env" command to Ray?

I find setting this as a job config kind of confusing, since a job can have multiple runtime envs within it, and it's not clear if all of them are eagerly loaded or just the top-level one.

In that case an API like the following seems more explicit:

ray.init()
for envs in ["env.yaml", "b.yaml", "c.yaml"]:
   ray.install_runtime_env(env)

But I don't have any objection to this PR as is.

edoakes · 2021-08-20T23:00:56Z

@ericl I think it would likely make sense to turn this on by default at the job level in the (near) future. Agree that it's a bit weird for individual tasks/actors, but maybe we could just have a flag at that level too to eagerly install it when defined? If that's what we want to do, maybe this should be a property of the env itself (maybe a bit weird but it'd work):

ray.init(runtime_env={"conda": {...}, "eager_install": True})

Actor.options(runtime_env={"conda": {...}, "eager_install": True}).remote()

@SongGuyang does this handle the case when a new node is added to the cluster (i.e., via autoscaling) while the job is running? Will we eagerly install in that case? Not sure how the "HandleJobStarted" callback works here.

ericl · 2021-08-20T23:12:44Z

Got it, eager install for job level makes sense.

…

On Fri, Aug 20, 2021, 4:01 PM Edward Oakes ***@***.***> wrote: @ericl <https://github.com/ericl> I think it would likely make sense to turn this on by default at the job level in the (near) future. Agree that it's a bit weird for individual tasks/actors, but maybe we could just have a flag at that level too to eagerly install it when defined? If that's what we want to do, maybe this should be a property of the env itself (maybe a bit weird but it'd work): ray.init(runtime_env={"conda": {...}, "eager_install": True}) Actor.options(runtime_env={"conda": {...}, "eager_install": True}).remote() @SongGuyang <https://github.com/SongGuyang> does this handle the case when a new node is added to the cluster (i.e., via autoscaling) while the job is running? Will we eagerly install in that case? Not sure how the "HandleJobStarted" callback works here. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#17949 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAADUSXRUQFGTOD2GE7KLQDT53NDHANCNFSM5COIVEIA> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email> .

SongGuyang · 2021-08-27T10:22:24Z

@edoakes The way of runtime_env={"conda": {...}, "eager_install": True} is good for me, thanks. But I can't implement task/actor level in current PR because we need to add a new RPC from worker to GCS to publish this install message. I will make an independent PR to do this (issue).
And about the case of new node added, I think we already handle this. GCS will publish all job table data to new node and HandleJobStarted could also been triggered. The related code is here https://github.com/ray-project/ray/blob/master/src/ray/gcs/gcs_client/service_based_accessor.cc#L274.

architkulkarni · 2021-08-31T17:39:19Z

There seems to be a relevant test failure: https://buildkite.com/ray-project/ray-builders-pr/builds/11915#1091a7fe-f8bb-42a7-9d02-cf9b5e9878b2/6-5919

SongGuyang · 2021-09-28T08:28:36Z

@architkulkarni @edoakes Sorry for my late update. Please review current PR again.

architkulkarni

Looks great @SongGuyang! Just one question, it looks like this PR introduces a cache of serialized_runtime_env -> RuntimeEnvContext inside the WorkerPool, but we already have the same kind of cache inside the RuntimeEnvAgent:

ray/dashboard/modules/runtime_env/runtime_env_agent.py

Line 54 in bf6e508

self._env_cache: Dict[str, CreatedEnvResult] = dict()

. Is there still some benefit for having this extra cache at the level of WorkerPool? If not maybe we should remove it.

@ericl could we please get codeowner approval from you? (Asking you because the other codeowners might not have as much context)

SongGuyang · 2021-10-01T02:40:16Z

@architkulkarni Yep, the original idea was to reduce RPC from raylet to agent. But If we consider the GC and Cache strategies, maybe it will bring some new issues. Let me try to remove it and add it in future if needed.

SongGuyang · 2021-10-01T16:02:55Z

@ericl Can you help approve and merge this PR？

wuisawesome

proto changes look compatible

support prepare_runtime_env_eagerly

dde4742

SongGuyang requested review from architkulkarni and edoakes August 19, 2021 13:49

SongGuyang requested review from AmeerHajAli, ericl, pcmoritz, robertnishihara and wuisawesome as code owners August 19, 2021 13:49

Merge branch 'master' into dev_prepare_runtime_env_eagerly

d8b5324

edoakes assigned architkulkarni and edoakes Aug 19, 2021

architkulkarni reviewed Aug 19, 2021

View reviewed changes

ericl reviewed Aug 19, 2021

View reviewed changes

address comments

ce76c5c

SongGuyang linked an issue Aug 23, 2021 that may be closed by this pull request

[runtime env] Eager mode of runtime env creation. #16801

Closed

architkulkarni added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Aug 31, 2021

SongGuyang added 2 commits September 26, 2021 20:51

Merge branch 'master' into dev_prepare_runtime_env_eagerly

946f234

address comments

0d15423

SongGuyang requested a review from raulchen as a code owner September 27, 2021 13:34

SongGuyang added 2 commits September 28, 2021 15:24

fix

84dd00e

fix

a484717

SongGuyang removed the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Sep 28, 2021

SongGuyang added 5 commits September 28, 2021 19:34

fix lint

c01b80a

fix

b3c31b7

Merge branch 'master' into dev_prepare_runtime_env_eagerly

e73f4d9

Merge branch 'master' into dev_prepare_runtime_env_eagerly

66bab1f

fix from master

5a57efa

architkulkarni approved these changes Sep 30, 2021

View reviewed changes

SongGuyang added 2 commits October 1, 2021 10:50

remove runtime env cache

aea96c1

fix

278c133

SongGuyang changed the title ~~[runtime env] support prepare_runtime_env_eagerly~~ [runtime env] support eager_install in runtime env Oct 3, 2021

wuisawesome approved these changes Oct 4, 2021

View reviewed changes

Merge branch 'master' into dev_prepare_runtime_env_eagerly

3af60db

kfstorm merged commit bae543c into ray-project:master Oct 9, 2021

kfstorm deleted the dev_prepare_runtime_env_eagerly branch October 9, 2021 10:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[runtime env] support eager_install in runtime env #17949

[runtime env] support eager_install in runtime env #17949

SongGuyang commented Aug 19, 2021 •

edited

Loading

architkulkarni Aug 19, 2021

SongGuyang Aug 20, 2021

architkulkarni Aug 19, 2021

SongGuyang Aug 20, 2021

ericl left a comment •

edited

Loading

SongGuyang commented Aug 20, 2021

ericl commented Aug 20, 2021 •

edited

Loading

edoakes commented Aug 20, 2021

ericl commented Aug 20, 2021 via email

SongGuyang commented Aug 27, 2021

architkulkarni commented Aug 31, 2021

SongGuyang commented Sep 28, 2021

architkulkarni left a comment •

edited

Loading

SongGuyang commented Oct 1, 2021

SongGuyang commented Oct 1, 2021

wuisawesome left a comment

[runtime env] support eager_install in runtime env #17949

[runtime env] support eager_install in runtime env #17949

Conversation

SongGuyang commented Aug 19, 2021 • edited Loading

Why are these changes needed?

architkulkarni Aug 19, 2021

Choose a reason for hiding this comment

SongGuyang Aug 20, 2021

Choose a reason for hiding this comment

architkulkarni Aug 19, 2021

Choose a reason for hiding this comment

SongGuyang Aug 20, 2021

Choose a reason for hiding this comment

ericl left a comment • edited Loading

Choose a reason for hiding this comment

SongGuyang commented Aug 20, 2021

ericl commented Aug 20, 2021 • edited Loading

edoakes commented Aug 20, 2021

ericl commented Aug 20, 2021 via email

SongGuyang commented Aug 27, 2021

architkulkarni commented Aug 31, 2021

SongGuyang commented Sep 28, 2021

architkulkarni left a comment • edited Loading

Choose a reason for hiding this comment

SongGuyang commented Oct 1, 2021

SongGuyang commented Oct 1, 2021

wuisawesome left a comment

Choose a reason for hiding this comment

SongGuyang commented Aug 19, 2021 •

edited

Loading

ericl left a comment •

edited

Loading

ericl commented Aug 20, 2021 •

edited

Loading

architkulkarni left a comment •

edited

Loading