Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Call ray.put in ray.init() to speed up first object store access. #5685

Merged
merged 1 commit into from
Sep 15, 2019

Conversation

robertnishihara
Copy link
Collaborator

One problem that users often run into is that the first time they do something with Ray it is very slow. The second time is much faster. This applies to the first usage of any worker and not just the driver. The reason is that the first time a driver or worker accesses the object store (e.g., by calling ray.put or by completing a task), it takes about half a second (presumably to memory map the a large file).

Note that instead of calling ray.put, I tried just calling plasma_client.put instead. However, that only sped things up by a factor of 2.

import ray

ray.init()

%time ray.get(ray.put(1))

Before this PR: The timed line takes 500+ milliseconds.
Using plasma_client.put instead of ray.put: The timed line takes 250+ milliseconds (I'm not 100% sure why).
After this PR: The timed line takes 700 microseconds.

Note that ray.init() gets slower in this PR.

It's important that this happens on workers as well as the driver.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/16967/
Test FAILed.

Copy link
Contributor

@edoakes edoakes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. The behavior with the plasma client is odd, though.

@robertnishihara
Copy link
Collaborator Author

Jenkins, retest this please.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/16977/
Test FAILed.

@robertnishihara
Copy link
Collaborator Author

Jenkins, retest this please.

@ericl
Copy link
Contributor

ericl commented Sep 11, 2019

Should we also do

@ray.remote
def f():
    return 1

ray.get([f.remote() for _ in range(multiprocessing.cpu_count())])

to warm up the workers? That might be overkill but otherwise we spend a lot of time waiting for workers to start as well.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/16983/
Test FAILed.

@ericl
Copy link
Contributor

ericl commented Sep 12, 2019

Re: plasma put, maybe that's because ray put also does a get afterwards to pin the object.

@robertnishihara
Copy link
Collaborator Author

@ericl good question, the ray.put introduced in this PR should happen on all of the workers as well (though I should verify that before merging). But there might be some other things that need to be warmed up (e.g., the raylet to object store connection).

@robertnishihara
Copy link
Collaborator Author

robertnishihara commented Sep 13, 2019

Let's hold off on merging, some initial commands are still taking longer than expected, and I need to investigate that.

EDIT: The issue was just that I was testing on the wrong branch.

@robertnishihara
Copy link
Collaborator Author

@ericl I confirmed that this PR also warms up the workers.

@pcmoritz pcmoritz merged commit 74a34b7 into ray-project:master Sep 15, 2019
@robertnishihara robertnishihara deleted the touchplasmastore branch September 15, 2019 04:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants