-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[doc][clusters] add doc for setting up Ray and K8s #39408
Merged
Merged
Changes from 11 commits
Commits
Show all changes
15 commits
Select commit
Hold shift + click to select a range
1b59b50
add doc for setting up Ray and K8s
angelinalg 3690570
updated diagrams and addressed some feedback
angelinalg bd184a1
update-text
richardliaw efbea48
update-text
richardliaw 9355ce1
update
richardliaw 586d584
update
richardliaw 888adb1
up
richardliaw 0da9c87
update diagrams
angelinalg 9f20513
Merge branch 'master' into dep-doc
angelinalg e42afc9
Merge branch 'dep-doc' of github.com:angelinalg/ray into dep-doc
richardliaw 7839486
rename
richardliaw 75f14be
copy edit
angelinalg c858fa6
address feedback and copy edit
angelinalg 63e246e
add to the User Guide index page
angelinalg ebea6b0
Merge branch 'master' into dep-doc
angelinalg File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could you help me refresh this once more? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,83 @@ | ||
(kuberay-storage)= | ||
|
||
# Storage and dependencies best practices with KubeRay | ||
|
||
This document contains recommendations for setting up storage and handling application dependencies for your Ray deployment on Kubernetes. | ||
|
||
When you set up Ray on Kubernetes, the [KubeRay documentation](kuberay-quickstart) provides an overview of how to configure the operator to execute and manage the Ray cluster lifecycle. | ||
However, administrators may still have questions with respect to actual user workflows. For example: | ||
|
||
* How do I ship or run code on the Ray cluster? | ||
* What type of storage system should I set up for artifacts? | ||
* How do I handle package dependencies for your application? | ||
|
||
The answers to these questions will vary between development and production. This table summarizes the recommended setup for both situations: | ||
|
||
| | Interactive Development | Production | | ||
|---|---|---| | ||
| Cluster Configuration | KubeRay YAML | KubeRay YAML | | ||
| Code | Run driver or Jupyter notebook on head node | Bake code into Docker image | | ||
| Artifact Storage | Set up an EFS <br /> or <br /> Cloud Storage (S3, GS) | Set up an EFS <br /> or <br /> Cloud Storage (S3, GS) | | ||
| Package Dependencies | Install onto NFS <br /> or <br /> Use runtime environments | Bake into docker image | | ||
|
||
Table 1: Table comparing recommended setup for development and production. | ||
|
||
## Interactive development | ||
|
||
To provide an interactive development environment for data scientists and ML practitioners, we recommend setting up the code, storage, and dependencies in a way that reduces context switches for developers and shortens iteration times. | ||
|
||
```{eval-rst} | ||
.. image:: ../images/interactive-dev.png | ||
:align: center | ||
.. | ||
Find the source document here (https://whimsical.com/clusters-P5Y6R23riCuNb6xwXVXN72) | ||
``` | ||
|
||
### Storage | ||
|
||
Use one of these two standard solutions for artifact and log storage during the development process, depending on your use case: | ||
|
||
* POSIX-compliant network file storage (like NFS and EFS): This approach is useful when you want to have artifacts or dependencies accessible across different nodes with low latency. For example, experiment logs of different models trained on different Ray tasks. | ||
* Cloud storage (like AWS S3 or GCP GS): This approach is useful for large artifacts or datasets that you need to access with high throughput. | ||
|
||
Ray's AI libraries such as Ray Data, Ray Train, and Ray Tune come with out-of-the-box capabilities to read and write from cloud storage and local/networked storage. | ||
### Driver script | ||
|
||
Run the main (driver) script on the head node of the cluster. Ray Core and library programs often assume that the driver is located on the head node and take advantage of the local storage. For example, Ray Tune will by default generate log files on the head node. | ||
|
||
A typical workflow can look like this: | ||
|
||
* Start a Jupyter server on the head node | ||
* SSH onto the head node and run the driver script or application there | ||
* Use the Ray Job Submission client to submit code from a local machine onto a cluster | ||
|
||
### Dependencies | ||
|
||
For local dependencies (for example, if you’re working in a mono-repo), or external dependencies (like a pip package), use one of the following options: | ||
|
||
* Put the code and install the packages onto your NFS. The benefit is that you can quickly interact with the rest of the codebase and dependencies without shipping it across a cluster every time. | ||
* Use the `runtime env` with the [Ray Job Submission Client](ray.job_submission.JobSubmissionClient), which can pull down code from S3 or ship code from your local working directory onto the remote cluster. | ||
* Bake remote and local dependencies into a published Docker image that all nodes will use ([guide](serve-custom-docker-images)). This is the most common way to deploy applications onto [Kubernetes](https://kube.academy/courses/building-applications-for-kubernetes), but it is also the highest friction option. | ||
|
||
## Production | ||
|
||
Our recommendations regarding production are more aligned with standard Kubernetes best practices. For production, we suggest the following configuration. | ||
|
||
|
||
```{eval-rst} | ||
.. image:: ../images/production.png | ||
:align: center | ||
.. | ||
Find the source document here (https://whimsical.com/clusters-P5Y6R23riCuNb6xwXVXN72) | ||
``` | ||
|
||
|
||
### Storage | ||
|
||
The choice of storage system remains the same across development and production. | ||
|
||
### Code and Dependencies | ||
|
||
Bake your code, remote, and local dependencies into a published Docker image for all nodes in the cluster. This is the most common way to deploy applications onto [Kubernetes](https://kube.academy/courses/building-applications-for-kubernetes). Here is a [guide](serve-custom-docker-images) for doing so. | ||
|
||
Using cloud storage and the `runtime_env` is a less preferred method but still viable as it may not be as reproducible as the container path. In this case, use the runtime environment option to download zip files containing code and other private modules from cloud storage, in addition to specifying the pip packages needed to run your application. |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you help me refresh this one as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done