[doc][clusters] add doc for setting up Ray and K8s #39408

angelinalg · 2023-09-07T21:30:36Z

Fill the content gap that provides best practices for two flavors of deployments:

interactive development
production
cc: @richardliaw

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: angelinalg <[email protected]>

doc/source/cluster/kubernetes/user-guides/ray-k8s-setup.md

kevin85421 · 2023-09-07T22:02:22Z

doc/source/cluster/kubernetes/user-guides/ray-k8s-setup.md

+
+# Set up a Ray + Kubernetes cluster
+
+This document contains recommendations for setting up a Ray + Kubernetes cluster for your organization.


The Ray and Kubernetes ecosystem encompasses various aspects. Could you specify which setup instructions are covered by this document?

It seems to be covered by:

This guide covers best practices for these deployment considerations: * Where to ship or run your code on the Ray cluster * Choosing a storage system for artifacts * Package dependencies for your application

should be addressed

kevin85421 · 2023-09-07T22:07:44Z

doc/source/cluster/kubernetes/user-guides/ray-k8s-setup.md

+
+### Storage
+
+Use one of these two standard solutions for artifact and log storage during the development process:


It is inconsistent with the table above. We only mention NFS/EFS in the table under the 'interactive development' column. However, here we reference both NFS/EFS and S3/GS.

kevin85421 · 2023-09-07T22:11:23Z

doc/source/cluster/kubernetes/user-guides/ray-k8s-setup.md

+
+
+```{eval-rst}
+.. image:: ../images/prod.png


This image is inconsistent with the table above. We only mention S3/GS in the table under the 'production' column. However, here we only reference NFS/EFS.

kevin85421 · 2023-09-07T22:11:46Z

doc/source/cluster/kubernetes/user-guides/ray-k8s-setup.md

+
+### Storage
+
+Reading and writing data and artifacts to cloud storage is the most reliable and observable option for production Ray deployments. 


kevin85421 · 2023-09-07T22:12:46Z

doc/source/cluster/kubernetes/user-guides/ray-k8s-setup.md

+
+Bake your code, remote, and local dependencies into a published Docker image for the workers. This is the most common way to deploy applications onto [Kubernetes](https://kube.academy/courses/building-applications-for-kubernetes).
+
+Using Cloud storage and the `runtime_env` is a less preferred method. In this case, use the runtime environment option to download zip files containing code and other private modules from cloud storage, in addition to specifying the pip packages needed to run your application.


Add a sentence to explain why runtime_env is a less preferred method for production.

architkulkarni

Looks good to me, just minor comments

architkulkarni · 2023-09-07T22:28:21Z

doc/source/cluster/kubernetes/user-guides/ray-k8s-setup.md

+
+This document contains recommendations for setting up a Ray + Kubernetes cluster for your organization.
+
+When you set up Ray on Kubernetes, the KubeRay documentation provides an overview of how to configure the operator to execute and manage the Ray cluster lifecycle. This guide complements the KubeRay documentation by providing best practices for effectively using Ray deployments in your organization.


Please link to KubeRay doc

Done! Good point.

architkulkarni · 2023-09-07T22:29:41Z

doc/source/cluster/kubernetes/user-guides/ray-k8s-setup.md

+| Artifact Storage | Set up an EFS | Cloud storage (S3, GS) |
+| Package Dependencies | Install onto NFS <br /> or <br /> Use runtime environments | Bake into docker image |


Maybe spell out EFS, NFS, S3, GS the first time you use them, and/or add links for them

architkulkarni · 2023-09-07T22:30:41Z

doc/source/cluster/kubernetes/user-guides/ray-k8s-setup.md

+| Artifact Storage | Set up an EFS | Cloud storage (S3, GS) |
+| Package Dependencies | Install onto NFS <br /> or <br /> Use runtime environments | Bake into docker image |


Done. Thanks for catching!

architkulkarni · 2023-09-07T22:32:39Z

doc/source/cluster/kubernetes/user-guides/ray-k8s-setup.md

+
+* Start a Jupyter server on the head node
+* SSH onto the head node and run the driver script or application there
+* Use the Ray Job Submission client to submit code from a local machine onto a cluster


It's not clear what these are examples of. I thought of "Here are some examples of ways to run a driver script on the head node", but that doesn't seem to fit well with the first bullet about Jupyter.

should be addressed

Signed-off-by: angelinalg <[email protected]>

richardliaw · 2023-09-07T21:44:18Z

doc/source/cluster/kubernetes/user-guides/ray-k8s-setup.md

+
+## Production
+
+For production, we suggest the following configuration.


Add a motivating comment here for recommendations

should be addressed

richardliaw · 2023-09-07T21:44:29Z

doc/source/cluster/kubernetes/user-guides/ray-k8s-setup.md

+
+This document contains recommendations for setting up a Ray + Kubernetes cluster for your organization.
+
+When you set up Ray on Kubernetes, the KubeRay documentation provides an overview of how to configure the operator to execute and manage the Ray cluster lifecycle. This guide complements the KubeRay documentation by providing best practices for effectively using Ray deployments in your organization.


Add a bit more clarity as to why this doc matters

richardliaw · 2023-09-07T21:44:46Z

doc/source/cluster/kubernetes/user-guides/ray-k8s-setup.md

+| | Interactive Development | Production |
+|---|---|---|
+| Cluster Configuration | KubeRay YAML | KubeRay YAML |
+| Code | Run driver or Jupyter notebook on head node | S3 + runtime envs <br /> OR <br /> Bake code into Docker image (link) |


Remove this link thing / do we need to say more about the docker image setup? or is that common knowledge

I removed the word, link.

Building a Ray image from scratch is not easy, and our image-building CI pipelines are pretty complex. It will be helpful to have a doc in the future.

https://docs.ray.io/en/master/serve/production-guide/docker.html => This is not enough. For example, some users are sensitive to security and want to build the image with different Linux distributions.

doc/source/cluster/kubernetes/user-guides/ray-k8s-setup.md

akshay-anyscale · 2023-09-08T00:08:14Z

doc/source/cluster/kubernetes/user-guides/best-practices-kuberay.md

+
+### Code and Dependencies
+
+Bake your code, remote, and local dependencies into a published Docker image for the workers. This is the most common way to deploy applications onto [Kubernetes](https://kube.academy/courses/building-applications-for-kubernetes).


Do you also want to add a link to how to build it into the docker image? -> https://docs.ray.io/en/master/serve/production-guide/docker.html

richardliaw · 2023-09-08T00:44:21Z

doc/source/cluster/kubernetes/images/production.png

Could you help me refresh this once more?

richardliaw · 2023-09-08T00:44:35Z

doc/source/cluster/kubernetes/images/interactive-dev.png

could you help me refresh this one as well?

Signed-off-by: angelinalg <[email protected]>

kevin85421

Update https://github.com/ray-project/ray/blob/master/doc/source/cluster/kubernetes/user-guides.md
I am not familiar with NFS/EFS. Could you explain why NFS is inside the "Ray cluster" in the interactive-dev.png but outside the "Ray cluster" in the production.png?

Signed-off-by: angelinalg <[email protected]>

#39510) * Update metrics.md (#38512) 1. there are 3 dashboards in the folder now. Refer to the folder instead of only 1 dashboard 2. include "Copy" since people need to copy this from the head node to the Grafana server Signed-off-by: Huaiwei Sun <[email protected]> * polish observability (o11y) docs (#39069) Signed-off-by: Huaiwei Sun <[email protected]> Co-authored-by: angelinalg <[email protected]> Co-authored-by: matthewdeng <[email protected]> * [Doc] Unbold "Use Cases" in sidebar (#39295) Signed-off-by: pdmurray <[email protected]> * [docs] Cleanup for other AIR concepts (#39400) * [doc][clusters] add doc for setting up Ray and K8s (#39408) --------- Signed-off-by: Huaiwei Sun <[email protected]> Signed-off-by: pdmurray <[email protected]> Co-authored-by: Huaiwei Sun <[email protected]> Co-authored-by: matthewdeng <[email protected]> Co-authored-by: Peyton Murray <[email protected]> Co-authored-by: Richard Liaw <[email protected]>

Signed-off-by: Jim Thompson <[email protected]>

Signed-off-by: Victor <[email protected]>

add doc for setting up Ray and K8s

1b59b50

Signed-off-by: angelinalg <[email protected]>

angelinalg requested review from architkulkarni, wuisawesome, DmitriGekhtman, maxpumperla, pcmoritz, kevin85421 and a team as code owners September 7, 2023 21:30

angelinalg added docs An issue or change related to documentation core-clusters For launching and managing Ray clusters/jobs/kubernetes v2.7.0-pick labels Sep 7, 2023

kevin85421 reviewed Sep 7, 2023

View reviewed changes

architkulkarni approved these changes Sep 7, 2023

View reviewed changes

updated diagrams and addressed some feedback

3690570

Signed-off-by: angelinalg <[email protected]>

richardliaw reviewed Sep 8, 2023

View reviewed changes

akshay-anyscale reviewed Sep 8, 2023

View reviewed changes

richardliaw added 4 commits September 7, 2023 17:12

update-text

bd184a1

update-text

efbea48

update

9355ce1

update

586d584

richardliaw approved these changes Sep 8, 2023

View reviewed changes

richardliaw and others added 3 commits September 7, 2023 17:45

up

888adb1

update diagrams

0da9c87

Signed-off-by: angelinalg <[email protected]>

Merge branch 'master' into dep-doc

9f20513

angelinalg force-pushed the dep-doc branch from 5cd4b38 to 9f20513 Compare September 8, 2023 00:56

richardliaw added 2 commits September 7, 2023 17:58

Merge branch 'dep-doc' of github.com:angelinalg/ray into dep-doc

e42afc9

rename

7839486

kevin85421 reviewed Sep 8, 2023

View reviewed changes

kevin85421 approved these changes Sep 8, 2023

View reviewed changes

angelinalg changed the title ~~[doc][clusters] add doc for setting up Ray and K8s~~ WIP [doc][clusters] add doc for setting up Ray and K8s Sep 8, 2023

angelinalg changed the title ~~WIP [doc][clusters] add doc for setting up Ray and K8s~~ [doc][clusters] add doc for setting up Ray and K8s Sep 8, 2023

angelinalg added 4 commits September 8, 2023 11:06

copy edit

75f14be

Signed-off-by: angelinalg <[email protected]>

address feedback and copy edit

c858fa6

Signed-off-by: angelinalg <[email protected]>

add to the User Guide index page

63e246e

Signed-off-by: angelinalg <[email protected]>

Merge branch 'master' into dep-doc

ebea6b0

richardliaw merged commit af192d8 into ray-project:master Sep 9, 2023
13 of 15 checks passed

angelinalg added a commit to angelinalg/ray that referenced this pull request Sep 9, 2023

[doc][clusters] add doc for setting up Ray and K8s (ray-project#39408)

1eda5e6

angelinalg mentioned this pull request Sep 9, 2023

[docs][docs infra] [clusters][cherry-pick] Docs cherry-picks for 2.7.0 #39510

Merged

8 tasks

jimthompson5802 pushed a commit to jimthompson5802/ray that referenced this pull request Sep 12, 2023

[doc][clusters] add doc for setting up Ray and K8s (ray-project#39408)

837298c

Signed-off-by: Jim Thompson <[email protected]>

vymao pushed a commit to vymao/ray that referenced this pull request Oct 11, 2023

[doc][clusters] add doc for setting up Ray and K8s (ray-project#39408)

6a9b4d6

Signed-off-by: Victor <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[doc][clusters] add doc for setting up Ray and K8s #39408

[doc][clusters] add doc for setting up Ray and K8s #39408

angelinalg commented Sep 7, 2023

kevin85421 Sep 7, 2023

kevin85421 Sep 7, 2023

richardliaw Sep 8, 2023

kevin85421 Sep 7, 2023

richardliaw Sep 8, 2023

kevin85421 Sep 7, 2023

richardliaw Sep 8, 2023

kevin85421 Sep 7, 2023

richardliaw Sep 8, 2023

kevin85421 Sep 7, 2023

architkulkarni left a comment

architkulkarni Sep 7, 2023

angelinalg Sep 8, 2023

architkulkarni Sep 7, 2023

architkulkarni Sep 7, 2023

angelinalg Sep 8, 2023

architkulkarni Sep 7, 2023

angelinalg Sep 8, 2023

richardliaw Sep 7, 2023

angelinalg Sep 8, 2023

richardliaw Sep 7, 2023

richardliaw Sep 7, 2023

angelinalg Sep 8, 2023

kevin85421 Sep 8, 2023

kevin85421 Sep 8, 2023

akshay-anyscale Sep 8, 2023

richardliaw Sep 8, 2023

richardliaw Sep 8, 2023

angelinalg Sep 8, 2023

richardliaw Sep 8, 2023

angelinalg Sep 8, 2023

kevin85421 left a comment


		# Set up a Ray + Kubernetes cluster

		This document contains recommendations for setting up a Ray + Kubernetes cluster for your organization.


		### Storage

		Use one of these two standard solutions for artifact and log storage during the development process:


		### Storage

		Reading and writing data and artifacts to cloud storage is the most reliable and observable option for production Ray deployments.


		Bake your code, remote, and local dependencies into a published Docker image for the workers. This is the most common way to deploy applications onto [Kubernetes](https://kube.academy/courses/building-applications-for-kubernetes).

		Using Cloud storage and the `runtime_env` is a less preferred method. In this case, use the runtime environment option to download zip files containing code and other private modules from cloud storage, in addition to specifying the pip packages needed to run your application.


		This document contains recommendations for setting up a Ray + Kubernetes cluster for your organization.

		When you set up Ray on Kubernetes, the KubeRay documentation provides an overview of how to configure the operator to execute and manage the Ray cluster lifecycle. This guide complements the KubeRay documentation by providing best practices for effectively using Ray deployments in your organization.

		\| Artifact Storage \| Set up an EFS \| Cloud storage (S3, GS) \|
		\| Package Dependencies \| Install onto NFS <br /> or <br /> Use runtime environments \| Bake into docker image \|


		## Production

		For production, we suggest the following configuration.


		### Code and Dependencies

		Bake your code, remote, and local dependencies into a published Docker image for the workers. This is the most common way to deploy applications onto [Kubernetes](https://kube.academy/courses/building-applications-for-kubernetes).

[doc][clusters] add doc for setting up Ray and K8s #39408

[doc][clusters] add doc for setting up Ray and K8s #39408

Conversation

angelinalg commented Sep 7, 2023

Checks

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

architkulkarni left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kevin85421 left a comment

Choose a reason for hiding this comment