[docs] create advanced guides directory; rename files 1/N (ray-projec…

…t#36297) Implement new directory structure per plan. This is one of many to come. I wanted to start with just the directory structure changes and get that merged asap to head off merge conflicts. @edoakes, @akshay-anyscale and I discussed removing the User Guides directory. With the restructuring, the number of non-Advanced guides was only 5-6 guides and we thought they would be more discoverable at a higher level. Take a look and let us know what you think, please.
krfricke · Jun 14, 2023 · 5fa92cc · 5fa92cc
1 parent 6f99d16
commit 5fa92cc
Show file tree

Hide file tree

Showing 32 changed files with 622 additions and 636 deletions.
diff --git a/doc/source/_static/js/custom.js b/doc/source/_static/js/custom.js
@@ -45,6 +45,8 @@ document.addEventListener("DOMContentLoaded", function() {
  "Ray Train", "Ray Train API",
  "Ray Tune", "Ray Tune Examples", "Ray Tune API",
  "Ray Serve", "Ray Serve API",
+ "Production Guide", "Advanced Guides",
+ "Deploy Many Models",
  "Ray RLlib", "Ray RLlib API",
  "More Libraries", "Ray Workflows (Alpha)",
  "Monitoring and Debugging",

diff --git a/doc/source/_toc.yml b/doc/source/_toc.yml
@@ -256,27 +256,34 @@ parts:
  sections:
  - file: serve/getting_started
  - file: serve/key-concepts
- - file: serve/user-guide
+ - file: serve/model_composition
+ - file: serve/deploy-many-models/index
  sections:
- - file: serve/http-guide
- - file: serve/scaling-and-resource-allocation
- - file: serve/model_composition
- - file: serve/dev-workflow
- - file: serve/app-builder-guide
- - file: serve/multi-app
- - file: serve/production-guide/index
- sections:
- - file: serve/production-guide/config
- - file: serve/production-guide/deploy-vm
- - file: serve/production-guide/kubernetes
- - file: serve/production-guide/monitoring
- - file: serve/production-guide/fault-tolerance
- - file: serve/performance
- - file: serve/handling-dependencies
- - file: serve/managing-java-deployments
- - file: serve/migration
- - file: serve/direct-ingress
- - file: serve/model-multiplexing
+ - file: serve/deploy-many-models/multi-app
+ - file: serve/deploy-many-models/model-multiplexing
+ - file: serve/http-guide
+ - file: serve/production-guide/index
+ title: Production Guide
+ sections:
+ - file: serve/production-guide/config
+ - file: serve/production-guide/kubernetes
+ - file: serve/production-guide/fault-tolerance
+ - file: serve/production-guide/handling-dependencies
+ - file: serve/production-guide/best-practices
+ - file: serve/monitoring
+ - file: serve/scaling-and-resource-allocation
+ - file: serve/advanced-guides/index
+ sections:
+ - file: serve/advanced-guides/app-builder-guide
+ - file: serve/advanced-guides/performance
+ - file: serve/advanced-guides/dyn-req-batch
+ - file: serve/advanced-guides/inplace-updates
+ - file: serve/advanced-guides/dev-workflow
+ - file: serve/advanced-guides/deployment-graphs
+ - file: serve/advanced-guides/direct-ingress
+ - file: serve/advanced-guides/managing-java-deployments
+ - file: serve/advanced-guides/migration
+ - file: serve/advanced-guides/deploy-vm
  - file: serve/architecture
  - file: serve/tutorials/index
  sections:

diff --git a/doc/source/serve/app-builder-guide.md → ...erve/advanced-guides/app-builder-guide.md b/doc/source/serve/app-builder-guide.md → ...erve/advanced-guides/app-builder-guide.md
@@ -1,4 +1,5 @@
-# Passing Arguments to Applications
+(serve-app-builder-guide)=
+# Pass Arguments to Applications
 
 This section describes how to pass arguments to your applications using an application builder function.
 
@@ -11,7 +12,7 @@ This pattern allows you to be configure deployments using ordinary Python code b
 
 To pass arguments without changing the code, define an "application builder" function that takes an arguments dictionary (or [Pydantic object](typed-app-builders)) and returns the built application to be run.
 
-```{literalinclude} ../serve/doc_code/app_builder.py
+```{literalinclude} ../doc_code/app_builder.py
 :start-after: __begin_untyped_builder__
 :end-before: __end_untyped_builder__
 :language: python
@@ -79,7 +80,7 @@ Notice that the "Hello from config" message is printed from within the deploymen
 To avoid writing logic to parse and validate the arguments by hand, define a [Pydantic model](https://pydantic-docs.helpmanual.io/usage/models/) as the single input parameter's type to your application builder function (the parameter must be type annotated).
 Arguments are passed the same way, but the resulting dictionary is used to construct the Pydantic model using `model.parse_obj(args_dict)`.
 
-```{literalinclude} ../serve/doc_code/app_builder.py
+```{literalinclude} ../doc_code/app_builder.py
 :start-after: __begin_typed_builder__
 :end-before: __end_typed_builder__
 :language: python
@@ -125,7 +126,7 @@ applications:
 You can use the arguments passed to an application builder to configure multiple deployments in a single application.
 For example a model composition application might take weights to two different models as follows:
 
-```{literalinclude} ../serve/doc_code/app_builder.py
+```{literalinclude} ../doc_code/app_builder.py
 :start-after: __begin_composed_builder__
 :end-before: __end_composed_builder__
 :language: python

diff --git a/...ource/serve/production-guide/deploy-vm.md → ...source/serve/advanced-guides/deploy-vm.md b/...ource/serve/production-guide/deploy-vm.md → ...source/serve/advanced-guides/deploy-vm.md
@@ -1,6 +1,6 @@
 (serve-in-production-deploying)=
 
-# Deploying on VMs
+# Deploy on VM
 
 You can deploy your Serve application to production on a Ray cluster using the Ray Serve CLI.
 `serve deploy` takes in a config file path and it deploys that file to a Ray cluster over HTTP.
@@ -188,134 +188,3 @@ deployment_statuses:
 ```
 
 `serve status` can also be used with KubeRay ({ref}`kuberay-index`), a Kubernetes operator for Ray Serve, to help deploy your Serve applications with Kubernetes. There's also work in progress to provide closer integrations between some of the features from this document, like `serve status`, with Kubernetes to provide a clearer Serve deployment story.
-
-(serve-in-production-updating)=
-
-## Updating the Serve application
-
-You can update your Serve applications once they're in production by updating the settings in your config file and redeploying it using the `serve deploy` command. In the redeployed config file, you can add new deployment settings or remove old deployment settings. This is because `serve deploy` is **idempotent**, meaning your Serve application's config always matches (or honors) the latest config you deployed successfully – regardless of what config files you deployed before that.
-
-(serve-in-production-lightweight-update)=
-
-### Lightweight Config Updates
-
-Lightweight config updates modify running deployment replicas without tearing them down and restarting them, so there's less downtime as the deployments update. For each deployment, modifying `num_replicas`, `autoscaling_config`, and/or `user_config` is considered a lightweight config update, and won't tear down the replicas for that deployment.
-
-:::{note}
-Lightweight config updates are only possible for deployments that are included as entries under `deployments` in the config file. If a deployment is not included in the config file, replicas of that deployment will be torn down and brought up again each time you redeploy with `serve deploy`.
-:::
-
-#### Updating User Config
-Let's use the `FruitStand` deployment graph [from an earlier section](fruit-config-yaml) as an example. All the individual fruit deployments contain a `reconfigure()` method. This method allows us to issue lightweight updates to our deployments by updating the `user_config`.
-
-First let's deploy the graph. Make sure to stop any previous Ray cluster using the CLI command `ray stop` for this example:
-
-```console
-$ ray start --head
-$ serve deploy fruit_config.yaml
-...
-
-$ python
-
->>> import requests
->>> requests.post("https://localhost:8000/", json=["MANGO", 2]).json()
-
-6
-```
-
-Now, let's update the price of mangos in our deployment. We can change the `price` attribute in the `MangoStand` deployment to `5` in our config file:
-
-```yaml
-import_path: fruit:deployment_graph
-
-runtime_env: {}
-
-deployments:
-
-- name: MangoStand
- num_replicas: 2
- route_prefix: null
- max_concurrent_queries: 100
- user_config:
- # price: 3 (Outdated price)
- price: 5
- autoscaling_config: null
- graceful_shutdown_wait_loop_s: 2.0
- graceful_shutdown_timeout_s: 20.0
- health_check_period_s: 10.0
- health_check_timeout_s: 30.0
- ray_actor_options: null
-
-...
-```
-
-Without stopping the Ray cluster, we can redeploy our graph using `serve deploy`:
-
-```console
-$ serve deploy fruit_config.yaml
-...
-```
-
-We can inspect our deployments with `serve status`. Once the `app_status`'s `status` returns to `"RUNNING"`, we can try our requests one more time:
-
-```console
-$ serve status
-app_status:
- status: RUNNING
- message: ''
- deployment_timestamp: 1655776483.457707
-deployment_statuses:
-- name: MangoStand
- status: HEALTHY
- message: ''
-- name: OrangeStand
- status: HEALTHY
- message: ''
-- name: PearStand
- status: HEALTHY
- message: ''
-- name: FruitMarket
- status: HEALTHY
- message: ''
-- name: DAGDriver
- status: HEALTHY
- message: ''
-
-$ python
-
->>> import requests
->>> requests.post("https://localhost:8000/", json=["MANGO", 2]).json()
-
-10
-```
-
-The price has updated! The same request now returns `10` instead of `6`, reflecting the new price.
-
-### Code Updates
-
-Similarly, you can update any other setting in any deployment in the config file. If a deployment setting other than `num_replicas`, `autoscaling_config`, or `user_config` is changed, it is considered a code update, and the deployment replicas will be restarted. Note that the following modifications are all considered "changes", and will trigger tear down of replicas:
-* changing an existing setting
-* adding an override setting that was previously not present in the config file
-* removing a setting from the config file
-
-Note also that changing `import_path` or `runtime_env` is considered a code update for all deployments, and will tear down all running deployments and restart them.
-
-:::{warning}
-Although you can update your Serve application by deploying an entirely new deployment graph using a different `import_path` and a different `runtime_env`, this is NOT recommended in production.
-
-The best practice for large-scale code updates is to start a new Ray cluster, deploy the updated code to it using `serve deploy`, and then switch traffic from your old cluster to the new one.
-:::
-
-## Best practices
-
-This section summarizes the best practices when deploying to production using the Serve CLI:
-
-* Use `serve run` to manually test and improve your deployment graph locally.
-* Use `serve build` to create a Serve config file for your deployment graph.
- * Put your deployment graph's code in a remote repository and manually configure the `working_dir` or `py_modules` fields in your Serve config file's `runtime_env` to point to that repository.
-* Use `serve deploy` to deploy your graph and its deployments to your Ray cluster. After the deployment is finished, you can start serving traffic from your cluster.
-* Use `serve status` to track your Serve application's health and deployment progress.
-* Use `serve config` to check the latest config that your Serve application received. This is its goal state.
-* Make lightweight configuration updates (e.g. `num_replicas` or `user_config` changes) by modifying your Serve config file and redeploying it with `serve deploy`.
-* Make heavyweight code updates (e.g. `runtime_env` changes) by starting a new Ray cluster, updating your Serve config file, and deploying the file with `serve deploy` to the new cluster. Once the new deployment is finished, switch your traffic to the new cluster.
-