[Serve][Doc] Update the doc code to use new api (ray-project#27689)

Co-authored-by: Archit Kulkarni <[email protected]> Signed-off-by: Stefan van der Kleij <[email protected]>
Stefan-1313 · Aug 18, 2022 · 6e2111f · 6e2111f
1 parent 8eeafe6
commit 6e2111f
Show file tree

Hide file tree

Showing 17 changed files with 258 additions and 215 deletions.
diff --git a/doc/source/serve/deploying-serve.md b/doc/source/serve/deploying-serve.md
@@ -15,21 +15,15 @@ This section should help you:
 
 ## Lifetime of a Ray Serve Instance
 
-Ray Serve instances run on top of Ray clusters and are started using {mod}`serve.start <ray.serve.start>`.
-Once {mod}`serve.start <ray.serve.start>` has been called, further API calls can be used to create and update the deployments that will be used to serve your Python code (including ML models).
+Ray Serve instances run on top of Ray clusters and are started using {mod}`serve.run <ray.serve.run>`.
+Once {mod}`serve.run <ray.serve.run>` is called, a Serve instance is created automatically.
 The Serve instance will be torn down when the script exits.
 
-When running on a long-lived Ray cluster (e.g., one started using `ray start`),
-you can also deploy a Ray Serve instance as a long-running
-service using `serve.start(detached=True)`. In this case, the Serve instance will continue to
-run on the Ray cluster even after the script that calls it exits. If you want to run another script
-to update the Serve instance, you can run another script that connects to the same Ray cluster and makes further API calls (e.g., to create, update, or delete a deployment). Note that there can only be one detached Serve instance on each Ray cluster.
-
 :::{note}
 All Serve actors– including the Serve controller, the HTTP proxies, and the deployment replicas– run in the `"serve"` namespace, even if the Ray driver namespace is different.
 :::
 
-If `serve.start()` is called again in a process in which there is already a running Serve instance, Serve will re-connect to the existing instance (regardless of whether the original instance was detached or not). To reconnect to a Serve instance that exists in the Ray cluster but not in the current process, connect to the cluster and run `serve.start()`.
+If `serve.run()` is called again in a process in which there is already a running Serve instance, Serve will re-connect to the existing instance (regardless of whether the original instance was detached or not). To reconnect to a Serve instance that exists in the Ray cluster but not in the current process, connect to the cluster with `ray.init(address=...)` and run `serve.run()`.
 
 ## Deploying on a Single Node
 
@@ -39,24 +33,10 @@ In general, **Option 2 is recommended for most users** because it allows you to
 
 1. Start Ray and deploy with Ray Serve all in a single Python file.
 
-```python
-import ray
-from ray import serve
-import time
-
-# This will start Ray locally and start Serve on top of it.
-serve.start()
-
-@serve.deployment
-def my_func(request):
- return "hello"
-
-my_func.deploy()
-
-# Serve will be shut down once the script exits, so keep it alive manually.
-while True:
- time.sleep(5)
- print(serve.list_deployments())
+```{literalinclude} ../serve/doc_code/deploying_serve_example.py
+:start-after: __deploy_in_single_file_1_start__
+:end-before: __deploy_in_single_file_1_end__
+:language: python
 ```
 
 2. First running `ray start --head` on the machine, then connecting to the running local Ray cluster using `ray.init(address="auto")` in your Serve script(s). You can run multiple scripts to update your deployments over time.
@@ -66,18 +46,10 @@ ray start --head # Start local Ray cluster.
 serve start # Start Serve on the local Ray cluster.
 ```
 
-```python
-import ray
-from ray import serve
-
-# This will connect to the running Ray cluster.
-ray.init(address="auto", namespace="serve")
-
-@serve.deployment
-def my_func(request):
- return "hello"
-
-my_func.deploy()
+```{literalinclude} ../serve/doc_code/deploying_serve_example.py
+:start-after: __deploy_in_single_file_2_start__
+:end-before: __deploy_in_single_file_2_end__
+:language: python
 ```
 
 (deploying-serve-on-kubernetes)=
@@ -168,21 +140,10 @@ $ kubectl -n ray describe service ray-head
 
 With the cluster now running, we can run a simple script to start Ray Serve and deploy a "hello world" deployment:
 
-> ```python
-> import ray
-> from ray import serve
->
-> # Connect to the running Ray cluster.
-> ray.init(address="auto")
-> # Bind on 0.0.0.0 to expose the HTTP server on external IPs.
-> serve.start(detached=True, http_options={"host": "0.0.0.0"})
->
->
-> @serve.deployment(route_prefix="/hello")
-> def hello(request):
-> return "hello world"
->
-> hello.deploy()
+> ```{literalinclude} ../serve/doc_code/deploying_serve_example.py
+> :start-after: __deploy_in_k8s_start__
+> :end-before: __deploy_in_k8s_end__
+> :language: python
 > ```
 
 Save this script locally as `deploy.py` and run it on the head node using `ray submit`:

diff --git a/doc/source/serve/doc_code/deploying_serve_example.py b/doc/source/serve/doc_code/deploying_serve_example.py
@@ -0,0 +1,52 @@
+import subprocess
+
+# __deploy_in_single_file_1_start__
+import ray
+from ray import serve
+
+
+@serve.deployment
+def my_func(request):
+ return "hello"
+
+
+serve.run(my_func.bind())
+# __deploy_in_single_file_1_end__
+
+serve.shutdown()
+ray.shutdown()
+subprocess.check_output(["ray", "stop", "--force"])
+subprocess.check_output(["ray", "start", "--head"])
+
+# __deploy_in_single_file_2_start__
+# This will connect to the running Ray cluster.
+ray.init(address="auto", namespace="serve")
+
+
+@serve.deployment
+def my_func(request):
+ return "hello"
+
+
+serve.run(my_func.bind())
+# __deploy_in_single_file_2_end__
+
+serve.shutdown()
+ray.shutdown()
+subprocess.check_output(["ray", "stop", "--force"])
+subprocess.check_output(["ray", "start", "--head"])
+
+# __deploy_in_k8s_start__
+# Connect to the running Ray cluster.
+ray.init(address="auto")
+
+
+@serve.deployment(route_prefix="/hello")
+def hello(request):
+ return "hello world"
+
+
+serve.run(hello.bind())
+# __deploy_in_k8s_end__
+
+subprocess.check_output(["ray", "stop", "--force"])
diff --git a/doc/source/serve/doc_code/deployment_graph_dag_http.py b/doc/source/serve/doc_code/deployment_graph_dag_http.py
@@ -7,10 +7,6 @@
 from ray.dag.input_node import InputNode
 
 
-ray.init()
-serve.start()
-
-
 class ModelInputData(BaseModel):
  model_input1: int
  model_input2: str

diff --git a/doc/source/serve/doc_code/fastapi_example.py b/doc/source/serve/doc_code/fastapi_example.py
@@ -16,8 +16,7 @@ def say_hello(self, name: str) -> str:
 
 
 # 2: Deploy the deployment.
-serve.start()
-FastAPIDeployment.deploy()
+serve.run(FastAPIDeployment.bind())
 
 # 3: Query the deployment and print the result.
 print(requests.get("https://localhost:8000/hello", params={"name": "Theodore"}).json())

diff --git a/doc/source/serve/doc_code/handle_guide.py b/doc/source/serve/doc_code/handle_guide.py
@@ -0,0 +1,44 @@
+import ray
+from ray import serve
+import requests
+
+
+# __basic_example_start__
+@serve.deployment
+class Deployment:
+ def method1(self, arg):
+ return f"Method1: {arg}"
+
+ def __call__(self, arg):
+ return f"__call__: {arg}"
+
+
+handle = serve.run(Deployment.bind())
+
+ray.get(handle.remote("hi")) # Defaults to calling the __call__ method.
+ray.get(handle.method1.remote("hi")) # Call a different method.
+# __basic_example_end__
+
+
+# __async_handle_start__
+@serve.deployment(route_prefix="/api")
+class Deployment:
+ def say_hello(self, name: str):
+ return f"Hello {name}!"
+
+ def __call__(self, request):
+ return self.say_hello(request.query_params["name"])
+
+
+handle = serve.run(Deployment.bind())
+
+# __async_handle_end__
+
+
+# __async_handle_print_start__
+print(requests.get("https://localhost:8000/api?name=Alice"))
+# Hello Alice!
+
+print(ray.get(handle.say_hello.remote("Alice")))
+# Hello Alice!
+# __async_handle_print_end__
diff --git a/doc/source/serve/doc_code/managing_deployments.py b/doc/source/serve/doc_code/managing_deployments.py
@@ -0,0 +1,79 @@
+from ray import serve
+import time
+import os
+
+
+# __updating_a_deployment_start__
+@serve.deployment(name="my_deployment", num_replicas=1)
+class SimpleDeployment:
+ pass
+
+
+# Creates one initial replica.
+serve.run(SimpleDeployment.bind())
+
+
+# Re-deploys, creating an additional replica.
+# This could be the SAME Python script, modified and re-run.
+@serve.deployment(name="my_deployment", num_replicas=2)
+class SimpleDeployment:
+ pass
+
+
+serve.run(SimpleDeployment.bind())
+
+# You can also use Deployment.options() to change options without redefining
+# the class. This is useful for programmatically updating deployments.
+serve.run(SimpleDeployment.options(num_replicas=2).bind())
+# __updating_a_deployment_end__
+
+
+# __scaling_out_start__
+# Create with a single replica.
+@serve.deployment(num_replicas=1)
+def func(*args):
+ pass
+
+
+serve.run(func.bind())
+
+# Scale up to 3 replicas.
+serve.run(func.options(num_replicas=3).bind())
+
+# Scale back down to 1 replica.
+serve.run(func.options(num_replicas=1).bind())
+# __scaling_out_end__
+
+
+# __autoscaling_start__
+@serve.deployment(
+ autoscaling_config={
+ "min_replicas": 1,
+ "max_replicas": 5,
+ "target_num_ongoing_requests_per_replica": 10,
+ }
+)
+def func(_):
+ time.sleep(1)
+ return ""
+
+
+serve.run(
+ func.bind()
+) # The func deployment will now autoscale based on requests demand.
+# __autoscaling_end__
+
+
+# __configure_parallism_start__
+@serve.deployment
+class MyDeployment:
+ def __init__(self, parallelism: str):
+ os.environ["OMP_NUM_THREADS"] = parallelism
+ # Download model weights, initialize model, etc.
+
+ def __call__(self):
+ pass
+
+
+serve.run(MyDeployment.bind("12"))
+# __configure_parallism_end__
diff --git a/doc/source/serve/doc_code/ml_models_examples.py b/doc/source/serve/doc_code/ml_models_examples.py
@@ -0,0 +1,24 @@
+from ray import serve
+from typing import List, Dict, Any
+
+
+# __batch_example_start__
+@serve.deployment(route_prefix="/increment")
+class BatchingExample:
+ def __init__(self):
+ self.count = 0
+
+ @serve.batch
+ async def handle_batch(self, requests: List[Any]) -> List[Dict]:
+ responses = []
+ for request in requests:
+ responses.append(request.json())
+
+ return responses
+
+ async def __call__(self, request) -> List[Dict]:
+ return await self.handle_batch(request)
+
+
+serve.run(BatchingExample.bind())
+# __batch_example_end__
diff --git a/doc/source/serve/doc_code/quickstart.py b/doc/source/serve/doc_code/quickstart.py
@@ -14,8 +14,7 @@ def __call__(self, request):
 
 
 # 2: Deploy the model.
-serve.start()
-MyModelDeployment.deploy(msg="Hello world!")
+serve.run(MyModelDeployment.bind(msg="Hello world!"))
 
 # 3: Query the deployment and print the result.
 print(requests.get("https://localhost:8000/").json())

diff --git a/doc/source/serve/doc_code/sklearn_quickstart.py b/doc/source/serve/doc_code/sklearn_quickstart.py
@@ -9,7 +9,6 @@
 
 from ray import serve
 
-serve.start()
 
 # Train model.
 iris_dataset = load_iris()
@@ -33,7 +32,7 @@ async def __call__(self, request):
 
 
 # Deploy model.
-BoostingModel.deploy(model)
+serve.run(BoostingModel.bind(model))
 
 # Query it!
 sample_request_input = {"vector": [1.2, 1.0, 1.1, 0.9]}

diff --git a/doc/source/serve/doc_code/transformers_example.py b/doc/source/serve/doc_code/transformers_example.py
@@ -14,8 +14,8 @@ def __call__(self, request):
 
 
 # 2: Deploy the deployment.
-serve.start()
-SentimentAnalysisDeployment.deploy()
+
+serve.run(SentimentAnalysisDeployment.bind())
 
 # 3: Query the deployment and print the result.
 print(