Skip to content

Commit

Permalink
[Serve][Doc] Update the doc code to use new api (ray-project#27689)
Browse files Browse the repository at this point in the history
Co-authored-by: Archit Kulkarni <[email protected]>
Signed-off-by: Stefan van der Kleij <[email protected]>
  • Loading branch information
2 people authored and Stefan van der Kleij committed Aug 18, 2022
1 parent 8eeafe6 commit 6e2111f
Show file tree
Hide file tree
Showing 17 changed files with 258 additions and 215 deletions.
69 changes: 15 additions & 54 deletions doc/source/serve/deploying-serve.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,21 +15,15 @@ This section should help you:

## Lifetime of a Ray Serve Instance

Ray Serve instances run on top of Ray clusters and are started using {mod}`serve.start <ray.serve.start>`.
Once {mod}`serve.start <ray.serve.start>` has been called, further API calls can be used to create and update the deployments that will be used to serve your Python code (including ML models).
Ray Serve instances run on top of Ray clusters and are started using {mod}`serve.run <ray.serve.run>`.
Once {mod}`serve.run <ray.serve.run>` is called, a Serve instance is created automatically.
The Serve instance will be torn down when the script exits.

When running on a long-lived Ray cluster (e.g., one started using `ray start`),
you can also deploy a Ray Serve instance as a long-running
service using `serve.start(detached=True)`. In this case, the Serve instance will continue to
run on the Ray cluster even after the script that calls it exits. If you want to run another script
to update the Serve instance, you can run another script that connects to the same Ray cluster and makes further API calls (e.g., to create, update, or delete a deployment). Note that there can only be one detached Serve instance on each Ray cluster.

:::{note}
All Serve actors– including the Serve controller, the HTTP proxies, and the deployment replicas– run in the `"serve"` namespace, even if the Ray driver namespace is different.
:::

If `serve.start()` is called again in a process in which there is already a running Serve instance, Serve will re-connect to the existing instance (regardless of whether the original instance was detached or not). To reconnect to a Serve instance that exists in the Ray cluster but not in the current process, connect to the cluster and run `serve.start()`.
If `serve.run()` is called again in a process in which there is already a running Serve instance, Serve will re-connect to the existing instance (regardless of whether the original instance was detached or not). To reconnect to a Serve instance that exists in the Ray cluster but not in the current process, connect to the cluster with `ray.init(address=...)` and run `serve.run()`.

## Deploying on a Single Node

Expand All @@ -39,24 +33,10 @@ In general, **Option 2 is recommended for most users** because it allows you to

1. Start Ray and deploy with Ray Serve all in a single Python file.

```python
import ray
from ray import serve
import time

# This will start Ray locally and start Serve on top of it.
serve.start()

@serve.deployment
def my_func(request):
return "hello"

my_func.deploy()

# Serve will be shut down once the script exits, so keep it alive manually.
while True:
time.sleep(5)
print(serve.list_deployments())
```{literalinclude} ../serve/doc_code/deploying_serve_example.py
:start-after: __deploy_in_single_file_1_start__
:end-before: __deploy_in_single_file_1_end__
:language: python
```

2. First running `ray start --head` on the machine, then connecting to the running local Ray cluster using `ray.init(address="auto")` in your Serve script(s). You can run multiple scripts to update your deployments over time.
Expand All @@ -66,18 +46,10 @@ ray start --head # Start local Ray cluster.
serve start # Start Serve on the local Ray cluster.
```

```python
import ray
from ray import serve

# This will connect to the running Ray cluster.
ray.init(address="auto", namespace="serve")

@serve.deployment
def my_func(request):
return "hello"

my_func.deploy()
```{literalinclude} ../serve/doc_code/deploying_serve_example.py
:start-after: __deploy_in_single_file_2_start__
:end-before: __deploy_in_single_file_2_end__
:language: python
```

(deploying-serve-on-kubernetes)=
Expand Down Expand Up @@ -168,21 +140,10 @@ $ kubectl -n ray describe service ray-head

With the cluster now running, we can run a simple script to start Ray Serve and deploy a "hello world" deployment:

> ```python
> import ray
> from ray import serve
>
> # Connect to the running Ray cluster.
> ray.init(address="auto")
> # Bind on 0.0.0.0 to expose the HTTP server on external IPs.
> serve.start(detached=True, http_options={"host": "0.0.0.0"})
>
>
> @serve.deployment(route_prefix="/hello")
> def hello(request):
> return "hello world"
>
> hello.deploy()
> ```{literalinclude} ../serve/doc_code/deploying_serve_example.py
> :start-after: __deploy_in_k8s_start__
> :end-before: __deploy_in_k8s_end__
> :language: python
> ```
Save this script locally as `deploy.py` and run it on the head node using `ray submit`:
Expand Down
52 changes: 52 additions & 0 deletions doc/source/serve/doc_code/deploying_serve_example.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
import subprocess

# __deploy_in_single_file_1_start__
import ray
from ray import serve


@serve.deployment
def my_func(request):
return "hello"


serve.run(my_func.bind())
# __deploy_in_single_file_1_end__

serve.shutdown()
ray.shutdown()
subprocess.check_output(["ray", "stop", "--force"])
subprocess.check_output(["ray", "start", "--head"])

# __deploy_in_single_file_2_start__
# This will connect to the running Ray cluster.
ray.init(address="auto", namespace="serve")


@serve.deployment
def my_func(request):
return "hello"


serve.run(my_func.bind())
# __deploy_in_single_file_2_end__

serve.shutdown()
ray.shutdown()
subprocess.check_output(["ray", "stop", "--force"])
subprocess.check_output(["ray", "start", "--head"])

# __deploy_in_k8s_start__
# Connect to the running Ray cluster.
ray.init(address="auto")


@serve.deployment(route_prefix="/hello")
def hello(request):
return "hello world"


serve.run(hello.bind())
# __deploy_in_k8s_end__

subprocess.check_output(["ray", "stop", "--force"])
4 changes: 0 additions & 4 deletions doc/source/serve/doc_code/deployment_graph_dag_http.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,6 @@
from ray.dag.input_node import InputNode


ray.init()
serve.start()


class ModelInputData(BaseModel):
model_input1: int
model_input2: str
Expand Down
3 changes: 1 addition & 2 deletions doc/source/serve/doc_code/fastapi_example.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,7 @@ def say_hello(self, name: str) -> str:


# 2: Deploy the deployment.
serve.start()
FastAPIDeployment.deploy()
serve.run(FastAPIDeployment.bind())

# 3: Query the deployment and print the result.
print(requests.get("https://localhost:8000/hello", params={"name": "Theodore"}).json())
Expand Down
44 changes: 44 additions & 0 deletions doc/source/serve/doc_code/handle_guide.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
import ray
from ray import serve
import requests


# __basic_example_start__
@serve.deployment
class Deployment:
def method1(self, arg):
return f"Method1: {arg}"

def __call__(self, arg):
return f"__call__: {arg}"


handle = serve.run(Deployment.bind())

ray.get(handle.remote("hi")) # Defaults to calling the __call__ method.
ray.get(handle.method1.remote("hi")) # Call a different method.
# __basic_example_end__


# __async_handle_start__
@serve.deployment(route_prefix="/api")
class Deployment:
def say_hello(self, name: str):
return f"Hello {name}!"

def __call__(self, request):
return self.say_hello(request.query_params["name"])


handle = serve.run(Deployment.bind())

# __async_handle_end__


# __async_handle_print_start__
print(requests.get("https://localhost:8000/api?name=Alice"))
# Hello Alice!

print(ray.get(handle.say_hello.remote("Alice")))
# Hello Alice!
# __async_handle_print_end__
79 changes: 79 additions & 0 deletions doc/source/serve/doc_code/managing_deployments.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
from ray import serve
import time
import os


# __updating_a_deployment_start__
@serve.deployment(name="my_deployment", num_replicas=1)
class SimpleDeployment:
pass


# Creates one initial replica.
serve.run(SimpleDeployment.bind())


# Re-deploys, creating an additional replica.
# This could be the SAME Python script, modified and re-run.
@serve.deployment(name="my_deployment", num_replicas=2)
class SimpleDeployment:
pass


serve.run(SimpleDeployment.bind())

# You can also use Deployment.options() to change options without redefining
# the class. This is useful for programmatically updating deployments.
serve.run(SimpleDeployment.options(num_replicas=2).bind())
# __updating_a_deployment_end__


# __scaling_out_start__
# Create with a single replica.
@serve.deployment(num_replicas=1)
def func(*args):
pass


serve.run(func.bind())

# Scale up to 3 replicas.
serve.run(func.options(num_replicas=3).bind())

# Scale back down to 1 replica.
serve.run(func.options(num_replicas=1).bind())
# __scaling_out_end__


# __autoscaling_start__
@serve.deployment(
autoscaling_config={
"min_replicas": 1,
"max_replicas": 5,
"target_num_ongoing_requests_per_replica": 10,
}
)
def func(_):
time.sleep(1)
return ""


serve.run(
func.bind()
) # The func deployment will now autoscale based on requests demand.
# __autoscaling_end__


# __configure_parallism_start__
@serve.deployment
class MyDeployment:
def __init__(self, parallelism: str):
os.environ["OMP_NUM_THREADS"] = parallelism
# Download model weights, initialize model, etc.

def __call__(self):
pass


serve.run(MyDeployment.bind("12"))
# __configure_parallism_end__
24 changes: 24 additions & 0 deletions doc/source/serve/doc_code/ml_models_examples.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
from ray import serve
from typing import List, Dict, Any


# __batch_example_start__
@serve.deployment(route_prefix="/increment")
class BatchingExample:
def __init__(self):
self.count = 0

@serve.batch
async def handle_batch(self, requests: List[Any]) -> List[Dict]:
responses = []
for request in requests:
responses.append(request.json())

return responses

async def __call__(self, request) -> List[Dict]:
return await self.handle_batch(request)


serve.run(BatchingExample.bind())
# __batch_example_end__
3 changes: 1 addition & 2 deletions doc/source/serve/doc_code/quickstart.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,7 @@ def __call__(self, request):


# 2: Deploy the model.
serve.start()
MyModelDeployment.deploy(msg="Hello world!")
serve.run(MyModelDeployment.bind(msg="Hello world!"))

# 3: Query the deployment and print the result.
print(requests.get("https://localhost:8000/").json())
Expand Down
3 changes: 1 addition & 2 deletions doc/source/serve/doc_code/sklearn_quickstart.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,6 @@

from ray import serve

serve.start()

# Train model.
iris_dataset = load_iris()
Expand All @@ -33,7 +32,7 @@ async def __call__(self, request):


# Deploy model.
BoostingModel.deploy(model)
serve.run(BoostingModel.bind(model))

# Query it!
sample_request_input = {"vector": [1.2, 1.0, 1.1, 0.9]}
Expand Down
4 changes: 2 additions & 2 deletions doc/source/serve/doc_code/transformers_example.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,8 @@ def __call__(self, request):


# 2: Deploy the deployment.
serve.start()
SentimentAnalysisDeployment.deploy()

serve.run(SentimentAnalysisDeployment.bind())

# 3: Query the deployment and print the result.
print(
Expand Down
Loading

0 comments on commit 6e2111f

Please sign in to comment.