Skip to content

Commit

Permalink
[Serve] [Doc] Update intro page (ray-project#27735)
Browse files Browse the repository at this point in the history
Signed-off-by: Stefan van der Kleij <[email protected]>
  • Loading branch information
simon-mo authored and Stefan van der Kleij committed Aug 18, 2022
1 parent 8933efe commit 2bd1f46
Show file tree
Hide file tree
Showing 6 changed files with 130 additions and 105 deletions.
1 change: 0 additions & 1 deletion doc/source/_toc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -192,7 +192,6 @@ parts:
- file: serve/tutorials/deployment-graph-patterns/linear_pipeline
- file: serve/tutorials/deployment-graph-patterns/branching_input
- file: serve/tutorials/deployment-graph-patterns/conditional
- file: serve/faq
- file: serve/package-ref

- file: rllib/index
Expand Down
1 change: 0 additions & 1 deletion doc/source/ray-references/faq.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,6 @@ FAQ
:caption: Frequently Asked Questions

./../tune/faq.rst
./../serve/faq.rst


Further Questions or Issues?
Expand Down
35 changes: 35 additions & 0 deletions doc/source/serve/doc_code/quickstart_graph.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
import requests
from ray import serve
from ray.serve.drivers import DAGDriver
from ray.serve.dag import InputNode
from ray.serve.http_adapters import json_request


# 1. Define the models in our composition graph
@serve.deployment
class Adder:
def __init__(self, increment: int):
self.increment = increment

def predict(self, inp: int):
return self.increment + inp


@serve.deployment
def combine_average(*input_values) -> float:
return {"result": sum(input_values) / len(input_values)}


# 2: Define the model composition graph and call it.
with InputNode() as input_node:
adder_1 = Adder.bind(increment=1)
adder_2 = Adder.bind(increment=2)
dag = combine_average.bind(
adder_1.predict.bind(input_node), adder_2.predict.bind(input_node)
)

serve.run(DAGDriver.bind(dag, http_adapter=json_request))

# 3: Query the deployment and print the result.
print(requests.post("https://localhost:8000/", json=100).json())
# {"result": 101.5}
1 change: 0 additions & 1 deletion doc/source/serve/doc_code/transformers_example.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,6 @@ def __call__(self, request):


# 2: Deploy the deployment.

serve.run(SentimentAnalysisDeployment.bind())

# 3: Query the deployment and print the result.
Expand Down
77 changes: 0 additions & 77 deletions doc/source/serve/faq.md

This file was deleted.

120 changes: 95 additions & 25 deletions doc/source/serve/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,6 @@

:::{tip}
[Get in touch with us](https://docs.google.com/forms/d/1l8HT35jXMPtxVUtQPeGoe09VGp5jcvSv0TqPgyz6lGU) if you're using or considering using Ray Serve.

Chat with Ray Serve users and developers on our [forum](https://discuss.ray.io/).
:::

```{image} logo.svg
Expand All @@ -27,29 +25,34 @@ Serve is particularly well suited for {ref}`serve-model-composition`, enabling y

Serve is built on top of Ray, so it easily scales to many machines and offers flexible scheduling support such as fractional GPUs so you can share resources and serve many machine learning models at low cost.

:::{tabbed} Installation
## Quickstart

Install Ray Serve and its dependencies:

```bash
pip install "ray[serve]"
```
:::

:::{tabbed} Quickstart

To run this example, install the following: ``pip install ray["serve"]``

In this quick-start example we will define a simple "hello world" deployment, deploy it behind HTTP locally, and query it.

```{literalinclude} doc_code/quickstart.py
:language: python
```

:::{tabbed} More examples
For more examples, select from the tabs.
:::

:::{tabbed} FastAPI integration
:::{tabbed} Model composition

To run this example, install the following: ``pip install ray["serve"]``
In this example, we demonstrate how you can use Serve's model composition API to express a complex computation graph and deploy it as a Serve application.

```{literalinclude} doc_code/quickstart_graph.py
:language: python
```
:::

:::{tabbed} FastAPI integration

In this example we will use Serve's [FastAPI](https://fastapi.tiangolo.com/) integration to make use of more advanced HTTP functionality.

Expand All @@ -58,9 +61,9 @@ In this example we will use Serve's [FastAPI](https://fastapi.tiangolo.com/) int
```
:::

:::{tabbed} Serving a Hugging Face NLP model
:::{tabbed} Hugging Face model

To run this example, install the following: ``pip install ray["serve"] transformers``
To run this example, install the following: ``pip install transformers``

In this example we will serve a pre-trained [Hugging Face transformers](https://huggingface.co/docs/transformers/index) model using Ray Serve.
The model we'll use is a sentiment analysis model: it will take a text string as input and return if the text was "POSITIVE" or "NEGATIVE."
Expand Down Expand Up @@ -121,9 +124,88 @@ Because it's built on top of Ray, you can run it anywhere Ray can: on your lapto

:::


## How can Serve help me as a...

:::{dropdown} Data scientist
:animate: fade-in-slide-down

Serve makes it easy to go from a laptop to a cluster. You can test your models (and your entire deployment graph) on your local machine before deploying it to production on a cluster. You don't need to know heavyweight Kubernetes concepts or cloud configurations to use Serve.

:::

:::{dropdown} ML engineer
:animate: fade-in-slide-down

Serve helps you scale out your deployment and runs them reliably and efficiently to save costs. With Serve's first-class model composition API, you can combine models together with business logic and build end-to-end user-facing applications. Additionally, Serve runs natively on Kubernetes with minimal operation overhead.
:::

:::{dropdown} ML platform engineer
:animate: fade-in-slide-down

Serve specializes in scalable and reliable ML model serving. As such, it can be an important plug-and-play component of your ML platform stack.
Serve supports arbitrary Python code and therefore integrates well with the MLOps ecosystem. You can use it with model optimizers (ONNX, TVM), model monitoring systems (Seldon Alibi, Arize), model registries (MLFlow, Weights and Biases), machine learning frameworks (XGBoost, Scikit-learn), data app UIs (Gradio, Streamlit), and Web API frameworks (FastAPI, gRPC).

:::


## How does Serve compare to ...

:::{dropdown} TFServing, TorchServe, ONNXRuntime
:animate: fade-in-slide-down

Ray Serve is *framework agnostic*, so you can use it alongside any other Python framework or library.
We believe data scientists should not be bound to a particular machine learning framework.
They should be empowered to use the best tool available for the job.

Compared to these framework-specific solutions, Ray Serve doesn't perform any model-specific optimizations to make your ML model run faster. However, you can still optimize the models yourself
and run them in Ray Serve. For example, you can run a model compiled by
[PyTorch JIT](https://pytorch.org/docs/stable/jit.html) or [ONNXRuntime](https://onnxruntime.ai/).
:::

:::{dropdown} AWS SageMaker, Azure ML, Google Vertex AI
:animate: fade-in-slide-down

As an open-source project, Ray Serve brings the scalability and reliability of these hosted offerings to your own infrastructure.
You can use the Ray [cluster launcher](cluster-index) to deploy Ray Serve to all major public clouds, K8s, as well as on bare-metal, on-premise machines.

Ray Serve is not a full-fledged ML Platform.
Compared to these other offerings, Ray Serve lacks the functionality for
managing the lifecycle of your models, visualizing their performance, etc. Ray
Serve primarily focuses on model serving and providing the primitives for you to
build your own ML platform on top.

If you are looking for end-to-end ML pipeline framework that can handle everything from data processing to serving, check out [Ray AI Runtime](air).
:::

:::{dropdown} Seldon, KServe, Cortex
:animate: fade-in-slide-down

You can develop Ray Serve on your laptop, deploy it on a dev box, and scale it out
to multiple machines or a Kubernetes cluster, all with minimal or no changes to code. It's a lot
easier to get started with when you don't need to provision and manage a K8s cluster.
When it's time to deploy, you can use our [Kubernetes Operator](kuberay-quickstart)
to transparently deploy your Ray Serve application to K8s.
:::

:::{dropdown} BentoML, Comet.ml, MLflow
:animate: fade-in-slide-down

Many of these tools are focused on serving and scaling models independently.
In contrast, Ray Serve is framework-agnostic and focuses on model composition.
As such, Ray Serve works with any model packaging and registry format.
Ray Serve also provides key features for building production-ready machine learning applications, including best-in-class autoscaling and naturally integrating with business logic.
:::

We truly believe Serve is unique as it gives you end-to-end control
over your ML application while delivering scalability and high performance. To achieve
Serve's feature offerings with other tools, you would need to glue together multiple
frameworks like Tensorflow Serving and SageMaker, or even roll your own
micro-batching component to improve throughput.

## Learn More

Check out {ref}`getting-started` and {ref}`serve-key-concepts`, look at the {ref}`serve-faq`,
Check out {ref}`getting-started` and {ref}`serve-key-concepts`,
or head over to the {doc}`tutorials/index` to get started building your Ray Serve applications.


Expand Down Expand Up @@ -182,18 +264,6 @@ or head over to the {doc}`tutorials/index` to get started building your Ray Serv
:classes: btn-outline-info btn-block
---
**Serve FAQ**
^^^
Find answers to commonly asked questions in our detailed FAQ.
+++
.. link-button:: serve-faq
:type: ref
:text: Ray Serve FAQ
:classes: btn-outline-info btn-block
---
**API Reference**
^^^
Expand Down

0 comments on commit 2bd1f46

Please sign in to comment.