[Serve] [Doc] Update intro page (ray-project#27735)

Signed-off-by: Stefan van der Kleij <[email protected]>
Stefan-1313 · Aug 18, 2022 · 2bd1f46 · 2bd1f46
1 parent 8933efe
commit 2bd1f46
Show file tree

Hide file tree

Showing 6 changed files with 130 additions and 105 deletions.
diff --git a/doc/source/_toc.yml b/doc/source/_toc.yml
@@ -192,7 +192,6 @@ parts:
  - file: serve/tutorials/deployment-graph-patterns/linear_pipeline
  - file: serve/tutorials/deployment-graph-patterns/branching_input
  - file: serve/tutorials/deployment-graph-patterns/conditional
- - file: serve/faq
  - file: serve/package-ref
 
  - file: rllib/index

diff --git a/doc/source/ray-references/faq.rst b/doc/source/ray-references/faq.rst
@@ -8,7 +8,6 @@ FAQ
  :caption: Frequently Asked Questions
 
  ./../tune/faq.rst
- ./../serve/faq.rst
 
 
 Further Questions or Issues?

diff --git a/doc/source/serve/doc_code/quickstart_graph.py b/doc/source/serve/doc_code/quickstart_graph.py
@@ -0,0 +1,35 @@
+import requests
+from ray import serve
+from ray.serve.drivers import DAGDriver
+from ray.serve.dag import InputNode
+from ray.serve.http_adapters import json_request
+
+
+# 1. Define the models in our composition graph
+@serve.deployment
+class Adder:
+ def __init__(self, increment: int):
+ self.increment = increment
+
+ def predict(self, inp: int):
+ return self.increment + inp
+
+
+@serve.deployment
+def combine_average(*input_values) -> float:
+ return {"result": sum(input_values) / len(input_values)}
+
+
+# 2: Define the model composition graph and call it.
+with InputNode() as input_node:
+ adder_1 = Adder.bind(increment=1)
+ adder_2 = Adder.bind(increment=2)
+ dag = combine_average.bind(
+ adder_1.predict.bind(input_node), adder_2.predict.bind(input_node)
+ )
+
+serve.run(DAGDriver.bind(dag, http_adapter=json_request))
+
+# 3: Query the deployment and print the result.
+print(requests.post("https://localhost:8000/", json=100).json())
+# {"result": 101.5}
diff --git a/doc/source/serve/doc_code/transformers_example.py b/doc/source/serve/doc_code/transformers_example.py
@@ -14,7 +14,6 @@ def __call__(self, request):
 
 
 # 2: Deploy the deployment.
-
 serve.run(SentimentAnalysisDeployment.bind())
 
 # 3: Query the deployment and print the result.

diff --git a/doc/source/serve/faq.md b/doc/source/serve/faq.md
diff --git a/doc/source/serve/index.md b/doc/source/serve/index.md
@@ -8,8 +8,6 @@
 
 :::{tip}
 [Get in touch with us](https://docs.google.com/forms/d/1l8HT35jXMPtxVUtQPeGoe09VGp5jcvSv0TqPgyz6lGU) if you're using or considering using Ray Serve.
-
-Chat with Ray Serve users and developers on our [forum](https://discuss.ray.io/).
 :::
 
 ```{image} logo.svg
@@ -27,29 +25,34 @@ Serve is particularly well suited for {ref}`serve-model-composition`, enabling y
 
 Serve is built on top of Ray, so it easily scales to many machines and offers flexible scheduling support such as fractional GPUs so you can share resources and serve many machine learning models at low cost.
 
-:::{tabbed} Installation
+## Quickstart
 
 Install Ray Serve and its dependencies:
 
 ```bash
 pip install "ray[serve]"
 ```
-:::
-
-:::{tabbed} Quickstart
-
-To run this example, install the following: ``pip install ray["serve"]``
 
 In this quick-start example we will define a simple "hello world" deployment, deploy it behind HTTP locally, and query it.
 
 ```{literalinclude} doc_code/quickstart.py
 :language: python
 ```
+
+:::{tabbed} More examples
+For more examples, select from the tabs.
 :::
 
-:::{tabbed} FastAPI integration
+:::{tabbed} Model composition
 
-To run this example, install the following: ``pip install ray["serve"]``
+In this example, we demonstrate how you can use Serve's model composition API to express a complex computation graph and deploy it as a Serve application.
+
+```{literalinclude} doc_code/quickstart_graph.py
+:language: python
+```
+:::
+
+:::{tabbed} FastAPI integration
 
 In this example we will use Serve's [FastAPI](https://fastapi.tiangolo.com/) integration to make use of more advanced HTTP functionality.
 
@@ -58,9 +61,9 @@ In this example we will use Serve's [FastAPI](https://fastapi.tiangolo.com/) int
 ```
 :::
 
-:::{tabbed} Serving a Hugging Face NLP model
+:::{tabbed} Hugging Face model
 
-To run this example, install the following: ``pip install ray["serve"] transformers``
+To run this example, install the following: ``pip install transformers``
 
 In this example we will serve a pre-trained [Hugging Face transformers](https://huggingface.co/docs/transformers/index) model using Ray Serve.
 The model we'll use is a sentiment analysis model: it will take a text string as input and return if the text was "POSITIVE" or "NEGATIVE."
@@ -121,9 +124,88 @@ Because it's built on top of Ray, you can run it anywhere Ray can: on your lapto
 
 :::
 
+
+## How can Serve help me as a...
+
+:::{dropdown} Data scientist
+:animate: fade-in-slide-down
+
+Serve makes it easy to go from a laptop to a cluster. You can test your models (and your entire deployment graph) on your local machine before deploying it to production on a cluster. You don't need to know heavyweight Kubernetes concepts or cloud configurations to use Serve.
+
+:::
+
+:::{dropdown} ML engineer
+:animate: fade-in-slide-down
+
+Serve helps you scale out your deployment and runs them reliably and efficiently to save costs. With Serve's first-class model composition API, you can combine models together with business logic and build end-to-end user-facing applications. Additionally, Serve runs natively on Kubernetes with minimal operation overhead.
+:::
+
+:::{dropdown} ML platform engineer
+:animate: fade-in-slide-down
+
+Serve specializes in scalable and reliable ML model serving. As such, it can be an important plug-and-play component of your ML platform stack.
+Serve supports arbitrary Python code and therefore integrates well with the MLOps ecosystem. You can use it with model optimizers (ONNX, TVM), model monitoring systems (Seldon Alibi, Arize), model registries (MLFlow, Weights and Biases), machine learning frameworks (XGBoost, Scikit-learn), data app UIs (Gradio, Streamlit), and Web API frameworks (FastAPI, gRPC).
+
+:::
+
+
+## How does Serve compare to ...
+
+:::{dropdown} TFServing, TorchServe, ONNXRuntime
+:animate: fade-in-slide-down
+
+Ray Serve is *framework agnostic*, so you can use it alongside any other Python framework or library.
+We believe data scientists should not be bound to a particular machine learning framework.
+They should be empowered to use the best tool available for the job.
+
+Compared to these framework-specific solutions, Ray Serve doesn't perform any model-specific optimizations to make your ML model run faster. However, you can still optimize the models yourself
+and run them in Ray Serve. For example, you can run a model compiled by
+[PyTorch JIT](https://pytorch.org/docs/stable/jit.html) or [ONNXRuntime](https://onnxruntime.ai/).
+:::
+
+:::{dropdown} AWS SageMaker, Azure ML, Google Vertex AI
+:animate: fade-in-slide-down
+
+As an open-source project, Ray Serve brings the scalability and reliability of these hosted offerings to your own infrastructure.
+You can use the Ray [cluster launcher](cluster-index) to deploy Ray Serve to all major public clouds, K8s, as well as on bare-metal, on-premise machines.
+
+Ray Serve is not a full-fledged ML Platform.
+Compared to these other offerings, Ray Serve lacks the functionality for
+managing the lifecycle of your models, visualizing their performance, etc. Ray
+Serve primarily focuses on model serving and providing the primitives for you to
+build your own ML platform on top.
+
+If you are looking for end-to-end ML pipeline framework that can handle everything from data processing to serving, check out [Ray AI Runtime](air).
+:::
+
+:::{dropdown} Seldon, KServe, Cortex
+:animate: fade-in-slide-down
+
+You can develop Ray Serve on your laptop, deploy it on a dev box, and scale it out
+to multiple machines or a Kubernetes cluster, all with minimal or no changes to code. It's a lot
+easier to get started with when you don't need to provision and manage a K8s cluster.
+When it's time to deploy, you can use our [Kubernetes Operator](kuberay-quickstart)
+to transparently deploy your Ray Serve application to K8s.
+:::
+
+:::{dropdown} BentoML, Comet.ml, MLflow
+:animate: fade-in-slide-down
+
+Many of these tools are focused on serving and scaling models independently.
+In contrast, Ray Serve is framework-agnostic and focuses on model composition.
+As such, Ray Serve works with any model packaging and registry format.
+Ray Serve also provides key features for building production-ready machine learning applications, including best-in-class autoscaling and naturally integrating with business logic.
+:::
+
+We truly believe Serve is unique as it gives you end-to-end control
+over your ML application while delivering scalability and high performance. To achieve
+Serve's feature offerings with other tools, you would need to glue together multiple
+frameworks like Tensorflow Serving and SageMaker, or even roll your own
+micro-batching component to improve throughput.
+
 ## Learn More
 
-Check out {ref}`getting-started` and {ref}`serve-key-concepts`, look at the {ref}`serve-faq`,
+Check out {ref}`getting-started` and {ref}`serve-key-concepts`,
 or head over to the {doc}`tutorials/index` to get started building your Ray Serve applications.
 
 
@@ -182,18 +264,6 @@ or head over to the {doc}`tutorials/index` to get started building your Ray Serv
  :classes: btn-outline-info btn-block
  ---
 
- **Serve FAQ**
- ^^^
-
- Find answers to commonly asked questions in our detailed FAQ.
-
- +++
- .. link-button:: serve-faq
- :type: ref
- :text: Ray Serve FAQ
- :classes: btn-outline-info btn-block
- ---
-
  **API Reference**
  ^^^