Revert "Revert "[serve] Integrate and Document Bring-Your-Own Gradio …

…Applications"" (ray-project#27662) Signed-off-by: Stefan van der Kleij <[email protected]>
Stefan-1313 · Aug 18, 2022 · b25f74e · b25f74e
1 parent 1c66765
commit b25f74e
Show file tree

Hide file tree

Showing 13 changed files with 402 additions and 366 deletions.
diff --git a/.buildkite/pipeline.yml b/.buildkite/pipeline.yml
@@ -268,9 +268,30 @@
  > test_shard.txt
  - cat test_shard.txt
  - bazel test --config=ci $(./ci/run/bazel_export_options)
- --test_tag_filters=-post_wheel_build
+ --test_tag_filters=-post_wheel_build,-py37
  $(cat test_shard.txt)
 
+
+- label: ":serverless: Serve Tests (Python 3.7)"
+ conditions:
+ [
+ "RAY_CI_SERVE_AFFECTED",
+ "RAY_CI_PYTHON_AFFECTED",
+ ]
+ commands:
+ - cleanup() { if [ "${BUILDKITE_PULL_REQUEST}" = "false" ]; then ./ci/build/upload_build_info.sh; fi }; trap cleanup EXIT
+ - echo "--- Setting up Python 3.7 environment."
+ - PYTHON=3.7 TORCH_VERSION=1.6 ./ci/env/install-dependencies.sh
+ # Specifying PYTHON=3.7 above somehow messes up the Ray install.
+ # Uninstall and re-install Ray so that we can use Ray Client.
+ # (Remove thirdparty_files to sidestep an issue with psutil.)
+ - pip uninstall -y ray && rm -rf /ray/python/ray/thirdparty_files
+ - ./ci/ci.sh build
+ - bazel test --config=ci $(./ci/run/bazel_export_options)
+ --test_tag_filters=team:serve
+ python/ray/serve/test_gradio
+
+
 - label: ":python: Minimal install 3.6"
  conditions: ["RAY_CI_PYTHON_AFFECTED"]
  commands:
@@ -304,6 +325,17 @@
  - bazel test --test_output=streamed --config=ci --test_env=RAY_DEFAULT=1 $(./ci/run/bazel_export_options)
  python/ray/dashboard/test_dashboard
 
+- label: ":python: Ray Serve default install"
+ conditions: ["RAY_CI_PYTHON_AFFECTED"]
+ commands:
+ - cleanup() { if [ "${BUILDKITE_PULL_REQUEST}" = "false" ]; then ./ci/build/upload_build_info.sh; fi }; trap cleanup EXIT
+ - ./ci/env/install-serve.sh
+ - ./ci/env/env_info.sh
+ - bazel test --test_output=streamed --config=ci --test_env=RAY_DEFAULT=1 $(./ci/run/bazel_export_options)
+ python/ray/serve/test_deployment_graph
+ - bazel test --test_output=streamed --config=ci --test_env=RAY_DEFAULT=1 $(./ci/run/bazel_export_options)
+ python/ray/serve/test_api
+
 - label: ":python: Release test package unit tests"
  conditions: ["ALWAYS"]
  commands:

diff --git a/ci/ci.sh b/ci/ci.sh
@@ -173,6 +173,7 @@ test_python() {
  -python/ray/serve:test_cross_language # Ray java not built on Windows yet.
  -python/ray/serve:test_gcs_failure # Fork not supported in windows
  -python/ray/serve:test_standalone2 # Multinode not supported on Windows
+ -python/ray/serve:test_gradio
  -python/ray/tests:test_actor_advanced # crashes in shutdown
  -python/ray/tests:test_autoscaler # We don't support Autoscaler on Windows
  -python/ray/tests:test_autoscaler_aws

diff --git a/ci/env/install-serve.sh b/ci/env/install-serve.sh
@@ -0,0 +1,12 @@
+#!/usr/bin/env bash
+
+# Installs serve dependencies ("ray[serve]") on top of minimal install
+
+# Get script's directory: https://stackoverflow.com/a/246128
+SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )
+
+# Installs minimal dependencies
+"$SCRIPT_DIR"/install-minimal.sh
+
+# Installs serve dependencies
+python -m pip install -U "ray[serve]"
diff --git a/doc/source/serve/tutorials/gradio-integration.md b/doc/source/serve/tutorials/gradio-integration.md
@@ -0,0 +1,123 @@
+# Scaling your Gradio app with Ray Serve
+
+In this guide, we will show you how to scale up your [Gradio](https://gradio.app/) application using Ray Serve. Keeping the internal architecture of your Gradio app intact (no changes), we simply wrap the app within Ray Serve as a deployment and scale it to access more resources.
+## Dependencies
+
+To follow this tutorial, you will need Ray Serve and Gradio. If you haven't already, install them by running:
+```console
+$ pip install "ray[serve]"
+$ pip install gradio
+```
+For this tutorial, we will use Gradio apps that run text summarization and generation models and use [HuggingFace's Pipelines](https://huggingface.co/docs/transformers/main_classes/pipelines) to access these models. **Note that you can substitute this Gradio app for any Gradio app of your own!**
+
+First, let's install the transformers module.
+```console
+$ pip install transformers
+```
+
+## Quickstart: Deploy your Gradio app with Ray Serve
+
+This section shows you an easy way to deploy your app onto Ray Serve. First, create a new Python file named `demo.py`. Second, import `GradioServer` from Ray Serve to deploy your Gradio app later, `gradio`, and `transformers.pipeline` to load text summarization models.
+```{literalinclude} ../../../../python/ray/serve/examples/doc/gradio-integration.py
+:start-after: __doc_import_begin__
+:end-before: __doc_import_end__
+```
+
+Then, we construct the (optional) Gradio app `io`. This application takes in text and uses the [T5 Small](https://huggingface.co/t5-small) text summarization model loaded using [HuggingFace's Pipelines](https://huggingface.co/docs/transformers/main_classes/pipelines) to summarize that text.
+:::{note} 
+Remember you can substitute this with your own Gradio app if you want to try scaling up your own Gradio app!
+:::
+```{literalinclude} ../../../../python/ray/serve/examples/doc/gradio-integration.py
+:start-after: __doc_gradio_app_begin__
+:end-before: __doc_gradio_app_end__
+```
+
+### Deploying Gradio Server
+In order to deploy your Gradio app onto Ray Serve, you need to wrap your Gradio app in a Serve [deployment](serve-key-concepts-deployment). `GradioServer` acts as that wrapper. It serves your Gradio app remotely on Ray Serve so that it can process and respond to HTTP requests. 
+
+Replicas in a deployment are copies of your program running on Ray Serve, where each replica runs on a separate Ray cluster node's worker process. More replicas scales your deployment by serving more client requests. By wrapping your application in `GradioServer`, you can increase the number of replicas of your application or increase the number of CPUs and/or GPUs available to each replica.
+
+:::{note} 
+`GradioServer` is simply `GradioIngress` but wrapped in a Serve deployment. You can use `GradioServer` for the simple wrap-and-deploy use case, but as you will see in the next section, you can use `GradioIngress` to define your own Gradio Server for more customized use cases.
+:::
+
+Using either the example app `io` we created above or an existing Gradio app (of type `Interface`, `Block`, `Parallel`, etc.), wrap it in your Gradio Server.
+
+```{literalinclude} ../../../../python/ray/serve/examples/doc/gradio-integration.py
+:start-after: __doc_app_begin__
+:end-before: __doc_app_end__
+```
+
+Finally, deploy your Gradio Server! Run the following in your terminal:
+```console
+$ serve run demo:app
+```
+
+Now you can access your Gradio app at `https://localhost:8000`! This is what it should look like:
+![Gradio Result](https://raw.githubusercontent.com/ray-project/images/master/docs/serve/gradio_result.png)
+
+See [Putting Ray Serve Deployment Graphs in Production](https://docs.ray.io/en/master/serve/production.html#id1) for more information on how to deploy your app in production.
+
+
+## Parallelizing models with Ray Serve
+You can run multiple models in parallel with Ray Serve by utilizing the [deployment graph](serve-model-composition-deployment-graph) in Ray Serve.
+
+### Original Approach
+Suppose you want to run the following program.
+
+1. Take two text generation models, [`gpt2`](https://huggingface.co/gpt2) and [`EleutherAI/gpt-neo-125M`](https://huggingface.co/EleutherAI/gpt-neo-125M).
+2. Run the two models on the same input text, such that the generated text has a minimum length of 20 and maximum length of 100.
+3. Display the outputs of both models using Gradio.
+
+This is how you would do it normally:
+
+```{literalinclude} ../../../../python/ray/serve/examples/doc/gradio-original.py
+:start-after: __doc_code_begin__
+:end-before: __doc_code_end__
+```
+
+### Parallelize using Ray Serve
+
+With Ray Serve, we can parallelize the two text generation models by wrapping each model in a separate Ray Serve [deployment](serve-key-concepts-deployment). Deployments are defined by decorating a Python class or function with `@serve.deployment`, and they usually wrap the models that you want to deploy on Ray Serve to handle incoming requests.
+
+Let's walk through a few steps to achieve parallelism. First, let's import our dependencies. Note that we need to import `GradioIngress` instead of `GradioServer` like before since we're now building a customized `MyGradioServer` that can run models in parallel.
+
+```{literalinclude} ../../../../python/ray/serve/examples/doc/gradio-integration-parallel.py
+:start-after: __doc_import_begin__
+:end-before: __doc_import_end__
+```
+
+Then, let's wrap our `gpt2` and `EleutherAI/gpt-neo-125M` models in Serve deployments, named `TextGenerationModel`.
+```{literalinclude} ../../../../python/ray/serve/examples/doc/gradio-integration-parallel.py
+:start-after: __doc_models_begin__
+:end-before: __doc_models_end__
+```
+
+Next, instead of simply wrapping our Gradio app in a `GradioServer` deployment, we can build our own `MyGradioServer` that reroutes the Gradio app so that it runs the `TextGenerationModel` deployments:
+
+```{literalinclude} ../../../../python/ray/serve/examples/doc/gradio-integration-parallel.py
+:start-after: __doc_gradio_server_begin__
+:end-before: __doc_gradio_server_end__
+```
+
+Lastly, we link everything together:
+```{literalinclude} ../../../../python/ray/serve/examples/doc/gradio-integration-parallel.py
+:start-after: __doc_app_begin__
+:end-before: __doc_app_end__
+```
+
+:::{note} 
+This will bind your two text generation models (wrapped in Serve deployments) to `MyGradioServer._d1` and `MyGradioServer._d2`, forming a [deployment graph](serve-model-composition-deployment-graph). Thus, we have built our Gradio Interface `io` such that it calls `MyGradioServer.fanout()`, which simply sends requests to your two text generation models that are deployed on Ray Serve.
+:::
+
+Now, you can run your scalable app, and the two text generation models will run in parallel on Ray Serve.
+Run your Gradio app with the following command:
+
+```console
+$ serve run demo:app
+```
+
+Access your Gradio app at `https://localhost:8000`, and you should see the following interactive interface:
+![Gradio Result](https://raw.githubusercontent.com/ray-project/images/master/docs/serve/gradio_result_parallel.png)
+
+See [Putting Ray Serve Deployment Graphs in Production](https://docs.ray.io/en/master/serve/production.html#id1) for more information on how to deploy your app in production.