Skip to content

Commit

Permalink
Revert "Revert "[serve] Integrate and Document Bring-Your-Own Gradio …
Browse files Browse the repository at this point in the history
…Applications"" (ray-project#27662)

Signed-off-by: Stefan van der Kleij <[email protected]>
  • Loading branch information
zcin authored and Stefan van der Kleij committed Aug 18, 2022
1 parent 1c66765 commit b25f74e
Show file tree
Hide file tree
Showing 13 changed files with 402 additions and 366 deletions.
34 changes: 33 additions & 1 deletion .buildkite/pipeline.yml
Original file line number Diff line number Diff line change
Expand Up @@ -268,9 +268,30 @@
> test_shard.txt
- cat test_shard.txt
- bazel test --config=ci $(./ci/run/bazel_export_options)
--test_tag_filters=-post_wheel_build
--test_tag_filters=-post_wheel_build,-py37
$(cat test_shard.txt)


- label: ":serverless: Serve Tests (Python 3.7)"
conditions:
[
"RAY_CI_SERVE_AFFECTED",
"RAY_CI_PYTHON_AFFECTED",
]
commands:
- cleanup() { if [ "${BUILDKITE_PULL_REQUEST}" = "false" ]; then ./ci/build/upload_build_info.sh; fi }; trap cleanup EXIT
- echo "--- Setting up Python 3.7 environment."
- PYTHON=3.7 TORCH_VERSION=1.6 ./ci/env/install-dependencies.sh
# Specifying PYTHON=3.7 above somehow messes up the Ray install.
# Uninstall and re-install Ray so that we can use Ray Client.
# (Remove thirdparty_files to sidestep an issue with psutil.)
- pip uninstall -y ray && rm -rf /ray/python/ray/thirdparty_files
- ./ci/ci.sh build
- bazel test --config=ci $(./ci/run/bazel_export_options)
--test_tag_filters=team:serve
python/ray/serve/test_gradio


- label: ":python: Minimal install 3.6"
conditions: ["RAY_CI_PYTHON_AFFECTED"]
commands:
Expand Down Expand Up @@ -304,6 +325,17 @@
- bazel test --test_output=streamed --config=ci --test_env=RAY_DEFAULT=1 $(./ci/run/bazel_export_options)
python/ray/dashboard/test_dashboard

- label: ":python: Ray Serve default install"
conditions: ["RAY_CI_PYTHON_AFFECTED"]
commands:
- cleanup() { if [ "${BUILDKITE_PULL_REQUEST}" = "false" ]; then ./ci/build/upload_build_info.sh; fi }; trap cleanup EXIT
- ./ci/env/install-serve.sh
- ./ci/env/env_info.sh
- bazel test --test_output=streamed --config=ci --test_env=RAY_DEFAULT=1 $(./ci/run/bazel_export_options)
python/ray/serve/test_deployment_graph
- bazel test --test_output=streamed --config=ci --test_env=RAY_DEFAULT=1 $(./ci/run/bazel_export_options)
python/ray/serve/test_api

- label: ":python: Release test package unit tests"
conditions: ["ALWAYS"]
commands:
Expand Down
1 change: 1 addition & 0 deletions ci/ci.sh
Original file line number Diff line number Diff line change
Expand Up @@ -173,6 +173,7 @@ test_python() {
-python/ray/serve:test_cross_language # Ray java not built on Windows yet.
-python/ray/serve:test_gcs_failure # Fork not supported in windows
-python/ray/serve:test_standalone2 # Multinode not supported on Windows
-python/ray/serve:test_gradio
-python/ray/tests:test_actor_advanced # crashes in shutdown
-python/ray/tests:test_autoscaler # We don't support Autoscaler on Windows
-python/ray/tests:test_autoscaler_aws
Expand Down
12 changes: 12 additions & 0 deletions ci/env/install-serve.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
#!/usr/bin/env bash

# Installs serve dependencies ("ray[serve]") on top of minimal install

# Get script's directory: https://stackoverflow.com/a/246128
SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )

# Installs minimal dependencies
"$SCRIPT_DIR"/install-minimal.sh

# Installs serve dependencies
python -m pip install -U "ray[serve]"
123 changes: 123 additions & 0 deletions doc/source/serve/tutorials/gradio-integration.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
# Scaling your Gradio app with Ray Serve

In this guide, we will show you how to scale up your [Gradio](https://gradio.app/) application using Ray Serve. Keeping the internal architecture of your Gradio app intact (no changes), we simply wrap the app within Ray Serve as a deployment and scale it to access more resources.
## Dependencies

To follow this tutorial, you will need Ray Serve and Gradio. If you haven't already, install them by running:
```console
$ pip install "ray[serve]"
$ pip install gradio
```
For this tutorial, we will use Gradio apps that run text summarization and generation models and use [HuggingFace's Pipelines](https://huggingface.co/docs/transformers/main_classes/pipelines) to access these models. **Note that you can substitute this Gradio app for any Gradio app of your own!**

First, let's install the transformers module.
```console
$ pip install transformers
```

## Quickstart: Deploy your Gradio app with Ray Serve

This section shows you an easy way to deploy your app onto Ray Serve. First, create a new Python file named `demo.py`. Second, import `GradioServer` from Ray Serve to deploy your Gradio app later, `gradio`, and `transformers.pipeline` to load text summarization models.
```{literalinclude} ../../../../python/ray/serve/examples/doc/gradio-integration.py
:start-after: __doc_import_begin__
:end-before: __doc_import_end__
```

Then, we construct the (optional) Gradio app `io`. This application takes in text and uses the [T5 Small](https://huggingface.co/t5-small) text summarization model loaded using [HuggingFace's Pipelines](https://huggingface.co/docs/transformers/main_classes/pipelines) to summarize that text.
:::{note}
Remember you can substitute this with your own Gradio app if you want to try scaling up your own Gradio app!
:::
```{literalinclude} ../../../../python/ray/serve/examples/doc/gradio-integration.py
:start-after: __doc_gradio_app_begin__
:end-before: __doc_gradio_app_end__
```

### Deploying Gradio Server
In order to deploy your Gradio app onto Ray Serve, you need to wrap your Gradio app in a Serve [deployment](serve-key-concepts-deployment). `GradioServer` acts as that wrapper. It serves your Gradio app remotely on Ray Serve so that it can process and respond to HTTP requests.

Replicas in a deployment are copies of your program running on Ray Serve, where each replica runs on a separate Ray cluster node's worker process. More replicas scales your deployment by serving more client requests. By wrapping your application in `GradioServer`, you can increase the number of replicas of your application or increase the number of CPUs and/or GPUs available to each replica.

:::{note}
`GradioServer` is simply `GradioIngress` but wrapped in a Serve deployment. You can use `GradioServer` for the simple wrap-and-deploy use case, but as you will see in the next section, you can use `GradioIngress` to define your own Gradio Server for more customized use cases.
:::

Using either the example app `io` we created above or an existing Gradio app (of type `Interface`, `Block`, `Parallel`, etc.), wrap it in your Gradio Server.

```{literalinclude} ../../../../python/ray/serve/examples/doc/gradio-integration.py
:start-after: __doc_app_begin__
:end-before: __doc_app_end__
```

Finally, deploy your Gradio Server! Run the following in your terminal:
```console
$ serve run demo:app
```

Now you can access your Gradio app at `https://localhost:8000`! This is what it should look like:
![Gradio Result](https://raw.githubusercontent.com/ray-project/images/master/docs/serve/gradio_result.png)

See [Putting Ray Serve Deployment Graphs in Production](https://docs.ray.io/en/master/serve/production.html#id1) for more information on how to deploy your app in production.


## Parallelizing models with Ray Serve
You can run multiple models in parallel with Ray Serve by utilizing the [deployment graph](serve-model-composition-deployment-graph) in Ray Serve.

### Original Approach
Suppose you want to run the following program.

1. Take two text generation models, [`gpt2`](https://huggingface.co/gpt2) and [`EleutherAI/gpt-neo-125M`](https://huggingface.co/EleutherAI/gpt-neo-125M).
2. Run the two models on the same input text, such that the generated text has a minimum length of 20 and maximum length of 100.
3. Display the outputs of both models using Gradio.

This is how you would do it normally:

```{literalinclude} ../../../../python/ray/serve/examples/doc/gradio-original.py
:start-after: __doc_code_begin__
:end-before: __doc_code_end__
```

### Parallelize using Ray Serve

With Ray Serve, we can parallelize the two text generation models by wrapping each model in a separate Ray Serve [deployment](serve-key-concepts-deployment). Deployments are defined by decorating a Python class or function with `@serve.deployment`, and they usually wrap the models that you want to deploy on Ray Serve to handle incoming requests.

Let's walk through a few steps to achieve parallelism. First, let's import our dependencies. Note that we need to import `GradioIngress` instead of `GradioServer` like before since we're now building a customized `MyGradioServer` that can run models in parallel.

```{literalinclude} ../../../../python/ray/serve/examples/doc/gradio-integration-parallel.py
:start-after: __doc_import_begin__
:end-before: __doc_import_end__
```

Then, let's wrap our `gpt2` and `EleutherAI/gpt-neo-125M` models in Serve deployments, named `TextGenerationModel`.
```{literalinclude} ../../../../python/ray/serve/examples/doc/gradio-integration-parallel.py
:start-after: __doc_models_begin__
:end-before: __doc_models_end__
```

Next, instead of simply wrapping our Gradio app in a `GradioServer` deployment, we can build our own `MyGradioServer` that reroutes the Gradio app so that it runs the `TextGenerationModel` deployments:

```{literalinclude} ../../../../python/ray/serve/examples/doc/gradio-integration-parallel.py
:start-after: __doc_gradio_server_begin__
:end-before: __doc_gradio_server_end__
```

Lastly, we link everything together:
```{literalinclude} ../../../../python/ray/serve/examples/doc/gradio-integration-parallel.py
:start-after: __doc_app_begin__
:end-before: __doc_app_end__
```

:::{note}
This will bind your two text generation models (wrapped in Serve deployments) to `MyGradioServer._d1` and `MyGradioServer._d2`, forming a [deployment graph](serve-model-composition-deployment-graph). Thus, we have built our Gradio Interface `io` such that it calls `MyGradioServer.fanout()`, which simply sends requests to your two text generation models that are deployed on Ray Serve.
:::

Now, you can run your scalable app, and the two text generation models will run in parallel on Ray Serve.
Run your Gradio app with the following command:

```console
$ serve run demo:app
```

Access your Gradio app at `https://localhost:8000`, and you should see the following interactive interface:
![Gradio Result](https://raw.githubusercontent.com/ray-project/images/master/docs/serve/gradio_result_parallel.png)

See [Putting Ray Serve Deployment Graphs in Production](https://docs.ray.io/en/master/serve/production.html#id1) for more information on how to deploy your app in production.
Loading

0 comments on commit b25f74e

Please sign in to comment.