Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Serve] [Docs] Update the "Monitoring Ray Serve" Page #27777

Merged
merged 18 commits into from
Aug 12, 2022

Conversation

shrekris-anyscale
Copy link
Contributor

@shrekris-anyscale shrekris-anyscale commented Aug 11, 2022

Why are these changes needed?

The "Monitoring Ray Serve" page explains how to inspect your Ray Serve applications. This change updates the page to remove outdated metrics that Serve no longer exposes and to upgrade code samples to use 2.0 APIs. It also improves the content's readability and organization.

Link to updated "Monitoring Ray Serve" page: https://ray--27777.org.readthedocs.build/en/27777/serve/monitoring.html

Related issue number

Closes #27720.

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
      • This change adds assertions to the documentation code, so they're tested in CI.

Signed-off-by: Shreyas Krishnaswamy <[email protected]>
Signed-off-by: Shreyas Krishnaswamy <[email protected]>
Signed-off-by: Shreyas Krishnaswamy <[email protected]>
Signed-off-by: Shreyas Krishnaswamy <[email protected]>
Signed-off-by: Shreyas Krishnaswamy <[email protected]>
Signed-off-by: Shreyas Krishnaswamy <[email protected]>
Signed-off-by: Shreyas Krishnaswamy <[email protected]>
Signed-off-by: Shreyas Krishnaswamy <[email protected]>
Signed-off-by: Shreyas Krishnaswamy <[email protected]>
@edoakes
Copy link
Contributor

edoakes commented Aug 11, 2022

@shrekris-anyscale FYI you have a merge conflict here

Copy link
Contributor

@edoakes edoakes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the serve.run changes, those will be very helpful.

This includes details such as:
* the number of deployment replicas currently running
* logs for your Serve controller, deployment replicas, and HTTP proxies
* the Ray nodes (i.e. machines) running in your Ray cluster
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

period to end the last bullet point

Comment on lines 20 to 24
For example, if you're running Ray Serve on a local Ray cluster, you can access the dashboard by going to this address in your browser:

```
https://localhost:8265
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
For example, if you're running Ray Serve on a local Ray cluster, you can access the dashboard by going to this address in your browser:
```
https://localhost:8265
```
For example, if you're running Ray Serve locally, you can access the dashboard by going to `https://localhost:8265` in your browser.

save some page real estate

Comment on lines 26 to 27
We can view important information about our application here.
For example, we can inspect our deployment replicas by navigating to the Ray dashboard's "Actors" tab while our Serve application is running:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the previous paragraph used "you" and now you're using "we," stick with one

@edoakes edoakes added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Aug 11, 2022
Signed-off-by: Shreyas Krishnaswamy <[email protected]>
@shrekris-anyscale
Copy link
Contributor Author

@edoakes Thanks for the review! I addressed your comments.

@shrekris-anyscale shrekris-anyscale removed the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Aug 11, 2022
Copy link
Contributor

@dmatrix dmatrix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With those changes, it should be good to go.

Just a general comment. If possible for 2.1, we should consider providing dashboards for this; there is too much boiler code they have to write. But at least we are collecting stats; matter of accessing and presenting easily.

This section should help you understand how to debug and monitor your Serve application.
There are three main ways to do this:
Using the Ray Dashboard, using Ray logging, and using built-in Ray Serve metrics.
This section should help you debug and monitor your Serve applications by:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section helps you debug and monitor your Serve applications by:

## Ray logging

You can use Ray logging to understand system-level behavior and to surface application-level details during runtime.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We repeatedly begin a sentence with "You can use XXX..." Perhaps change the pattern: may be use an infinitive phrase or participle phrase :

To comprehend system-level behavior and surface application-level details during runtime, use Ray logging.

```

Running this code block, we first get some log messages from the controller saying that a new replica of the deployment is being created:
We can run this deployment using the `serve run` CLI command:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Run this deployment using .....

```

Then when we query the deployment, we get both a default access log as well as our custom `"Hello world!"` message.
`serve run` prints out a few log messages immediately. Note that a few of these messages start with identifiers such as
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"out" may be redundant. This without "out" reads equally well.

prints a few log messages ....

@@ -82,18 +111,21 @@ class Silenced:
logger.setLevel(logging.ERROR)
```

This will prevent the replica INFO-level logs from being written to STDOUT or to files on disk.
This controls which logs are written to STDOUT or files on disk.
You can also use your own custom logger, in which case you'll need to configure the behavior to write to STDOUT/STDERR, files on disk, or both.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from standard Python logger, Serve supports custom logging. Custom logging via configuration can control the behavior of what messages are written to STDOUT/STDERR, files on disk, or both.

First, install Loki and Promtail using the instructions on <https://grafana.com>.
It will be convenient to save the Loki and Promtail executables in the same directory, and to navigate to this directory in your terminal before beginning this walkthrough.
You can explore and filter your logs using [Loki](https://grafana.com/oss/loki/).
Setup and configuration is straightforward on Kubernetes, but as a tutorial, let's set up Loki manually.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is "setup and configuration" considered a singular compound subject (like ham and cheese is great for breakfast) in this context? If not perhaps change to

"Setup and configuration are straightforward on Kubernetes."

@@ -163,11 +198,14 @@ You should see something similar to the following:
:align: center
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

passward "admin."

```

The requests will loop and can be canceled with `Ctrl-C`.
The requests will loop and can be canceled with `ctrl-c`.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The requests will loop until canceled with ctrl-c.

Signed-off-by: Shreyas Krishnaswamy <[email protected]>
@shrekris-anyscale
Copy link
Contributor Author

Thanks for the review @dmatrix! I've addressed all your comments.

Just a general comment. If possible for 2.1, we should consider providing dashboards for this; there is too much boiler code they have to write. But at least we are collecting stats; matter of accessing and presenting easily.

Yeah that's a good suggestion. I think that would improve Ray Serve's observability.

@edoakes edoakes merged commit e15960e into ray-project:master Aug 12, 2022
gramhagen pushed a commit to gramhagen/ray that referenced this pull request Aug 15, 2022
)

The "Monitoring Ray Serve" page explains how to inspect your Ray Serve applications. This change updates the page to remove outdated metrics that Serve no longer exposes and to upgrade code samples to use 2.0 APIs. It also improves the content's readability and organization.

Link to updated "Monitoring Ray Serve" page: https://ray--27777.org.readthedocs.build/en/27777/serve/monitoring.html
Stefan-1313 pushed a commit to Stefan-1313/ray_mod that referenced this pull request Aug 18, 2022
)

The "Monitoring Ray Serve" page explains how to inspect your Ray Serve applications. This change updates the page to remove outdated metrics that Serve no longer exposes and to upgrade code samples to use 2.0 APIs. It also improves the content's readability and organization.

Link to updated "Monitoring Ray Serve" page: https://ray--27777.org.readthedocs.build/en/27777/serve/monitoring.html

Signed-off-by: Stefan van der Kleij <[email protected]>
JiahaoYao pushed a commit to JiahaoYao/ray that referenced this pull request Aug 21, 2022
)

The "Monitoring Ray Serve" page explains how to inspect your Ray Serve applications. This change updates the page to remove outdated metrics that Serve no longer exposes and to upgrade code samples to use 2.0 APIs. It also improves the content's readability and organization.

Link to updated "Monitoring Ray Serve" page: https://ray--27777.org.readthedocs.build/en/27777/serve/monitoring.html
JiahaoYao pushed a commit to JiahaoYao/ray that referenced this pull request Aug 22, 2022
)

The "Monitoring Ray Serve" page explains how to inspect your Ray Serve applications. This change updates the page to remove outdated metrics that Serve no longer exposes and to upgrade code samples to use 2.0 APIs. It also improves the content's readability and organization.

Link to updated "Monitoring Ray Serve" page: https://ray--27777.org.readthedocs.build/en/27777/serve/monitoring.html
JiahaoYao pushed a commit to JiahaoYao/ray that referenced this pull request Aug 22, 2022
)

The "Monitoring Ray Serve" page explains how to inspect your Ray Serve applications. This change updates the page to remove outdated metrics that Serve no longer exposes and to upgrade code samples to use 2.0 APIs. It also improves the content's readability and organization.

Link to updated "Monitoring Ray Serve" page: https://ray--27777.org.readthedocs.build/en/27777/serve/monitoring.html
ArturNiederfahrenhorst pushed a commit to ArturNiederfahrenhorst/ray that referenced this pull request Sep 1, 2022
)

The "Monitoring Ray Serve" page explains how to inspect your Ray Serve applications. This change updates the page to remove outdated metrics that Serve no longer exposes and to upgrade code samples to use 2.0 APIs. It also improves the content's readability and organization.

Link to updated "Monitoring Ray Serve" page: https://ray--27777.org.readthedocs.build/en/27777/serve/monitoring.html

Signed-off-by: Artur Niederfahrenhorst <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Serve] [Docs] Update metrics that Ray Serve offers in documentation
5 participants