diff --git a/doc/source/serve/handle-guide.md b/doc/source/serve/handle-guide.md index 36f9d056c6d99..d5b3d17f73618 100644 --- a/doc/source/serve/handle-guide.md +++ b/doc/source/serve/handle-guide.md @@ -58,4 +58,4 @@ In both types of ServeHandle, you can call a specific method by using the `.meth :start-after: __begin_handle_method__ :end-before: __end_handle_method__ :language: python -``` \ No newline at end of file +``` diff --git a/doc/source/serve/monitoring.md b/doc/source/serve/monitoring.md index 205cc8898013e..a650e53d22475 100644 --- a/doc/source/serve/monitoring.md +++ b/doc/source/serve/monitoring.md @@ -211,6 +211,13 @@ You can leverage built-in Ray Serve metrics to get a closer look at your applica Ray Serve exposes important system metrics like the number of successful and failed requests through the [Ray metrics monitoring infrastructure](ray-metrics). By default, the metrics are exposed in Prometheus format on each node. +:::{note} +Different metrics are collected when Deployments are called +via Python `ServeHandle` and when they are called via HTTP. + +See the list of metrics below marked for each. +::: + The following metrics are exposed by Ray Serve: ```{eval-rst} @@ -219,29 +226,31 @@ The following metrics are exposed by Ray Serve: * - Name - Description - * - ``serve_deployment_request_counter`` + * - ``serve_deployment_request_counter`` [**] - The number of queries that have been processed in this replica. - * - ``serve_deployment_error_counter`` + * - ``serve_deployment_error_counter`` [**] - The number of exceptions that have occurred in the deployment. - * - ``serve_deployment_replica_starts`` + * - ``serve_deployment_replica_starts`` [**] - The number of times this replica has been restarted due to failure. - * - ``serve_deployment_processing_latency_ms`` + * - ``serve_deployment_processing_latency_ms`` [**] - The latency for queries to be processed. - * - ``serve_replica_processing_queries`` + * - ``serve_replica_processing_queries`` [**] - The current number of queries being processed. - * - ``serve_num_http_requests`` + * - ``serve_num_http_requests`` [*] - The number of HTTP requests processed. - * - ``serve_num_http_error_requests`` + * - ``serve_num_http_error_requests`` [*] - The number of non-200 HTTP responses. - * - ``serve_num_router_requests`` + * - ``serve_num_router_requests`` [*] - The number of requests processed by the router. - * - ``serve_handle_request_counter`` + * - ``serve_handle_request_counter`` [**] - The number of requests processed by this ServeHandle. - * - ``serve_deployment_queued_queries`` + * - ``serve_deployment_queued_queries`` [*] - The number of queries for this deployment waiting to be assigned to a replica. - * - ``serve_num_deployment_http_error_requests`` + * - ``serve_num_deployment_http_error_requests`` [*] - The number of non-200 HTTP responses returned by each deployment. ``` +[*] - only available when using HTTP calls +[**] - only available when using Python `ServeHandle` calls To see this in action, first run the following command to start Ray and set up the metrics export port: