[Serve][Docs] Mark metrics served for HTTP vs Python calls (ray-proje…

…ct#27858) Different metrics are collected in Ray Serve when the deployments are called from HTTP vs Python. This needs to be mentioned in the documentation and each metric marked accordingly. Signed-off-by: Stefan van der Kleij <[email protected]>
Stefan-1313 · Aug 18, 2022 · e62c8f5 · e62c8f5
1 parent 933407f
commit e62c8f5
Show file tree

Hide file tree

Showing 2 changed files with 21 additions and 12 deletions.
diff --git a/doc/source/serve/handle-guide.md b/doc/source/serve/handle-guide.md
@@ -58,4 +58,4 @@ In both types of ServeHandle, you can call a specific method by using the `.meth
 :start-after: __begin_handle_method__
 :end-before: __end_handle_method__
 :language: python
-```
+```
diff --git a/doc/source/serve/monitoring.md b/doc/source/serve/monitoring.md
@@ -211,6 +211,13 @@ You can leverage built-in Ray Serve metrics to get a closer look at your applica
 Ray Serve exposes important system metrics like the number of successful and
 failed requests through the [Ray metrics monitoring infrastructure](ray-metrics). By default, the metrics are exposed in Prometheus format on each node.
 
+:::{note}
+Different metrics are collected when Deployments are called
+via Python `ServeHandle` and when they are called via HTTP.
+
+See the list of metrics below marked for each.
+:::
+
 The following metrics are exposed by Ray Serve:
 
 ```{eval-rst}
@@ -219,29 +226,31 @@ The following metrics are exposed by Ray Serve:
 
  * - Name
  - Description
- * - ``serve_deployment_request_counter``
+ * - ``serve_deployment_request_counter`` [**]
  - The number of queries that have been processed in this replica.
- * - ``serve_deployment_error_counter``
+ * - ``serve_deployment_error_counter`` [**]
  - The number of exceptions that have occurred in the deployment.
- * - ``serve_deployment_replica_starts``
+ * - ``serve_deployment_replica_starts`` [**]
  - The number of times this replica has been restarted due to failure.
- * - ``serve_deployment_processing_latency_ms``
+ * - ``serve_deployment_processing_latency_ms`` [**]
  - The latency for queries to be processed.
- * - ``serve_replica_processing_queries``
+ * - ``serve_replica_processing_queries`` [**]
  - The current number of queries being processed.
- * - ``serve_num_http_requests``
+ * - ``serve_num_http_requests`` [*]
  - The number of HTTP requests processed.
- * - ``serve_num_http_error_requests``
+ * - ``serve_num_http_error_requests`` [*]
  - The number of non-200 HTTP responses.
- * - ``serve_num_router_requests``
+ * - ``serve_num_router_requests`` [*]
  - The number of requests processed by the router.
- * - ``serve_handle_request_counter``
+ * - ``serve_handle_request_counter`` [**]
  - The number of requests processed by this ServeHandle.
- * - ``serve_deployment_queued_queries``
+ * - ``serve_deployment_queued_queries`` [*]
  - The number of queries for this deployment waiting to be assigned to a replica.
- * - ``serve_num_deployment_http_error_requests``
+ * - ``serve_num_deployment_http_error_requests`` [*]
  - The number of non-200 HTTP responses returned by each deployment.
 ```
+[*] - only available when using HTTP calls 
+[**] - only available when using Python `ServeHandle` calls
 
 To see this in action, first run the following command to start Ray and set up the metrics export port: