-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Figure out success rate classification in telemetry #634
Comments
I think you need to do something like:
because all grpc messages have a 200 response code |
The plan for this more generally is that routes will have some configurable classification logic, so this should be pushed down into the proxy. |
Bear in mind also that responses with no |
I think the proxy should add a |
Question regarding gRPC statuses: it seems like some gRPC status codes (e.g. "Not Found", "Already Exists", "Permission Denied", etc) indicate errors on the client side rather than the server side. If we count the analogous HTTP 4xx error codes as "successes", it seems like some non-zero gRPC status codes might be counted as successes as well? |
This PR adds a `classification` label to proxy response metrics, as @olix0r described in #634 (comment). The label is either "success" or "failure", depending on the following rules: + **if** the response had a gRPC status code, *then* - gRPC status code 0 is considered a success - all others are considered failures + **else if** the response had an HTTP status code, *then* - status codes < 500 are considered success, - status codes >= 500 are considered failures + **else if** the response stream failed **then** - the response is a failure. I've also added end-to-end tests for the classification of HTTP responses (with some work towards classifying gRPC responses as well). Additionally, I've updated `doc/proxy_metrics.md` to reflect the added `classification` label. Signed-off-by: Eliza Weisman <[email protected]>
This PR adds a `classification` label to proxy response metrics, as @olix0r described in linkerd/linkerd2#634 (comment). The label is either "success" or "failure", depending on the following rules: + **if** the response had a gRPC status code, *then* - gRPC status code 0 is considered a success - all others are considered failures + **else if** the response had an HTTP status code, *then* - status codes < 500 are considered success, - status codes >= 500 are considered failures + **else if** the response stream failed **then** - the response is a failure. I've also added end-to-end tests for the classification of HTTP responses (with some work towards classifying gRPC responses as well). Additionally, I've updated `doc/proxy_metrics.md` to reflect the added `classification` label. Signed-off-by: Eliza Weisman <[email protected]>
In #536 we introduced
grpc_status_code
andstatus_code
to enable success rate calculations.A naive success rate calculation would look something like:
Note that
status_code="200"
doesn't count non-200 successes. Assuming we really want to countstatus_code<400
, a couple options come to mind:success=true|false
label to the proxy'sresponse_total
metricstatus_code
's < 400 (this may require enumerating every integer [0...400)Related to #627
The text was updated successfully, but these errors were encountered: