Add request metric to RestController to track success/failure (by status code) #109957

ldematte · 2024-06-20T09:08:39Z

We would like to track the status code of each request through a new metric. We would like it to be internal to ES, so we have full ownership of it, and we can make changes and build the needed dashboards and alerts.

The main question we want to answer is: "What is the percentage of failed requests compared to successful one for a certain project, per API or group of API?"

This PR address this issue by adding a synchronous counter, es.rest.requests.total, with 3 attributes, es_rest_handler_name, es_rest_status_code and es_rest_request_method. This way it should be possible to create custom dashboards and alarms based on API, group of APIs (filtering by e.g. labels.es_rest_handler_name: cat_*), success/failure (filtering by e.g. numeric_labels.es_rest_status_code)

This is currently a WIP draft, several unit tests need to be added, but existing unit tests and IT tests pass, and I have done some rally runs to validate performances and there was no issue.

Which brings me to the open questions:

I used Rally on my laptop to check if the addition of a incrementBy call on every rest request could cause performance problems; it seems there is no problem, all runs have been close to the starting case (current main), within variance (between +3% and -2%). It seems the Java agent handles aggregation with the same attributes quite well. BUT if we are worried this might be a problem we can do aggregation ourselves, and be a bit more aggressive. Note that I don't think we can use a async counter here: they are designed to produce one single metric per call, and if we use attributes (and I think we need to, especially for the "API" part), we cannot do that.
After discussion, given the Rally result, and double checking that the APM Java agent aggregates increments with the same attributes (it does), and reducing the attribute space by using RestControllers names, we are happy with this for now. We can revisit this if we run into problems or limitations in the future.
I am saving the request path as a way to identify an API or groups of them, but I'm wonder if instead I should use the RestHandler routes? Or some sort of "RestHandler name/id"? (I feel this might be related to the capabilities work somehow)
After discussion with the team, this has been replaced by RestController#getName ("pulling up" the existing BaseRestHandler#getName, defaulting to classname)

Related JIRA: ES-7655

… attributes

elasticsearchmachine · 2024-06-21T06:49:53Z

Pinging @elastic/es-core-infra (Team:Core/Infra)

elasticsearchmachine · 2024-06-21T06:49:54Z

Hi @ldematte, I've created a changelog YAML for you.

…cing

…search into rest-controller-tracing

pgomulka

LGTM, I left minor suggestions
do you think we can add integration test using TestTelemetryPlugin?

server/src/main/java/org/elasticsearch/rest/RestController.java

ldematte · 2024-06-26T13:43:32Z

do you think we can add integration test using TestTelemetryPlugin?

Sure; the unit tests should already cover the invocations with mocks, but better one test more :)

…cing

ldematte added 2 commits June 18, 2024 12:19

Propagate TelemetryProvider in place of Tracer

3ef5e20

Add counter for rest requests, with API path and response status-code…

d62ddc0

… attributes

elasticsearchmachine added the v8.15.0 label Jun 20, 2024

ldematte added 2 commits June 20, 2024 15:39

Replace request path with RestHandler name for API identification

1db64b3

Tests

46e2f3d

ldematte changed the title ~~[WIP] Add request metric to RestController~~ Add request metric to RestController to track success/failure (by status code) Jun 21, 2024

ldematte added >enhancement :Core/Infra/Metrics Metrics and metering infrastructure labels Jun 21, 2024

ldematte marked this pull request as ready for review June 21, 2024 06:49

ldematte requested a review from a team as a code owner June 21, 2024 06:49

elasticsearchmachine added the Team:Core/Infra Meta label for core/infra team label Jun 21, 2024

ldematte added 4 commits June 21, 2024 08:49

Update docs/changelog/109957.yaml

385704d

Merge remote-tracking branch 'upstream/main' into rest-controller-tra…

5771d36

…cing

Merge branch 'rest-controller-tracing' of github.com:ldematte/elastic…

c378311

…search into rest-controller-tracing

Fix tests

08d3c51

pgomulka approved these changes Jun 26, 2024

View reviewed changes

server/src/main/java/org/elasticsearch/rest/RestController.java Outdated Show resolved Hide resolved

server/src/main/java/org/elasticsearch/rest/RestController.java Outdated Show resolved Hide resolved

ldematte added 4 commits June 26, 2024 16:05

Merge remote-tracking branch 'upstream/main' into rest-controller-tra…

b393675

…cing

Refactoring to reduce duplication + IT tests

b175dc7

spotless

5c503e1

More robust measurement check

e26d476

ldematte merged commit abcc383 into elastic:main Jun 27, 2024
15 checks passed

ldematte deleted the rest-controller-tracing branch June 27, 2024 10:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add request metric to RestController to track success/failure (by status code) #109957

Add request metric to RestController to track success/failure (by status code) #109957

ldematte commented Jun 20, 2024 •

edited

Loading

elasticsearchmachine commented Jun 21, 2024

elasticsearchmachine commented Jun 21, 2024

pgomulka left a comment

ldematte commented Jun 26, 2024

Add request metric to RestController to track success/failure (by status code) #109957

Add request metric to RestController to track success/failure (by status code) #109957

Conversation

ldematte commented Jun 20, 2024 • edited Loading

elasticsearchmachine commented Jun 21, 2024

elasticsearchmachine commented Jun 21, 2024

pgomulka left a comment

Choose a reason for hiding this comment

ldematte commented Jun 26, 2024

ldematte commented Jun 20, 2024 •

edited

Loading