Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add request metric to RestController to track success/failure (by status code) #109957

Merged
merged 12 commits into from
Jun 27, 2024

Conversation

ldematte
Copy link
Contributor

@ldematte ldematte commented Jun 20, 2024

We would like to track the status code of each request through a new metric. We would like it to be internal to ES, so we have full ownership of it, and we can make changes and build the needed dashboards and alerts.

The main question we want to answer is: "What is the percentage of failed requests compared to successful one for a certain project, per API or group of API?"

This PR address this issue by adding a synchronous counter, es.rest.requests.total, with 3 attributes, es_rest_handler_name, es_rest_status_code and es_rest_request_method. This way it should be possible to create custom dashboards and alarms based on API, group of APIs (filtering by e.g. labels.es_rest_handler_name: cat_*), success/failure (filtering by e.g. numeric_labels.es_rest_status_code)

This is currently a WIP draft, several unit tests need to be added, but existing unit tests and IT tests pass, and I have done some rally runs to validate performances and there was no issue.

Which brings me to the open questions:

  1. I used Rally on my laptop to check if the addition of a incrementBy call on every rest request could cause performance problems; it seems there is no problem, all runs have been close to the starting case (current main), within variance (between +3% and -2%). It seems the Java agent handles aggregation with the same attributes quite well. BUT if we are worried this might be a problem we can do aggregation ourselves, and be a bit more aggressive. Note that I don't think we can use a async counter here: they are designed to produce one single metric per call, and if we use attributes (and I think we need to, especially for the "API" part), we cannot do that.
    After discussion, given the Rally result, and double checking that the APM Java agent aggregates increments with the same attributes (it does), and reducing the attribute space by using RestControllers names, we are happy with this for now. We can revisit this if we run into problems or limitations in the future.
  2. I am saving the request path as a way to identify an API or groups of them, but I'm wonder if instead I should use the RestHandler routes? Or some sort of "RestHandler name/id"? (I feel this might be related to the capabilities work somehow)
    After discussion with the team, this has been replaced by RestController#getName ("pulling up" the existing BaseRestHandler#getName, defaulting to classname)

Related JIRA: ES-7655

@ldematte ldematte changed the title [WIP] Add request metric to RestController Add request metric to RestController to track success/failure (by status code) Jun 21, 2024
@ldematte ldematte added >enhancement :Core/Infra/Metrics Metrics and metering infrastructure labels Jun 21, 2024
@ldematte ldematte marked this pull request as ready for review June 21, 2024 06:49
@ldematte ldematte requested a review from a team as a code owner June 21, 2024 06:49
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (Team:Core/Infra)

@elasticsearchmachine elasticsearchmachine added the Team:Core/Infra Meta label for core/infra team label Jun 21, 2024
@elasticsearchmachine
Copy link
Collaborator

Hi @ldematte, I've created a changelog YAML for you.

Copy link
Contributor

@pgomulka pgomulka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I left minor suggestions
do you think we can add integration test using TestTelemetryPlugin?

@ldematte
Copy link
Contributor Author

do you think we can add integration test using TestTelemetryPlugin?

Sure; the unit tests should already cover the invocations with mocks, but better one test more :)

@ldematte ldematte merged commit abcc383 into elastic:main Jun 27, 2024
15 checks passed
@ldematte ldematte deleted the rest-controller-tracing branch June 27, 2024 10:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/Metrics Metrics and metering infrastructure >enhancement Team:Core/Infra Meta label for core/infra team v8.15.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants