Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add metric to count search responses #110070

Merged
merged 7 commits into from
Jun 25, 2024
Merged

Conversation

jdconrad
Copy link
Contributor

@jdconrad jdconrad commented Jun 21, 2024

This adds a metric to count the total number of search responses generated by the search action and the scroll search action. This uses the key es.search_response.response_count.total. Each response also has an attribute to specify whether the response was a success, a partial failure, or a failure.

@jdconrad jdconrad added >non-issue :Search/Search Search-related issues that do not fall into other categories v8.15.0 labels Jun 21, 2024
@jdconrad jdconrad requested review from quux00 and JVerwolf June 21, 2024 22:51
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@elasticsearchmachine elasticsearchmachine added the Team:Search Meta label for search team label Jun 21, 2024
meterRegistry.registerLongCounter(
RESPONSE_COUNT_TOTAL_COUNTER_NAME,
"The cumulative total of search responses with an attribute to describe"
+ "success, partial failure, or failure, expressed as a counter and an attribute",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think you need a space after "describe" or before "success"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Asking to learn: what does "expressed as a counter and an attribute" mean? How is it both?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Attributes can be attached to counters in this case which afaik will add additional fields for each document with the count of each attribute as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the description a bit based on this feedback.

Comment on lines 59 to 61
public void incrementResponseCount(ResponseCountTotalStatus responseCountTotalStatus) {
responseCountTotalCounter.incrementBy(1L, Map.of(RESPONSE_COUNT_TOTAL_STATUS_ATTRIBUTE_NAME, responseCountTotalStatus));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what's causing the following build failure:

https://buildkite.com/elastic/elasticsearch-pull-request/builds/23506#01903d0a-f3e5-4f38-8025-68037928d91b

Caused by: java.lang.IllegalArgumentException: attributes do not support value type of [org.elasticsearch.rest.action.search.SearchResponseMetrics.ResponseCountTotalStatus]
»       at [email protected]/org.elasticsearch.telemetry.apm.internal.metrics.OtelHelper.lambda$fromMap$0(OtelHelper.java:50)

You could change this to:

responseCountTotalCounter.incrementBy(1L, Map.of(RESPONSE_COUNT_TOTAL_STATUS_ATTRIBUTE_NAME, responseCountTotalStatus.name()));
}

Or,


    public enum ResponseCountTotalStatus {
        SUCCESS("success"),
        PARTIAL_FAILURE("partial_failure"),
        FAILURE("failure");
        private final String value;

        ResponseCountTotalStatus(String value) {
            this.value = value;
        }

        public String value() {
            return value
        }
    }
    ...
    
        responseCountTotalCounter.incrementBy(1L, Map.of(RESPONSE_COUNT_TOTAL_STATUS_ATTRIBUTE_NAME, responseCountTotalStatus.value()));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After chatting with @original-brownbear my concerns around the assert triggered by this error have been alleviated. My understanding is the assert is related to ensuring we don't except any exceptions to be triggered as part of onResponse except for unusual cases. I will fix this and then expect the tests will pass.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to your suggestion of value but used displayName instead to not overload the term value.

@jdconrad
Copy link
Contributor Author

@quux00 @JVerwolf Thanks for the initial review. I have responded to your feedback, and this is ready for another round.

Copy link
Contributor

@JVerwolf JVerwolf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}
if ((f.status().getStatus() >= 500 || causeHas500Status)
&& ExceptionsHelper.isNodeOrShardUnavailableTypeException(f.getCause()) == false) {
logger.warn("TransportSearchAction shard failure (partial results response)", f);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@quux00 Do you know why we only log partial failures when the response code is a 5xx? I see that the metric that @jdconrad added follows the same example, so I was curious.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was done in the context of serverless, where we don't log or alert on "user errors" (4xx) (other than 429s, but that doesn't apply here). The goal of the logging epic was to get a sense of how often 5xx "system" type errors are occurring when users get partial results.

@jdconrad jdconrad merged commit 9685a5d into elastic:main Jun 25, 2024
15 checks passed
@quux00
Copy link
Contributor

quux00 commented Jun 26, 2024

LGTM!

Non-blocking: @quux00 Does https://github.com/jdconrad/elasticsearch/blob/c252b40ea63aa764c4a88e12d37f1e51af9af768/x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/termsenum/action/TransportTermsEnumAction.java#L158 need to be instrumented as well?

Yeah, probably. We could bring that to an es search foundations team meeting for discussion and follow on ticket.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>non-issue :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team v8.15.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants