-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[receiver/apache]: add support for more metrics #14102
[receiver/apache]: add support for more metrics #14102
Conversation
4253bb8
to
b23ba1d
Compare
monotonic: false | ||
aggregation: delta |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As noted above, I don't think this metric is actually a sum. For gauges, monotonic
and aggregation
are not applicable.
That said, it may be useful to explain these further, as I think I'll have related feedback below.
I find it easiest to explain with two examples:
- A metric that reports the number of requests made to an endpoint.
- A metric that reports the current number of users logged into a service.
In the "requests" case, each request causes the value to go up. This is monotonic.
In the "current users" case, each time a user logs in the value goes up, but each time a user logs out it goes down. This is nonmonotonic.
Note that the measurements are inherently monotonic or not. On the other hand, either metric can be presented as cumulative or delta aggregation.
scenario | monotonic | aggregation | example | representation |
---|---|---|---|---|
requests | true | cumulative | 5 requests per minute | [ 5, 10, 15, 20, 25, ... ] |
requests | true | delta | 5 requests per minute | [ 5, 5, 5, 5, 5, ... ] |
current users | false | cumulative | 5 log in, then 1 logs out, repeated | [ 5, 4, 9, 8, 13, 12, ... ] |
current users | false | delta | 5 log in, then 1 logs out, repeated | [ 5, -1, 5, -1, 5, -1, ... ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this explanation! I'm relatively new to OpenTelemetry and I'm not used to the terminology yet.
Thank you for a quick and detailed review. I'm sorry for so many mistakes - the documentation on these metrics is not as complete as I expected so I eventually had to lookup some things in the source code. |
1d7229e
to
584670d
Compare
Thanks for the quick turnaround on the feedback. As noted in this thread, I think we should not add the derived metrics, at least in this PR. We can consider adding them later if users believe them to be critical as is, but my preference would be that we leave the rate calculation to processors or backends. |
I removed the following metrics: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good. Just a couple more details.
monotonic: true | ||
aggregation: cumulative | ||
attributes: [ server_name, cpu_level, cpu_mode ] | ||
apache.cpu_load: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
apache.cpu_load: | |
apache.cpu.load: |
Since we've established apache.cpu
as a "namespece", let's reuse it here.
enabled: true | ||
description: Current load of the CPU. | ||
unit: "{load}" | ||
sum: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be a gauge? (Is it meaningful to aggregate instances of this metric across multiple systems, for example?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good question. In the otel docs, gauge is defined as "a sampled value at a given time", which suits the load well.
In addition, I tried to find how similar metrics are treated in other receivers. The only result I found was the flinkmetricsreceiver:
opentelemetry-collector-contrib/receiver/flinkmetricsreceiver/metadata.yaml
Lines 45 to 51 in d444f47
flink.jvm.cpu.load: | |
enabled: true | |
description: The CPU usage of the JVM for a jobmanager or taskmanager. | |
unit: "%" | |
gauge: | |
value_type: double | |
input_type: string |
As you can see, it's a gauge there and the unit is %
. It's seems rational to me and I like the idea of keeping consistency between different components, so I'll change the load here to gauge measured in percents.
@@ -0,0 +1,5 @@ | |||
change_type: enhancement | |||
component: apachereceiver | |||
note: The receiver now supports 13 more metrics, more information in the linked issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should update the count here
The receiver now scrapes 13 more metrics.
9abd9f5
to
bfbf377
Compare
bfbf377
to
7b2234c
Compare
I have applied the changes. It seems that |
Description:
The receiver now scrapes 6 more metrics, described in more detail in the issue.
Link to tracking Issue: #14095
Testing:
Unit and integration tests for the scraper have been extended with data for new metrics.
Documentation:
Documentation generated by
mdatagen
has been added.