Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[receiver/apache]: add support for more metrics #14102

Merged

Conversation

aboguszewski-sumo
Copy link
Member

@aboguszewski-sumo aboguszewski-sumo commented Sep 13, 2022

Description:
The receiver now scrapes 6 more metrics, described in more detail in the issue.

Link to tracking Issue: #14095

Testing:
Unit and integration tests for the scraper have been extended with data for new metrics.

Documentation:
Documentation generated by mdatagen has been added.

receiver/apachereceiver/metadata.yaml Outdated Show resolved Hide resolved
receiver/apachereceiver/metadata.yaml Outdated Show resolved Hide resolved
receiver/apachereceiver/metadata.yaml Outdated Show resolved Hide resolved
receiver/apachereceiver/metadata.yaml Outdated Show resolved Hide resolved
Comment on lines 86 to 87
monotonic: false
aggregation: delta
Copy link
Member

@djaglowski djaglowski Sep 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As noted above, I don't think this metric is actually a sum. For gauges, monotonic and aggregation are not applicable.

That said, it may be useful to explain these further, as I think I'll have related feedback below.

I find it easiest to explain with two examples:

  1. A metric that reports the number of requests made to an endpoint.
  2. A metric that reports the current number of users logged into a service.

In the "requests" case, each request causes the value to go up. This is monotonic.
In the "current users" case, each time a user logs in the value goes up, but each time a user logs out it goes down. This is nonmonotonic.

Note that the measurements are inherently monotonic or not. On the other hand, either metric can be presented as cumulative or delta aggregation.

scenario monotonic aggregation example representation
requests true cumulative 5 requests per minute [ 5, 10, 15, 20, 25, ... ]
requests true delta 5 requests per minute [ 5, 5, 5, 5, 5, ... ]
current users false cumulative 5 log in, then 1 logs out, repeated [ 5, 4, 9, 8, 13, 12, ... ]
current users false delta 5 log in, then 1 logs out, repeated [ 5, -1, 5, -1, 5, -1, ... ]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this explanation! I'm relatively new to OpenTelemetry and I'm not used to the terminology yet.

receiver/apachereceiver/metadata.yaml Outdated Show resolved Hide resolved
receiver/apachereceiver/metadata.yaml Outdated Show resolved Hide resolved
receiver/apachereceiver/metadata.yaml Outdated Show resolved Hide resolved
receiver/apachereceiver/metadata.yaml Outdated Show resolved Hide resolved
receiver/apachereceiver/metadata.yaml Outdated Show resolved Hide resolved
@aboguszewski-sumo
Copy link
Member Author

Thank you for a quick and detailed review. I'm sorry for so many mistakes - the documentation on these metrics is not as complete as I expected so I eventually had to lookup some things in the source code.
I've applied your suggestions. For clarity, I decided to leave them as separate commits for now, I'll rebase once this will look fine.

@djaglowski
Copy link
Member

Thanks for the quick turnaround on the feedback.

As noted in this thread, I think we should not add the derived metrics, at least in this PR. We can consider adding them later if users believe them to be critical as is, but my preference would be that we leave the rate calculation to processors or backends.

@aboguszewski-sumo
Copy link
Member Author

I removed the following metrics: apache.bytes_per_second, apache.request.time, apache.request.rate, apache.request.size.
It also seems that I've managed to fix the integration tests (it seems that they ignore the exact values anyway), so I guess that this PR is ready (I've seen on other pull requests that you squash the commits anyway, so there's probably no need for me to do that, but squash manually later if needed).

Copy link
Member

@djaglowski djaglowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. Just a couple more details.

monotonic: true
aggregation: cumulative
attributes: [ server_name, cpu_level, cpu_mode ]
apache.cpu_load:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
apache.cpu_load:
apache.cpu.load:

Since we've established apache.cpu as a "namespece", let's reuse it here.

enabled: true
description: Current load of the CPU.
unit: "{load}"
sum:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a gauge? (Is it meaningful to aggregate instances of this metric across multiple systems, for example?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good question. In the otel docs, gauge is defined as "a sampled value at a given time", which suits the load well.
In addition, I tried to find how similar metrics are treated in other receivers. The only result I found was the flinkmetricsreceiver:

flink.jvm.cpu.load:
enabled: true
description: The CPU usage of the JVM for a jobmanager or taskmanager.
unit: "%"
gauge:
value_type: double
input_type: string

As you can see, it's a gauge there and the unit is %. It's seems rational to me and I like the idea of keeping consistency between different components, so I'll change the load here to gauge measured in percents.

@@ -0,0 +1,5 @@
change_type: enhancement
component: apachereceiver
note: The receiver now supports 13 more metrics, more information in the linked issue.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should update the count here

@aboguszewski-sumo
Copy link
Member Author

I have applied the changes. It seems that mdatagen is undergoing some changes, so I had to rebase in order to make the workflow pass.

@djaglowski djaglowski merged commit 296afa1 into open-telemetry:main Sep 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants