Add new JVM runtime environment metrics #44

roberttoyonaga · 2023-05-19T17:46:06Z

This is the same PR as in the old semantic conventions location here: open-telemetry/opentelemetry-specification#3352 (review)

This PR adds process.runtime.jvm.cpu.monitor.time, process.runtime.jvm.network.io, process.runtime.jvm.network.time, and process.runtime.jvm.cpu.context_switch metrics to the runtime environment metrics.

Metric gathering implementations for these new metrics already exist in a basic form in https://github.com/open-telemetry/opentelemetry-java-instrumentation/tree/main/instrumentation/runtime-telemetry-jfr/library
Once the details around these new metrics are decided, the implementations can be updated.

JFR streaming would be used to gather these metrics. This feature has only been available since JDK 14 so these metrics would only be supported for JDK17+.

Please see original discussion in this open-telemetry/opentelemetry-java-instrumentation#7886 (comment) and at the Java + Instrumentation SIG.

Related issues #1222

semantic_conventions/metrics/process-runtime-jvm-metrics.yaml

specification/metrics/semantic_conventions/runtime-environment-metrics.md

jsuereth · 2023-06-14T12:42:24Z

specification/metrics/semantic_conventions/runtime-environment-metrics.md

+This metric is obtained from [`jdk.JavaMonitorWait`](https://sap.github.io/SapMachine/jfrevents/21.html#javamonitorwait) and [`jdk.JavaMonitorEnter`](https://sap.github.io/SapMachine/jfrevents/21.html#javamonitorenter) JFR events.
+
+This metric SHOULD be specified with
+[`ExplicitBucketBoundaries`](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/api.md#instrument-advice)


Why are we recommending empty buckets? Do we not care about percentiles here?

I agree it would be better to have a set of buckets, but I'm not sure what to best recommend. A thread could be blocked/waiting for any length of time. There is also a single bucket precedent for GC pause time. Maybe exponential buckets would make sense.

@jack-berg what was your reasoning for suggesting a single bucket:

I recommend we downgrade this to a summary by default by specifying an empty set of explicit bucket bounds using advice.

Reasoning is as follows:

We need a recommended set of default buckets since the SDK defaults won't be useful.

Coming up with sensible defaults is hard, requiring real world data for some distribution of apps.

A single bucket histogram, or effectively a summary, probably gives most people (> 50%) the data they want. E.g. you can look at the max time a monitor is held, see how much time monitors are waiting as a rate. The distribution of the monitor wait times is a nice to have and more niche. Let those people who need it opt into it with views, but don't burden everybody by default with the extra data.

I don't buy the reasonable defaults argument in all cases. I agree a summary is useful, but I still think there's likely SOME baseline buckets that could be provided.

jsuereth · 2023-06-14T12:46:04Z

specification/metrics/semantic_conventions/runtime-environment-metrics.md

+
+This metric SHOULD be specified with
+[`ExplicitBucketBoundaries`](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/api.md#instrument-advice)
+of `[]` (single bucket histogram capturing count, sum, min, max).


Couldn't we define some buckets to understand better if our bytes are falling within standard packet sizes?

I'm not sure I understand this metric so well. What would count be here? Is count the number of packets or the number of times JFR reports data (i.e. relativley useless to a user).

If the later, why isn't this just a Counter of some sort?

Similar to the comment above, I'm not sure how to suggest the best bucket bounds. count would be number of times JFR network io events are emitted (every time network IO occurs). So I think count would still be informative. It lets the user know roughly how busy network IO is.

I'd encourage thinking about setting baseline buckets to common network packet sizes here. I think the value of even 1-2 buckets would dramatically increase the overall value of this metric.

Hell, even understanding if the bytes written tend to be "larger than fits on a packet" or "smaller than fits on a packet" is a useful count to track.

jack-berg · 2023-06-16T15:32:02Z

specification/metrics/semantic_conventions/runtime-environment-metrics.md

@@ -36,6 +36,10 @@ semantic conventions when instrumenting runtime environments.
 * [Metric: `process.runtime.jvm.buffer.usage`](#metric-processruntimejvmbufferusage)
 * [Metric: `process.runtime.jvm.buffer.limit`](#metric-processruntimejvmbufferlimit)
 * [Metric: `process.runtime.jvm.buffer.count`](#metric-processruntimejvmbuffercount)
+ * [Metric: `process.runtime.jvm.cpu.monitor.duration`](#metric-processruntimejvmcpumonitorduration)


You'll want to rebase and we should decide whether any of these should be included in the effort for stable semantic conventions, or be experimental for now and followup stabilizing them later.

I brought it up here open-telemetry/opentelemetry-specification#3419 but it seemed like the plan was to exclude them from initial stability. I think these metrics would be quite useful to have, but I guess are not worth delaying stabilization for since they only apply to Java 14+. I've moved things to process-runtime-jvm-metrics-experimental.yaml

specification/metrics/semantic_conventions/runtime-environment-metrics.md

jack-berg · 2023-06-16T15:34:27Z

specification/metrics/semantic_conventions/runtime-environment-metrics.md

+| Attribute | Type | Description | Examples | Requirement Level |
+|---|---|---|---|---|
+| `class` | string | Class of the monitor. | `java.lang.Object` | Opt-In |
+| `state` | string | Action taken at monitor. | `blocked`; `wait` | Recommended |


Should note that we we may need to add a namespace to this dependent on the result of #51.

jack-berg · 2023-06-16T15:39:32Z

semantic_conventions/metrics/process-runtime-jvm-metrics.yaml

+ attributes:
+ - ref: thread.id
+ requirement_level: opt_in
+ - id: network.direction


Should we extend the general network conventions to include the direction?

semantic-conventions/semantic_conventions/trace/general.yaml

Line 58 in bef5a68

- id: network-connection-and-carrier

hmm I think that makes sense. I've added a new attribute in the general network conventions (connection.direction) and referenced it from this metric

trask · 2023-08-14T00:23:13Z

@roberttoyonaga if you have a chance, it would be great to get the merge conflicts resolved here

roberttoyonaga · 2023-08-14T15:36:26Z

@roberttoyonaga if you have a chance, it would be great to get the merge conflicts resolved here

@trask done!

…-metrics.md Co-authored-by: jack-berg <[email protected]>

Co-authored-by: jack-berg <[email protected]>

…-metrics.md Co-authored-by: jack-berg <[email protected]>

github-actions · 2024-02-03T03:20:26Z

This PR was marked stale due to lack of activity. It will be closed in 7 days.

joaopgrassi · 2024-02-05T15:49:06Z

Hi @roberttoyonaga !

We changed how the CHANGELOG.md is managed. Please take a look at https://github.com/open-telemetry/semantic-conventions/blob/main/CONTRIBUTING.md#adding-a-changelog-entry to see what needs to be done. Sorry for the disruption.

github-actions · 2024-03-05T03:20:09Z

This PR was marked stale due to lack of activity. It will be closed in 7 days.

github-actions · 2024-03-13T03:19:49Z

Closed as inactive. Feel free to reopen if this PR is still being worked on.

roberttoyonaga requested review from a team as code owners May 19, 2023 17:46

github-actions bot assigned reyang May 19, 2023

roberttoyonaga mentioned this pull request May 19, 2023

Add new JVM runtime environment metrics open-telemetry/opentelemetry-specification#3352

Closed

jack-berg reviewed May 19, 2023

View reviewed changes

jsuereth reviewed Jun 14, 2023

View reviewed changes

jack-berg reviewed Jun 16, 2023

View reviewed changes

specification/metrics/semantic_conventions/runtime-environment-metrics.md Outdated Show resolved Hide resolved

jack-berg reviewed Jun 16, 2023

View reviewed changes

roberttoyonaga force-pushed the add-jfr-metrics branch 2 times, most recently from 5f033a9 to 70edf6e Compare June 16, 2023 16:42

roberttoyonaga force-pushed the add-jfr-metrics branch from 214a44d to 0ab671f Compare August 14, 2023 15:16

roberttoyonaga requested a review from a team as a code owner August 14, 2023 15:16

roberttoyonaga and others added 13 commits August 14, 2023 12:46

add jfr metrics

adb88d8

firx further conflicts

a2cc558

add links to JFR docs

9e70daf

Update specification/metrics/semantic_conventions/runtime-environment…

d2a9a95

…-metrics.md Co-authored-by: jack-berg <[email protected]>

Update semantic_conventions/metrics/process-runtime-jvm-metrics.yaml

84b01c2

Co-authored-by: jack-berg <[email protected]>

Update specification/metrics/semantic_conventions/runtime-environment…

1ae37e4

…-metrics.md Co-authored-by: jack-berg <[email protected]>

change time to duration

349b07b

use code.namespace

27b564b

move JFR metrics to experimental

c6ade39

fix links and move connection.direction to network attributes

26088a6

style and links

e8a66df

change state to status to avoid conflicts

72be8db

rebuild table

9730964

roberttoyonaga force-pushed the add-jfr-metrics branch from 7c4776d to 9730964 Compare August 14, 2023 16:46

Merge branch 'main' into add-jfr-metrics

239908a

github-actions bot added the Stale label Feb 3, 2024

github-actions bot removed the Stale label Feb 19, 2024

github-actions bot added the Stale label Mar 5, 2024

github-actions bot closed this Mar 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add new JVM runtime environment metrics #44

Add new JVM runtime environment metrics #44

roberttoyonaga commented May 19, 2023

jsuereth Jun 14, 2023

roberttoyonaga Jun 14, 2023

jack-berg Jun 16, 2023

jsuereth Jun 29, 2023

jsuereth Jun 14, 2023

roberttoyonaga Jun 14, 2023

jsuereth Jun 29, 2023

jack-berg Jun 16, 2023

roberttoyonaga Jun 16, 2023 •

edited

Loading

jack-berg Jun 16, 2023

jack-berg Jun 16, 2023

roberttoyonaga Jun 16, 2023

trask commented Aug 14, 2023

roberttoyonaga commented Aug 14, 2023

github-actions bot commented Feb 3, 2024

joaopgrassi commented Feb 5, 2024

github-actions bot commented Mar 5, 2024

github-actions bot commented Mar 13, 2024

Add new JVM runtime environment metrics #44

Add new JVM runtime environment metrics #44

Conversation

roberttoyonaga commented May 19, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

roberttoyonaga Jun 16, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

trask commented Aug 14, 2023

roberttoyonaga commented Aug 14, 2023

github-actions bot commented Feb 3, 2024

joaopgrassi commented Feb 5, 2024

github-actions bot commented Mar 5, 2024

github-actions bot commented Mar 13, 2024

roberttoyonaga Jun 16, 2023 •

edited

Loading