Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for "Gauge histogram" data type, instrumentation #2714

Open
jmacd opened this issue Aug 5, 2022 · 4 comments
Open

Support for "Gauge histogram" data type, instrumentation #2714

jmacd opened this issue Aug 5, 2022 · 4 comments
Assignees
Labels
spec:metrics Related to the specification/metrics directory triaged-accepted The issue is triaged and accepted by the OTel community, one can proceed with creating a PR proposal

Comments

@jmacd
Copy link
Contributor

jmacd commented Aug 5, 2022

What are you trying to achieve?

OpenTelemetry has not incorporated the Gauge histogram instrument. Several issues have been filed at various times in the otel-proto repository about this, but nothing in the specification repo.

For example:

What did you expect to see?

To add this to the OTel API/SDK and data model, some equivalencies will have to be drawn. It should be possible to replace async Gauge instrumentation, where every observation has a unique attribute set, with a Gauge Histogram aggregator that counts the number of appearances of each value (after erasing some attributes). In this sense, it is probably not necessary to implement new instruments for Gauge histogram, we. can just provide new ways of aggregating Gauges to obtain the Gauge histogram.

Additional context.

https://github.com/OpenObservability/OpenMetrics/blob/main/specification/OpenMetrics.md#gaugehistogram-1

@jack-berg
Copy link
Member

Do you have a concrete example of a type of instrumentation that would benefit from this type of aggregation?

where every observation has a unique attribute set, with a Gauge Histogram aggregator that counts the number of appearances of each value (after erasing some attributes).

Are you imagining aggregating gauge measurements to the existing explicit / exponential histogram data point (with potential modification), or introducing a new data point type?

@jmacd
Copy link
Contributor Author

jmacd commented Aug 5, 2022

Do you have a concrete example [...]?

Sure. I filed this in connection with open-telemetry/opentelemetry-go-contrib#2624, where one appears. The Go runtime/metrics package produces one Gauge Histogram,

/sched/latencies:seconds
	Distribution of the time goroutines have spent in the scheduler
	in a runnable state before actually running.

It's not immediately clear from the documentation that this is a gauge histogram, but it reports as Float64Histogram (its value type) and Cumulative==False (i.e., not a counter).

The way you compute this, in pseudocode,

schedLatency := meter.newGaugeHistogramInstrument()

for _, runnableGoroutine := range AllRunnableGoroutines {
  schedLatency.Observe(runnnableGoroutine, runnableGoroutine.elapsedWaitTimeSeconds())
}

Logically speaking, I believe you could instrument the same using a Gauge with Histogram aggregation, however, ...

Are you imagining aggregating gauge measurements to the existing explicit / exponential histogram data point (with potential modification), or introducing a new data point type?

This is the question. The data structures for explicit / exponential histogram work, but the temporality setting does not (related: open-telemetry/opentelemetry-proto#274). If we replace the current temporality field in a backwards-compatible way, then we can express non-temporal histogram points (i.e., gauge histograms). I don't want to introduce new types.

I can imagine a new enum that applies only to histograms, that is to express both temporality or the lack thereof.

enum HistogramTemporality {
  Cumulative // i.e., a "Counter" histogram
  Delta // i.e., a "Counter" histogram
  Gauge // i.e., a "Gauge" histogram
}

This can be done in a wire-compatible way for protobuf encodings. It would break JSON, likely.

To see how this could equivalently be instrumented using Gauge instruments, consider a kind of "ephemeral" attribute that we'll call nonce. I will use a float64 NaN value as the value of this attribute, so by IEEE logic there can never be two attributes that are equal. Thus, every Gauge observation is unique. Now, configure a view to use Histogram aggregation and erase the nonce attribute. The result should be a histogram with one count per gauge observation and HistogramTemporality (as defined above) equal to "Gauge".

@pirgeo
Copy link
Member

pirgeo commented Aug 8, 2022

I have a few clarifying questions/assumptions:

  • Does the GaugeHistogram look almost the same as a Histogram with Cumulative temporality?
    • The count for each bucket represents the total number of elements in this bucket at this time
    • The difference is the same as between cumulative Counters and Gauges outside of the histogram context (Counters are additive, Gauges are not)
  • The goroutine example aims at measuring "how many goroutines are waiting to be run, separated into buckets by how long they have been waiting already", is that right? And the runtime already returns a histogram-like object, so this is also related to Support for asynchronous Histogram instrument #2713?
  • I think you pointed it out in the last paragraph, but this would also be representable as a set of gauges, each of them with an attribute like le=1 that has the number of (in this example) threads that have been waiting for less than 1s. The additional value of the GaugeHistogram is that it condenses this data and packs it into one metric, did I get that right?

@rbailey7210 rbailey7210 added the triaged-accepted The issue is triaged and accepted by the OTel community, one can proceed with creating a PR proposal label Aug 12, 2022
@rbailey7210 rbailey7210 assigned reyang and unassigned tigrannajaryan Aug 12, 2022
@reyang
Copy link
Member

reyang commented Aug 12, 2022

I have a few clarifying questions/assumptions:

  • Does the GaugeHistogram look almost the same as a Histogram with Cumulative temporality?

    • The count for each bucket represents the total number of elements in this bucket at this time
    • The difference is the same as between cumulative Counters and Gauges outside of the histogram context (Counters are additive, Gauges are not)
  • The goroutine example aims at measuring "how many goroutines are waiting to be run, separated into buckets by how long they have been waiting already", is that right? And the runtime already returns a histogram-like object, so this is also related to Support for asynchronous Histogram instrument #2713?

  • I think you pointed it out in the last paragraph, but this would also be representable as a set of gauges, each of them with an attribute like le=1 that has the number of (in this example) threads that have been waiting for less than 1s. The additional value of the GaugeHistogram is that it condenses this data and packs it into one metric, did I get that right?

And in addition, gauge histogram describes the distribution of "what's currently running/exist/available/etc.", so it makes less sense to sum up the values from different time. The default aggregation seems to be "Last Value".

I can imagine a new enum that applies only to histograms, that is to express both temporality or the lack thereof.

enum HistogramTemporality {
  Cumulative // i.e., a "Counter" histogram
  Delta // i.e., a "Counter" histogram
  Gauge // i.e., a "Gauge" histogram
}

I like this idea 👍

I think our existing 6 instruments are already requiring many users to shift their minds, and I've seen folks making mistakes while choosing the correct instrument. That's why we have https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/supplementary-guidelines.md#instrument-selection and it's not that simple. I feel that we should really keep a high bar and try not to add another one, unless the value added is much bigger than the newly introduced learning overhead.

jpkrohling pushed a commit to open-telemetry/opentelemetry-collector-contrib that referenced this issue Apr 9, 2024
**Description:** 

Implement native histogram append MVP.
Very similar to appending a float sample.

Limitations:
- Only support integer counter histograms fully.
- In case a histogram has both classic and native buckets, we only store
one of them. Governed by scrape_classic_histograms scrape option. The
reason is that in the OTEL model the metric family is identified by the
normalized name (without _count, _sum, _bucket suffixes for the classic
histograms), meaning that the classic and native histograms would map to
the same metric family in OTEL model , but that cannot have both
Histogram and ExponentialHistogram types at the same time.
- Gauge histograms are dropped with warning as that temporality is
unsupported, see
open-telemetry/opentelemetry-specification#2714
- NoRecordedValue attribute might be unreliable. Prometheus scrape marks
all series with float NaN values when stale, but transactions in
prometheusreceiver are stateless, meaning that we have to use heuristics
to figure out if we need to add a NoRecordedValue data point to an
Exponential Histogram metric. (Need work in Prometheus.)



Additionally: 
- Created timestamp supported.
- Float counter histograms not fully tested and lose precision, but we
don't expect instrumentation to expose these anyway.

**Link to tracking Issue:**

Fixes: #26555 

**Testing:** 

Added unit tests and e2e tests.

**Documentation:**

TBD: will have to call out protobuf negotiation while no text format.
#27030

---------

Signed-off-by: György Krajcsovits <[email protected]>
Co-authored-by: David Ashpole <[email protected]>
rimitchell pushed a commit to rimitchell/opentelemetry-collector-contrib that referenced this issue May 8, 2024
…-telemetry#28663)

**Description:** 

Implement native histogram append MVP.
Very similar to appending a float sample.

Limitations:
- Only support integer counter histograms fully.
- In case a histogram has both classic and native buckets, we only store
one of them. Governed by scrape_classic_histograms scrape option. The
reason is that in the OTEL model the metric family is identified by the
normalized name (without _count, _sum, _bucket suffixes for the classic
histograms), meaning that the classic and native histograms would map to
the same metric family in OTEL model , but that cannot have both
Histogram and ExponentialHistogram types at the same time.
- Gauge histograms are dropped with warning as that temporality is
unsupported, see
open-telemetry/opentelemetry-specification#2714
- NoRecordedValue attribute might be unreliable. Prometheus scrape marks
all series with float NaN values when stale, but transactions in
prometheusreceiver are stateless, meaning that we have to use heuristics
to figure out if we need to add a NoRecordedValue data point to an
Exponential Histogram metric. (Need work in Prometheus.)



Additionally: 
- Created timestamp supported.
- Float counter histograms not fully tested and lose precision, but we
don't expect instrumentation to expose these anyway.

**Link to tracking Issue:**

Fixes: open-telemetry#26555 

**Testing:** 

Added unit tests and e2e tests.

**Documentation:**

TBD: will have to call out protobuf negotiation while no text format.
open-telemetry#27030

---------

Signed-off-by: György Krajcsovits <[email protected]>
Co-authored-by: David Ashpole <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
spec:metrics Related to the specification/metrics directory triaged-accepted The issue is triaged and accepted by the OTel community, one can proceed with creating a PR proposal
Projects
None yet
Development

No branches or pull requests

6 participants