Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prometheus exporter does not convert time units to seconds #18903

Open
jonatan-ivanov opened this issue Feb 24, 2023 · 13 comments
Open

Prometheus exporter does not convert time units to seconds #18903

jonatan-ivanov opened this issue Feb 24, 2023 · 13 comments
Labels
bug Something isn't working exporter/prometheus never stale Issues marked with this label will be never staled and automatically removed

Comments

@jonatan-ivanov
Copy link
Member

Component(s)

exporter/prometheus

What happened?

Description

Prometheus uses seconds as time unit by default. If I send an OTLP histogram with a different time unit, the value will not be converted to seconds (as it should be) but will be used as-is.

Steps to Reproduce

Send a histogram with unit: "milliseconds" to the OTel collector where the receiver is otlp/http/protobuf (but I think any otlp receiver should produce the same result) and the exporter is prometheus. Then check the Prometheus /metrics endpoint.
E.g.:

metrics {
  name: "test.timer"
  unit: "milliseconds"
  histogram {
    data_points {
      start_time_unix_nano: 1677210838494000000
      time_unix_nano: 1677210839021000000
      count: 1
      sum: 123.0
    }
    aggregation_temporality: AGGREGATION_TEMPORALITY_CUMULATIVE
  }
}

Expected Result

test_timer_sum{...} 0.123

Actual Result

test_timer_sum{...} 123

Collector version

otel/opentelemetry-collector-contrib:cdf47846a7ff

Environment information

Environment

OS: MacOS 13.2.1

OpenTelemetry Collector configuration

receivers:
  otlp:
    protocols:
      grpc:
      http:

exporters:
  prometheus:
    endpoint: '0.0.0.0:9090'
    metric_expiration: 1m
    enable_open_metrics: true
    resource_to_telemetry_conversion:
      enabled: true

service:
  pipelines:
    metrics:
      receivers: [otlp]
      exporters: [prometheus]

Log output

No response

Additional context

No response

@jonatan-ivanov jonatan-ivanov added bug Something isn't working needs triage New item requiring triage labels Feb 24, 2023
@github-actions
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@Aneurysm9
Copy link
Member

The specification requires that the unit be handled as follows:

The Unit of an OTLP metric point SHOULD be converted to the equivalent unit in Prometheus when possible. This includes:

  • Converting from abbreviations to full words (e.g. "ms" to "milliseconds").
  • Dropping the portions of the Unit within brackets (e.g. {packets}). Brackets MUST NOT be included in the resulting unit. A "count of foo" is considered unitless in Prometheus.
  • Special case: Converting "1" to "ratio".
  • Converting "foo/bar" to "foo_per_bar".

The resulting unit SHOULD be added to the metric as OpenMetrics UNIT metadata and as a suffix to the metric name unless the metric name already contains the unit, or the unit MUST be omitted. The unit suffix comes before any type-specific suffixes.

That does not include changing the unit to a different unit or modifying the value in any way.

@jonatan-ivanov
Copy link
Member Author

Since the Prometheus exporter is the concern of the collector, I think the client should never know that the data that it published in OTLP format will be converted to Prometheus format. Because of this, I think any unit that is supported by OTLP should work and the client should not care. Maybe the Prometheus exporter is not configured right now but it will be starting from tomorrow. I think making a change on the exporters should not involve changing all the clients.

Can this behavior lead to impossible scenarios?

  • Can it happen that there are two receivers that mandate different units?
  • Can it happen that there are two exporters that mandate different units?

@shakuzen
Copy link

The resulting unit SHOULD be added to the metric as OpenMetrics UNIT metadata and as a suffix to the metric name unless the metric name already contains the unit

The bold part is not happening. The included actual result above shows the metric name from the Prometheus exporter is test_timer_sum - no unit in the name. A consumer of the exporter in Prometheus format has no way to know what the unit is. I can't speak to whether the former part is happening or not because I was never able to get the Prometheus exporter to return OpenMetrics format, even when setting enable_open_metrics: true. Regardless, the UNIT metadata is not part of Prometheus format, so it wouldn't help consumers of the Prometheus exporter that are scraping Prometheus format rather than OpenMetrics format.

@atoulme atoulme removed the needs triage New item requiring triage label Mar 7, 2023
jonatan-ivanov added a commit to micrometer-metrics/micrometer that referenced this issue May 1, 2023
The OTel collector had/have multiple bugs around this:
- The time unit is not converted to `seconds` (Prometheus' default)
- The time unit was not even visible in the name of the time series,
this violated the OTel specification

This means that if you sent 123ms to the OTLP collector,
on Prometheus side this turned into 123s
(the value is the same and not having the unit means `seconds`).
See: open-telemetry/opentelemetry-collector-contrib#18903
It seems units are still not converted but at least the unit is in the
name now (breaking change),
see: open-telemetry/opentelemetry-collector-contrib#20519

Because of this breaking change, our tests are also broken and we need to add the unit to our assertions.

Closes gh-3796
@github-actions
Copy link
Contributor

github-actions bot commented May 8, 2023

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label May 8, 2023
@jonatan-ivanov
Copy link
Member Author

@Aneurysm9 Could you please check the last two comments and mark this issue so that it won't be auto-closed?

@github-actions github-actions bot removed the Stale label May 26, 2023
@gouthamve
Copy link
Member

The bold part is not happening. The included actual result above shows the metric name from the Prometheus exporter is test_timer_sum - no unit in the name.

This is now happening in the latest releases (since #20519).

Regarding converting milliseconds to seconds, while this is possible in fixed-bucket histograms, it is not possible to do in exponential histograms. This was one of the main motivations to adopt seconds as the default unit for HTTP (and hopefully other) duration measurements in OTel Semantic Conventions. Ideally the producer will send seconds (as defined in the semantic conventions).

I don't think converting milliseconds to seconds is appropriate in fixed-bucket histograms while not converting in exponential histograms.

@shakuzen
Copy link

This is now happening in the latest releases (since #20519).

Yes, we noticed in Micrometer when it broke our integration tests: micrometer-metrics/micrometer#3796.

This was one of the main motivations to adopt seconds as the default unit for HTTP (and hopefully other) duration measurements in OTel Semantic Conventions. Ideally the producer will send seconds (as defined in the semantic conventions).

I think this is tying things together that shouldn't be tied together. OTLP is a format for telemetry data; it defines the data model but not the semantic naming. Someone should not have to use the OTel semantic convention to successfully use OTLP or the OTel Collector. I understand all of these things are branded OpenTelemetry, but it would behoove adoption and usefulness to users if they could be used separately. And it was my understanding they were intended to be usable without using everything.

It hurts the Collector's general usefulness if the Prometheus exporter expects the input is already in seconds so it matches data produced specifically for Prometheus/OpenMetrics. If the producer is a Prometheus client, it's clear what conventions it should follow as far as unit, but not all producers know where data they are producing will be stored, especially if it is in OTLP format (and sent to the Collector) that is supported by different backends.

Regarding converting milliseconds to seconds, while this is possible in fixed-bucket histograms, it is not possible to do in exponential histograms.

That's unfortunate and I don't have any solution. It feels like it leaves us in this bad state where the Collector can't deliver its full potential of being a universal adapter. Users are going to have to make more breaking changes to align with its limitations.

@jonatan-ivanov
Copy link
Member Author

jonatan-ivanov commented Jun 29, 2023

Fyi: it seems that starting from 0.80.0 the unit was removed (brakes our integration tests again): #23229

@github-actions
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Aug 29, 2023
@jonatan-ivanov
Copy link
Member Author

@Aneurysm9 Could you please add the never stale label on the issue so that I don't need to play ping-pong with the bot?

@github-actions github-actions bot removed the Stale label Aug 31, 2023
@github-actions
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Oct 31, 2023
@jonatan-ivanov
Copy link
Member Author

@Aneurysm9 or someone else: Could you please add the never stale label on the issue so that I don't need to play ping-pong with the bot?

@github-actions github-actions bot removed the Stale label Oct 31, 2023
@gouthamve gouthamve added the never stale Issues marked with this label will be never staled and automatically removed label Oct 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working exporter/prometheus never stale Issues marked with this label will be never staled and automatically removed
Projects
None yet
Development

No branches or pull requests

5 participants