Should prometheus metric name normalization be exposed as a configuration option? #21743

bryan-aguilar · 2023-05-10T15:38:18Z

Component(s)

exporter/prometheus, exporter/prometheusremotewrite, receiver/prometheus

Describe the issue you're reporting

In #20519 the pkg.translator.prometheus.NormalizeName was enabled by default causing full prometheus metric name normalization to be used instead of simple. The intent of feature gates is that they will eventually be removed and thus the option to revert back to simple name normalization will be lost.

I believe that simple vs full name normalization should be a choice by the operator of the collector. It is not clear to me if full name normalization makes sense in all use cases.

Specifically I am wondering if the permanent enabling of this flag causes the OpenTelemetry Prometheus receiver to move further away from a drop in replacement for the prometheus server. Does the prometheus server perform this same type of full metric name normalization by default?

The text was updated successfully, but these errors were encountered:

github-actions · 2023-05-10T15:45:42Z

Pinging code owners:

exporter/prometheus: @Aneurysm9
exporter/prometheusremotewrite: @Aneurysm9 @rapphil
receiver/prometheus: @Aneurysm9 @dashpole

See Adding Labels via Comments if you do not have permissions to add labels yourself.

hughesjj · 2023-05-10T16:07:23Z

We have a working group meeting in two weeks if you would like to discuss it there. If not, I'll do so.

bryan-aguilar · 2023-05-10T16:09:51Z

Which working group meeting are you referring to specifically?

bryan-aguilar · 2023-05-10T16:25:32Z

Collector Sig Discussion on 10 May 2023
Discussion Participants: @hughesjj @bryan-aguilar @jpkrohling

Take prometheus interopability working group to have discussion there
We should also take have this discussion with language SDKs. OTEL SDKS with prometheus exporters should have the same capabilities of the collector exporters. If we need to codify the export behavior between SDK and Collector then we should consider taking this to the spec level.
We are also open to leaving this feature gate in place for a longer time until this discussion is resolved.

Please add anything that I may have missed.

ekarlso · 2023-05-16T06:18:34Z

This change is effectively breaking any existing Grafana dashboard you get from various projects and vendors giving you ultimately 2 options

Create and maintain specific dashboards yourself
Have the project || community adapt and maintain dashboards yourself specifically for OTEL bsaed setups.

It will also probably go for any AlertingRule you get from upstream

swiatekm · 2023-05-23T10:30:27Z

From the perspective of a user wanting to use the prometheus receiver as a drop-in replacement for Prometheus, this is a very disruptive change.

The Prometheus -> Otel normalization is very simple, as it only trims unit and _total suffixes from metrics. Unfortunately, there are a lot of very widely used metrics affected by this. For example, Kubernetes has the container_cpu_usage_seconds_total cadvisor metric, which is probably the single most used container metric in the ecosystem.

I support making this a configuration option. And if we actually want users to migrate, we should consider providing a more gradual migration path as well.

damemi · 2023-05-24T15:16:50Z

This also affects us in the googlemanagedprometheus exporter, both on the read side from the receiver and on export because we rely on the normalization functions in the Collector.

bryan-aguilar · 2023-05-24T18:16:53Z

Summary from prometheus wg this morning:

Normalization is hard because the desired behavior can be different depending on the use case. For example, a metrics pipeline with an OTLP Receiver and Prometheus remote write exporter may tolerate/desire a different normalization mode than a pipeline with a prometheus receiver and prometheus remote write exporter.
We should move to make metric name normalization a configuration option.
- the configuration should allow us to achieve the same behavior before the breaking change was made
- It is not clear on what the default behavior should be. Full vs Simple normalization. This will be a part of the proposal and can be discussed there.
- The proposal should investigate the ability for a prometheus receiver to add an attribute to the instrumentation scope of the metric. That attribute will signal what type of normalization was used in the receiver. A compatible exporter will then first look for that attribute rather than configuration to determine the normalization it should use.
There will most likely be some spec changes required. Especially if an attribute relating to metric name normalization is introduced. Metric name normalization should be clearly defined in the spec so that Language SDKs and Collector components can be compatible. See relevant spec doc here.

I will break down this summary into a list of action items. After doing that we can determine action item owners and start moving forward.

bryan-aguilar · 2023-05-24T18:32:20Z

The Prometheus -> Otel normalization is very simple, as it only trims unit and _total suffixes from metrics. Unfortunately, there are a lot of very widely used metrics affected by this. For example, Kubernetes has the container_cpu_usage_seconds_total cadvisor metric, which is probably the single most used container metric in the ecosystem.

@swiatekm-sumo are you saying that prometheus -> otel translation converts container_cpu_usage_seconds_total metric name to container_cpu_usage_seconds? Is that behavior documented somewhere because I can't seem to find the relevant part in normalization docs here.

swiatekm · 2023-05-24T19:02:19Z

The Prometheus -> Otel normalization is very simple, as it only trims unit and _total suffixes from metrics. Unfortunately, there are a lot of very widely used metrics affected by this. For example, Kubernetes has the container_cpu_usage_seconds_total cadvisor metric, which is probably the single most used container metric in the ecosystem.

@swiatekm-sumo are you saying that prometheus -> otel translation converts container_cpu_usage_seconds_total metric name to container_cpu_usage_seconds? Is that behavior documented somewhere because I can't seem to find the relevant part in normalization docs here.

To be exact, Prometheus receiver calls TrimPromSuffixes here. The implementation is here and it does what I mentioned. Though this isn't explicitly mentioned anywhere, so maybe it's actually not intended?

bryan-aguilar · 2023-05-24T19:38:10Z

To be exact, Prometheus receiver calls TrimPromSuffixes here. The implementation is here and it does what I mentioned. Though this isn't explicitly mentioned anywhere, so maybe it's actually not intended?

Thanks for the detailed info. This may be worthy of an issue in itself, I'll take a deeper look, maybe we are missing some context. I'll add it to the list of action items that I will build around metric name normalization.

dashpole · 2023-05-25T14:49:54Z

Sorry I haven't been part of relevant discussions, as i'm still on leave. The intent of the feature gate was to be transparent to users when using a prometheus receiver and exporter. Translation from prometheus naming to otel naming is done in the receiver, and the reverse is done in prometheus exporters. It should not be breaking for anyone using a prometheus receiver and exporter as a drop-in replacement for the prometheus server.

The transition is intended to be gradual, and the feature gate was in alpha (disabled by default) for 11 months. It can still be disabled with --feature-gates=-pkg.translator.prometheus.NormalizeName, and I think we should leave plenty of time for people to transition before we remove it.

bryan-aguilar · 2023-05-25T15:50:18Z

Translation from prometheus naming to otel naming is done in the receiver, and the reverse is done in prometheus exporters. It should not be breaking for anyone using a prometheus receiver and exporter as a drop-in replacement for the prometheus server.

This would mean that there is a guarantee that metric pipelines with prom receiver -> promrw exporter and prom receiver -> otlp exporter will have different metric names because of normalization? Is that the intent? I don't know if the component documentation clearly states that.

The prometheus receiver readme says:

The Prometheus receiver is meant to minimally be a drop-in replacement for Prometheus

What does minimal drop in mean here? Does performing normalization by default in the receiver push it further from that goal? I believe @swiatekm-sumo brought up a good use case where this would not be desired. The dropping of the _total suffix from container_cpu_usage_seconds_total would be surprising to a user when using a metrics pipeline that does not have an exporter that re-normalizes such as the PRW.

dmitryax · 2023-06-07T00:04:29Z

I agree that this should be a configuration option. There are many scenarios (not prom->otel->prom) when the full normalization can be confusing.

Another concern is that some standard Prometheus metrics potentially can be mapped 1:1 to standard OTel metrics defined in the spec. Would covering those cases be complete normalization? I'm a bit skeptical about going further that route and would vote for making simple normalization a default option.

To be exact, Prometheus receiver calls TrimPromSuffixes here. The implementation is here and it does what I mentioned. Though this isn't explicitly mentioned anywhere, so maybe it's actually not intended?

Thanks for the detailed info. This may be worthy of an issue in itself, I'll take a deeper look, maybe we are missing some context. I'll add it to the list of action items that I will build around metric name normalization.

I submitted a PR to document the Prometheus->Otel transition #23209. Also, filed an issue about the limited translations on the receiver side #23208

The proposal should investigate the ability for a prometheus receiver to add an attribute to the instrumentation scope of the metric. That attribute will signal what type of normalization was used in the receiver. A compatible exporter will then first look for that attribute rather than configuration to determine the normalization it should use.

I don't think we should complicate it like this as long as the Prometheus exporter produces the same result for metrics both normalized and not normalized on the receiver side, which is the case AFAIK, e.g., it doesn't add the _total suffix if it's already there.

gouthamve · 2023-06-22T13:24:19Z

Hi, I have a lot more experience with the exporter side of things, than the receiver side of things. And in the exporter, not having full normalization is quite bad imo.

For example, take the case of a metric being exported via OTLP Recv --> PRW today (full normalisation OFF): http_server_duration. Once it reaches Prometheus, there is no way to know the unit on this metric. And in OTel, both millisecond and second are likely options. Prometheus handles units by putting them in metric names, and handling metrics without units is a bad experience for Prometheus users.

For this reason, I think we should enable full normalisation by default on the exporter as soon as possible. It is an inevitable change, and something that is more painful the longer it takes as it will break more users.

Coming to the normalisation on the receiver end, it is a tricky question. Lets us consider the following usecases:

Prom Recv --> Prom Exporter

Whether recevier does full normalisation or not doesn't matter here as long as exporter does the full normalisation.

Prom Recv --> OTLP Exporter

With normalisation it becomes:

http_server_duration_seconds becomes http_server_duration WITH UNIT = s
alertmanager_notifications_total becomes alertmanager_notifications

Some names are unfortunate though:

prometheus_request_message_bytes_total becomes prometheus_request_message WITH UNIT = Byte
alertmanager_build_info becomes alertmanager_build with UNIT = ???
process_cpu_seconds_total becomes process_cpu WITH UNIT = s

So I am kinda split on the receiver normalisation. It makes sense for a lot of the metrics, but there is also a set of metrics where its a bit unfortunate.

gouthamve · 2023-06-22T17:11:00Z

Just found this: prometheus/prometheus#12152

If this is merged, people could see warnings for OTel metrics (PossibleNonCounterWarning) until the _total is appended.

…4256) **Description:** Adds a new configuration option: `trim_metric_suffixes` to the prometheus receiver. When set to true, suffixes will be trimmed from metrics. This replaces use of the`pkg.translator.prometheus.NormalizeName` feature-gate in the prometheus receiver, but leaves it for exporters. The first commit simplifies the usage of the feature-gate. The tests and implementation were passing around an entire registry when it wasn't necessary. **Link to tracking Issue:** Part of #21743 Part of #8950

…24260) **Description:** Add add_metric_suffixes configuration option, which can disable the addition of type and unit suffixes. This is backwards-compatible, since the default is true. This is recommended by the specification for sum suffixes in https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/compatibility/prometheus_and_openmetrics.md#sums and allowed in metadata https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/compatibility/prometheus_and_openmetrics.md#metric-metadata-1: `Exporters SHOULD provide a configuration option to disable the addition of _total suffixes.` `The resulting unit SHOULD be added to the metric as OpenMetrics UNIT metadata and as a suffix to the metric name unless the metric name already contains the unit, or the unit MUST be omitted` This deprecates the BuildPromCompliantName function in-favor of BuildCompliantName, which includes the additional argument for configuring suffixes. **Link to tracking Issue:** Fixes #21743 Part of #8950 --------- Co-authored-by: Dmitrii Anoshin <[email protected]>

…pen-telemetry#24260) **Description:** Add add_metric_suffixes configuration option, which can disable the addition of type and unit suffixes. This is backwards-compatible, since the default is true. This is recommended by the specification for sum suffixes in https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/compatibility/prometheus_and_openmetrics.md#sums and allowed in metadata https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/compatibility/prometheus_and_openmetrics.md#metric-metadata-1: `Exporters SHOULD provide a configuration option to disable the addition of _total suffixes.` `The resulting unit SHOULD be added to the metric as OpenMetrics UNIT metadata and as a suffix to the metric name unless the metric name already contains the unit, or the unit MUST be omitted` This deprecates the BuildPromCompliantName function in-favor of BuildCompliantName, which includes the additional argument for configuring suffixes. **Link to tracking Issue:** Fixes open-telemetry#21743 Part of open-telemetry#8950 --------- Co-authored-by: Dmitrii Anoshin <[email protected]>

bryan-aguilar added the needs triage New item requiring triage label May 10, 2023

Aneurysm9 added priority:p1 High comp:prometheus Prometheus related issues exporter/prometheusremotewrite receiver/prometheus Prometheus receiver exporter/prometheus pkg/translator/prometheus pkg/translator/prometheusremotewrite and removed needs triage New item requiring triage labels May 10, 2023

bryan-aguilar mentioned this issue May 10, 2023

Warning: ADOT Collector v0.31.0 breaking change aws-observability/aws-otel-collector#2043

Closed

damemi mentioned this issue May 17, 2023

Update OTEL dependencies to latest release candidates GoogleCloudPlatform/opentelemetry-operations-go#620

Merged

This was referenced May 18, 2023

Naming difference for metrics received from same node exporter in Prometheus and Opentelemetry server. #22067

Closed

feat!: disable Prometheus metrics name normalization by default SumoLogic/sumologic-otel-collector#1138

Merged

swiatekm mentioned this issue May 23, 2023

Full Prometheus metric name normalization disabled by default SumoLogic/sumologic-otel-collector#1139

Closed

sigilioso mentioned this issue May 25, 2023

[NR-120689] disable prometheus normalization feature-gate by default newrelic/opentelemetry-collector-releases#57

Merged

Aneurysm9 mentioned this issue May 25, 2023

[pkg/translator/prw] Use flag to control metrics normalization. #22040

Closed

swiatekm mentioned this issue May 26, 2023

REQUEST: New membership for @swiatekm-sumo open-telemetry/community#1512

Closed

6 tasks

disfluxly mentioned this issue Jun 2, 2023

[exporter] prometheusremotewrite appends _None to Datapoints with "Unit: None" #22828

Closed

gillg mentioned this issue Jun 15, 2023

Prometheus Receiver - Some counter metrics dropped for unknown reason #4974

Closed

dmitryax mentioned this issue Jun 21, 2023

Reconsider the approach of applying the full normalization open-telemetry/wg-prometheus#72

Closed

dashpole mentioned this issue Jun 30, 2023

Tracking issue: SDK Prometheus exporter spec-compliance open-telemetry/wg-prometheus#73

Open

This was referenced Jul 13, 2023

prometheusreceiver: Add trim_metric_suffixes configuration option #24256

Merged

prometheus exporters: Add add_metric_suffixes configuration option #24260

Merged

dmitryax closed this as completed in #24260 Jul 18, 2023

dashpole mentioned this issue Aug 30, 2023

Prometheus: Allow changing metric names by default when translating from Prometheus to OpenTelemetry open-telemetry/opentelemetry-specification#3679

Merged

dashpole mentioned this issue Sep 6, 2023

Prometheus exporters: graduate pkg.translator.prometheus.NormalizeName feature gate #26488

Open

ridwanmsharif mentioned this issue Oct 18, 2023

Upgrade to OpenTelemetry v0.87.0 GoogleCloudPlatform/opentelemetry-operations-collector#185

Merged

Aneurysm9 mentioned this issue Jan 12, 2024

[prometheusremotewrite] add otel_scope_info metric to prometheus remote write exporter #24248

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should prometheus metric name normalization be exposed as a configuration option? #21743

Should prometheus metric name normalization be exposed as a configuration option? #21743

bryan-aguilar commented May 10, 2023

github-actions bot commented May 10, 2023

hughesjj commented May 10, 2023

bryan-aguilar commented May 10, 2023

bryan-aguilar commented May 10, 2023 •

edited

Loading

ekarlso commented May 16, 2023

swiatekm commented May 23, 2023

damemi commented May 24, 2023

bryan-aguilar commented May 24, 2023 •

edited

Loading

bryan-aguilar commented May 24, 2023

swiatekm commented May 24, 2023

bryan-aguilar commented May 24, 2023

dashpole commented May 25, 2023

bryan-aguilar commented May 25, 2023

dmitryax commented Jun 7, 2023 •

edited

Loading

gouthamve commented Jun 22, 2023

gouthamve commented Jun 22, 2023

Should prometheus metric name normalization be exposed as a configuration option? #21743

Should prometheus metric name normalization be exposed as a configuration option? #21743

Comments

bryan-aguilar commented May 10, 2023

Component(s)

Describe the issue you're reporting

github-actions bot commented May 10, 2023

hughesjj commented May 10, 2023

bryan-aguilar commented May 10, 2023

bryan-aguilar commented May 10, 2023 • edited Loading

ekarlso commented May 16, 2023

swiatekm commented May 23, 2023

damemi commented May 24, 2023

bryan-aguilar commented May 24, 2023 • edited Loading

bryan-aguilar commented May 24, 2023

swiatekm commented May 24, 2023

bryan-aguilar commented May 24, 2023

dashpole commented May 25, 2023

bryan-aguilar commented May 25, 2023

dmitryax commented Jun 7, 2023 • edited Loading

gouthamve commented Jun 22, 2023

gouthamve commented Jun 22, 2023

bryan-aguilar commented May 10, 2023 •

edited

Loading

bryan-aguilar commented May 24, 2023 •

edited

Loading

dmitryax commented Jun 7, 2023 •

edited

Loading