prometheusremotewrite exporter with histogram is causing metrics export failure due to high memory (90%) #30675

bhupeshpadiyar · 2024-01-19T06:13:59Z

Component(s)

exporter/prometheusremotewrite

What happened?

Description

Collector memory and CPU usages are spiking while exporting histogram metrics with prometheusremotewrite exporter and causing metrics export failure with following errors logs.

2024-01-17T10:33:50.902Z info [email protected]/memorylimiter.go:287 Memory usage is above soft limit. Forcing a GC. {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 16085}

2024-01-17T10:50:08.919Z error scrape/scrape.go:1351 Scrape commit failed {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "scrape_pool": "otel-collector", "target": "http:https://0.0.0.0:8888/metrics", "error": "data refused due to high memory usage"}

2024-01-17T15:07:12.464Z warn [email protected]/memorylimiter.go:294 Memory usage is above soft limit. Refusing data. {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 16081}

Note - This issue is happening only with histogram metrics export only. This exporter works fine with counter and gauge type metrics

Steps to Reproduce

Instrument histogram metrics using OTEL-Java SDK
Monitor the collector health and other metrics
Initially it exports all histogram metric seamlessly
After few minutes, collector memory, CPU, exporter queue size spikes and start getting memory usages error as mentioned above in the sample logs.
And the metrics export is getting failed.

Expected Result

All type metrics (counter, gauge, histogram) should export seamlessly, without error.

Actual Result

Causing high memory usages error and exporting failed with histogram metrics.

2024-01-17T10:33:50.902Z info [email protected]/memorylimiter.go:287 Memory usage is above soft limit. Forcing a GC. {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 16085}

2024-01-17T10:50:08.919Z error scrape/scrape.go:1351 Scrape commit failed {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "scrape_pool": "otel-collector", "target": "http:https://0.0.0.0:8888/metrics", "error": "data refused due to high memory usage"}

2024-01-17T15:07:12.464Z warn [email protected]/memorylimiter.go:294 Memory usage is above soft limit. Refusing data. {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 16081}

Collector version

0.92.0 (Confirmed with older collector versions as well)

Environment information

#Environment
OS: Linux/ARM64 - AWS ECS (Fargate) Cluster

OpenTelemetry Collector configuration

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4318

  # Collect own metrics
  prometheus:
    config:
      scrape_configs:
        - job_name: 'otel-collector'
          scrape_interval: 10s
          static_configs:
            - targets: [ '0.0.0.0:8888' ]
exporters:
  logging:
    verbosity: "basic"
    sampling_initial: 5
  prometheusremotewrite:
    endpoint: "http:https://<victoria-metrics-instance>:8428/prometheus/api/v1/write"
    tls:
      insecure: true
processors:
  batch:
    timeout:
    send_batch_size:
    send_batch_max_size:
  memory_limiter:
    check_interval: 200ms
    limit_mib: 20000
    spike_limit_mib: 4000
extensions:
  health_check:
  pprof:
    endpoint: 0.0.0.0:1888
    block_profile_fraction: 3
    mutex_profile_fraction: 5
  zpages:
    endpoint: 0.0.0.0:55679
service:
  extensions: [health_check, pprof, zpages]
  pipelines:
    metrics:
      receivers: [otlp, prometheus]
      processors: [memory_limiter, batch]
      exporters: [logging, prometheusremotewrite]

Log output

2024-01-17T10:33:50.902Z info [email protected]/memorylimiter.go:287 Memory usage is above soft limit. Forcing a GC. {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 16085}

2024-01-17T10:50:08.919Z error scrape/scrape.go:1351 Scrape commit failed {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "scrape_pool": "otel-collector", "target": "http:https://0.0.0.0:8888/metrics", "error": "data refused due to high memory usage"}

2024-01-17T15:07:12.464Z warn [email protected]/memorylimiter.go:294 Memory usage is above soft limit. Refusing data. {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 16081}



### Additional context

This issue is happening only with histogram metrics export only and the exporter works fine with counter and gauge type metrics

The text was updated successfully, but these errors were encountered:

github-actions · 2024-01-19T06:14:17Z

Pinging code owners:

exporter/prometheusremotewrite: @Aneurysm9 @rapphil

See Adding Labels via Comments if you do not have permissions to add labels yourself.

crobert-1 · 2024-01-19T19:17:04Z

Note: This is possibly a duplicate of #24405

bhupeshpadiyar · 2024-01-23T03:58:19Z

Hi @crobert-1 ,

Just to inform you, we are facing this issue with the histogram metrics but the linked issue seems related to Exponential histogram metrics.

In our case, we are getting this error in the case of exporting Histogram type metrics.

github-actions · 2024-03-25T03:29:34Z

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

exporter/prometheusremotewrite: @Aneurysm9 @rapphil

See Adding Labels via Comments if you do not have permissions to add labels yourself.

github-actions · 2024-05-27T03:31:50Z

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

exporter/prometheusremotewrite: @Aneurysm9 @rapphil

See Adding Labels via Comments if you do not have permissions to add labels yourself.

bhupeshpadiyar added bug Something isn't working needs triage New item requiring triage labels Jan 19, 2024

github-actions bot added the exporter/prometheusremotewrite label Jan 19, 2024

github-actions bot mentioned this issue Jan 23, 2024

Weekly Report: 2024-01-16 - 2024-01-23 #30711

Closed

This was referenced Jan 30, 2024

Weekly Report: 2024-01-23 - 2024-01-30 #30848

Closed

Weekly Report: 2024-01-30 - 2024-02-06 #31055

Closed

This was referenced Feb 13, 2024

Weekly Report: 2024-02-06 - 2024-02-13 #31192

Closed

Weekly Report: 2024-02-13 - 2024-02-20 #31323

Closed

This was referenced Feb 20, 2024

Weekly Report: 2024-02-13 - 2024-02-20 asuresh4/opentelemetry-collector-contrib#11541

Open

Weekly Report: 2024-02-20 - 2024-02-27 #31422

Closed

Weekly Report: 2024-02-20 - 2024-02-27 asuresh4/opentelemetry-collector-contrib#11542

Open

This was referenced Mar 5, 2024

Weekly Report: 2024-02-27 - 2024-03-05 #31560

Closed

Weekly Report: 2024-02-27 - 2024-03-05 asuresh4/opentelemetry-collector-contrib#11543

Open

Weekly Report: 2024-03-05 - 2024-03-12 #31693

Closed

This was referenced Mar 19, 2024

Weekly Report: 2024-03-12 - 2024-03-19 #31825

Closed

Weekly Report: 2024-03-12 - 2024-03-19 asuresh4/opentelemetry-collector-contrib#11544

Open

github-actions bot added the Stale label Mar 25, 2024

crobert-1 removed the Stale label Mar 25, 2024

github-actions bot mentioned this issue Mar 26, 2024

Weekly Report: 2024-03-19 - 2024-03-26 #31947

Closed

This was referenced Apr 2, 2024

Weekly Report: 2024-03-26 - 2024-04-02 #32082

Closed

Weekly Report: 2024-04-02 - 2024-04-09 #32230

Closed

github-actions bot mentioned this issue Apr 16, 2024

Weekly Report: 2024-04-09 - 2024-04-16 #32407

Closed

github-actions bot added the Stale label May 27, 2024

github-actions bot mentioned this issue May 28, 2024

Weekly Report: 2024-05-21 - 2024-05-28 #33243

Open

LucaLanziani mentioned this issue Jun 2, 2024

Weekly Report: 2024-05-26 - 2024-06-02 #33329

Closed

github-actions bot mentioned this issue Jul 2, 2024

Weekly Report: 2024-06-25 - 2024-07-02 #33839

Open

github-actions bot mentioned this issue Jul 9, 2024

Weekly Report: 2024-07-02 - 2024-07-09 #33962

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

prometheusremotewrite exporter with histogram is causing metrics export failure due to high memory (90%) #30675

prometheusremotewrite exporter with histogram is causing metrics export failure due to high memory (90%) #30675

bhupeshpadiyar commented Jan 19, 2024

github-actions bot commented Jan 19, 2024

crobert-1 commented Jan 19, 2024

bhupeshpadiyar commented Jan 23, 2024

github-actions bot commented Mar 25, 2024

github-actions bot commented May 27, 2024