[Exporter/LoadBalncer] Increased Memory Utilization after bumping from 0.94.0 to 0.99.0 #33435

NickAnge · 2024-06-07T17:14:37Z

Component(s)

exporter/loadbalancing

What happened?

Description

Hello team.

We recently upgraded our internal collectors from version 0.94.0 to 0.99.0, and we observed a rise in memory usage at the load balancer deployment collectors, as depicted in the image below. This persisted even after updating to the latest version, 0.101.0.

We enabled profiling to our collectors (pprof ) component observed inuse_memory and inuse_objects. I seperated by investigation between 3 pods with low, medium and high memory usage.

Inuse Memory - Top

Low Memory Usage Pod

Medium Memory Usage Pod

High Memory Usage Pod

Inuse_objects - top

Low Memory Usage Pod

Medium Memory Usage Pod

High Memory Usage Pod

Steps to Reproduce

Deployment mode used as Load Balancer with version 0.94.0
Bump the version to 0.101.0

Expected Result

Expected result was the memory to remain the same over time, after the bump of the version

Actual Result

High memory usage after bumping the version

Collector version

0.101.0

Environment information

Environment

OS: (e.g., "Ubuntu 20.04")
Compiler(if manually compiled): (e.g., "go 14.2")

OpenTelemetry Collector configuration

receivers:
  otlp:
    protocols:
      grpc:
        max_recv_msg_size_mib: 20

processors:
  memory_limiter:
    check_interval: 1s
    limit_percentage: 95
    spike_limit_percentage: 15
  k8sattributes:
    passthrough: true

exporters:
  loadbalancing/spans:
    protocol:
      otlp:
        sending_queue:
          enabled: true
          num_consumers: 100
          queue_size: 500
        retry_on_failure:
          enabled: true
          initial_interval: 2s
          max_interval: 2s
          max_elapsed_time: 10s
        tls:
          insecure: true
        timeout: 1
    resolver:
      k8s:
        service: service
  loadbalancing/metrics:
    routing_key: metric
    protocol:
      otlp:
        sending_queue:
          enabled: true
          num_consumers: 50
          queue_size: 500
        retry_on_failure:
          enabled: true
          initial_interval: 2s
          max_interval: 2s
          max_elapsed_time: 10s
        tls:
          insecure: true
        timeout: 1
    resolver:
      k8s:
        service: service

extensions:
  health_check:
  pprof:
    endpoint: :1777

service:
  extensions: [ health_check , pprof]
  pipelines:
    traces:
      receivers: [ otlp ]
      processors: [ memory_limiter ]
      exporters: [ loadbalancing/spans ]
    logs:
      receivers: [ otlp ]
      processors: [ memory_limiter ]
      exporters: [ loadbalancing/spans ]
    metrics:
      receivers: [ otlp ]
      processors: [ memory_limiter, k8sattributes ]
      exporters: [ loadbalancing/metrics ]

Log output

No response

Additional context

No response

github-actions · 2024-06-07T17:14:54Z

Pinging code owners:

exporter/loadbalancing: @jpkrohling

See Adding Labels via Comments if you do not have permissions to add labels yourself.

jpkrohling · 2024-06-10T14:23:23Z

Thank you for the detailed report, I'll take a look and try to reproduce it. In the meantime, can you try switching to the DNS resolver instead of the k8s resolver? I'm not 100% sure yet it would show a difference, but the DNS resolver is known to consume fewer resources in other situations.

    resolver:
      k8s:
        service: service

NickAnge · 2024-06-11T07:58:22Z

Thanks @jpkrohling .
We have discussed internally the replacement of the K8s resolver with dns resolver. The conclusion was to stay with K8s resolver as it is faster into computing/resolve the endpoints of the backing collectors in case of rollout or outage.

Let me know if you need me to provide some more information about the issue, and thanks a lot for taking a look

jpkrohling · 2024-06-11T12:18:33Z

Can you temporarily replace it, and see if the memory profile is different? If we can isolate this behavior to this resolver specifically, it's easier to find a solution.

NickAnge · 2024-06-11T16:18:42Z

This memory issue happened to our production environments only (probably because of higher traffic), so I am not sure if we can change it there even if it is temporarily :/. Did you manage to reproduce at your setup ?

jpkrohling · 2024-06-19T11:57:10Z

I wasn't able to try it out. I might be able to find some time later this week, but next week I'm AFK again. If anyone is interested in this issue, it would help me a lot if I can have a confirmation that this is isolated to the k8s resolver.

NickAnge added bug Something isn't working needs triage New item requiring triage labels Jun 7, 2024

github-actions bot added the exporter/loadbalancing label Jun 7, 2024

This was referenced Jun 10, 2024

Weekly Report: 2024-06-03 - 2024-06-10 LucaLanziani/opentelemetry-collector-contrib#6

Closed

Weekly Report: 2024-06-03 - 2024-06-10 LucaLanziani/opentelemetry-collector-contrib#7

Closed

This was referenced Jun 10, 2024

Weekly Report: 2024-06-03 - 2024-06-10 LucaLanziani/opentelemetry-collector-contrib#8

Closed

Weekly Report: 2024-06-03 - 2024-06-10 LucaLanziani/opentelemetry-collector-contrib#9

Closed

github-actions bot mentioned this issue Jul 2, 2024

Weekly Report: 2024-06-25 - 2024-07-02 #33839

Open

github-actions bot mentioned this issue Jul 9, 2024

Weekly Report: 2024-07-02 - 2024-07-09 #33962

Open

This was referenced Jul 16, 2024

Weekly Report: 2024-07-09 - 2024-07-16 #34087

Open

Weekly Report: 2024-07-16 - 2024-07-23 #34202

Open

This was referenced Jul 30, 2024

Weekly Report: 2024-07-23 - 2024-07-30 #34301

Open

Weekly Report: 2024-07-30 - 2024-08-06 #34410

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Exporter/LoadBalncer] Increased Memory Utilization after bumping from 0.94.0 to 0.99.0 #33435

[Exporter/LoadBalncer] Increased Memory Utilization after bumping from 0.94.0 to 0.99.0 #33435

NickAnge commented Jun 7, 2024 •

edited

Loading

github-actions bot commented Jun 7, 2024

jpkrohling commented Jun 10, 2024

NickAnge commented Jun 11, 2024

jpkrohling commented Jun 11, 2024

NickAnge commented Jun 11, 2024

jpkrohling commented Jun 19, 2024

[Exporter/LoadBalncer] Increased Memory Utilization after bumping from 0.94.0 to 0.99.0 #33435

[Exporter/LoadBalncer] Increased Memory Utilization after bumping from 0.94.0 to 0.99.0 #33435

Comments

NickAnge commented Jun 7, 2024 • edited Loading

Component(s)

What happened?

Description

Inuse Memory - Top

Low Memory Usage Pod

Medium Memory Usage Pod

High Memory Usage Pod

Inuse_objects - top

Low Memory Usage Pod

Medium Memory Usage Pod

High Memory Usage Pod

Steps to Reproduce

Expected Result

Actual Result

Collector version

Environment information

Environment

OpenTelemetry Collector configuration

Log output

Additional context

github-actions bot commented Jun 7, 2024

jpkrohling commented Jun 10, 2024

NickAnge commented Jun 11, 2024

jpkrohling commented Jun 11, 2024

NickAnge commented Jun 11, 2024

jpkrohling commented Jun 19, 2024

NickAnge commented Jun 7, 2024 •

edited

Loading