Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[receiver/dockerstats] not generating per container metrics #33303

Open
schewara opened this issue May 29, 2024 · 3 comments
Open

[receiver/dockerstats] not generating per container metrics #33303

schewara opened this issue May 29, 2024 · 3 comments
Labels
bug Something isn't working exporter/prometheus needs triage New item requiring triage receiver/dockerstats receiver/prometheus Prometheus receiver

Comments

@schewara
Copy link

Component(s)

exporter/prometheus, receiver/dockerstats, receiver/prometheus

What happened?

Description

We have a collector running (in docker), which is supposed to collect

  • docker container stats through receiver/dockerstats
  • app metrics, through receiver/prometheus and docker_sd_config
  • export the collected metrics trough exporter/prometheusexporter

A similar issue was already reported but was closed without any real solution -> #21247

Steps to Reproduce

  1. have a container running which exposes metrics, like prometheus/node-exporter
  2. start a otel/opentelemetry-collector-contrib container
  3. observe the /metrics endpoint of the prometheus exporter

Expected Result

Individual metrics for each container running on the same host.

Actual Result

Only metrics which have Data point attributes are shown like the following, plus the metrics coming from the prometheus receiver.

container_network_io_usage_rx_errors_total{interface="eth1"} 0

Test scenarios and observations

exporter/prometheus - resource_to_telemetry_conversion - enabled

When enabling the config options, the following was observed

  • receiver/dockerstats metrics are available as expected
  • receiver/prometheus metrics are gone

I don't really know how the prometheus receiver converts the scraped metrics into an otel object, but it looks like that it creates individual metrics + a target_info metric only containing Data point attributes but no Resource attributes.

This would explain, that the metrics disappear, as from what it seems, all existing metric labels are wiped and replaced with nothing.

manually setting attribute labels

Trying to set manual static attributes through the attributes processor only added a new label, to the single metrics, but did not produce individual container metrics

After going through all the logs and searching through all the documentation I discovered the
Setting resource attributes as metric labels section from the prometheus exporter, when implemented (see the commented out sections of the config), metrics from the dockerstats receiver showed up on the exporters /metrics endpoint, but are still missing some crucial labels, which might need to be added manually as well.

Findings

Based on all the observations during testing and trying things out, these are my takeaways for the current shortcomings of the 3 selected components and how they are not very good integrated with each other.

receiver/dockerstats

  • The received data from docker should be properly set up as either resource or datapoint attribute
  • The config settings or maybe just the documentation for the container_labels_to_metric_labels and env_vars_to_metric_labels settings is incorrect, as they are not added as a datapoint attribute and therefore never show up in any metric labels
  • For the metrics to work with prometheus, they should include a job and an instance label,
    by using the service.namespace,service.name,service.instance.id resource attributes, which then hopefully get picked up correctly by the exporter to convert it into the right label.

receiver/prometheus

  • I was under the impression, that the labels from the docker_sd_configs are added as resource attributes to the scraped metrics.
    But as I can't find the link to the source right now I am either mistaken or it just is not the case, looking at the log outputs and the target_info metrics.

exporter/prometheusexporter

  • Looking at the documentation and the target_info metric, I am missing the resource attributes from the dockerstats metrics. Maybe this is due to the missing service attributes or some other reason, but I was unable to see any errors or warnings in the standard log
  • The resource_to_telemetry_conversion functionality left me a bit speechless, that it wipes all datapoint attributes, especially when there are no resource attributes available.
    Also activating it would mean, that I would loose (as an example) the interface information from the container.network.io.usage.rx_bytes metric, without any idea from where the actual value is taken or calculated from.
    A warning in the documentation would be really helpful, or a flag to adjust the behavior based on individual needs.

Right now I am torn between manually transform all the labels of the dockerstats receiver, or
create duplicate pipelines with a duplicated exporter, but either way there is some room for improvement to have everything working together smoothly.

Collector version

otel/opentelemetry-collector-contrib:0.101.0

Environment information

Environment

Docker

OpenTelemetry Collector configuration

receivers:
  docker_stats:
    api_version: '1.45'
    collection_interval: 10s
    container_labels_to_metric_labels:
      com.docker.compose.project: compose.project
      com.docker.compose.service: compose.service
    endpoint: "unix:https:///var/run/docker.sock"
    initial_delay: 1s
    metrics:
      container.restarts:
        enabled: true
      container.uptime:
        enabled: true
    timeout: 5s
  otlp:
    protocols:
      grpc: null
      http: null
  prometheus:
    config:
      global:
        scrape_interval: 30s
      scrape_configs:
      - job_name: otel-collector
        relabel_configs:
        - replacement: static.instance.name
          source_labels:
          - __address__
          target_label: instance
        scrape_interval: 30s
        static_configs:
        - targets:
          - localhost:8888
      - docker_sd_configs:
        - filters:
          - name: label
            values:
            - prometheus.scrape=true
          host: unix:https:///var/run/docker.sock
        job_name: docker-containers
        relabel_configs:
        - action: replace
          source_labels:
          - __meta_docker_container_label_prometheus_path
          target_label: __metrics_path__
        - action: replace
          regex: /(.*)
          source_labels:
          - __meta_docker_container_name
          target_label: container_name
        - action: replace
          separator: ':'
          source_labels:
          - container_name
          - __meta_docker_container_label_prometheus_port
          target_label: __address__
        - replacement: static.instance.name
          source_labels:
          - __address__
          target_label: instance
        - action: replace
          source_labels:
          - __meta_docker_container_id
          target_label: container_id
        - action: replace
          source_labels:
          - __meta_docker_container_id
          target_label: service_instance_id
        - action: replace
          source_labels:
          - __meta_docker_container_label_service_namespace
          target_label: service_namespace
        - action: replace
          source_labels:
          - container_name
          target_label: service_name
        - action: replace
          source_labels:
          - __meta_docker_container_label_deployment_environment
          target_label: deployment_environment
        - action: replace
          regex: (.+/)?/?(.+)
          replacement: $${1}$${2}
          separator: /
          source_labels:
          - service_namespace
          - service_name
          target_label: job
        scrape_interval: 30s

processors:
  batch: null
  resourcedetection/docker:
    detectors:
    - env
    - docker
    override: true
    timeout: 2s   
#  transform/dockerstats:
#    metric_statements:
#      - context: datapoint
#        statements:
#          - set(attributes["container.id"], resource.attributes["container.id"])
#          - set(attributes["container.name"], resource.attributes["container.name"])
#          - set(attributes["container.hostname"], resource.attributes["container.hostname"])
#          - set(attributes["host.name"], resource.attributes["host.name"])
#          - set(attributes["compose.project"], resource.attributes["compose.project"])
#          - set(attributes["compose.service"], resource.attributes["compose.service"])
#          - set(attributes["deployment.environment"], resource.attributes["deployment.environment"])
#          - set(attributes["service.namespace"], resource.attributes["service.namespace"])

service:
  pipelines:
    metrics:
      exporters:
      - prometheus
      - logging
      processors:
      # - transform/dockerstats
      - resourcedetection/docker
      - batch
      receivers:
      - otlp
      - docker_stats
      - prometheus

Log output

some snippets from individual metric log entries 

`nodexporter` metric through `receiver/prometheus` which contains Data point attributes but no Resource attributes


Metric #9
Descriptor:
     -> Name: node_disk_flush_requests_total
     -> Description: The total number of flush requests completed successfully
     -> Unit: 
     -> DataType: Sum
     -> IsMonotonic: true
     -> AggregationTemporality: Cumulative
NumberDataPoints #0
Data point attributes:
     -> container_id: Str(47626dfb6da051ba7858bfa763297be5a20283b510a3faab46dd8a1f2f25210d)
     -> container_name: Str(nodeexporter)
     -> device: Str(sr0)
     -> service_instance_id: Str(47626dfb6da051ba7858bfa763297be5a20283b510a3faab46dd8a1f2f25210d)
     -> service_name: Str(nodeexporter)
StartTimestamp: 2024-05-29 19:03:04.033 +0000 UTC
Timestamp: 2024-05-29 19:03:04.033 +0000 UTC
Value: 0.000000
NumberDataPoints #1
Data point attributes:
     -> container_id: Str(47626dfb6da051ba7858bfa763297be5a20283b510a3faab46dd8a1f2f25210d)
     -> container_name: Str(nodeexporter)
     -> device: Str(vda)
     -> service_instance_id: Str(47626dfb6da051ba7858bfa763297be5a20283b510a3faab46dd8a1f2f25210d)
     -> service_name: Str(nodeexporter)
StartTimestamp: 2024-05-29 19:03:04.033 +0000 UTC
Timestamp: 2024-05-29 19:03:04.033 +0000 UTC
Value: 6907671.000000

receiver/dockerstats metric with a Data point attribute, but no Resource attribute

Descriptor:
     -> Name: container.network.io.usage.rx_bytes
     -> Description: Bytes received by the container.
     -> Unit: By
     -> DataType: Sum
     -> IsMonotonic: true
     -> AggregationTemporality: Cumulative
NumberDataPoints #0
Data point attributes:
     -> interface: Str(eth0)
StartTimestamp: 2024-05-29 19:02:58.560036953 +0000 UTC
Timestamp: 2024-05-29 19:03:01.664890776 +0000 UTC
Value: 425806
NumberDataPoints #1
Data point attributes:
     -> interface: Str(eth1)
StartTimestamp: 2024-05-29 19:02:58.560036953 +0000 UTC
Timestamp: 2024-05-29 19:03:01.664890776 +0000 UTC
Value: 1631176

receiver/dockerstats metric with no Data point attribute, but Resource attributes

Metric #18
Descriptor:
     -> Name: container.uptime
     -> Description: Time elapsed since container start time.
     -> Unit: s
     -> DataType: Gauge
NumberDataPoints #0
StartTimestamp: 2024-05-29 19:02:58.560036953 +0000 UTC
Timestamp: 2024-05-29 19:03:01.664890776 +0000 UTC
Value: 5070.807996
ResourceMetrics #5
Resource SchemaURL: https://opentelemetry.io/schemas/1.6.1
Resource attributes:
     -> container.runtime: Str(docker)
     -> container.hostname: Str(my-hostname)
     -> container.id: Str(cacdf88cadd7d8691efefbd0f0c49d256718830b89a3e47f6b65e8e7378534e6f)
     -> container.image.name: Str(my/test-container:0.0.9)
     -> container.name: Str(my-test-container)
     -> host.name: Str(my-hostname)
     -> os.type: Str(linux)
ScopeMetrics #0
ScopeMetrics SchemaURL: 
InstrumentationScope otelcol/dockerstatsreceiver 0.101.0


### Additional context

_No response_
@schewara schewara added bug Something isn't working needs triage New item requiring triage labels May 29, 2024
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@jamesmoessis
Copy link
Contributor

I'm not sure I fully understand your issue, but it's seemingly not to do with the docker stats receiver. It just seems like the prometheus exporter isn't exporting what you expect, or there is some misunderstanding about what it makes available.

I think this would be more helpful if you identified one component that wasn't operating as expected. The docker stats receiver and the prometheus exporter have nothing to do with each other.

If your problem is that the docker stats receiver isn't reporting a metric that it should be, then it's a problem with the docker stats. If the prom exporter isn't doing what you think it should, that's a problem with the prom exporter (or a misconfiguration).

From what I can see it seems that the docker stats receiver is producing all of the information it should, and then the prom exporter is stripping some of the information that you expect. You can verify this by replacing the prom exporter with the debugexporter and see the output straight into stdout. If it's what you expect then you can narrow down the issue to the prom exporter.

@Mendes11
Copy link

Mendes11 commented Jun 20, 2024

I'm experiencing the same issue, but my exporter is awsemf.

Using the debugger, I can see the labels in Resource Attributes, but it seems awsemf is just using the Data Point Attributes when sending the metrics, and it ignores what's in the Resource attributes?

 ResourceMetrics #4
 Resource SchemaURL: https://opentelemetry.io/schemas/1.6.1
 Resource attributes:
      -> container.runtime: Str(docker)
      -> container.hostname: Str(c3e61d730cb6)
      -> container.id: Str(c3e61d730cb6c5936b5862844d6e4acf60a880821610a7af9f9a689cffb966db)
      -> container.image.name: Str(couchdb:2.3.1@sha256:5c83dab4f1994ee4bb9529e9b1d282406054a1f4ad957d80df9e1624bdfb35d7)
      -> container.name: Str(swarmpit_db.1.usj3zlnoxmwjhjc27tc3g5he0)
      -> swarm_service: Str(swarmpit_db)
      -> swarm_container_id: Str(usj3zlnoxmwjhjc27tc3g5he0)
      -> swarm_namespace: Str(swarmpit)
 ScopeMetrics #0
 ScopeMetrics SchemaURL:
 InstrumentationScope otelcol/dockerstatsreceiver 1.0.0
 Metric #0
 Descriptor:
      -> Name: container.blockio.io_service_bytes_recursive
      -> Description: Number of bytes transferred to/from the disk by the group and descendant groups.
      -> Unit: By
      -> DataType: Sum
      -> IsMonotonic: true
      -> AggregationTemporality: Cumulative
 NumberDataPoints #0
 Data point attributes:
      -> device_major: Str(259)
      -> device_minor: Str(0)
      -> operation: Str(read)
 StartTimestamp: 2024-06-20 19:10:25.725911895 +0000 UTC
 Timestamp: 2024-06-20 19:19:28.761889055 +0000 UTC
 Value: 4366336
 NumberDataPoints #1
 Data point attributes:
      -> device_major: Str(259)
      -> device_minor: Str(0)
      -> operation: Str(write)
 StartTimestamp: 2024-06-20 19:10:25.725911895 +0000 UTC
 Timestamp: 2024-06-20 19:19:28.761889055 +0000 UTC
 Value: 4096

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working exporter/prometheus needs triage New item requiring triage receiver/dockerstats receiver/prometheus Prometheus receiver
Projects
None yet
Development

No branches or pull requests

3 participants