Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[connector/spanmetrics] - unable to export the metrics using prometheusremotewrite when batch processor enabled #32042

Open
ramanjaneyagupta opened this issue Mar 29, 2024 · 4 comments
Labels
bug Something isn't working connector/spanmetrics

Comments

@ramanjaneyagupta
Copy link

Component(s)

connector/spanmetrics

What happened?

Description

Unable to export span metrics using prometheusremotewrite when batch processor enabled.
Agetns -> Gateway(OtelCollectors) -> Storage.
Gateway contains multiple servers which calculates the spanmetrics and writes to the Prometheus using PRW.

Steps to Reproduce

enable spanmetrics with prometheus remotewrite

Expected Result

Prometheus Remote Write should able to export the metrics when batch processor is enabled

Actual Result

It is giving "Permanent Error"; "Duplicate Sample For timestamp"; " Permanent error remote write returned status code of 400 bad request""

Collector version

v0.96.0

Environment information

Environment

OS: Linux (RHEL)

OpenTelemetry Collector configuration

receivers:
  otlp:
    protocols:
      http:
      grpc:

exporters:
  prometheusremotewrite:
    endpoint: http:https://<endpoint>
     target_info:
       enabled: true
    resource_to_telemetry_conversion:
      enabled: true 

connectors:
  spanmetrics:
   dimensions:
      - name: http.method
      - name: http.status_code
      - name: k8s.namespace.name
    exemplars:
      enabled: true
    dimensions_cache_size: 1000
    aggregation_temporality: "AGGREGATION_TEMPORALITY_CUMULATIVE"    
    metrics_flush_interval: 15s
    resource_metrics_key_attributes:
      - service.name
      - telemetry.sdk.language
      - telemetry.sdk.name
processors:
  resourcedetection/system:
    detectors: ["system"]
    system:
      hostname_sources: ["os"]
service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [resourcedetection/system, batch]
      exporters: [spanmetrics,tracebackend]
    metrics:
      receivers: [spanmetrics]
      processors: [resourcedetection/system, batch]
      exporters: [prometheusremotewrite]

Log output

" Permanent error remote write returned status code of 400 bad request" "err"=nil duplicate sample for timestamp"

Additional context

No response

@ramanjaneyagupta ramanjaneyagupta added bug Something isn't working needs triage New item requiring triage labels Mar 29, 2024
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@ankitpatel96
Copy link
Contributor

ankitpatel96 commented Apr 17, 2024

I would guess that the problem here is that your multiple collectors running in gateway mode are submitting the same samples to prometheus.
between the default list of

service.name
span.name
span.kind
status.code

and your dimensions list of

http.method
http.status_code
k8s.namespace.name

I would guess that these do not uniquely identify series. Is each collector receiving traces from the same machines? If not the exact same machines, would each collector be receiving traces from containers that run the same service in the same k8s namespace?

I would also guess that this is also the problem in 32043

Copy link
Contributor

github-actions bot commented Jul 8, 2024

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Jul 8, 2024
@Frapschen
Copy link
Contributor

Hi, @ramanjaneyagupta Is the problem still exist? Have you tried to add a unique identity to your metrics?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working connector/spanmetrics
Projects
None yet
Development

No branches or pull requests

4 participants