Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add exemplars support to prometheusremotewriteexporter #5578

Conversation

anneelisabethlelievre
Copy link
Contributor

@anneelisabethlelievre anneelisabethlelievre commented Oct 4, 2021

Signed-off-by: Anne-Elisabeth Lelievre [email protected]

Hello, this is the implementation of #5192 to support exemplars for the prometheus remote write exporter.

Thanks

@anneelisabethlelievre anneelisabethlelievre marked this pull request as ready for review October 4, 2021 15:48
@anneelisabethlelievre anneelisabethlelievre requested a review from a team as a code owner October 4, 2021 15:48
Signed-off-by: Anne-Elisabeth Lelievre <[email protected]>
@github-actions
Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label Oct 12, 2021
@secat
Copy link

secat commented Oct 12, 2021

@codeboten I think this PR is ready for review

@codeboten codeboten removed the Stale label Oct 12, 2021
Copy link
Contributor

@codeboten codeboten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few questions. It would be great to get a second review from one of the awsprometheusremotewriteexporter codeowners.

Ping @alolita @anuraaga @rakyll

exporter/prometheusremotewriteexporter/helper_test.go Outdated Show resolved Hide resolved
exporter/prometheusremotewriteexporter/helper.go Outdated Show resolved Hide resolved
exporter/prometheusremotewriteexporter/helper.go Outdated Show resolved Hide resolved
exporter/prometheusremotewriteexporter/helper_test.go Outdated Show resolved Hide resolved
@@ -327,7 +361,34 @@ func addSingleHistogramDataPoint(pt pdata.HistogramDataPoint, resource pdata.Res
Timestamp: time,
}
infLabels := createAttributes(resource, pt.Attributes(), externalLabels, nameStr, baseName+bucketStr, leStr, pInfStr)
addSample(tsMap, infBucket, infLabels, metric)

promExemplar := getPromExemplar(pt, len(pt.BucketCounts())-1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify where len(pt.BucketCounts())-1 comes from? I could be misunderstanding it, but my reading of the proto doesn't indicate any relationship between bucket counts and exemplars

https://github.com/open-telemetry/opentelemetry-proto/blob/main/opentelemetry/proto/metrics/v1/metrics.proto#L473

Shouldn't we just go through all the exemplars, convert them and add them all?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anuraaga according to my understanding, the len(pt.BucketCounts())-1 seems to be the index for the +inf bucket. For me, from what I read, it seems to exists a relationship between the bucket index and the exemplars with the support in prometheus:

prometheus/prometheus#9414

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarifying! Can you add a comment referencing that we need to use the +inf bucket?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes of course! done

Signed-off-by: Anne-Elisabeth Lelièvre <[email protected]>
@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Oct 18, 2021

CLA Signed

The committers are authorized under a signed CLA.

Signed-off-by: Anne-Elisabeth Lelievre <[email protected]>
Signed-off-by: Anne-Elisabeth Lelièvre <[email protected]>
Copy link
Contributor

@codeboten codeboten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing my comments, @anuraaga please review and approve if your comments have been addressed.

Copy link
Member

@gouthamve gouthamve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm super new to the code base but I think making sure the traceid label is present in as many exemplars is important. Otherwise the exported exemplars are not very useful in Prometheus.

Other than that, good job! I can't wait to see this merged!

Comment on lines 381 to 389
labels := exemplar.FilteredAttributes().AsRaw()
for key, value := range labels {
promLabel := prompb.Label{
Name: key,
Value: value.(string),
}

promExemplar.Labels = append(promExemplar.Labels, promLabel)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are doing this because of

exemplar.FilteredAttributes().Insert(traceIDKey, pdata.NewAttributeValueString(traceID.HexString()))

But this means the trace_id key will only be set on exemplars from the spanmetricsprocessor. I think we should set the trace_id on every exemplar. Instead of using FilteredAttributes, why not just do:

			promLabel := prompb.Label{
				Name:  traceIDKey,
				Value: exemplar.TraceID().HexString(),
			}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @gouthamve, initially I wanted to do it like this, but I think it is better to make it as generic as possible by going through all the labels and not specifying only the trace_id but the others as well. From what I understood, the trace_id could be set in any processor and the exporter just needs to copy what he received from the processor and not wonder if there is really the trace_id. But tbh, I'm not 100% sure and I understand your point and maybe I'm wrong, WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrm, so the current solution only works for exemplars emitted by spanmetricsprocessor, if a service sends an exemplar, it would be sent to Prometheus, but it would be ~useless because the only consumer of exemplars (Grafana) needs a traceID.

I think its fair to add all the labels, but we should also make sure the traceID is one of the labels and if its not, add it.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO, there are two use cases:

Use case (1):

+-----------------+         +------------------------+         +--------------------------+
|  OTLP Receiver  | ------> | Span Metrics Processor | ------> |  OTLP Exporter           |
+-----------------+         +------------------------+    |    +--------------------------+
                                                          |
                                                          |    +--------------------------+
                                                          +--> | Prometheus Remote Write  |
                                                               +--------------------------+

Use case (2):

+-----------------------+         +--------------------------+         
|  Prometheus Receiver  | ------> | Prometheus Remote Write  |
+-----------------------+         +--------------------------+

In use case (1), metrics are generated from trace spans and the trace_id is injected as an exemplar in the spanmetricsprocessor.

However, in use case (2), metrics are coming from scrapped instances (i.e. no traceID are associated for those metrics in pdata). IMHO we should honor exemplar(s) that are attached to those metrics and forward them.

WDYT?

@secat
Copy link

secat commented Oct 28, 2021

@Anne-Eli we are currently having the following errors when sending metrics to Cortex:

{"level":"error","ts":1635446592.7831264,"caller":"exporterhelper/queued_retry.go:199","msg":"Exporting failed. The error is not retryable. Dropping data.","kind":"exporter","name":"prometheusremotewrite/0/metrics/2","error":"Permanent error: remote write returned HTTP status 400 Bad Request; err = <nil>: exemplar missing labels, timestamp: 0 series: {__name__=\"harbour_tracing_latency_bucket\", le=\"2000\", operation=\"opentelemetrycollector_controller\", service_name=\"fleet-tracing-operator.fleet-system\", span_kind=\"SPAN_KIND_UNSPECIFIED\", status_code=\"STATUS_C","dropped_items":4,"stacktrace":"go.opentelemetry.io/collector/exporter/exporterhelper.(*retrySender).send\n\t/build/vendor/go.opentelemetry.io/collector/exporter/exporterhelper/queued_retry.go:199\ngo.opentelemetry.io/collector/exporter/exporterhelper.(*metricsSenderWithObservability).send\n\t/build/vendor/go.opentelemetry.io/collector/exporter/exporterhelper/metrics.go:132\ngo.opentelemetry.io/collector/exporter/exporterhelper.(*queuedRetrySender).start.func1\n\t/build/vendor/go.opentelemetry.io/collector/exporter/exporterhelper/queued_retry_inmemory.go:105\ngo.opentelemetry.io/collector/exporter/exporterhelper/internal.ConsumerFunc.Consume\n\t/build/vendor/go.opentelemetry.io/collector/exporter/exporterhelper/internal/bounded_queue.go:103\ngo.opentelemetry.io/collector/exporter/exporterhelper/internal.(*BoundedQueue).StartConsumersWithFactory.func1\n\t/build/vendor/go.opentelemetry.io/collector/exporter/exporterhelper/internal/bounded_queue.go:82"}

maybe related to: #5263

@anneelisabethlelievre
Copy link
Contributor Author

anneelisabethlelievre commented Oct 29, 2021

@Anne-Eli we are currently having the following errors when sending metrics to Cortex:

{"level":"error","ts":1635446592.7831264,"caller":"exporterhelper/queued_retry.go:199","msg":"Exporting failed. The error is not retryable. Dropping data.","kind":"exporter","name":"prometheusremotewrite/0/metrics/2","error":"Permanent error: remote write returned HTTP status 400 Bad Request; err = <nil>: exemplar missing labels, timestamp: 0 series: {__name__=\"harbour_tracing_latency_bucket\", le=\"2000\", operation=\"opentelemetrycollector_controller\", service_name=\"fleet-tracing-operator.fleet-system\", span_kind=\"SPAN_KIND_UNSPECIFIED\", status_code=\"STATUS_C","dropped_items":4,"stacktrace":"go.opentelemetry.io/collector/exporter/exporterhelper.(*retrySender).send\n\t/build/vendor/go.opentelemetry.io/collector/exporter/exporterhelper/queued_retry.go:199\ngo.opentelemetry.io/collector/exporter/exporterhelper.(*metricsSenderWithObservability).send\n\t/build/vendor/go.opentelemetry.io/collector/exporter/exporterhelper/metrics.go:132\ngo.opentelemetry.io/collector/exporter/exporterhelper.(*queuedRetrySender).start.func1\n\t/build/vendor/go.opentelemetry.io/collector/exporter/exporterhelper/queued_retry_inmemory.go:105\ngo.opentelemetry.io/collector/exporter/exporterhelper/internal.ConsumerFunc.Consume\n\t/build/vendor/go.opentelemetry.io/collector/exporter/exporterhelper/internal/bounded_queue.go:103\ngo.opentelemetry.io/collector/exporter/exporterhelper/internal.(*BoundedQueue).StartConsumersWithFactory.func1\n\t/build/vendor/go.opentelemetry.io/collector/exporter/exporterhelper/internal/bounded_queue.go:82"}

maybe related to: #5263

@secat I'm on it

@anneelisabethlelievre
Copy link
Contributor Author

anneelisabethlelievre commented Nov 4, 2021

@Anne-Eli we are currently having the following errors when sending metrics to Cortex:

{"level":"error","ts":1635446592.7831264,"caller":"exporterhelper/queued_retry.go:199","msg":"Exporting failed. The error is not retryable. Dropping data.","kind":"exporter","name":"prometheusremotewrite/0/metrics/2","error":"Permanent error: remote write returned HTTP status 400 Bad Request; err = <nil>: exemplar missing labels, timestamp: 0 series: {__name__=\"harbour_tracing_latency_bucket\", le=\"2000\", operation=\"opentelemetrycollector_controller\", service_name=\"fleet-tracing-operator.fleet-system\", span_kind=\"SPAN_KIND_UNSPECIFIED\", status_code=\"STATUS_C","dropped_items":4,"stacktrace":"go.opentelemetry.io/collector/exporter/exporterhelper.(*retrySender).send\n\t/build/vendor/go.opentelemetry.io/collector/exporter/exporterhelper/queued_retry.go:199\ngo.opentelemetry.io/collector/exporter/exporterhelper.(*metricsSenderWithObservability).send\n\t/build/vendor/go.opentelemetry.io/collector/exporter/exporterhelper/metrics.go:132\ngo.opentelemetry.io/collector/exporter/exporterhelper.(*queuedRetrySender).start.func1\n\t/build/vendor/go.opentelemetry.io/collector/exporter/exporterhelper/queued_retry_inmemory.go:105\ngo.opentelemetry.io/collector/exporter/exporterhelper/internal.ConsumerFunc.Consume\n\t/build/vendor/go.opentelemetry.io/collector/exporter/exporterhelper/internal/bounded_queue.go:103\ngo.opentelemetry.io/collector/exporter/exporterhelper/internal.(*BoundedQueue).StartConsumersWithFactory.func1\n\t/build/vendor/go.opentelemetry.io/collector/exporter/exporterhelper/internal/bounded_queue.go:82"}

maybe related to: #5263

@secat I'm on it

I created this PR 6140 to fix this in the span metrics processor with the current changes of this PR and everything seems to be ok now when sending metrics to cortex.

@anneelisabethlelievre
Copy link
Contributor Author

anneelisabethlelievre commented Nov 8, 2021

Please do not merge this PR, I just found 2 little issues, I'll commit 2 fixes ASAP, thanks.

Copy link
Contributor

@codeboten codeboten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking as requested #5578 (comment)

@jpkrohling jpkrohling marked this pull request as draft November 9, 2021 09:15
@anneelisabethlelievre
Copy link
Contributor Author

anneelisabethlelievre commented Nov 12, 2021

Blocking as requested #5578 (comment)

I just pushed some new changes in this PR and here are the 2 changes I added :

  1. When we loop through the attributes of the exemplars in the "getPromExemplars" method, I now use the Range and AsString() to convert an OTLP AttributeValue object of any type to its equivalent string.

before:

labels := exemplar.FilteredAttributes().AsRaw()
for key, value := range labels {
  promLabel := prompb.Label{
    Name:  key,
    Value: value.(string),
  }
  
  promExemplar.Labels = append(promExemplar.Labels, promLabel)
}

now:

exemplar.FilteredAttributes().Range(func(key string, value pdata.AttributeValue) bool {
promLabel := prompb.Label{
    Name:  key,
    Value: value.AsString(),
  }
  
  promExemplar.Labels = append(promExemplar.Labels, promLabel)
  
  return true
})
  1. When I tested with this fix: 6140, this processor now append the exemplar data and does not add a exemplar to a specific bucket index, so now in this exporter, we simply loop through all the examplars that we receive and no longer take the bucket index as a reference. I also realised that the "le" value for an exemplar wasn't correct in the addExemplars func, so to fix that, I scan the bucket bound from the smallest to the biggest and add the exemplar to the first bound where it is smaller than the bound.

Finally, I re-tested sending to prometheus and cortex and everything looks ok now. In prometheus, the "le" values are ok now.

image

I know some unit tests are failing but I don't thinks so this is related to my new changes (It seems to come from here https://github.com/anneelisabethlelievre/opentelemetry-collector-contrib/blob/main/extension/awsproxy/factory_test.go#L51, but I'm not sure there is anything to do with my changes).

cc: @codeboten

@anneelisabethlelievre anneelisabethlelievre marked this pull request as ready for review November 12, 2021 19:27
Copy link
Contributor

@codeboten codeboten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fixes @anneelisabethlelievre! You're right the failures are unrelated.

@codeboten
Copy link
Contributor

/easycla

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants