-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prometheus Receiver - Some counter metrics dropped for unknown reason #4974
Comments
All these dropped metrics has also negative effects at exporter level. Each of them produce this exception on prometheus exporter. The metric seems "partialy" dropped, a reference should stay somwhere but without a valid name and type...
|
Most likely they are the new types "gauge histogram, enum or state" which we don't support |
We are seeing the same. Even on a counter as simple as this, which is present on the endpoint that the Prom receiver is scraping, it does not appear on the exporter side.
|
I don't understand, I compared some metrics with exactly the same name on a node-exporter and on grafana internal exporter, the metric is scrapped correctly on node exporter but not on grafana.... It makes no sense. I thought to a invalid byte somewhere on grafana, gzip compression, transfer encoding but I found nothing for now... |
I wonder if this is related to the adjustment logic being removed in open-telemetry/opentelemetry-collector#3047 that was dropping some samples. Can you try with a build from that branch, or try once it lands on the trunk? |
Don't you think there is a link with https://github.com/open-telemetry/opentelemetry-collector/issues/2852 ? It seems a "regresssion" or something like that if it was working before march. @Aneurysm9 I didn't have the time to try the branch from you PR, I will try it asap. |
@Aneurysm9 because this PR was merged I tried directly on master, but it's the same thing. 😞 As example, on node-exporter otel-collector succeed to record
On grafana job this metric is dropped. The original value is :
Even if I only keep Grafana receiver job, the result is the same. |
Sadly I was very happy but this not fix the issue. |
@bogdandrutu can you reopen the issue please ? |
@gillg I am facing a similar issue with the prometheus receiver, did you find a workaround that worked for you? |
Hello, unfortunately nothing for now. It's definitely not systematic, not a majority of metrics, but present a lot in some contexts like grafana metrics. |
Thanks for the response! I ended up changing my metrics to add the |
@bruuuuuuuce what is wrong here ?
It's an exemple of metric considered "untyped" but the TYPE counter is defined. |
@gillg I am not sure, from my understanding of prometheus metrics, that formatting looks right to me. Why do you say that it is untyped? |
Because this produce traces from the first note. |
Does anyone can reopen this issue please ? |
|
Someone to re-open ? 🙏 I discovered a lot of new use cases where my As new exemple a set of metrics :
|
Hi, |
I found out that the To check if the metric has been dropped by Otherwise we could look at the prometheusreceiver next. |
👋, just to +1, we have some end users reporting issues with either this issue or #4907. Noticed you re-opened this issue @Aneurysm9 , are you currently investigating this? cc @mx-psi |
Hello, I also might have the same problem, running 0.37.1. @gillg have you been able to find a solution ? See the metrics points missing : In this I'm trying to scrape Might be unrelated but : I'm running the collector in standalone mode (single replica) and I noticed that metrics are always consistent for targets being on the same node as the collector. For targets located on another node it's erratic. Will update. |
Hello @etiennejournet for now I don't have any solution about dropped counters. But this scenario seems not match with your graph. In your case you have dropped points and not dropped entire time serie. |
Thanks for you answer ;) I don't think it's the memory leak problem, I had a careful look to your other thread about that and my opentelemetry setup doesn't show any sign of refused/dropped metrics endpoints in the receiver or exporters, I don't even see the memory leak in my own monitoring. I'm going to dig that further, thanks for your time ;) |
This problem seems have disapear with the latest otel collector contrib (v0.42.0 as today). Can be closed for now :) |
…pen-telemetry#4974) * When Run fails due to a startup issue it will set the state to closed Signed-off-by: Corbin Phelps <[email protected]> * Fixed changelog Signed-off-by: Corbin Phelps <[email protected]>
Same problem, running v0.78.0! |
Same here, although not all with |
Describe the bug
Analyzing an opentelemetry collector setup and trying to create dashboards with collected metrics, I discovered some droped metrics for an unknown reason.
I have a bunch of "dropped" metrics without any logs or traces.
Eventualy,
I have some logs
info internal/metrics_adjuster.go:357 Adjust - skipping unexpected point {"kind": "receiver", "name": "prometheus", "type": "UNSPECIFIED"}
So they seems dropped due to an unspecified type. I added some logs on metricFamily.go to vizualize their metada, and they are empty.
As exemple this list completly disapear between the receiver and the exporter :
I had to put custom logs to understand, and the seems dropped because the Type is unspecified and they have no metadata from metricFamily.go
For an unknown reason some other metrics ending with
_total
are working like :Steps to reproduce
I don't know exactly... Try to scrap a grafana API on https://grafana:3000/metrics
What did you expect to see?
Metrics should be kept internaly, then visible at the exporter side.
What did you see instead?
No metrics at the exporter side, and probably dropped on metrics_adjuster.go (a log with metric name is definitely missing here)
What version did you use?
0.25
The text was updated successfully, but these errors were encountered: