Exporter v1.4.0 up to v1.7.0 is constantly failing to retrieve data in time. #244

gtherond · 2022-07-20T15:01:20Z

Hi everyone,

I'm using the exporter on a kolla-ansible based platform, however I'm facing a weird issue with it since a while.

2022/07/20 16:47:21 http: superfluous response.WriteHeader call from github.com/prometheus/client_golang/prometheus/promhttp.httpError (http.go:344)

I can request this exporter using cURL but when I'm on grafana through prometheus server or even calling /metrics on prometheus server from a cURL request, then I don't get any metrics.

What's really weird is, as soon as this message appear the process restart (not the container).

I'm using the latest 1.6.0 but I have this behavior and issue since 1.4.0.

I've seen a lot of project having this issue and even a in here with this issue: #130

My prometheus server is having plenty ressources available with a lot of disk IOPS and CPUs/RAM.
Here is the prometheus configuration:

/opt/prometheus/prometheus -config.file /etc/prometheus/prometheus.yml -web.listen-address 192.168.22.12:9091 -web.external-url=http:https://172.16.22.253:9091 -log.format logger:stdout -storage.local.path /var/lib/prometheus -storage.local.retention=744h -storage.local.target-heap-size=24000000000

The prometheus server is at version 1.8.2
All our images are CentOS 8 Stream based.
All our host are CentOS 8 Stream based too.

Let me know if you need more information.

gtherond · 2022-07-22T09:18:47Z

Ok, I've increased the scrape interval and the scrape timeout, it seems to solve (kinda, still waiting to see if we have holes in the metrics) the issue.
So, it seems that the exporter is actually too slow to answer for our default value:

Default values:
Scrape interval: 60s
Scrape timeout: 10s

New values:
Scrape interval: 200s
Scrape timeout: 60s

gtherond · 2024-04-18T15:04:19Z

Hi guys, the infamous bug is back!

We did increased the thresholds up to:
Scrape interval: 900s
Scrape timeout: 60s

However, even with those thresholds the metrics retrieval is not really working correctly as we now have period where we can't graph the latest 5mins info as the retrieval failed do to the following error:

https://paste.opendev.org/show/byl7Eyuu5f4xfNwgc04q/

We're using the 1.7.0 release and this retrieval time issue is present since 1.4.0 up to latest.

Nils98Ar · 2024-06-04T10:51:55Z

We are facing similar issues. Seems to me that the memory usage and the response time continuously increases…

@gtherond Do you see high memory usage (>30GB) as well?

gtherond · 2024-06-04T14:11:42Z

Nope, it barely scratch around 6Gb of ram and even in debug mode it doesn't indeed show anything special.

What's really weird is that the HTTP error is not catched by any stack in between.

Here is my latest debug logs attached:
https://paste.opendev.org/show/b524Ksxvi5tc1pHOmSeM/

If anything can help let me know!

Hybrid512 · 2024-06-20T07:37:20Z

Are you using Heat intensively ?
Heat is creating plenty of empty projects in a dedicated domain of its own and it takes ages to scrap all of these for nothing.
We had that kind of issue and asked for this feature to fix this : #110
It really helped in our case reducing our scrap time from more than a minute to a few seconds.

gtherond · 2024-06-21T08:21:18Z

Hi!

Not at all in fact ^^
I narrowed down the issue to the use of NewPedanticRegistry function for the exporters, which seems to be sub optimal for our Openstack use case where the "API" can took a long time to answer with a large set of data.

Another issue with Openstack is it does answer slowly where prometheus advise to keep scrape and collect duration under 2 mins as much as possible and 5 mins max.

gtherond changed the title ~~Exporter constantly rebooting~~ 1.6.0 - Exporter constantly rebooting Jul 20, 2022

gtherond changed the title ~~1.6.0 - Exporter constantly rebooting~~ Exporter v1.6.0 is constantly rebooting Jul 20, 2022

gtherond changed the title ~~Exporter v1.6.0 is constantly rebooting~~ Exporter v1.6.0/v1.7.0 is constantly failing to retrieve data in time. Apr 18, 2024

gtherond changed the title ~~Exporter v1.6.0/v1.7.0 is constantly failing to retrieve data in time.~~ Exporter v1.4.0 up to v1.7.0 is constantly failing to retrieve data in time. Apr 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exporter v1.4.0 up to v1.7.0 is constantly failing to retrieve data in time. #244

Exporter v1.4.0 up to v1.7.0 is constantly failing to retrieve data in time. #244

gtherond commented Jul 20, 2022

gtherond commented Jul 22, 2022

gtherond commented Apr 18, 2024

Nils98Ar commented Jun 4, 2024

gtherond commented Jun 4, 2024

Hybrid512 commented Jun 20, 2024

gtherond commented Jun 21, 2024

Exporter v1.4.0 up to v1.7.0 is constantly failing to retrieve data in time. #244

Exporter v1.4.0 up to v1.7.0 is constantly failing to retrieve data in time. #244

Comments

gtherond commented Jul 20, 2022

gtherond commented Jul 22, 2022

gtherond commented Apr 18, 2024

Nils98Ar commented Jun 4, 2024

gtherond commented Jun 4, 2024

Hybrid512 commented Jun 20, 2024

gtherond commented Jun 21, 2024