Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exporter v1.4.0 up to v1.7.0 is constantly failing to retrieve data in time. #244

Open
gtherond opened this issue Jul 20, 2022 · 6 comments

Comments

@gtherond
Copy link

Hi everyone,

I'm using the exporter on a kolla-ansible based platform, however I'm facing a weird issue with it since a while.

2022/07/20 16:47:21 http: superfluous response.WriteHeader call from github.com/prometheus/client_golang/prometheus/promhttp.httpError (http.go:344)

I can request this exporter using cURL but when I'm on grafana through prometheus server or even calling /metrics on prometheus server from a cURL request, then I don't get any metrics.

What's really weird is, as soon as this message appear the process restart (not the container).

I'm using the latest 1.6.0 but I have this behavior and issue since 1.4.0.

I've seen a lot of project having this issue and even a in here with this issue: #130

My prometheus server is having plenty ressources available with a lot of disk IOPS and CPUs/RAM.
Here is the prometheus configuration:

/opt/prometheus/prometheus -config.file /etc/prometheus/prometheus.yml -web.listen-address 192.168.22.12:9091 -web.external-url=http:https://172.16.22.253:9091 -log.format logger:stdout -storage.local.path /var/lib/prometheus -storage.local.retention=744h -storage.local.target-heap-size=24000000000

The prometheus server is at version 1.8.2
All our images are CentOS 8 Stream based.
All our host are CentOS 8 Stream based too.

Let me know if you need more information.

@gtherond gtherond changed the title Exporter constantly rebooting 1.6.0 - Exporter constantly rebooting Jul 20, 2022
@gtherond gtherond changed the title 1.6.0 - Exporter constantly rebooting Exporter v1.6.0 is constantly rebooting Jul 20, 2022
@gtherond
Copy link
Author

Ok, I've increased the scrape interval and the scrape timeout, it seems to solve (kinda, still waiting to see if we have holes in the metrics) the issue.
So, it seems that the exporter is actually too slow to answer for our default value:

Default values:
Scrape interval: 60s
Scrape timeout: 10s

New values:
Scrape interval: 200s
Scrape timeout: 60s

@gtherond
Copy link
Author

Hi guys, the infamous bug is back!

We did increased the thresholds up to:
Scrape interval: 900s
Scrape timeout: 60s

However, even with those thresholds the metrics retrieval is not really working correctly as we now have period where we can't graph the latest 5mins info as the retrieval failed do to the following error:

https://paste.opendev.org/show/byl7Eyuu5f4xfNwgc04q/

We're using the 1.7.0 release and this retrieval time issue is present since 1.4.0 up to latest.

@gtherond gtherond changed the title Exporter v1.6.0 is constantly rebooting Exporter v1.6.0/v1.7.0 is constantly failing to retrieve data in time. Apr 18, 2024
@gtherond gtherond changed the title Exporter v1.6.0/v1.7.0 is constantly failing to retrieve data in time. Exporter v1.4.0 up to v1.7.0 is constantly failing to retrieve data in time. Apr 19, 2024
@Nils98Ar
Copy link

Nils98Ar commented Jun 4, 2024

We are facing similar issues. Seems to me that the memory usage and the response time continuously increases…

@gtherond Do you see high memory usage (>30GB) as well?

@gtherond
Copy link
Author

gtherond commented Jun 4, 2024

Nope, it barely scratch around 6Gb of ram and even in debug mode it doesn't indeed show anything special.

What's really weird is that the HTTP error is not catched by any stack in between.

Here is my latest debug logs attached:
https://paste.opendev.org/show/b524Ksxvi5tc1pHOmSeM/

If anything can help let me know!

@Hybrid512
Copy link
Contributor

Are you using Heat intensively ?
Heat is creating plenty of empty projects in a dedicated domain of its own and it takes ages to scrap all of these for nothing.
We had that kind of issue and asked for this feature to fix this : #110
It really helped in our case reducing our scrap time from more than a minute to a few seconds.

@gtherond
Copy link
Author

Hi!

Not at all in fact ^^
I narrowed down the issue to the use of NewPedanticRegistry function for the exporters, which seems to be sub optimal for our Openstack use case where the "API" can took a long time to answer with a large set of data.

Another issue with Openstack is it does answer slowly where prometheus advise to keep scrape and collect duration under 2 mins as much as possible and 5 mins max.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants