Openstack exporter hits the default Prometheus scrape timeout #214

peppepetra · 2021-12-13T10:39:43Z

In a big environment with:

~40 hosts
~1000 running VMs
~1800 ports

openstack-exporter takes around ~30 sec to collect metrics hitting the 15 sec. default scrape timeout.

I have tried disabling slow and deprecated metrics with:

--disable-slow-metrics --disable-deprecated-metrics

but I am only seeing 1-2 secs of improvements.

It would be nice to have a configurable caching mechanism as described here

Most expensive metrics appear to be:

neutron-security_groups
nova-total_vms

Disabling the two metrics, the exporter returns in ~15 sec but still hitting the scrape timeout.

The text was updated successfully, but these errors were encountered:

Hybrid512 · 2022-02-03T09:51:16Z

Probably related to #110 or #120

afreiberger · 2022-02-07T15:02:15Z

I'd like to echo @peppepetra's suggestion for a caching mechanism to allow for an immediate return of a cached report with tunable metrics collection cycles happening in the background. I suggest in many use cases, Openstack capacity metrics would not impact decision makers if a short, predictable delay between collection and reporting is introduced.

Hybrid512 · 2022-02-07T16:08:14Z

Well, I don't know about your specific situation but in our case, we discovered that the exporter was scrapping metrics from empy domains created by Heat.
These are remains from heat stacks that were not properly purged and the problem is that the exporter is scrapping each domain one by one even if they are empty and this takes quite a lot of time especially when, like in our case, you have thousands of them.
This is what I described already in #110
We're working on a PR for this, I hope we can submit it asap.
This is not a performance fix but just a way to filter out the domains you don't want to scrap and in our case, this is way enough to fix our issue.

engel75 · 2023-03-01T13:55:46Z

We are running an OS environment with 3 AZ and a lot of customers (domains) and thousands of assets. After tuning (disabling a lot of metrics and using probe to scrape service [compute,network,volume,...] by service) we still hit timeouts and some scrapes take more than 2 minutes.
We will try to "workaround" this by triggering the scrape by some script and writing the result to a static webpage. This webpage will be our scrape target. So basically we "implement" caching.

Would be awesome, someone extends the exporter allowing us to define and use caching?

For example:

--enable-cache
--cache-ttl=

So the result (metrics) got cached for and the next scrape would still return the outdated data but start a fresh metric scan in the background. The outdated data will be displayed until fresh data is available.

Would do you think?

jneo8 mentioned this issue Mar 16, 2024

[Feature Request] Cache mechanism #351

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Openstack exporter hits the default Prometheus scrape timeout #214

Openstack exporter hits the default Prometheus scrape timeout #214

peppepetra commented Dec 13, 2021 •

edited

Loading

Hybrid512 commented Feb 3, 2022

afreiberger commented Feb 7, 2022

Hybrid512 commented Feb 7, 2022

engel75 commented Mar 1, 2023

Openstack exporter hits the default Prometheus scrape timeout #214

Openstack exporter hits the default Prometheus scrape timeout #214

Comments

peppepetra commented Dec 13, 2021 • edited Loading

Hybrid512 commented Feb 3, 2022

afreiberger commented Feb 7, 2022

Hybrid512 commented Feb 7, 2022

engel75 commented Mar 1, 2023

peppepetra commented Dec 13, 2021 •

edited

Loading