Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pkg/translate/promremotewrite] downscale exponential histograms for prometheus #24026

Conversation

krajorama
Copy link
Contributor

Description:

Implement down-scaling of exponential histograms to Prometheus native histograms in Prometheus remote writer.
Configuration of down-scaling is TBD.

Link to tracking Issue:

Fixes: #17565

Testing:

Unit tests.

Documentation:

TBD

@krajorama krajorama requested review from a team, Aneurysm9 and kovrus as code owners July 7, 2023 06:21
@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Jul 7, 2023

CLA Signed

The committers listed above are authorized under a signed CLA.

@kovrus
Copy link
Member

kovrus commented Jul 7, 2023

@krajorama please sing the CLA

@krajorama krajorama changed the title promremotewrite: Fix histogram unit test observation count [pkg/translate/promremotewrite] downscale exponential histograms for prometheus Jul 7, 2023
@krajorama

This comment was marked as outdated.

@krajorama
Copy link
Contributor Author

After some optimizations I got this benchmark compared to baseline:

goos: linux
goarch: amd64
pkg: github.com/open-telemetry/opentelemetry-collector-contrib/pkg/translator/prometheusremotewrite
cpu: AMD Ryzen 7 4700U with Radeon Graphics         
                            │ bench-orig.txt │           bench-new3.txt           │
                            │     sec/op     │   sec/op     vs base               │
ConvertBucketLayout/gap_0-8      17.30µ ± 1%   18.18µ ± 1%  +5.07% (p=0.000 n=10)
ConvertBucketLayout/gap_1-8      16.05µ ± 1%   16.91µ ± 1%  +5.34% (p=0.000 n=10)
ConvertBucketLayout/gap_2-8      15.61µ ± 1%   16.94µ ± 1%  +8.55% (p=0.000 n=10)
ConvertBucketLayout/gap_3-8      18.58µ ± 1%   19.47µ ± 1%  +4.76% (p=0.000 n=10)
geomean                          16.85µ        17.84µ       +5.92%

Looking at the profile, half the time is spent in reallocating the deltas and probably spans slices. That could be something to look at , although for small numbers it's probably fine.

@krajorama krajorama marked this pull request as ready for review July 9, 2023 17:14
Copy link
Member

@kovrus kovrus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

Copy link
Member

@jpkrohling jpkrohling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I verified the tests, checking that the changes aren't breaking.

@jpkrohling
Copy link
Member

CI is failing with an unrelated change, I'll rebase once the fix is merged.

krajorama and others added 10 commits July 12, 2023 11:41
Add unit test to check that upscaling is still rejected and downscaling
happens above scale 8. Test edges 8 and 9.
Further detailed tests to follow in TestConvertBucketsLayout.

Signed-off-by: György Krajcsovits <[email protected]>
Signed-off-by: György Krajcsovits <[email protected]>
Add a new parameter that will be used to tell the function how many
buckets to merge when converting the bucket layouts.
Not use yet. Add disabled test cases.

Signed-off-by: György Krajcsovits <[email protected]>
Signed-off-by: György Krajcsovits <[email protected]>
Signed-off-by: György Krajcsovits <[email protected]>
Signed-off-by: György Krajcsovits <[email protected]>
Saves about 10%.


Signed-off-by: György Krajcsovits <[email protected]>
We don't need to carry nextBucketIdx outside the loop since we can easily
recalculate its value.

Signed-off-by: György Krajcsovits <[email protected]>
@jpkrohling jpkrohling force-pushed the promremotewrite-downscale-histograms-17565 branch from 70ae915 to af9e160 Compare July 12, 2023 14:41
Copy link
Member

@Aneurysm9 Aneurysm9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple nits and a question. I'd approve but for the question about why apparently unrelated test changes seem to have been made.

@jpkrohling jpkrohling merged commit cfeecd8 into open-telemetry:main Jul 17, 2023
89 checks passed
@github-actions github-actions bot added this to the next release milestone Jul 17, 2023
@Dogacel
Copy link

Dogacel commented Jul 17, 2023

Hoping this will resolve our issues with prometheus histograms 🙏 Putting the error we have seen here for visibility.

Exporting failed. The error is not retryable. Dropping data. {"kind": "exporter", "data_type": "metrics", "name": "prometheusremotewrite", "error": "Permanent error: cannot convert exponential to native histogram. Scale must be <= 8 and >= -4, was 20;

krajorama added a commit to grafana/mimir that referenced this pull request Jul 18, 2023
krajorama added a commit to grafana/mimir that referenced this pull request Jul 18, 2023
* vendor update opentelemetry-collector-contrib/pkg/translator/prometheusremotewrite

Get open-telemetry/opentelemetry-collector-contrib#24026

Signed-off-by: György Krajcsovits <[email protected]>

* Update CHANGELOG.md

Co-authored-by: Charles Korn <[email protected]>

---------

Signed-off-by: György Krajcsovits <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[pkg/translator/prometheusremotewrite] Implement Exponential Histogram downscaling
8 participants