[exporter/loadbalacing] Refactor how metrics are split and then re-joined after load-balacing #33293

RichieSams · 2024-05-29T16:58:44Z

Description: The previous code splits an incoming pmetric.Metrics into individual pmetric.Metrics instances, at the granularity of pmetric.Metric. Then afterwards, it used the various routingID functions to create a map of booleans, in order to define how the metrics should be routed. Finally, it merged the metrics by routing key, and exported them by concatenating them all together. While this worked, it's somewhat hard to follow, and inefficient for most of the routingIDs. In a future PR, we'd like to add a new routingID, which would require splitting at the datapoint level. This would add a ton of extra work for the other routingIDs, which don't care about specific datapoints.

Therefore, the new code has dedicated splitting functions for each routingID. These functions directly return a map[string]pmetric.Metrics instance. IE, a map of routing keys to its metrics. These functions can be unit tested directly, and makes the logic in ConsumeMetrics() very easy to follow. Lastly, when combining metrics for routing, the new code utilizes the MergeMetrics() helper function from internal/exp/metrics. This merges the metrics and removes duplicate ResourceMetrics / ScopeMetrics instances. Which saves compute and bandwidth for serialization downstream.

Link to tracking Issue: 32513

Testing: I created a full suite of tests for each routingID enum. For both single endpoint, as well an multi-endpoint loadbalancing

Documentation: The code is documented in comments. I added a changelog entry to explain changes for users

…ined after load-balacing

jpkrohling · 2024-05-30T07:56:52Z

@RichieSams, is this ready for review?

RichieSams · 2024-05-30T21:52:26Z

It's ready for technical review. I just need to add a changelog and fix up the CI tests

So we can rely on the compiler to make sure we don't have typos

.chloggen/loadbalancer_exporter_refactor.yaml

fatsheep9146 · 2024-06-06T05:31:12Z

please fix the conflicts

RichieSams · 2024-06-06T13:41:03Z

please fix the conflicts

Fixed. This PR is ready for final review and merge

fatsheep9146

Still have non-resolved failed checks

--- a/exporter/loadbalancingexporter/go.mod
+++ b/exporter/loadbalancingexporter/go.mod
@@ -74,7 +74,6 @@ require (
 	github.com/imdario/mergo v0.3.6 // indirect
 	github.com/inconshreveable/mousetrap v1.1.0 // indirect
 	github.com/josharian/intern v1.0.0 // indirect
-	github.com/json-iterator/go v1.1.12 // indirect
 	github.com/klauspost/compress v1.17.8 // indirect
 	github.com/knadh/koanf/maps v0.1.1 // indirect
 	github.com/knadh/koanf/providers/confmap v0.1.0 // indirect
go.mod/go.sum deps changes detected, please run "make gotidy" and commit the changes in this PR.

RichieSams · 2024-06-12T12:58:17Z

Tests are failing on an unrelated issue:

         	Error Trace:	/home/runner/work/opentelemetry-collector-contrib/opentelemetry-collector-contrib/receiver/kubeletstatsreceiver/e2e_test.go:83
        	            				/home/runner/work/opentelemetry-collector-contrib/opentelemetry-collector-contrib/receiver/kubeletstatsreceiver/e2e_test.go:54
        	Error:      	Condition never satisfied
        	Test:       	TestE2E
        	Messages:   	failed to receive 10 entries,  received 0 metrics in 3 minutes

All other tests are green

fatsheep9146 · 2024-06-13T13:11:36Z

Tests are failing on an unrelated issue:

         	Error Trace:	/home/runner/work/opentelemetry-collector-contrib/opentelemetry-collector-contrib/receiver/kubeletstatsreceiver/e2e_test.go:83
        	            				/home/runner/work/opentelemetry-collector-contrib/opentelemetry-collector-contrib/receiver/kubeletstatsreceiver/e2e_test.go:54
        	Error:      	Condition never satisfied
        	Test:       	TestE2E
        	Messages:   	failed to receive 10 entries,  received 0 metrics in 3 minutes

All other tests are green

yes, this is a known bug

RichieSams · 2024-06-14T15:52:12Z

Woohoo! everything is green again. Are we ok to merge now?

RichieSams · 2024-06-17T19:50:30Z

@jpkrohling Is this ok to merge?

mx-psi · 2024-06-18T09:39:10Z

Thanks! Since we are in the middle of a release, I will put a reminder to merge this right after this release happens (so this would be available on v0.104.0)

mx-psi · 2024-06-19T15:41:57Z

@RichieSams Can you fix the merge conflicts so that I can merge this? Thanks!

RichieSams · 2024-06-19T16:43:01Z

@RichieSams Can you fix the merge conflicts so that I can merge this? Thanks!

Updated. Thanks!

jpkrohling · 2024-06-20T09:04:57Z

@RichieSams, sorry I wasn't able to review this before it got merged. Would you be willing to add benchmarks, assessing that this change is at least on par with the previous version in terms of performance?

RichieSams · 2024-06-20T13:06:31Z

Sure thing. Before this change:

Running tool: /usr/local/go/bin/go test -benchmem -run=^$ -bench ^(BenchmarkConsumeMetrics_1E100T|BenchmarkConsumeMetrics_1E1000T|BenchmarkConsumeMetrics_5E100T|BenchmarkConsumeMetrics_5E500T|BenchmarkConsumeMetrics_5E1000T|BenchmarkConsumeMetrics_10E100T|BenchmarkConsumeMetrics_10E500T|BenchmarkConsumeMetrics_10E1000T)$ github.com/open-telemetry/opentelemetry-collector-contrib/exporter/loadbalancingexporter

goos: linux
goarch: amd64
pkg: github.com/open-telemetry/opentelemetry-collector-contrib/exporter/loadbalancingexporter
cpu: Intel(R) Xeon(R) CPU E5-2687W v4 @ 3.00GHz
BenchmarkConsumeMetrics_1E100T-24      	    6747	    189668 ns/op	   72946 B/op	    1522 allocs/op
BenchmarkConsumeMetrics_1E1000T-24     	     633	   1715175 ns/op	  713044 B/op	   15028 allocs/op
BenchmarkConsumeMetrics_5E100T-24      	    5209	    198354 ns/op	   74243 B/op	    1557 allocs/op
BenchmarkConsumeMetrics_5E500T-24      	    1438	    762235 ns/op	  360968 B/op	    7567 allocs/op
BenchmarkConsumeMetrics_5E1000T-24     	     504	   1996786 ns/op	  719125 B/op	   15072 allocs/op
BenchmarkConsumeMetrics_10E100T-24     	    6991	    160781 ns/op	   76704 B/op	    1610 allocs/op
BenchmarkConsumeMetrics_10E500T-24     	    1448	    767864 ns/op	  362681 B/op	    7630 allocs/op
BenchmarkConsumeMetrics_10E1000T-24    	     624	   1641427 ns/op	  720533 B/op	   15140 allocs/op

After:

Running tool: /usr/local/go/bin/go test -benchmem -run=^$ -bench ^(BenchmarkConsumeMetrics_1E100T|BenchmarkConsumeMetrics_1E1000T|BenchmarkConsumeMetrics_5E100T|BenchmarkConsumeMetrics_5E500T|BenchmarkConsumeMetrics_5E1000T|BenchmarkConsumeMetrics_10E100T|BenchmarkConsumeMetrics_10E500T|BenchmarkConsumeMetrics_10E1000T)$ github.com/open-telemetry/opentelemetry-collector-contrib/exporter/loadbalancingexporter

goos: linux
goarch: amd64
pkg: github.com/open-telemetry/opentelemetry-collector-contrib/exporter/loadbalancingexporter
cpu: Intel(R) Xeon(R) CPU E5-2687W v4 @ 3.00GHz
BenchmarkConsumeMetrics_1E100T-24      	  336166	      4425 ns/op	    1476 B/op	      29 allocs/op
BenchmarkConsumeMetrics_1E1000T-24     	  257836	      4008 ns/op	    1479 B/op	      29 allocs/op
BenchmarkConsumeMetrics_5E100T-24      	   77956	     15485 ns/op	    5744 B/op	     130 allocs/op
BenchmarkConsumeMetrics_5E500T-24      	   59664	     18696 ns/op	    5759 B/op	     130 allocs/op
BenchmarkConsumeMetrics_5E1000T-24     	   71922	     15106 ns/op	    5694 B/op	     130 allocs/op
BenchmarkConsumeMetrics_10E100T-24     	   27265	     40060 ns/op	   12907 B/op	     268 allocs/op
BenchmarkConsumeMetrics_10E500T-24     	   27732	     37414 ns/op	   12892 B/op	     268 allocs/op
BenchmarkConsumeMetrics_10E1000T-24    	   40455	     29029 ns/op	   12957 B/op	     268 allocs/op

It's significantly faster and uses significantly less memory

RichieSams · 2024-06-20T13:08:15Z

My next change updates the benchmarks to also test different combinations of ResourceMetrics, ScopeMetrics, Metrics, and DataPoints

jpkrohling · 2024-06-20T13:20:18Z

Wonderful, thank you!

RichieSams · 2024-06-20T14:01:04Z

#33676

…ined after load-balacing (open-telemetry#33293) **Description:** The previous code splits an incoming `pmetric.Metrics` into individual `pmetric.Metrics` instances, at the granularity of `pmetric.Metric`. Then afterwards, it used the various routingID functions to create a map of booleans, in order to define how the metrics should be routed. Finally, it merged the metrics by routing key, and exported them by concatenating them all together. While this worked, it's somewhat hard to follow, and inefficient for most of the routingIDs. In a future PR, we'd like to add a new routingID, which would require splitting at the datapoint level. This would add a ton of extra work for the other routingIDs, which don't care about specific datapoints. Therefore, the new code has dedicated splitting functions for each routingID. These functions directly return a `map[string]pmetric.Metrics` instance. IE, a map of routing keys to its metrics. These functions can be unit tested directly, and makes the logic in `ConsumeMetrics()` very easy to follow. Lastly, when combining metrics for routing, the new code utilizes the `MergeMetrics()` helper function from `internal/exp/metrics`. This merges the metrics and removes duplicate ResourceMetrics / ScopeMetrics instances. Which saves compute and bandwidth for serialization downstream. **Link to tracking Issue:** 32513 **Testing:** I created a full suite of tests for each routingID enum. For both single endpoint, as well an multi-endpoint loadbalancing **Documentation:** The code is documented in comments. I added a changelog entry to explain changes for users --------- Co-authored-by: Pablo Baeyens <[email protected]>

[exporter/loadbalacing] Refactor how metrics are split and then re-jo…

9edc779

…ined after load-balacing

RichieSams requested review from jpkrohling, a team, mx-psi and dmitryax as code owners May 29, 2024 16:58

RichieSams marked this pull request as draft May 29, 2024 16:58

github-actions bot assigned fatsheep9146 May 29, 2024

github-actions bot added cmd/configschema configschema command exporter/loadbalancing labels May 29, 2024

RichieSams added 2 commits June 3, 2024 14:02

chore: Add str const variables for the routingKeys

ca0556f

So we can rely on the compiler to make sure we don't have typos

Create changelog

abf10cf

RichieSams marked this pull request as ready for review June 3, 2024 18:06

github-actions bot assigned bogdandrutu Jun 3, 2024

RichieSams added 2 commits June 3, 2024 14:07

Merge branch 'main' into loadbalancingexporter_refactor

c5c3fc9

fix tests

38ebe4e

fatsheep9146 reviewed Jun 6, 2024

View reviewed changes

.chloggen/loadbalancer_exporter_refactor.yaml Show resolved Hide resolved

RichieSams added 2 commits June 6, 2024 08:55

Merge branch 'main' into loadbalancingexporter_refactor

98b66ae

tidy

0a09e7e

RichieSams requested a review from fatsheep9146 June 6, 2024 13:41

Merge branch 'main' into loadbalancingexporter_refactor

0757626

fatsheep9146 reviewed Jun 11, 2024

View reviewed changes

RichieSams added 3 commits June 11, 2024 15:41

Make lint and tidy

be36f01

Merge branch 'main' into loadbalancingexporter_refactor

3575637

Merge branch 'main' into loadbalancingexporter_refactor

d6a216f

Merge branch 'main' into loadbalancingexporter_refactor

e770a6d

Merge branch 'main' into loadbalancingexporter_refactor

058ed29

Merge branch 'main' into loadbalancingexporter_refactor

2d995a7

fatsheep9146 approved these changes Jun 14, 2024

View reviewed changes

RichieSams added 2 commits June 19, 2024 12:40

Merge branch 'main' into loadbalancingexporter_refactor

60232cb

make tidy

238ac83

make gotidy

892ff42

mx-psi merged commit d6eaca8 into open-telemetry:main Jun 19, 2024
154 checks passed

github-actions bot added this to the next release milestone Jun 19, 2024

RichieSams deleted the loadbalancingexporter_refactor branch June 20, 2024 12:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[exporter/loadbalacing] Refactor how metrics are split and then re-joined after load-balacing #33293

[exporter/loadbalacing] Refactor how metrics are split and then re-joined after load-balacing #33293

RichieSams commented May 29, 2024 •

edited

Loading

jpkrohling commented May 30, 2024

RichieSams commented May 30, 2024

fatsheep9146 commented Jun 6, 2024

RichieSams commented Jun 6, 2024

fatsheep9146 left a comment

RichieSams commented Jun 12, 2024

fatsheep9146 commented Jun 13, 2024

RichieSams commented Jun 14, 2024

RichieSams commented Jun 17, 2024

mx-psi commented Jun 18, 2024

mx-psi commented Jun 19, 2024

RichieSams commented Jun 19, 2024

jpkrohling commented Jun 20, 2024

RichieSams commented Jun 20, 2024

RichieSams commented Jun 20, 2024

jpkrohling commented Jun 20, 2024

RichieSams commented Jun 20, 2024

[exporter/loadbalacing] Refactor how metrics are split and then re-joined after load-balacing #33293

[exporter/loadbalacing] Refactor how metrics are split and then re-joined after load-balacing #33293

Conversation

RichieSams commented May 29, 2024 • edited Loading

jpkrohling commented May 30, 2024

RichieSams commented May 30, 2024

fatsheep9146 commented Jun 6, 2024

RichieSams commented Jun 6, 2024

fatsheep9146 left a comment

Choose a reason for hiding this comment

RichieSams commented Jun 12, 2024

fatsheep9146 commented Jun 13, 2024

RichieSams commented Jun 14, 2024

RichieSams commented Jun 17, 2024

mx-psi commented Jun 18, 2024

mx-psi commented Jun 19, 2024

RichieSams commented Jun 19, 2024

jpkrohling commented Jun 20, 2024

RichieSams commented Jun 20, 2024

RichieSams commented Jun 20, 2024

jpkrohling commented Jun 20, 2024

RichieSams commented Jun 20, 2024

RichieSams commented May 29, 2024 •

edited

Loading