[pkg/pdatautil] Fixed panic in nested map hashing #18912

cpheps · 2023-02-24T16:10:15Z

Description:
Fixes #18910

Changed how the keysBuf works so that nested maps will append their keys to the buffer for that recursion then they will be removed before the recursion exist. This allows keeping a single buffer while also protecting ensure recursion doesn't pollute the buffer order.

Link to tracking Issue: #18910

Testing: Added a unit test that could replicate the panic in the old code.

runforesight · 2023-02-24T16:11:29Z

Foresight Summary

Major Impacts

TestStartAndShutdownRemote ❌ failed 2 times in 13 runs (15% fail rate).

build-and-test duration(20 minutes 1 second) has decreased 54 minutes 20 seconds compared to main branch avg(1 hour 14 minutes 21 seconds).

View More Details

⭕ build-and-test-windows workflow has finished in 7 seconds (43 minutes 2 seconds less than `main` branch avg.) and finished at 27th Feb, 2023.

Job	Failed Steps	Tests
windows-unittest-matrix	- 🔗	N/A	See Details
windows-unittest	- 🔗	N/A	See Details

✅ check-links workflow has finished in 54 seconds (1 minute 45 seconds less than `main` branch avg.) and finished at 27th Feb, 2023.

Job	Failed Steps	Tests
changed files	- 🔗	N/A	See Details
check-links	- 🔗	N/A	See Details

✅ telemetrygen workflow has finished in 1 minute 1 second (2 minutes 17 seconds less than `main` branch avg.) and finished at 27th Feb, 2023.

Job	Failed Steps	Tests
build-dev	- 🔗	N/A	See Details
publish-latest	- 🔗	N/A	See Details
publish-stable	- 🔗	N/A	See Details

✅ changelog workflow has finished in 1 minute 37 seconds (1 minute 9 seconds less than `main` branch avg.) and finished at 27th Feb, 2023.

Job	Failed Steps	Tests
changelog	- 🔗	N/A	See Details

✅ prometheus-compliance-tests workflow has finished in 3 minutes 20 seconds (6 minutes 7 seconds less than `main` branch avg.) and finished at 27th Feb, 2023.

Job	Failed Steps	Tests
prometheus-compliance-tests	- 🔗	✅ 21 ❌ 0 ⏭ 0 🔗	See Details

✅ load-tests workflow has finished in 7 minutes 3 seconds (11 minutes 17 seconds less than `main` branch avg.) and finished at 27th Feb, 2023.

Job	Failed Steps	Tests
loadtest (TestTraceAttributesProcessor)	- 🔗	✅ 3 ❌ 0 ⏭ 0 🔗	See Details
loadtest (TestIdleMode)	- 🔗	✅ 1 ❌ 0 ⏭ 0 🔗	See Details
loadtest (TestMetric10kDPS\|TestMetricsFromFile)	- 🔗	✅ 6 ❌ 0 ⏭ 0 🔗	See Details
loadtest (TestTraceNoBackend10kSPS\|TestTrace1kSPSWithAttrs)	- 🔗	✅ 8 ❌ 0 ⏭ 0 🔗	See Details
loadtest (TestMetricResourceProcessor\|TestTrace10kSPS)	- 🔗	✅ 12 ❌ 0 ⏭ 0 🔗	See Details
loadtest (TestTraceBallast1kSPSWithAttrs\|TestTraceBallast1kSPSAddAttrs)	- 🔗	✅ 10 ❌ 0 ⏭ 0 🔗	See Details
setup-environment	- 🔗	N/A	See Details
loadtest (TestBallastMemory\|TestLog10kDPS)	- 🔗	✅ 18 ❌ 0 ⏭ 0 🔗	See Details

✅ e2e-tests workflow has finished in 13 minutes 17 seconds (3 minutes 11 seconds less than `main` branch avg.) and finished at 27th Feb, 2023.

Job	Failed Steps	Tests
kubernetes-test (v1.26.0)	- 🔗	N/A	See Details
kubernetes-test (v1.25.3)	- 🔗	N/A	See Details
kubernetes-test (v1.24.7)	- 🔗	N/A	See Details
kubernetes-test (v1.23.13)	- 🔗	N/A	See Details

build-and-test workflow has finished in 12 minutes 37 seconds (1 hour 1 minute 44 seconds less than `main` branch avg.) and finished at 27th Feb, 2023.

Job	Failed Steps	Tests
unittest-matrix (1.19, receiver-0)	N/A	✅ 2580 ❌ 0 ⏭ 0 🔗	See Details
unittest-matrix (1.20, receiver-0)	N/A	✅ 2580 ❌ 0 ⏭ 0 🔗	See Details
unittest-matrix (1.19, exporter)	N/A	✅ 2456 ❌ 0 ⏭ 0 🔗	See Details
unittest-matrix (1.19, receiver-1)	N/A	✅ 1937 ❌ 0 ⏭ 0 🔗	See Details
unittest-matrix (1.20, receiver-1)	N/A	✅ 1937 ❌ 0 ⏭ 0 🔗	See Details
unittest-matrix (1.20, exporter)	N/A	✅ 2456 ❌ 0 ⏭ 0 🔗	See Details
unittest-matrix (1.19, other)	N/A	✅ 4772 ❌ 0 ⏭ 0 🔗	See Details
unittest-matrix (1.20, other)	N/A	✅ 4772 ❌ 0 ⏭ 0 🔗	See Details

🔎 See details on Foresight

^{*You can configure Foresight comments in your organization settings page.}

cpheps · 2023-02-27T14:33:59Z

I don't think PR failures are related to my changes.

djaglowski

Rather than throw out the optimization altogether, can we find a way to handle recursion correctly?

At a minimum, I think we could pass a bool to writeMapHash that indicates whether it is a recursive call or not. MapHash can pass false, while writeValueHash passes true. Then at least we keep the optimization for non-recursive maps, which is likely the majority of cases.

Possibly an even better solution would have nested maps request their own hashWriter from the pool and use that, rather than polluting the existing one. This would require a bit more work, so maybe can be a separate issue.

cpheps · 2023-02-27T17:13:40Z

@djaglowski I reworked this to use a single buffer and some slice tricks to only focus on the current recursions key set. Let me know if this seems like a better solution.

pkg/pdatautil/hash_test.go

pkg/pdatautil/hash.go

Signed-off-by: Corbin Phelps <[email protected]>

…writes Signed-off-by: Corbin Phelps <[email protected]>

Signed-off-by: Corbin Phelps <[email protected]>

djaglowski

LGTM

* Fixed panic in pdatautil map hashing Signed-off-by: Corbin Phelps <[email protected]>

cpheps requested review from a team and dmitryax as code owners February 24, 2023 16:10

github-actions bot assigned Aneurysm9 Feb 24, 2023

github-actions bot added the pkg/pdatautil label Feb 24, 2023

cpheps force-pushed the fix/pdata-util-nested-map-panic branch from 3a6e8b1 to 4f9ba25 Compare February 27, 2023 13:22

djaglowski reviewed Feb 27, 2023

View reviewed changes

djaglowski changed the title ~~Fixed panic in pdatautil nested map hashing~~ [pkg/pdatautil] Fixed panic in nested map hashing Feb 27, 2023

djaglowski reviewed Feb 27, 2023

View reviewed changes

pkg/pdatautil/hash_test.go Outdated Show resolved Hide resolved

pkg/pdatautil/hash.go Show resolved Hide resolved

Corbin Phelps added 4 commits February 27, 2023 12:44

Fixed panic in pdatautil map hashing

c2dbea8

Signed-off-by: Corbin Phelps <[email protected]>

Corrected spelling/grammer in comments

3ae4adc

Signed-off-by: Corbin Phelps <[email protected]>

Changed pdatautil to use slicing of a single buffer during recursive …

82483b1

…writes Signed-off-by: Corbin Phelps <[email protected]>

Condensed tests and added comments

33fe06f

Signed-off-by: Corbin Phelps <[email protected]>

cpheps force-pushed the fix/pdata-util-nested-map-panic branch from 65537c8 to 33fe06f Compare February 27, 2023 17:44

Updated comment to include non-recursive case

fc03814

Signed-off-by: Corbin Phelps <[email protected]>

djaglowski approved these changes Feb 27, 2023

View reviewed changes

dmitryax approved these changes Feb 27, 2023

View reviewed changes

djaglowski merged commit 848486f into open-telemetry:main Feb 27, 2023

newly12 pushed a commit to newly12/opentelemetry-collector-contrib that referenced this pull request Feb 28, 2023

[pkg/pdatautil] Fixed panic in nested map hashing (open-telemetry#18912)

0bbf7ec

* Fixed panic in pdatautil map hashing Signed-off-by: Corbin Phelps <[email protected]>

cpheps deleted the fix/pdata-util-nested-map-panic branch February 28, 2023 13:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pkg/pdatautil] Fixed panic in nested map hashing #18912

[pkg/pdatautil] Fixed panic in nested map hashing #18912

cpheps commented Feb 24, 2023 •

edited

Loading

runforesight bot commented Feb 24, 2023 •

edited

Loading

`TestStartAndShutdownRemote` ❌ failed 2 times in 13 runs (15% fail rate).

`build-and-test` duration(20 minutes 1 second) has decreased 54 minutes 20 seconds compared to main branch avg(1 hour 14 minutes 21 seconds).

⭕ build-and-test-windows workflow has finished in 7 seconds (43 minutes 2 seconds less than `main` branch avg.) and finished at 27th Feb, 2023.

✅ check-links workflow has finished in 54 seconds (1 minute 45 seconds less than `main` branch avg.) and finished at 27th Feb, 2023.

✅ telemetrygen workflow has finished in 1 minute 1 second (2 minutes 17 seconds less than `main` branch avg.) and finished at 27th Feb, 2023.

✅ changelog workflow has finished in 1 minute 37 seconds (1 minute 9 seconds less than `main` branch avg.) and finished at 27th Feb, 2023.

✅ prometheus-compliance-tests workflow has finished in 3 minutes 20 seconds (6 minutes 7 seconds less than `main` branch avg.) and finished at 27th Feb, 2023.

✅ load-tests workflow has finished in 7 minutes 3 seconds (11 minutes 17 seconds less than `main` branch avg.) and finished at 27th Feb, 2023.

✅ e2e-tests workflow has finished in 13 minutes 17 seconds (3 minutes 11 seconds less than `main` branch avg.) and finished at 27th Feb, 2023.

build-and-test workflow has finished in 12 minutes 37 seconds (1 hour 1 minute 44 seconds less than `main` branch avg.) and finished at 27th Feb, 2023.

cpheps commented Feb 27, 2023

djaglowski left a comment •

edited

Loading

cpheps commented Feb 27, 2023

djaglowski left a comment

[pkg/pdatautil] Fixed panic in nested map hashing #18912

[pkg/pdatautil] Fixed panic in nested map hashing #18912

Conversation

cpheps commented Feb 24, 2023 • edited Loading

runforesight bot commented Feb 24, 2023 • edited Loading

Foresight Summary

TestStartAndShutdownRemote ❌ failed 2 times in 13 runs (15% fail rate).

build-and-test duration(20 minutes 1 second) has decreased 54 minutes 20 seconds compared to main branch avg(1 hour 14 minutes 21 seconds).

⭕ build-and-test-windows workflow has finished in 7 seconds (43 minutes 2 seconds less than main branch avg.) and finished at 27th Feb, 2023.

✅ check-links workflow has finished in 54 seconds (1 minute 45 seconds less than main branch avg.) and finished at 27th Feb, 2023.

✅ telemetrygen workflow has finished in 1 minute 1 second (2 minutes 17 seconds less than main branch avg.) and finished at 27th Feb, 2023.

✅ changelog workflow has finished in 1 minute 37 seconds (1 minute 9 seconds less than main branch avg.) and finished at 27th Feb, 2023.

✅ prometheus-compliance-tests workflow has finished in 3 minutes 20 seconds (6 minutes 7 seconds less than main branch avg.) and finished at 27th Feb, 2023.

✅ load-tests workflow has finished in 7 minutes 3 seconds (11 minutes 17 seconds less than main branch avg.) and finished at 27th Feb, 2023.

✅ e2e-tests workflow has finished in 13 minutes 17 seconds (3 minutes 11 seconds less than main branch avg.) and finished at 27th Feb, 2023.

build-and-test workflow has finished in 12 minutes 37 seconds (1 hour 1 minute 44 seconds less than main branch avg.) and finished at 27th Feb, 2023.

cpheps commented Feb 27, 2023

djaglowski left a comment • edited Loading

Choose a reason for hiding this comment

cpheps commented Feb 27, 2023

djaglowski left a comment

Choose a reason for hiding this comment

cpheps commented Feb 24, 2023 •

edited

Loading

runforesight bot commented Feb 24, 2023 •

edited

Loading

`TestStartAndShutdownRemote` ❌ failed 2 times in 13 runs (15% fail rate).

`build-and-test` duration(20 minutes 1 second) has decreased 54 minutes 20 seconds compared to main branch avg(1 hour 14 minutes 21 seconds).

⭕ build-and-test-windows workflow has finished in 7 seconds (43 minutes 2 seconds less than `main` branch avg.) and finished at 27th Feb, 2023.

✅ check-links workflow has finished in 54 seconds (1 minute 45 seconds less than `main` branch avg.) and finished at 27th Feb, 2023.

✅ telemetrygen workflow has finished in 1 minute 1 second (2 minutes 17 seconds less than `main` branch avg.) and finished at 27th Feb, 2023.

✅ changelog workflow has finished in 1 minute 37 seconds (1 minute 9 seconds less than `main` branch avg.) and finished at 27th Feb, 2023.

✅ prometheus-compliance-tests workflow has finished in 3 minutes 20 seconds (6 minutes 7 seconds less than `main` branch avg.) and finished at 27th Feb, 2023.

✅ load-tests workflow has finished in 7 minutes 3 seconds (11 minutes 17 seconds less than `main` branch avg.) and finished at 27th Feb, 2023.

✅ e2e-tests workflow has finished in 13 minutes 17 seconds (3 minutes 11 seconds less than `main` branch avg.) and finished at 27th Feb, 2023.

build-and-test workflow has finished in 12 minutes 37 seconds (1 hour 1 minute 44 seconds less than `main` branch avg.) and finished at 27th Feb, 2023.

djaglowski left a comment •

edited

Loading