Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition in testbed load generator when parallel is used #32326

Closed
lahsivjar opened this issue Apr 11, 2024 · 1 comment · Fixed by #32351
Closed

Race condition in testbed load generator when parallel is used #32326

lahsivjar opened this issue Apr 11, 2024 · 1 comment · Fixed by #32351
Labels
bug Something isn't working testbed

Comments

@lahsivjar
Copy link
Contributor

lahsivjar commented Apr 11, 2024

Component(s)

testbed

What happened?

Description

If the ProviderSender#Start is called to run with more than one goroutine then a race condition occurs due to the shared variable ProviderSender#prevError.

Steps to Reproduce

  1. Write a test or edit an existing one to call ProviderSender#Start with load options having LoadOptions#Parallel value greater than 1. Example:
diff --git a/testbed/testbed/mock_backend_test.go b/testbed/testbed/mock_backend_test.go
index 98882d87cb..4763a1f1ba 100644
--- a/testbed/testbed/mock_backend_test.go
+++ b/testbed/testbed/mock_backend_test.go
@@ -48,7 +48,7 @@ func TestGeneratorAndBackend(t *testing.T) {
 			assert.EqualValues(t, 0, lg.DataItemsSent())
 
 			// Generate at 1000 SPS
-			lg.Start(LoadOptions{DataItemsPerSecond: 1000})
+			lg.Start(LoadOptions{DataItemsPerSecond: 1000, Parallel: 2})
 			// ensure teardown in failure scenario
 			t.Cleanup(lg.Stop)
  1. Run the test using make test or go test -v -race ./...
  2. Observe the output for the race condition.

Expected Result

No race condition and the tests pass.

Actual Result

Race conditions found by the race-detector.

Collector version

v0.98.0

Environment information

Environment

OS: (e.g., "Ubuntu 20.04")
Compiler(if manually compiled): (e.g., "go 14.2")

OpenTelemetry Collector configuration

No response

Log output

==================
WARNING: DATA RACE
Write at 0x00c000466160 by goroutine 59:
  github.com/open-telemetry/opentelemetry-collector-contrib/testbed/testbed.(*ProviderSender).generateTrace()
      /Users/lahsivjar/Projects/elastic/opentelemetry-collector-contrib/testbed/testbed/load_generator.go:250 +0x2d0
  github.com/open-telemetry/opentelemetry-collector-contrib/testbed/testbed.(*ProviderSender).generate.func1()
      /Users/lahsivjar/Projects/elastic/opentelemetry-collector-contrib/testbed/testbed/load_generator.go:218 +0x1f0

Previous write at 0x00c000466160 by goroutine 60:
  github.com/open-telemetry/opentelemetry-collector-contrib/testbed/testbed.(*ProviderSender).generateTrace()
      /Users/lahsivjar/Projects/elastic/opentelemetry-collector-contrib/testbed/testbed/load_generator.go:250 +0x2d0
  github.com/open-telemetry/opentelemetry-collector-contrib/testbed/testbed.(*ProviderSender).generate.func1()
      /Users/lahsivjar/Projects/elastic/opentelemetry-collector-contrib/testbed/testbed/load_generator.go:218 +0x1f0

Goroutine 59 (running) created at:
  github.com/open-telemetry/opentelemetry-collector-contrib/testbed/testbed.(*ProviderSender).generate()
      /Users/lahsivjar/Projects/elastic/opentelemetry-collector-contrib/testbed/testbed/load_generator.go:209 +0x23c
  github.com/open-telemetry/opentelemetry-collector-contrib/testbed/testbed.(*ProviderSender).Start.func2()
      /Users/lahsivjar/Projects/elastic/opentelemetry-collector-contrib/testbed/testbed/load_generator.go:119 +0x34

Goroutine 60 (running) created at:
  github.com/open-telemetry/opentelemetry-collector-contrib/testbed/testbed.(*ProviderSender).generate()
      /Users/lahsivjar/Projects/elastic/opentelemetry-collector-contrib/testbed/testbed/load_generator.go:209 +0x23c
  github.com/open-telemetry/opentelemetry-collector-contrib/testbed/testbed.(*ProviderSender).Start.func2()
      /Users/lahsivjar/Projects/elastic/opentelemetry-collector-contrib/testbed/testbed/load_generator.go:119 +0x34
==================

Additional context

Possible solution: Make prevError local to the generate{Log,Metrics,Trace} and remove it from ProviderSender -- not sure if I am missing something here but that should be enough. If we require the prevError to be a field in the ProviderSender then we can use atomic.Value.

@lahsivjar lahsivjar added bug Something isn't working needs triage New item requiring triage labels Apr 11, 2024
Copy link
Contributor

Pinging code owners:

  • testbed: @open-telemetry/collector-approvers

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@crobert-1 crobert-1 removed the needs triage New item requiring triage label Apr 15, 2024
codeboten pushed a commit that referenced this issue May 1, 2024
Fixes a race condition in testbed's default load generator
(`ProviderSender`). The fix changes the behavior of how logging works
for the `ProviderSender`. Before this PR, the code will try to log
errors if the previous error is not the same and it would try to do this
across goroutines. Now, each error will be logged if the previous error
is not the same for each goroutine.

Alternatively, we can also build a logger using bloom filter to try to
log each error once though I am not sure if that would be required. This
PR offers a quick fix while keeping the behavior reasonably close to the
current behavior.

Closes
#32326

**Link to tracking Issue:**
#32326

**Testing:** Follow the steps in the tracking issue.
rimitchell pushed a commit to rimitchell/opentelemetry-collector-contrib that referenced this issue May 8, 2024
Fixes a race condition in testbed's default load generator
(`ProviderSender`). The fix changes the behavior of how logging works
for the `ProviderSender`. Before this PR, the code will try to log
errors if the previous error is not the same and it would try to do this
across goroutines. Now, each error will be logged if the previous error
is not the same for each goroutine.

Alternatively, we can also build a logger using bloom filter to try to
log each error once though I am not sure if that would be required. This
PR offers a quick fix while keeping the behavior reasonably close to the
current behavior.

Closes
open-telemetry#32326

**Link to tracking Issue:**
open-telemetry#32326

**Testing:** Follow the steps in the tracking issue.
cparkins pushed a commit to AmadeusITGroup/opentelemetry-collector-contrib that referenced this issue Jul 11, 2024
Fixes a race condition in testbed's default load generator
(`ProviderSender`). The fix changes the behavior of how logging works
for the `ProviderSender`. Before this PR, the code will try to log
errors if the previous error is not the same and it would try to do this
across goroutines. Now, each error will be logged if the previous error
is not the same for each goroutine.

Alternatively, we can also build a logger using bloom filter to try to
log each error once though I am not sure if that would be required. This
PR offers a quick fix while keeping the behavior reasonably close to the
current behavior.

Closes
open-telemetry#32326

**Link to tracking Issue:**
open-telemetry#32326

**Testing:** Follow the steps in the tracking issue.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working testbed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants