Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky test: TestTracingGoldenData/otlp-opencensus port already in use #27295

Open
songy23 opened this issue Oct 2, 2023 · 15 comments
Open

Flaky test: TestTracingGoldenData/otlp-opencensus port already in use #27295

songy23 opened this issue Oct 2, 2023 · 15 comments
Labels
bug Something isn't working flaky test a test is flaky never stale Issues marked with this label will be never staled and automatically removed testbed

Comments

@songy23
Copy link
Member

songy23 commented Oct 2, 2023

Component(s)

testbed

What happened?

See https://github.com/open-telemetry/opentelemetry-collector-contrib/actions/runs/6381486879/job/17318149920?pr=27291:

panic: cannot start pipelines: listen tcp 127.0.0.1:44455: bind: address already in use

goroutine 229 [running]:
github.com/open-telemetry/opentelemetry-collector-contrib/testbed/testbed.(*inProcessCollector).Start.func1()
	/home/runner/work/opentelemetry-collector-contrib/opentelemetry-collector-contrib/testbed/testbed/in_process_collector.go:89 +0x[99](https://github.com/open-telemetry/opentelemetry-collector-contrib/actions/runs/6381486879/job/17318149920?pr=27291#step:7:100)
created by github.com/open-telemetry/opentelemetry-collector-contrib/testbed/testbed.(*inProcessCollector).Start
	/home/runner/work/opentelemetry-collector-contrib/opentelemetry-collector-contrib/testbed/testbed/in_process_collector.go:85 +0x59e
exit status 2
FAIL	github.com/open-telemetry/opentelemetry-collector-contrib/testbed/correctnesstests/traces	3.165s
cat: results/TESTRESULTS.md: No such file or directory
make: Leaving directory '/home/runner/work/opentelemetry-collector-contrib/opentelemetry-collector-contrib/testbed'
make: *** [Makefile:37: run-correctness-traces-tests] Error 1
Error: Process completed with exit code 2.

Collector version

mainline

Environment information

No response

OpenTelemetry Collector configuration

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: "127.0.0.1:44455"
exporters:
  opencensus:
    endpoint: "127.0.0.1:44455"
    tls:
      insecure: true
processors:
  
  batch:
    send_batch_size: 1024



extensions:

service:
  extensions:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [opencensus]

Log output

No response

Additional context

No response

@crobert-1
Copy link
Member

Looks like a pretty simple bug from what I can tell. The config is trying to use the same 44455 port twice, resulting in the error shown. This is a test issue.

The test is attempting to get available local addresses, one for the receiver, one for the sender (exporter), and then attempts to use the result in its final running configuration. However, since it gets two available addresses independently, it simply returns the same available address twice sometimes. There's actually a comment in the code calling this out as a possibility as well. The ports aren't actually in use until the entire configuration is put together and the test bed runner is started, that's why the same port can be returned twice.

I think the simplest option is to check to make sure the data receiver's address is different than the sender when they're generated. The sender has a public property called GetEndpoint() that could be parsed to get the port, and the receiver has a Port property that could be used to check if they match. If they're the same, we could simply loop re-creating the receiver or sender until the ports no longer match.

There are some alternatives that could work as well. One would be somehow marking the port as used before it's actually used. Another option would be to plumb the first received port down the call stack, so it's not returned again by the GetAvailableLocalAddress method again. Yet another option, make the GetAvailableLocalAddress method take another argument like addressCount, where the user can specify how many available addresses they need. The method would then be able to internally check to make sure it's not returning duplicates, and return an array of addresses.

All of the alternatives end up being a lot of extra work and impact, when this is simply a test issue, that's why I think my main suggestion would make the most sense, even though it's not the most "thorough" solution.

@crobert-1 crobert-1 removed the needs triage New item requiring triage label Oct 30, 2023
Copy link
Contributor

github-actions bot commented Jan 1, 2024

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

  • testbed: @open-telemetry/collector-approvers

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Jan 1, 2024
@songy23
Copy link
Member Author

songy23 commented Jan 2, 2024

@songy23 songy23 added never stale Issues marked with this label will be never staled and automatically removed and removed Stale labels Jan 2, 2024
@dmitryax
Copy link
Member

One more: https://github.com/open-telemetry/opentelemetry-collector-contrib/actions/runs/8430916144/job/23087554817

Seems like an issue with correctness tests framework, not a particular test

@crobert-1
Copy link
Member

+1 freq: https://github.com/open-telemetry/opentelemetry-collector-contrib/actions/runs/8558802306/job/23454150989?pr=32173

(Some panics are hit by port in use, some from timeout. Not sure if it's the same issue or not)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working flaky test a test is flaky never stale Issues marked with this label will be never staled and automatically removed testbed
Projects
None yet
Development

No branches or pull requests

3 participants