Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Azure Data Explorer Exporter permanently fails on init due to lack of retries in dependency #22771

Closed
ben-childs-docusign opened this issue May 24, 2023 · 3 comments
Labels
bug Something isn't working exporter/azuredataexplorer needs triage New item requiring triage

Comments

@ben-childs-docusign
Copy link
Contributor

Component(s)

exporter/azuredataexplorer

What happened?

Description

When using the azure data explorer exporter on a machine with a flaky network (in this case it was a windows container which was seeing network issues on launch), I saw the data explorer fail permanently due to a failed network call during initialization which gets memoized and never retried.

Steps to Reproduce

Try to init azure data explorer exporter in an environment with flaky network.

Expected Result

The init eventually succeeds

Actual Result

The exporter gets stuck in a permanent failed state due to memoization in the azure-kusto-go component

Collector version

v0.77.0

Environment information

Environment

OS: Windows 2019
Compiler(if manually compiled): go 1.19.8

OpenTelemetry Collector configuration

No response

Log output

Kind(KInternal): Error while getting token : Get \"https://****.eastus.kusto.windows.net/v1/rest/auth/metadata\": net/http: TLS handshake timeout

Additional context

This issue was logged in azure-kusto-go and fixed in 0.13.1
Azure/azure-kusto-go#188

I have tested the fix locally and confirmed the issue is resolved.

@ben-childs-docusign ben-childs-docusign added bug Something isn't working needs triage New item requiring triage labels May 24, 2023
@github-actions
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@ag-ramachandran
Copy link
Contributor

Hello @ben-childs-docusign, Thanks for reporting the issue, will have a look at this.
Will try out a way if there can be retries initiated in case of network flakiness. Will keep this issue posted based on internal discussions we have.

ag-ramachandran added a commit to ag-ramachandran/opentelemetry-collector-contrib that referenced this issue May 30, 2023
@ag-ramachandran
Copy link
Contributor

Hello @codeboten , @mx-psi. We've made a small PR:: #22933 with the SDK update and fix of all deprecations related to that. Kindly review when you get a chance!

dmitryax pushed a commit that referenced this issue Jun 2, 2023
The underlying SDK has an update to fix an issue where the network connections are flaky. To get the metadata to connect to the service , a HTTP metadata call is issued. If this call fails , it is not retried and the init is interrupted till restart. The underlying SDK fixes it by reset of the lock that was causing this behavior
@dmitryax dmitryax closed this as completed Jun 2, 2023
Caleb-Hurshman pushed a commit to observIQ/opentelemetry-collector-contrib that referenced this issue Jul 6, 2023
…telemetry#22933)

The underlying SDK has an update to fix an issue where the network connections are flaky. To get the metadata to connect to the service , a HTTP metadata call is issued. If this call fails , it is not retried and the init is interrupted till restart. The underlying SDK fixes it by reset of the lock that was causing this behavior
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working exporter/azuredataexplorer needs triage New item requiring triage
Projects
None yet
Development

No branches or pull requests

3 participants