Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[exporter/datadogexporter] Retry per network call #6412

Merged

Conversation

mx-psi
Copy link
Member

@mx-psi mx-psi commented Nov 23, 2021

Description:

Our Push[Metric/Trace]Data sometimes do several network calls per OTLP payload.
In the case of metrics these are safe to retry so we retry them per network call.
In the case of traces it's unclear when it is safe to retry so I am disabling them for now.

Link to tracking Issue: n/a

Testing: Added unit tests. Manual tests to be done are:

  • Test that metadata payload is sent correctly
  • Test that traces and metrics can be received under normal conditions
  • Cut off network connection and check that the retry mechanism works for metrics.

Documentation: Added a comment about retry settings on traces.

@mx-psi mx-psi force-pushed the mx-psi/retries-per-network-call branch from a4c6d66 to 2366398 Compare November 23, 2021 12:38
Copy link
Contributor

@KSerrania KSerrania left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

exporter/datadogexporter/internal/utils/http_test.go Outdated Show resolved Hide resolved
@mx-psi

This comment has been minimized.

@mx-psi mx-psi marked this pull request as ready for review December 1, 2021 10:09
@mx-psi mx-psi requested review from a team and jpkrohling December 1, 2021 10:09
@@ -156,7 +156,8 @@ func createMetricsExporter(
pushMetricsFn,
// explicitly disable since we rely on http.Client timeout logic.
exporterhelper.WithTimeout(exporterhelper.TimeoutSettings{Timeout: 0 * time.Second}),
exporterhelper.WithRetry(cfg.RetrySettings),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a breaking change. Could you add this to the changelog? Make sure also to add this to the readme. Is there a way to determine whether the retry settings were customized by users, and warn them?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Retry settings will still be used here

expBackoff := backoff.ExponentialBackOff{
InitialInterval: r.cfg.InitialInterval,
RandomizationFactor: backoff.DefaultRandomizationFactor,
Multiplier: backoff.DefaultMultiplier,
MaxInterval: r.cfg.MaxInterval,
MaxElapsedTime: r.cfg.MaxElapsedTime,
Stop: backoff.Stop,
Clock: backoff.SystemClock,
}
, what this PR changes is that we now retry on each network call we do, instead of retrying the whole ConsumeMetrics function (so, we disable the exporterhelper retry mechanism). I believe this change should be transparent to users and it will probably align closer with what one would expect from the retry settings.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it! Makes sense then.

@codeboten codeboten merged commit 13e7a7a into open-telemetry:main Dec 2, 2021
jamesmoessis pushed a commit to atlassian-forks/opentelemetry-collector-contrib that referenced this pull request Dec 8, 2021
* [exporter/datadogexporter] Refactor metadata retrier to take RetrySettings into account

* [exporter/datadogexporter] Do retries per network call on the metrics side

* Address review comment

* Fix lint
povilasv referenced this pull request in coralogix/opentelemetry-collector-contrib Dec 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants