Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

googlecloudexporter: retries for traces #29607

Closed
monikeu opened this issue Dec 1, 2023 · 4 comments
Closed

googlecloudexporter: retries for traces #29607

monikeu opened this issue Dec 1, 2023 · 4 comments
Assignees
Labels
exporter/googlecloud question Further information is requested

Comments

@monikeu
Copy link

monikeu commented Dec 1, 2023

Component(s)

No response

Describe the issue you're reporting

TL:DR: I'm looking for retry configuration for traces, but it seems that there isn't anything which allows to configure it - retry_on_failure was removed recently and available configuration as 'max_backoff' applies only to metrics export.

Details:
Recently with 5ab2db6 commit retry_on_failure was removed, the message for changelog was:
googlecloudexporter: remove retry_on_failure from the googlecloud exporter. The exporter itself handles retries, and retrying can cause issues. (#57233)

Looking at the doc it seems that it was removed due to Google Cloud Monitoring related problems

Q1: Does it mean exporter -> Google Cloud Trace connection was not having similar problems as Monitoring?

I was looking for the trace export configs in opentelemetry-operations-go but i found only 'max_backoff' config param which applies only to metrics export.

Q2: Is there any config for trace retries available?

Q3: Are there retries for traces available at all?

Please forgive me if i missed some part of docs/configs ;)

@monikeu monikeu added the needs triage New item requiring triage label Dec 1, 2023
@crobert-1 crobert-1 added question Further information is requested exporter/googlecloud labels Dec 1, 2023
Copy link
Contributor

github-actions bot commented Dec 1, 2023

Pinging code owners for exporter/googlecloud: @aabmass @dashpole @jsuereth @punya @damemi @psx95. See Adding Labels via Comments if you do not have permissions to add labels yourself.

@dashpole
Copy link
Contributor

dashpole commented Dec 2, 2023

Q1: Does it mean exporter -> Google Cloud Trace connection was not having similar problems as Monitoring?

It does not. We removed retries from the exporter because it was duplicating work done by the client library, and produced strange retry behavior. The client library also already has more intelligent retry behavior, which is customized by return code, and by function called.

Q2: Is there any config for trace retries available?

There is not currently. I believe we could provide such configuration to override the defaults: https://github.com/googleapis/google-cloud-go/blob/e371f8f00ee60be27379fe363603b8cf6b43929c/trace/apiv2/trace_client.go#L61

Q3: Are there retries for traces available at all?

Yes. Retries are enabled by default, with the config linked above (100ms initial, increasing 2x each failure, up to 30 seconds)

@dashpole dashpole removed the needs triage New item requiring triage label Dec 2, 2023
@dashpole dashpole self-assigned this Dec 2, 2023
@monikeu
Copy link
Author

monikeu commented Dec 4, 2023

I believe we could provide such configuration to override the defaults -> that would be great.

We were facing increased 'otelcol_exporter_send_failed_spans' metric value while having logs:

"failed to export to Google Cloud Trace: context deadline exceeded" - that is why we wanted to check out the timeouts of exporter.

"otelcol_exporter_send_failed_spans" is described as

"Sustained rates of otelcol_exporter_send_failed_spans and
otelcol_exporter_send_failed_metric_points indicate that the Collector is not
able to export data as expected.
It doesn't imply data loss per se since there could be retries but a high rate
of failures could indicate issues with the network or backend receiving the
data."

and we wanted to ensure that we do not face trace loss (or minimal) - do you know maybe if the google cloud exporter populates these metrics in case of permanent sent failure (when all retries failed) or on each retry attempt fail?

@dashpole
Copy link
Contributor

dashpole commented Dec 4, 2023

failed to export to Google Cloud Trace: context deadline exceeded

Tuning retry wouldn't help in this case, as it isn't hitting the retry limit. It is hitting the 12s default timeout, which you can configure by setting timeout.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
exporter/googlecloud question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants