Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[exporter/datadog] datadog exporter should fail fast if api key validation fail #7402

Closed
freyayunfu opened this issue Jan 27, 2022 · 6 comments · Fixed by #9426
Closed

[exporter/datadog] datadog exporter should fail fast if api key validation fail #7402

freyayunfu opened this issue Jan 27, 2022 · 6 comments · Fixed by #9426
Labels
enhancement New feature or request exporter/datadog Datadog components

Comments

@freyayunfu
Copy link

freyayunfu commented Jan 27, 2022

Describe the bug
Currently, if datadog exporter failed to validate api key, it would only give some warning logs as below:

{"level":"info","ts":1642812421.1986244,"caller":"utils/api.go:32","msg":"Validating API key.","kind":"exporter","name":"datadog"}
{"level":"info","ts":1642812421.5067706,"caller":"utils/api.go:39","msg":"API key validation successful.","kind":"exporter","name":"datadog"}

Opentelemetry collector can still get started even if datadog exported was failed to get started. And metrics send_failed_metric_points, send_failed_spans will not count the metrics failed to be sent to datadog if datadog exporter was not started properly. Basically, datadog exporter would just fail silently. This made people hard to monitor this proactively. People can only found something goes wrong when they see metrics/traces missing in datadog.

I understand it's hard to let send_failed_metric_points, send_failed_spans count the metrics/traces failed to sent caused by datadog exporter start up failure -- we only record these metrics when one export operation is done, while there is no export operation in this case. So that I think it would be better just let opentelemetry collector fail fast during startup if the api key failed to be validated successfully.

@freyayunfu freyayunfu added the bug Something isn't working label Jan 27, 2022
@jpkrohling jpkrohling assigned mx-psi and unassigned mx-psi Jan 27, 2022
@jpkrohling
Copy link
Member

cc @KSerrania @mx-psi @gbbr @knusbaum

@mx-psi mx-psi added exporter/datadog Datadog components enhancement New feature or request and removed bug Something isn't working labels Jan 27, 2022
@mx-psi
Copy link
Member

mx-psi commented Jan 27, 2022

Hey there! Thanks for taking the time to open the issue :) I think this would be a nice addition, provided we only fail when we are able to verify the API key is invalid. We need to be careful about backwards compatibility, but off the top of my head I think it's doable. I will add this to our backlog (and of course PRs are welcome!)

@mx-psi
Copy link
Member

mx-psi commented Jan 27, 2022

Also, I changed this from a bug to an enhancement, but feel free to disagree with me if you think bug is more appropriate

@sodabrew
Copy link
Contributor

sodabrew commented Feb 7, 2022

I would ask that this be an across-the-board decision for the collector and all exporters. If you configure multiple exporters and 1 out of 3 has an expired API key, fail? Start with 2 that work and log the error? Ensure that at least 1 configured exporter is working else bomb out?

Here's my suggestion, which I hope would illustrate why I think it should be an across-the-board decision:

exporters:
  supermetricscorp:
    api: 12345
    allowfail: true
  awesomemonitoringinc:
    api: abcdefg
    allowfail: false

@abarganier
Copy link

I am in support of this - send_failed_metric_points{exporter="datadog"} and send_failed_spans{exporter="datadog"} should increment when telemetry fails to export to Datadog.

I've noticed that if I have an otel collector up-and-running with a Datadog exporter successfully exporting metrics, and then I revoke the API key, otelcol_exporter_send_failed_metric_points{exporter="datadog"} does not increment or give any indication of a 403. The otel collector's log don't give any indication of a problem, either.

It would be nice to not only account for these failures at initialization of the exporter/collector, but also throughout the lifecycle of the Datadog exporter. If I'm sending metrics without problems, and then my API key is revoked, I would like there to be a signal (otel metrics or logs?) to indicate that. Currently, the only way to tell is by looking in Datadog itself, where you'd see discontinuity of metrics.

@mx-psi
Copy link
Member

mx-psi commented Apr 21, 2022

I've noticed that if I have an otel collector up-and-running with a Datadog exporter successfully exporting metrics, and then I revoke the API key, otelcol_exporter_send_failed_metric_points{exporter="datadog"} does not increment or give any indication of a 403. The otel collector's log don't give any indication of a problem, either.

This part sounds like a different issue from the one posted by OP; what I would expect in this situation (at least for metrics, the part that I work on) is the following:

  1. on the logs, an info log should be reported here and after that an error log by the exporterhelper once retries have been exhausted.
  2. the exporterhelper should increase the metric (here).

Do you see no logs of either type? Are you configuring retries in some specific way?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request exporter/datadog Datadog components
Projects
None yet
5 participants