Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datadog query returns unexpected nil values: Enable Datadog API call and response in debug logs #1217

Open
jonnylangefeld opened this issue Jun 9, 2022 · 2 comments

Comments

@jonnylangefeld
Copy link
Contributor

Describe the bug

We are running flagger as central service used by multiple other services for progressive deliveries. The services use Datadog queries via the MetricsTemplate. The other day we had an outage where only one of the services could not progress because it only received nil values in the Datadog response:

{"level":"info","ts":"2022-06-06T17:14:07.153Z","caller":"controller/events.go:45","msg":"Halt service.namespace advancement slo 0.00 < 90","canary":"service.namespace"}

This same canary has worked flawless in the past. During this incident several retries for the rollout did not succeed. A restart of the flagger pod and another retry for the rollout resolved the issue for now.

We queried the same query that flagger receives via the MetricsTemplate through the Datadog UI and also as a curl command (to closer replicate what flagger should be doing behind the scenes) with the approximate timestamps of the progressive delivery and in both cases the metrics had legitimate values in the Datadog response that were above 90, so the delivery should have progressed.

Since at this point it's unclear where the issue happened (FWIW it could be a faulty datadog query that has unexpected responsed based on the timestamps flagger uses) we just suggest to enable debug output in flagger to print out the datadog query including all headers (maybe hide the token) and the response. Only this would help us to reproduce the exact query flagger sends under the hood with the exact timestamps to further debug why flagger did not receive any data from Datadog. This will be debug output only so it won't affect any other operations.

To Reproduce

So far we have not been able to reproduce the issue, hence the suggestion to add debug logs for the Datadog query and the Datadog response.

Expected behavior

The delivery should have progressed because the values we observed via the Datadog UI and API were all within good ranges (above 90).

Additional context

  • Flagger version: 1.15.0
  • Kubernetes version: 1.21.11
  • Service Mesh provider: istio
  • Ingress provider: istio
@jonnylangefeld jonnylangefeld changed the title Datadog query returns unexpected nil values Datadog query returns unexpected nil values: Enable Datadog API call and response in debug logs Jun 9, 2022
@stefanprodan
Copy link
Member

@jonnylangefeld I’m Ok with adding debug logs as long as we mask the token. Would you like to contribute this?

@ccystephenclinton
Copy link

@jonnylangefeld did you ever get to the bottom of this issue? I'm still seeing this issue now on queries that use default_zero in DD. The problem is, if you don't set that, you get 'No Data'.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants