Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Istio 1.22.0 Breaking Datadog tracing. #51430

Closed
2 tasks done
akshaysgithub opened this issue Jun 6, 2024 · 10 comments
Closed
2 tasks done

Istio 1.22.0 Breaking Datadog tracing. #51430

akshaysgithub opened this issue Jun 6, 2024 · 10 comments

Comments

@akshaysgithub
Copy link

akshaysgithub commented Jun 6, 2024

Is this the right place to submit this?

  • This is not a security vulnerability or a crashing bug
  • This is not a question about how to use Istio

Bug Description

Hello,

Recently we upgraded Istio to version 1.22.0. Since then, our tracing on Datadog side is broken. When we revert to using Istio 1.20.2, it works just fine.

Post-upgrade, every proxyv2 container(sidecar container) attached with microservices start throwing this log:

2024-06-05T04:25:18.263562Z error envoy tracing external/envoy/source/extensions/tracers/datadog/logger.cc:23 Unexpected Remote Configuration status 503 with body (if any, starts on next line) upstream connect error or disconnect/reset before headers. reset reason: connection termination thread=14

In huge amounts, so we cannot even keep the service running.

Datadog is up and running and updated, datadog service is up and running, and as mentioned it works with Istio 1.20.2.

Without tracing which we have on dev, it works just fine, but with tracing, it doesn't work.

I tried many different ways, but I cannot find what's wrong. Since we use operator, I even tried with helm-chart, but still the same issue. I am attaching our Istio Operator file as well for review. Renamed it to .txt, since .yaml is not allowed.
platform.txt

Datadog svc is in datadog namespace with name datadog-agent.

I even tried the complete name datadog-agent.datadog.svc.cluster.local. Still the same.

Please help. I am kinda stuck with this since 2 weeks.

Version

Coming from dev, since I had to revert the istio version on staging due to these issues

istioctl version                         
client version: 1.22.0
control plane version: 1.22.0
data plane version: 1.22.0 (340 proxies)

Additional Information

No response

@tylermichael
Copy link

tylermichael commented Jun 11, 2024

Also running into this issue. We are getting itstio and service traces in Datadog, but just seeing this log from all pods.

I just upgraded to Istio 1.22.1 and it also did not help.

Turning on or off mTLS makes no difference for us. I will report back here with more details as I can find them.

@tylermichael
Copy link

I found this issue which seems relevant. I will try to apply the suggestion tomorrow to see if it helps.

@zirain zirain removed area/environments area/upgrade Issues related to upgrades labels Jun 11, 2024
@tylermichael
Copy link

Adding this to my values.yaml file solved this issue for me so far:

defaults:
  meshConfig:
    defaultConfig:
      proxyMetadata:
        DD_REMOTE_CONFIGURATION_ENABLED: 'false'

@akshaysgithub
Copy link
Author

This worked. Thanks a lot.

@tylermichael
Copy link

I feel like this setting should be disabled by default, either in code or with this variable set to false. Having this error message happen out of the box caused me to spend a lot of time researching what was happening. The Datadog client API usually exposes methods to configure these settings in code.

@zirain (Tagging you as you were the last to have some activity on this thread)

@zirain
Copy link
Member

zirain commented Jun 15, 2024

should datadog provider a way/API to config it instead of using enviroment flag?

@akshaysgithub
Copy link
Author

Just FYI, We had already disabled remote-configuration in DD operator level as well as via UI at Org level. Still Istio kept printing those messages.

apiVersion: datadoghq.com/v2alpha1 kind: DatadogAgent metadata: name: datadog spec: features: remoteConfiguration: enabled: false

That's why we didn't bother checking remoteConfiguration as a likely cause as well. We suspected it had something to do with DD address syntax.

@tylermichael
Copy link

I'm unfamiliar with C++ (also not sure if that's the language where the tracer interaction happens), but I know from experience that the client for other languages has the ability configure these types of settings using their API.

@alex-k27
Copy link

alex-k27 commented Jul 2, 2024

Adding this to my values.yaml file solved this issue for me so far:

defaults:
  meshConfig:
    defaultConfig:
      proxyMetadata:
        DD_REMOTE_CONFIGURATION_ENABLED: 'false'

Could you specify where exactly did you add this config? DId you configure this via istio operator? I use helm to install istio and tried to use parameter
"meshConfig.defaultConfig.proxyMetadata.DD_REMOTE_CONFIGURATION_ENABLED" = false
but that didn't solve an issue with 503 for me. Could you advice please?

@tylermichael
Copy link

@alex-k27 Try doing defaults.meshConfig.defaultConfig.proxyMetadata.DD_REMOTE_CONFIGURATION_ENABLED. defaults is the root YAML object.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants