Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracing is broken with v1.7.16 #10358

Open
ugur99 opened this issue Jun 18, 2024 · 8 comments
Open

Tracing is broken with v1.7.16 #10358

ugur99 opened this issue Jun 18, 2024 · 8 comments
Labels

Comments

@ugur99
Copy link

ugur99 commented Jun 18, 2024

Description

When we upgrade containerd from 1.7.15 to 1.7.16 we noticed that our tracing config is broken; we saw some errors in the containerd log that implies no endpoint is configured; but same config works with the previous version.

Steps to reproduce the issue

Describe the results you received and expected

related part of container config

  [plugins."io.containerd.internal.v1.tracing"]
    sampling_ratio = 1.0
    service_name = "containerd"

  [plugins."io.containerd.tracing.processor.v1.otlp"]
    endpoint = "0.0.0.0:4317"
    insecure = true
    protocol = "grpc"

containerd 1.7.15 logs

Jun 18 12:58:19 node1 containerd[136137]: time="2024-06-18T12:58:19.949185303Z" level=info msg="loading plugin \"io.containerd.tracing.processor.v1.otlp\"..." type=io.containerd.tracing.processor.v1
Jun 18 12:58:19 node1 containerd[136137]: time="2024-06-18T12:58:19.949558011Z" level=info msg="loading plugin \"io.containerd.internal.v1.tracing\"..." type=io.containerd.internal.v1

containerd 1.7.16 logs

Jun 18 12:49:10 node1 containerd[1368693]: time="2024-06-18T12:49:10.388645358Z" level=info msg="loading plugin \"io.containerd.tracing.processor.v1.otlp\"..." type=io.containerd.tracing.processor.v1
Jun 18 12:49:10 node1 containerd[1368693]: time="2024-06-18T12:49:10.388679435Z" level=info msg="skip loading plugin \"io.containerd.tracing.processor.v1.otlp\"..." error="skip plugin: tracing endpoint not configured" type=io.containerd.tracing.processor.v1
Jun 18 12:49:10 node1 containerd[1368693]: time="2024-06-18T12:49:10.388697817Z" level=info msg="loading plugin \"io.containerd.internal.v1.tracing\"..." type=io.containerd.internal.v1
Jun 18 12:49:10 node1 containerd[1368693]: time="2024-06-18T12:49:10.388722405Z" level=info msg="skip loading plugin \"io.containerd.internal.v1.tracing\"..." error="skip plugin: tracing endpoint not configured" type=io.containerd.internal.v1

I checked the changelog and saw that there was some work done to deprecate this config; could it be related to this commit?

What version of containerd are you using?

1.7.16

Any other relevant information

No response

Show configuration if it is related to CRI plugin.

No response

@dmcgowan
Copy link
Member

Agreed, this was unintended. The backport should have included checks for deprecated configs not just skipping if the envs aren't set. To re-enable, use the environment variables which I believe is more standard anyway. Hopefully environment variables work for you rather than needing to downgrade, but we should fix this in 1.7 branch.

@ugur99
Copy link
Author

ugur99 commented Jun 18, 2024

Thanks for the explanation and quick reply @dmcgowan! I tried it too; this time containerd does not give any error, but still does not send any tracing data :( Do you have any idea what I am missing?

I followed the otel documentation and dropped some environment file link in the systemd file

[Service]
EnvironmentFile=-/etc/containerd/containerd.env
..

And /etc/containerd/containerd.env files is as follows:

OTEL_EXPORTER_OTLP_TRACES_ENDPOINT="http:https://0.0.0.0:4317"
OTEL_EXPORTER_OTLP_TRACES_PROTOCOL="grpc"
OTEL_EXPORTER_OTLP_INSECURE="true"
OTEL_TRACES_SAMPLER="traceidratio"
OTEL_TRACES_SAMPLER_ARG="1.0"
OTEL_SERVICE_NAME="containerd"

@dmcgowan
Copy link
Member

Let me see if otel experts @cpuguy83 or @vvoland might have an idea

@cpuguy83
Copy link
Member

Yes it looks like #8645 was backported exactly as is to 1.7 in #9992 and should have fallen back to using config (or perhaps use the config to set envs).

As to why your envs are not working, it looks like containerd is not seeing them at all.
I see you are ignoring errors from your EnvironmentFile setting.
Possibly an issue there.

I'll work on a patch to 1.7 to restore support for config from toml.

@ugur99
Copy link
Author

ugur99 commented Jun 19, 2024

@cpuguy83 thanks for the help on this! Actually containerd process seems to have right environment variables; and I can confirm that tracing plugin is loaded without any problems; but logs do not give any more information even in debug mode. So I'm not sure what is the problem here.

$ cat /proc/<contaienrd_process_id>/environ
LANG=en_US.UTF-8PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/binNOTIFY_SOCKET=/run/systemd/notifyINVOCATION_ID=63edadb7e440476dad9f93027ca636d0JOURNAL_STREAM=8:495932382SYSTEMD_EXEC_PID=3109917OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=http:https://0.0.0.0:4317OTEL_EXPORTER_OTLP_TRACES_PROTOCOL=grpcOTEL_EXPORTER_OTLP_INSECURE=trueOTEL_TRACES_SAMPLER=traceidratioOTEL_TRACES_SAMPLER_ARG=1.0OTEL_SERVICE_NAME=containerd

@cpuguy83
Copy link
Member

@ugur99 Here's the code where that log message originates from:

Here we lookup both OTEL_EXPORTER_OTLP_ENDPOINT and OTEL_EXPORTER_OTLP_TRACES_ENDPOINT to determine if we should enable the trace exporter.

if os.Getenv(otlpEndpointEnv) == "" && os.Getenv(otlpTracesEndpointEnv) == "" {
return fmt.Errorf("%w: tracing endpoint not configured", plugin.ErrSkipPlugin)
}

Are you sure you are getting that log message every time?

@ugur99
Copy link
Author

ugur99 commented Jun 20, 2024

Hi @cpuguy83; sorry I could not understand what you meant. After setting the environment variables I can see that the tracing plugins are loaded without any problems, warnings in debug mode. But I still don't see any tracing data from this containerd instance; I believe all environment variables are set correctly according to the opentelemetry documentation. Could you share a sample configuration which works fine with me just to make sure the problem is with my setup?

containerd[3871192]: time="2024-06-20T07:34:45.549575545Z" level=info msg="loading plugin \"io.containerd.tracing.processor.v1.otlp\"..." type=io.containerd.tracing.processor.v1
containerd[3871192]: time="2024-06-20T07:34:45.549984707Z" level=info msg="loading plugin \"io.containerd.internal.v1.tracing\"..." type=io.containerd.internal.v1

@cpuguy83
Copy link
Member

My guess would be the 0.0.0.0 in your env var (assuming that's what you actually have).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants