Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove pilot flag PILOT_TRACE_SAMPLING #51792

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

zirain
Copy link
Member

@zirain zirain commented Jun 28, 2024

Please provide a description of this PR:

there's three ways to config trace sampling rate today, it's time to remove pilot flag.

@zirain zirain requested review from a team as code owners June 28, 2024 13:40
@istio-testing istio-testing added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Jun 28, 2024
Copy link
Member

@howardjohn howardjohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the widespread usage of this this may need a deprecation period to migrate users off of it. This is not just any feature flag, it is pretty extensively used -- and even has first class support in helm (.Values.pilot.traceSampling)

@@ -577,8 +576,7 @@ func configureSampling(hcmTracing *hcm.HttpConnectionManager_Tracing, providerPe
}

func proxyConfigSamplingValue(config *meshconfig.ProxyConfig) float64 {
// PILOT_TRACE_SAMPLING
sampling := features.TraceSampling
sampling := 1.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how can this be turned off? config.Tracing.Sampling != 0.0 { does not allow it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you mean turned off? sampling = 0.0?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The questions is what is the recommended way to configure trace sampling for pilot. I assumed using the OTel env - https://opentelemetry.io/docs/specs/otel/configuration/sdk-environment-variables/ - but this method is confusing me, shouldn't you remove this method and let OTel do its own config ?

@zirain
Copy link
Member Author

zirain commented Jun 29, 2024

Given the widespread usage of this this may need a deprecation period to migrate users off of it. This is not just any feature flag, it is pretty extensively used -- and even has first class support in helm (.Values.pilot.traceSampling)

what should I do for this? add a deprecation announcement in v1.23 releate notes?

@howardjohn
Copy link
Member

/test integ-ambient-ipv6_istio

Copy link
Member

@hzxuzhonghu hzxuzhonghu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lGTM. maybe should deprecate sampling from ProxyConfig too. ProxyConfig was designed not to work as an CR, since we have a replacement, it should end its life

@zirain zirain changed the title remote pilot flag PILOT_TRACE_SAMPLING remove pilot flag PILOT_TRACE_SAMPLING Jun 29, 2024
@zirain
Copy link
Member Author

zirain commented Jun 29, 2024

lGTM. maybe should deprecate sampling from ProxyConfig too. ProxyConfig was designed not to work as an CR, since we have a replacement, it should end its life

I don't think so, it work with defaultProviders as default value.

@hzxuzhonghu
Copy link
Member

Telemetry cannot define default value?

@zirain
Copy link
Member Author

zirain commented Jun 29, 2024

Telemetry cannot define default value?

#46711

@zirain zirain added the do-not-merge Block automatic merging of a PR. label Jul 2, 2024
content: |
If you are using the pilot environment flag `PILOT_TRACE_SAMPLING`, you should
replace it with `defaultConfig.tracing.sampling` in the [MeshConfig](https://istio.io/latest/docs/reference/config/istio.mesh.v1alpha1/#ProxyConfig)
or `randomSamplingPercentage`in [Telemetry API](https://istio.io/latest/docs/reference/config/telemetry/).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I don't see any code using Telemetry API to program Istiod itself.
  • TelemetryAPI is based on label selectors - Istiod can be external or run in another managed environment
  • The MeshConfig.tracing.sampling is supposed to be replaced by Telemetry API for sidecars - I don't think we should provide multiple options, we already have too many. If you want to use Telemetry API - we should just tell users to use the v1 API, not the alpha legacy one.
  • also the needs/expectations for Istiod are quite different from the 'default for all envoys in the mesh'

IMO we should just use the env variables defined by OTel - they are considered stable and that's how users expect to configure any app using the OTel SDK.

Telemetry API would be the right thing to use for HTTP/gRPC straces - if Istiod was behind a Waypoint or Gateway. I am trying to bring back that mode ( I had a very old PR), Istiod is the perfect user of ambient and/or having a Gatweay in front, that would take care of this kind of tracing - but we still have the 'native' tracing using the SDK.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any code using Telemetry API to program Istiod itself.

PILOT_TRACE_SAMPLING is not about programming Istiod-internal traces, its about setting the trace sampling for Envoys.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name is misleading, I need to pause every time I read this code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know why I am even surprised by this after spending days trying to figure out the cert signing interfaces...

Can you put in some comment "for setting pilot tracing, we are using XXX ( or link to some doc - I remember John added OTLP tracing for pilot but forgot how to set it up...)" - and in the release note just mention the v1 telemetry API ( not mesh config )

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me try to find out thing about OTLP tracing for pilot(I never know about it).

meshconfig is still useful when users use defaultProviers only(without Telemetry resource).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/extensions and telemetry do-not-merge Block automatic merging of a PR. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants