Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

telemetry: adding custom tags generates duplicated metrics #39772

Open
sschepens opened this issue Jul 4, 2022 · 16 comments
Open

telemetry: adding custom tags generates duplicated metrics #39772

sschepens opened this issue Jul 4, 2022 · 16 comments
Assignees
Labels
area/extensions and telemetry lifecycle/staleproof Indicates a PR or issue has been deemed to be immune from becoming stale and/or automatically closed

Comments

@sschepens
Copy link
Contributor

sschepens commented Jul 4, 2022

Bug Description

When adding custom tags to Istio standard metrics I get duplicated metrics.

Telemetry config:

apiVersion: telemetry.istio.io/v1alpha1
kind: Telemetry
metadata:
  name: custom-tags
  namespace: external-istiod
spec:
  metrics:
  - overrides:
    - match:
        metric: ALL_METRICS
        mode: CLIENT
      tagOverrides:
        destination_x:
          value: upstream_peer.labels['x'].value
        source_x:
          value: labels['x'].value
    - match:
        metric: ALL_METRICS
        mode: SERVER
      tagOverrides:
        destination_x:
          value: upstream_peer.labels['x'].value
        source_x:
          value: downstream_peer.labels['x'].value
    providers:
    - name: prometheus

The output of /stats/prometheus :

istio_requests_total{response_code="500",reporter="destination",source_workload="SRC_WORKLOAD",source_workload_namespace="default",source_principal="spiffe:https://cluster.local/ns/default/sa/default",source_app="SRC_APP",source_version="SRC_VERSION",source_cluster="SRC_CLUSTER",destination_workload="DST_WORKLOAD",destination_workload_namespace="default",destination_principal="spiffe:https://cluster.local/ns/default/sa/default",destination_app="DST_APP",destination_version="DST_VERSION",destination_service="DST_SERVICE",destination_service_name="DST_SERVICE_NAME",destination_service_namespace="default",destination_cluster="DST_CLUSTER",request_protocol="http",response_flags="-",grpc_response_status="",connection_security_policy="mutual_tls",source_canonical_service="SRC_CANONICAL_SERVICE",destination_canonical_service="DST_CANONICAL_SERVICE",source_canonical_revision="SRC_CANONICAL_REVISION",destination_canonical_revision="DST_CANONICAL_REVISION",destination_x="X",source_x="X"} 480
istio_requests_total{response_code="500",reporter="destination",source_workload="SRC_WORKLOAD",source_workload_namespace="default",source_principal="spiffe:https://cluster.local/ns/default/sa/default",source_app="SRC_APP",source_version="SRC_VERSION",source_cluster="SRC_CLUSTER",destination_workload="DST_WORKLOAD",destination_workload_namespace="default",destination_principal="spiffe:https://cluster.local/ns/default/sa/default",destination_app="DST_APP",destination_version="DST_VERSION",destination_service="DST_SERVICE",destination_service_name="DST_SERVICE_NAME",destination_service_namespace="default",destination_cluster="DST_CLUSTER",request_protocol="http",response_flags="-",grpc_response_status="",connection_security_policy="mutual_tls",source_canonical_service="SRC_CANONICAL_SERVICE",destination_canonical_service="DST_CANONICAL_SERVICE",source_canonical_revision="SRC_CANONICAL_REVISION",destination_canonical_revision="DST_CANONICAL_REVISION"} 480

As you can see, there is two separate metrics, with the exact same value, one has the custom tags and the other does not.

Is this intended behaviour? I was expecting to have only one metric with the custom tags.

Version

$ istioctl version
client version: 1.14.1
control plane version: 1.14.1
data plane version: 1.14.1
$ kubectl version --short
Client Version: v1.22
Server Version: v1.22

Additional Information

No response

@zirain
Copy link
Member

zirain commented Jul 5, 2022

@sschepens can you share me the listener configdump of the pod?

@zirain zirain self-assigned this Jul 5, 2022
@zirain
Copy link
Member

zirain commented Jul 5, 2022

iop yaml

# export mesh_id cluster_id in metrics
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
metadata:
  name: dup-metrics
  namespace: istio-system
spec:
  values:
    global:
      meshID: mesh1
  meshConfig:
    defaultConfig:
      extraStatTags:
        - destination_x
        - source_x

telemetry

apiVersion: telemetry.istio.io/v1alpha1
kind: Telemetry
metadata:
  name: custom-tags
  namespace: istio-system
spec:
  metrics:
    - overrides:
        - match:
            metric: ALL_METRICS
            mode: CLIENT
          tagOverrides:
            destination_x:
              value: upstream_peer.labels['app'].value
        - match:
            metric: ALL_METRICS
            mode: SERVER
          tagOverrides:
            source_x:
              value: downstream_peer.labels['app'].value
      providers:
        - name: prometheus

promethe output:

istio_requests_total{response_code="200",reporter="source",source_workload="sleep",source_workload_namespace="default",source_principal="spiffe:https://cluster.local/ns/default/sa/sleep",source_app="sleep",source_version="unknown",source_cluster="Kubernetes",destination_workload="httpbin",destination_workload_namespace="default",destination_principal="spiffe:https://cluster.local/ns/default/sa/httpbin",destination_app="httpbin",destination_version="v1",destination_service="httpbin.default.svc.cluster.local",destination_service_name="httpbin",destination_service_namespace="default",destination_cluster="Kubernetes",request_protocol="http",response_flags="-",grpc_response_status="",connection_security_policy="unknown",source_canonical_service="sleep",destination_canonical_service="httpbin",source_canonical_revision="latest",destination_canonical_revision="v1",destination_x="httpbin"} 1
istio_requests_total{response_code="200",reporter="source",source_workload="sleep",source_workload_namespace="default",source_principal="spiffe:https://cluster.local/ns/default/sa/sleep",source_app="sleep",source_version="unknown",source_cluster="Kubernetes",destination_workload="httpbin",destination_workload_namespace="default",destination_principal="spiffe:https://cluster.local/ns/default/sa/httpbin",destination_app="httpbin",destination_version="v1",destination_service="httpbin.default.svc.cluster.local",destination_service_name="httpbin",destination_service_namespace="default",destination_cluster="Kubernetes",request_protocol="http",response_flags="-",grpc_response_status="",connection_security_policy="unknown",source_canonical_service="sleep",destination_canonical_service="httpbin",source_canonical_revision="latest",destination_canonical_revision="v1"} 1

listener dump:

- address:
    socketAddress:
      address: 0.0.0.0
      portValue: 8000
  bindToPort: false
  continueOnListenerFiltersTimeout: true
  defaultFilterChain:
    filterChainMatch: {}
    filters:
    - name: istio.stats
      typedConfig:
        '@type': type.googleapis.com/envoy.extensions.filters.network.wasm.v3.Wasm
        config:
          configuration:
            '@type': type.googleapis.com/google.protobuf.StringValue
            value: '{"metrics":[{"dimensions":{"destination_x":"upstream_peer.labels[''app''].value"},"name":"request_messages_total"},{"dimensions":{"destination_x":"upstream_peer.labels[''app''].value"},"name":"response_messages_total"},{"dimensions":{"destination_x":"upstream_peer.labels[''app''].value"},"name":"requests_total"},{"dimensions":{"destination_x":"upstream_peer.labels[''app''].value"},"name":"request_duration_milliseconds"},{"dimensions":{"destination_x":"upstream_peer.labels[''app''].value"},"name":"request_bytes"},{"dimensions":{"destination_x":"upstream_peer.labels[''app''].value"},"name":"response_bytes"},{"dimensions":{"destination_x":"upstream_peer.labels[''app''].value"},"name":"tcp_connections_closed_total"},{"dimensions":{"destination_x":"upstream_peer.labels[''app''].value"},"name":"tcp_connections_opened_total"},{"dimensions":{"destination_x":"upstream_peer.labels[''app''].value"},"name":"tcp_received_bytes_total"},{"dimensions":{"destination_x":"upstream_peer.labels[''app''].value"},"name":"tcp_sent_bytes_total"}]}'
          rootId: stats_outbound
          vmConfig:
            allowPrecompiled: true
            code:
              local:
                filename: /etc/istio/extensions/stats-filter.compiled.wasm
            runtime: envoy.wasm.runtime.v8
            vmId: tcp_stats_outbound
    - name: istio.stats
      typedConfig:
        '@type': type.googleapis.com/udpa.type.v1.TypedStruct
        typeUrl: type.googleapis.com/envoy.extensions.filters.network.wasm.v3.Wasm
        value:
          config:
            configuration:
              '@type': type.googleapis.com/google.protobuf.StringValue
              value: |
                {
                  "debug": "false",
                  "stat_prefix": "istio"
                }
            root_id: stats_outbound
            vm_config:
              allow_precompiled: true
              code:
                local:
                  filename: /etc/istio/extensions/stats-filter.compiled.wasm
              runtime: envoy.wasm.runtime.v8
              vm_id: tcp_stats_outbound
    - name: envoy.filters.network.tcp_proxy
      typedConfig:
        '@type': type.googleapis.com/envoy.extensions.filters.network.tcp_proxy.v3.TcpProxy
        cluster: PassthroughCluster
        idleTimeout: 0s
        statPrefix: PassthroughCluster
    name: PassthroughFilterChain
  filterChains:
  - filterChainMatch:
      applicationProtocols:
      - http/1.1
      - h2c
      transportProtocol: raw_buffer
    filters:
    - name: envoy.filters.network.http_connection_manager
      typedConfig:
        '@type': type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
        httpFilters:
        - name: istio.metadata_exchange
          typedConfig:
            '@type': type.googleapis.com/envoy.extensions.filters.http.wasm.v3.Wasm
            config:
              configuration:
                '@type': type.googleapis.com/envoy.tcp.metadataexchange.config.MetadataExchange
              vmConfig:
                allowPrecompiled: true
                code:
                  local:
                    filename: /etc/istio/extensions/metadata-exchange-filter.compiled.wasm
                runtime: envoy.wasm.runtime.v8
        - name: istio.alpn
          typedConfig:
            '@type': type.googleapis.com/istio.envoy.config.filter.http.alpn.v2alpha1.FilterConfig
            alpnOverride:
            - alpnOverride:
              - istio-http/1.0
              - istio
              - http/1.0
            - alpnOverride:
              - istio-http/1.1
              - istio
              - http/1.1
              upstreamProtocol: HTTP11
            - alpnOverride:
              - istio-h2
              - istio
              - h2
              upstreamProtocol: HTTP2
        - name: envoy.filters.http.fault
          typedConfig:
            '@type': type.googleapis.com/envoy.extensions.filters.http.fault.v3.HTTPFault
        - name: envoy.filters.http.cors
          typedConfig:
            '@type': type.googleapis.com/envoy.extensions.filters.http.cors.v3.Cors
        - name: istio.stats
          typedConfig:
            '@type': type.googleapis.com/envoy.extensions.filters.http.wasm.v3.Wasm
            config:
              configuration:
                '@type': type.googleapis.com/google.protobuf.StringValue
                value: '{"metrics":[{"dimensions":{"destination_x":"upstream_peer.labels[''app''].value"},"name":"request_messages_total"},{"dimensions":{"destination_x":"upstream_peer.labels[''app''].value"},"name":"response_messages_total"},{"dimensions":{"destination_x":"upstream_peer.labels[''app''].value"},"name":"requests_total"},{"dimensions":{"destination_x":"upstream_peer.labels[''app''].value"},"name":"request_duration_milliseconds"},{"dimensions":{"destination_x":"upstream_peer.labels[''app''].value"},"name":"request_bytes"},{"dimensions":{"destination_x":"upstream_peer.labels[''app''].value"},"name":"response_bytes"},{"dimensions":{"destination_x":"upstream_peer.labels[''app''].value"},"name":"tcp_connections_closed_total"},{"dimensions":{"destination_x":"upstream_peer.labels[''app''].value"},"name":"tcp_connections_opened_total"},{"dimensions":{"destination_x":"upstream_peer.labels[''app''].value"},"name":"tcp_received_bytes_total"},{"dimensions":{"destination_x":"upstream_peer.labels[''app''].value"},"name":"tcp_sent_bytes_total"}]}'
              rootId: stats_outbound
              vmConfig:
                allowPrecompiled: true
                code:
                  local:
                    filename: /etc/istio/extensions/stats-filter.compiled.wasm
                runtime: envoy.wasm.runtime.v8
                vmId: stats_outbound
        - name: istio.stats
          typedConfig:
            '@type': type.googleapis.com/udpa.type.v1.TypedStruct
            typeUrl: type.googleapis.com/envoy.extensions.filters.http.wasm.v3.Wasm
            value:
              config:
                configuration:
                  '@type': type.googleapis.com/google.protobuf.StringValue
                  value: |
                    {
                      "debug": "false",
                      "stat_prefix": "istio"
                    }
                root_id: stats_outbound
                vm_config:
                  allow_precompiled: true
                  code:
                    local:
                      filename: /etc/istio/extensions/stats-filter.compiled.wasm
                  runtime: envoy.wasm.runtime.v8
                  vm_id: stats_outbound
        - name: envoy.filters.http.router
          typedConfig:
            '@type': type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
        normalizePath: true
        pathWithEscapedSlashesAction: KEEP_UNCHANGED
        rds:
          configSource:
            ads: {}
            initialFetchTimeout: 0s
            resourceApiVersion: V3
          routeConfigName: "8000"
        requestIdExtension:
          typedConfig:
            '@type': type.googleapis.com/envoy.extensions.request_id.uuid.v3.UuidRequestIdConfig
            useRequestIdForTraceSampling: true
        statPrefix: outbound_0.0.0.0_8000
        streamIdleTimeout: 0s
        tracing:
          clientSampling:
            value: 100
          customTags:
          - metadata:
              kind:
                request: {}
              metadataKey:
                key: envoy.filters.http.rbac
                path:
                - key: istio_dry_run_allow_shadow_effective_policy_id
            tag: istio.authorization.dry_run.allow_policy.name
          - metadata:
              kind:
                request: {}
              metadataKey:
                key: envoy.filters.http.rbac
                path:
                - key: istio_dry_run_allow_shadow_engine_result
            tag: istio.authorization.dry_run.allow_policy.result
          - metadata:
              kind:
                request: {}
              metadataKey:
                key: envoy.filters.http.rbac
                path:
                - key: istio_dry_run_deny_shadow_effective_policy_id
            tag: istio.authorization.dry_run.deny_policy.name
          - metadata:
              kind:
                request: {}
              metadataKey:
                key: envoy.filters.http.rbac
                path:
                - key: istio_dry_run_deny_shadow_engine_result
            tag: istio.authorization.dry_run.deny_policy.result
          - literal:
              value: latest
            tag: istio.canonical_revision
          - literal:
              value: sleep
            tag: istio.canonical_service
          - literal:
              value: mesh1
            tag: istio.mesh_id
          - literal:
              value: default
            tag: istio.namespace
          overallSampling:
            value: 100
          randomSampling:
            value: 1
        upgradeConfigs:
        - upgradeType: websocket
        useRemoteAddress: false
  listenerFilters:
  - name: envoy.filters.listener.tls_inspector
    typedConfig:
      '@type': type.googleapis.com/envoy.extensions.filters.listener.tls_inspector.v3.TlsInspector
  - name: envoy.filters.listener.http_inspector
    typedConfig:
      '@type': type.googleapis.com/envoy.extensions.filters.listener.http_inspector.v3.HttpInspector
  listenerFiltersTimeout: 0s
  name: 0.0.0.0_8000
  trafficDirection: OUTBOUND

cc @douglas-reid looks like we need remove duplicate istio.stats when telemetryv2 is enabled?

@douglas-reid
Copy link
Contributor

@zirain Telemetry API is fundamentally incompatible with the telemetry-focused EnvoyFilters that get applied. If one wants to use Telemetry API, the first step is really to delete those Filters. IIRC, this used to be documented somewhere, but I can't find it now. I think for the transition period, we need to handle mapping the default providers appropriately based on flags -- and properly handle upgrades with existing filters.

@zirain
Copy link
Member

zirain commented Jul 14, 2022

@zirain Telemetry API is fundamentally incompatible with the telemetry-focused EnvoyFilters that get applied. If one wants to use Telemetry API, the first step is really to delete those Filters. IIRC, this used to be documented somewhere, but I can't find it now. I think for the transition period, we need to handle mapping the default providers appropriately based on flags -- and properly handle upgrades with existing filters.

I'm thinking do something to avoid this, most of users will not read docs before getting errors.
One of my thought is somehting like operation: INSERT_BEFORE_IF_NOEXIST, that will not apply istio.stats filter when there already have one with same named.

@istio-policy-bot istio-policy-bot added the lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while label Oct 12, 2022
@sschepens
Copy link
Contributor Author

@zirain can we unstale this? Is there any plans for deprecating the old EnvotFilter based telemetry? Or is there a way to disabled them when installing Istio?

@kyessenov
Copy link
Contributor

Yeah, disable EnvoyFilters by setting telemetry.v2.enabled to false: https://github.com/istio/istio/blob/master/manifests/profiles/default.yaml#L142

The question should be when do we make it that default, that's probably 1.17.

@istio-policy-bot istio-policy-bot removed the lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while label Oct 21, 2022
@iPapanya
Copy link

I have the same problem (#39932 (comment))

@iPapanya
Copy link

and one more side effect:
if you have added custom metrics in mesh and then add the same metrics from Telemetry you have doubled score of istio metrics (i.e. for each request the istio_requests_total will be increased by 2)

@zirain zirain added the lifecycle/staleproof Indicates a PR or issue has been deemed to be immune from becoming stale and/or automatically closed label Nov 29, 2022
@andrew-shevchyk
Copy link

@zirain Telemetry API is fundamentally incompatible with the telemetry-focused EnvoyFilters that get applied. If one wants to use Telemetry API, the first step is really to delete those Filters. IIRC, this used to be documented somewhere, but I can't find it now. I think for the transition period, we need to handle mapping the default providers appropriately based on flags -- and properly handle upgrades with existing filters.

I deleted all stats-filter-1.* envoy filters in istio-system namespace and metrics like 'istio_request_*' disappeared. Is it possible to get them back and use Telemetry API after that?

Istio version:
client version: 1.16.3
control plane version: 1.16.1
data plane version: 1.16.1

@zirain
Copy link
Member

zirain commented Jul 5, 2023

there's no magic behind it, you can generate manifests by running istioctl mainifest -f your_iop_file > temp.yaml, then you will see EnvoyFilter configuration in that file.

@imnmo
Copy link

imnmo commented Oct 16, 2023

can this issue be closed @zirain ?

@zirain
Copy link
Member

zirain commented Oct 16, 2023

will review again after Telemetry API promoted to beta.

@ryant1986
Copy link

ryant1986 commented Jan 9, 2024

I am running into an issue / some issues here, and I get the sense they are the same as what's going in this bug. Want to confirm, and want to ask a a few more questions to make sure I'm understanding the issue correctly / what I can do / expect in our current / future versions.

I want to use the "Telemetry API", which as I understand it means to do things like create a CRD like this to expose some more envoy properties in my prometheus stats for tcp traffic (in my specific case, istio_tcp_sent_bytes_total):

apiVersion: telemetry.istio.io/v1alpha1
kind: Telemetry
metadata:
  name: destination-endpoints
spec:
  metrics:
  - providers:
    - name: prometheus
    overrides:
    - match:
        metric: TCP_SENT_BYTES
      tagOverrides:
        source_address:
          value: source.address
        destination_address:
          value: destination.address
        upstream_address:
          value: upstream.address
        upstream_local_address:
          value: upstream.local_address
        requested_server_name:
          value: connection.requested_server_name
        upstream_cluster_id:
          value: upstream_peer.cluster_id
        upstream_mesh_id:
          value: upstream_peer.mesh_id
        upstream_controller_revision_hash:
          value: upstream_peer.labels['controller-revision-hash'].value
        upstream_host:
          value: upstream.host
        downstream_local_address:
          value: downstream.local_address
        downstream_remote_address:
          value: downstream.remote_address
        connection_mtls:
          value: connection.mtls

My specific goal is to understand the remote address of a tcp connection from the source (info I can see when access logs are dumped for these same connections, so I assume that they are "knowable" by the proxy)

The problem I am currently seeing is that certain fields are not consistently populated for certain metrics, and I don't really understand why. Some info here maybe provides some insight into types of traffic we have that may not work: https://techblog.cisco.com/istio-mixerless-telemetry#istio-telemetry-v2

  • Out of mesh telemetry is not fully supported: some metrics are missing (the traffic source or destination is not injected by the sidecar).
  • Egress gateway telemetry is not supported.
  • TCP telemetry is only supported with mTLS.
  • Black Hole telemetry for TCP and HTTP protocols is not supported.
  • Histogram buckets are significantly different from the ones based on Mixer.
  • Custom metrics support is experimental and limited.

Some of our traffic is out of mesh, but some is not (in mesh mtls), and in either case, I intermittently see different labels successfully being populated with values vs with "unknown" (even in cases of mtls traffic between the same two services, or same processes!). E.g. sometimes these 4 are populated

  • connection_mtls
  • destination_address
  • upstream_address
  • upstream_local_address

Sometimes everything is "unknown" besides destination_canonical_service.

One other observation - seems any metric series that get created with the values that aren't "unknown" also has no non-zero values for the metric I'm looking at (tcp sent bytes). Maybe a coincidence.

Here's the relevant part of my operator config:

    defaultProviders:
      metrics:
        - prometheus
...
    telemetry:
      enabled: true
      v2:
        enabled: false
        metadataExchange: {}
        prometheus:
          enabled: false
        stackdriver:
          configOverride: {}
          enabled: false
          logging: false
          monitoring: false
          topology: false

I understand that this is experimental, but what I'm trying to find out is

  • is disabling telemetry.v2 everything and using defaultProviders the correct configuration in 1.14 for what I'm trying to achieve.
  • what can I actually expect for tcp traffic? Do the ones I'm seeing that aren't "unknown" the best I can expect for in mesh mtls traffic?
  • Have expectations changed between 1.14.3 and now pertaining to the issues I'm seeing - we know we need to upgrade, don't know where these specific issues would be addressed

control plane version: 1.14.3

@SamuelRosenqvist
Copy link

I'm seeing a similar issue but in my case the metric is completely duplicated with the custom tag but different values.

We have a Telemetry resource changing a metric like so:

      overrides:
        - match:
            metric: REQUEST_COUNT
            mode: CLIENT
          tagOverrides:
            custom_label:
              value: "request.headers['SOME-HEADER'].contains('some-value') ? request.headers['SOME-HEADER'] : 'unknown'"

and of course the Istiod config has the old EnvoyFilters disabled:

    telemetry:
      enabled: true
      v2:
        enabled: false

We end up with these duplicate metrics which causes some issues for the metrics collector sometimes picking one and sometimes the other, maybe we could find a fix on that side but the ideal solution would be to not have duplicate metrics in the first place. We don't see the issue if the Telemetry resource is removed.

istio_requests_total{reporter="source",source_workload="istio-ingressgateway-pub",source_canonical_service="istio-ingressgateway-pub",source_canonical_revision="latest",source_workload_namespace="istio-system",source_principal="spiffe:https://cluster.local/ns/istio-system/sa/istio-ingressgateway-pub",source_app="istio-ingressgateway",destination_workload="public-api",destination_workload_namespace="destination-namespace",destination_principal="spiffe:https://cluster.local/ns/destination-namespace/sa/default",destination_app="public-api",destination_service="public-api.destination-namespace.svc.cluster.local",destination_canonical_service="public-api",destination_canonical_revision="latest",destination_service_name="public-api",destination_service_namespace="destination-namespace",destination_cluster="Kubernetes",request_protocol="http",response_code="400",grpc_response_status="",custom_label="unknown"} 2064
istio_requests_total{reporter="source",source_workload="istio-ingressgateway-pub",source_canonical_service="istio-ingressgateway-pub",source_canonical_revision="latest",source_workload_namespace="istio-system",source_principal="spiffe:https://cluster.local/ns/istio-system/sa/istio-ingressgateway-pub",source_app="istio-ingressgateway",destination_workload="public-api",destination_workload_namespace="destination-namespace",destination_principal="spiffe:https://cluster.local/ns/destination-namespace/sa/default",destination_app="public-api",destination_service="public-api.destination-namespace.svc.cluster.local",destination_canonical_service="public-api",destination_canonical_revision="latest",destination_service_name="public-api",destination_service_namespace="destination-namespace",destination_cluster="Kubernetes",request_protocol="http",response_code="400",grpc_response_status="",custom_label="unknown"} 1434

This is on Istio 1.19.7.

@nicole-lihui
Copy link
Member

nicole-lihui commented May 16, 2024

@SamuelRosenqvist Sam IMO, 1.19.7 can't auto remove the stats EnvoyFilter, you can manually verify that the EnvoyFilter is deleted

@SamuelRosenqvist
Copy link

SamuelRosenqvist commented May 16, 2024

@nicole-lihui Thank you for the reply, I've removed the stats and tcp-stats EnvoyFilters previously as those caused all metrics to be duplicated, unfortuantely in my current setup it's only the metric overridden by the Telemetry resource that appears to have this issue. 😞

edit:
I went back and tested my setup a bit and found the issue, it turned out to be quite unrelated to Istio and was more a result of the Common Expression Language but I'll shortly explain it here in case someone else encounters this issue.

"request.headers['SOME-HEADER'].contains('some-value') ? request.headers['SOME-HEADER'] : 'unknown'" creates a separate metric dimension for each request containing some-value in the SOME-HEADER header value. If the header doesn't contain some-value the label is just set to unknown, but if there is some error evaluating this CEL it is also set to unkown. For whatever reason these two unkowns get their own dimensions that look identical.

I noticed that the CEL fails to evaluate if the header is missing a value, e.g curl -H "SOME-HEADER: " ... but there could be other reasons. I opted to just change my default value to not known.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/extensions and telemetry lifecycle/staleproof Indicates a PR or issue has been deemed to be immune from becoming stale and/or automatically closed
Projects
None yet
Development

No branches or pull requests