Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic Error in istiod discovery container: Slice bounds out of range #51610

Closed
2 tasks done
mugioka opened this issue Jun 18, 2024 · 6 comments
Closed
2 tasks done

Panic Error in istiod discovery container: Slice bounds out of range #51610

mugioka opened this issue Jun 18, 2024 · 6 comments

Comments

@mugioka
Copy link

mugioka commented Jun 18, 2024

Is this the right place to submit this?

  • This is not a security vulnerability or a crashing bug
  • This is not a question about how to use Istio

Bug Description

I am encountering a runtime error in the istiod discovery container that results in a panic. The error message suggests that the slice bounds are out of range. Below is the error log for your reference:

panic: runtime error: slice bounds out of range [-2:]

goroutine 19414 [running]:
github.com/envoyproxy/go-control-plane/envoy/config/core/v3.(*SocketAddress).MarshalToSizedBufferVTStrict(0xc0070b7730, {0xc0013f0000, 0x8, 0x15be})
	github.com/envoyproxy/[email protected]/envoy/config/core/v3/address_vtproto.pb.go:191 +0x415
github.com/envoyproxy/go-control-plane/envoy/config/core/v3.(*Address_SocketAddress).MarshalToSizedBufferVTStrict(0xa47?, {0xc0013f0000, 0x8, 0x0?})
	github.com/envoyproxy/[email protected]/envoy/config/core/v3/address_vtproto.pb.go:507 +0x94
github.com/envoyproxy/go-control-plane/envoy/config/core/v3.(*Address).MarshalToSizedBufferVTStrict(0xc00608cc80, {0xc0013f0000, 0x8, 0x15be})
	github.com/envoyproxy/[email protected]/envoy/config/core/v3/address_vtproto.pb.go:490 +0x19a
github.com/envoyproxy/go-control-plane/envoy/config/listener/v3.(*Listener).MarshalToSizedBufferVTStrict(0xc00304ad00, {0xc0013f0000, 0x15be, 0x15be})
	github.com/envoyproxy/[email protected]/envoy/config/listener/v3/listener_vtproto.pb.go:782 +0x12e6
github.com/envoyproxy/go-control-plane/envoy/config/listener/v3.(*Listener).MarshalVTStrict(0xc00304ad00)
	github.com/envoyproxy/[email protected]/envoy/config/listener/v3/listener_vtproto.pb.go:378 +0x56
istio.io/istio/pilot/pkg/util/protoconv.marshal({0x36f5060?, 0xc00304ad00?})
	istio.io/istio/pilot/pkg/util/protoconv/protoconv.go:48 +0x73
istio.io/istio/pilot/pkg/util/protoconv.MessageToAnyWithError({0x36f5060, 0xc00304ad00})
	istio.io/istio/pilot/pkg/util/protoconv/protoconv.go:32 +0x25
istio.io/istio/pilot/pkg/util/protoconv.MessageToAny({0x36f5060, 0xc00304ad00})
	istio.io/istio/pilot/pkg/util/protoconv/protoconv.go:57 +0x27
istio.io/istio/pilot/pkg/xds.LdsGenerator.Generate({{0x3744220?, 0xc000f08000?}}, 0xc006d1bb80, 0xc00394d068?, 0xc0040bd720)
	istio.io/istio/pilot/pkg/xds/lds.go:109 +0x116
istio.io/istio/pilot/pkg/xds.(*DiscoveryServer).pushDeltaXds(0xc000e17b00, 0xc0041fc5a0, 0xc004a628c0, 0xc0040bd720)
	istio.io/istio/pilot/pkg/xds/delta.go:504 +0x30b5
istio.io/istio/pilot/pkg/xds.(*DiscoveryServer).pushConnectionDelta(0xc000e17b00, 0xc0041fc5a0, 0xc0041fc5a0?)
	istio.io/istio/pilot/pkg/xds/delta.go:174 +0x1c5
istio.io/istio/pilot/pkg/xds.(*DiscoveryServer).StreamDeltas(0xc000e17b00, {0x37481d0, 0xc007ab9eb0})
	istio.io/istio/pilot/pkg/xds/delta.go:141 +0x970
istio.io/istio/pilot/pkg/xds.(*DiscoveryServer).DeltaAggregatedResources(0x551e440?, {0x37481d0?, 0xc007ab9eb0?})
	istio.io/istio/pilot/pkg/xds/ads.go:753 +0x1d
github.com/envoyproxy/go-control-plane/envoy/service/discovery/v3._AggregatedDiscoveryService_DeltaAggregatedResources_Handler({0x31f3f40?, 0xc000e17b00}, {0x3741218, 0xc007026690})
	github.com/envoyproxy/[email protected]/envoy/service/discovery/v3/ads.pb.go:324 +0xd8
google.golang.org/grpc.(*Server).processStreamingRPC(0xc00045fa00, {0x37383f0, 0xc0068dc030}, {0x374b7c0, 0xc004668f00}, 0xc004080360, 0xc000fe45d0, 0x5560780, 0x0)
	google.golang.org/[email protected]/server.go:1663 +0x1208
google.golang.org/grpc.(*Server).handleStream(0xc00045fa00, {0x374b7c0, 0xc004668f00}, 0xc004080360)
	google.golang.org/[email protected]/server.go:1784 +0xe3a
google.golang.org/grpc.(*Server).serveStreams.func2.1()
	google.golang.org/[email protected]/server.go:1019 +0x8b
created by google.golang.org/grpc.(*Server).serveStreams.func2 in goroutine 19474
	google.golang.org/[email protected]/server.go:1030 +0x125

This error does not occur every time, but only occasionally.

Version

$ istioctl version
client version: 1.22.1
control plane version: 1.22.1
data plane version: 1.22.1 (583 proxies)

$ kubectl version
Client Version: v1.29.2
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.27.12-gke.1115000
WARNING: version difference between client (1.29) and server (1.27) exceeds the supported minor version skew of +/-1

Additional Information

No response

@mugioka
Copy link
Author

mugioka commented Jun 18, 2024

error log

error	Observed a panic: "descriptor mismatch: stats.PluginConfig != envoy.extensions.filters.network.tcp_proxy.v3.TcpProxy" (descriptor mismatch: stats.PluginConfig != envoy.extensions.filters.network.tcp_proxy.v3.TcpProxy)

@mugioka
Copy link
Author

mugioka commented Jun 18, 2024

@mugioka
Copy link
Author

mugioka commented Jun 18, 2024

apiVersion: networking.istio.io/v1alpha3](https://networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
  name: idle-timeout-example
  namespace: istio-system
spec:
  configPatches:
    - applyTo: NETWORK_FILTER
      match:
        listener:
          filterChain:
            sni: [example.com](https://example.com/)
      patch:
        operation: MERGE
        value:
          name: envoy.filters.network.tcp_proxy
          typed_config:
            "@type": type.googleapis.com/envoy.extensions.filters.network.tcp_proxy.v3.TcpProxy](https://type.googleapis.com/envoy.extensions.filters.network.tcp_proxy.v3.TcpProxy
            idle_timeout: 3s

After deleting the EnvoyFilter like the above, it has been confirmed that istiod does not terminate abnormally.
However, it is understood that EnvoyFilter is necessary to set TCP Idle Timeout, so I will continue to investigate why it terminates abnormally.

@howardjohn
Copy link
Member

Without envoy filter:

  filterChains:
  - filterChainMatch:
      serverNames:
      - foo.com
    filters:
    - name: istio.stats
      typedConfig:
        '@type': type.googleapis.com/stats.PluginConfig
        disableHostHeaderFallback: true
    - name: envoy.filters.network.tcp_proxy
      typedConfig:
        '@type': type.googleapis.com/envoy.extensions.filters.network.tcp_proxy.v3.TcpProxy
        cluster: outbound|80||echo.sidecar.svc.cluster.local
        statPrefix: outbound|80||echo.sidecar.svc.cluster.local
    transportSocket:
      name: envoy.transport_sockets.tls
      typedConfig:
        '@type': type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.DownstreamTlsContext
        commonTlsContext:
          alpnProtocols:
          - h2
          - http/1.1
          tlsCertificateSdsSecretConfigs:
          - name: kubernetes:https://sds-credential
            sdsConfig:
              ads: {}
              resourceApiVersion: V3
        requireClientCertificate: false

With:

 filterChains:
  - filterChainMatch:
      serverNames:
      - foo.com
    filters:
    - name: envoy.filters.network.tcp_proxy
      typedConfig:
        '@type': type.googleapis.com/stats.PluginConfig
        disableHostHeaderFallback: true
        metrics:
        - {}
    - name: envoy.filters.network.tcp_proxy
      typedConfig:
        '@type': type.googleapis.com/envoy.extensions.filters.network.tcp_proxy.v3.TcpProxy
        cluster: outbound|80||echo.sidecar.svc.cluster.local
        idleTimeout: 3s
        statPrefix: outbound|80||echo.sidecar.svc.cluster.local
    transportSocket:
      name: envoy.transport_sockets.tls
      typedConfig:
        '@type': type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.DownstreamTlsContext
        commonTlsContext:
          alpnProtocols:
          - h2
          - http/1.1
          tlsCertificateSdsSecretConfigs:
          - name: kubernetes:https://sds-credential
            sdsConfig:
              ads: {}
              resourceApiVersion: V3
        requireClientCertificate: false

Removing the filter, permanently in broken state:

  filterChains:
  - filterChainMatch:
      serverNames:
      - foo.com
    filters:
    - name: envoy.filters.network.tcp_proxy
      typedConfig:
        '@type': type.googleapis.com/stats.PluginConfig
        disableHostHeaderFallback: true
        metrics:
        - {}
    - name: envoy.filters.network.tcp_proxy
      typedConfig:
        '@type': type.googleapis.com/envoy.extensions.filters.network.tcp_proxy.v3.TcpProxy
        cluster: outbound|80||echo.sidecar.svc.cluster.local
        statPrefix: outbound|80||echo.sidecar.svc.cluster.local
    transportSocket:
      name: envoy.transport_sockets.tls
      typedConfig:
        '@type': type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.DownstreamTlsContext
        commonTlsContext:
          alpnProtocols:
          - h2
          - http/1.1
          tlsCertificateSdsSecretConfigs:
          - name: kubernetes:https://sds-credential
            sdsConfig:
              ads: {}
              resourceApiVersion: V3
        requireClientCertificate: false

@howardjohn
Copy link
Member

It works fine like this:

apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
  name: idle-timeout-example
  namespace: istio-system
spec:
  configPatches:
    - applyTo: NETWORK_FILTER
      match:
        listener:
          filterChain:
            sni: foo.com
            filter:
              name: envoy.filters.network.tcp_proxy
      patch:
        operation: MERGE
        value:
          name: envoy.filters.network.tcp_proxy
          typed_config:
            "@type": type.googleapis.com/envoy.extensions.filters.network.tcp_proxy.v3.TcpProxy
            idle_timeout: 3s

Note the added match. Maybe we should make this more robust against misconfig

@mugioka
Copy link
Author

mugioka commented Jun 20, 2024

It works fine like this:
apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
name: idle-timeout-example
namespace: istio-system
spec:
configPatches:
- applyTo: NETWORK_FILTER
match:
listener:
filterChain:
sni: foo.com
filter:
name: envoy.filters.network.tcp_proxy
patch:
operation: MERGE
value:
name: envoy.filters.network.tcp_proxy
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.network.tcp_proxy.v3.TcpProxy
idle_timeout: 3s
Note the added match. Maybe we should make this more robust against misconfig

Thank you for your investigation, I confirmed too.

In my environment, there was a memory leak in istio-proxy as like below.
image

I expect these issues to be resolved by applying the correct EnvoyFilter.
If anything else comes up, I will raise an Issue.
Thank you.

@mugioka mugioka closed this as completed Jun 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants