Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LDS ACK error for headless service instance #17748

Closed
hzxuzhonghu opened this issue Oct 10, 2019 · 11 comments
Closed

LDS ACK error for headless service instance #17748

hzxuzhonghu opened this issue Oct 10, 2019 · 11 comments

Comments

@hzxuzhonghu
Copy link
Member

(NOTE: This is used to report product bugs:
To report a security vulnerability, please visit https://istio.io/about/security-vulnerabilities/
To ask questions about how to use Istio, please visit https://discuss.istio.io
)

Bug description

Headless service instance LDS NACK with duplicate listener error.

	ADS:LDS: ACK ERROR 10.32.0.18:44890 sidecar~10.32.0.18~redis-2.default~default.svc.cluster.local-10 (redis-2.default) version_info:"2019-10-08T06:48:02Z/1" node:<id:"sidecar~10.32.0.18~redis-2.default~default.svc.cluster.local" cluster:"redis-cart.default" metadata:<fields:<key:"CLUSTER_ID" value:<string_value:"Kubernetes" > > fields:<key:"CONFIG_NAMESPACE" value:<string_value:"default" > > fields:<key:"EXCHANGE_KEYS" value:<string_value:"NAME,NAMESPACE,INSTANCE_IPS,LABELS,OWNER,PLATFORM_METADATA,WORKLOAD_NAME,CANONICAL_TELEMETRY_SERVICE,MESH_ID,SERVICE_ACCOUNT" > > fields:<key:"INCLUDE_INBOUND_PORTS" value:<string_value:"6379" > > fields:<key:"INSTANCE_IPS" value:<string_value:"10.32.0.18" > > fields:<key:"INTERCEPTION_MODE" value:<string_value:"REDIRECT" > > fields:<key:"ISTIO_PROXY_SHA" value:<string_value:"istio-proxy:e383776139e4c69b49237bad84882fb972718307" > > fields:<key:"ISTIO_VERSION" value:<string_value:"master-20191004-09-15" > > fields:<key:"LABELS" value:<struct_value:<fields:<key:"app" value:<string_value:"redis-cart" > > fields:<key:"controller-revision-hash" value:<string_value:"redis-85d5755949" > > fields:<key:"statefulset.kubernetes.io/pod-name" value:<string_value:"redis-2" > > > > > fields:<key:"NAME" value:<string_value:"redis-2" > > fields:<key:"NAMESPACE" value:<string_value:"default" > > fields:<key:"OWNER" value:<string_value:"kubernetes:https://api/apps/v1/namespaces/default/statefulsets/redis" > > fields:<key:"POD_NAME" value:<string_value:"redis-2" > > fields:<key:"POD_PORTS" value:<string_value:"[{\"containerPort\":6379,\"protocol\":\"TCP\"}]" > > fields:<key:"SERVICE_ACCOUNT" value:<string_value:"default" > > fields:<key:"WORKLOAD_NAME" value:<string_value:"redis" > > fields:<key:"app" value:<string_value:"redis-cart" > > fields:<key:"controller-revision-hash" value:<string_value:"redis-85d5755949" > > fields:<key:"statefulset.kubernetes.io/pod-name" value:<string_value:"redis-2" > > > locality:<> build_version:"e383776139e4c69b49237bad84882fb972718307/1.12.0-dev/Clean/RELEASE/BoringSSL" > type_url:"type.googleapis.com/envoy.api.v2.Listener" response_nonce:"7aaef486-ecc6-40cc-9e14-0b87bc4267a5" error_detail:<code:13 message:"Error adding/updating listener(s) 10.32.0.18_6379: duplicate listener 10.32.0.18_6379 found" > 

Affected product area (please put an X in all that apply)

[ ] Configuration Infrastructure
[ ] Docs
[ ] Installation
[x] Networking
[ ] Performance and Scalability
[ ] Policies and Telemetry
[ ] Security
[ ] Test and Release
[ ] User Experience
[ ] Developer Infrastructure

Expected behavior

Steps to reproduce the bug

Version (include the output of istioctl version --remote and kubectl version)

How was Istio installed?

Environment where bug was observed (cloud vendor, OS, etc)

Additionally, please consider attaching a cluster state archive by attaching
the dump file to this issue.

@hzxuzhonghu
Copy link
Member Author

Without deep dive, but i can say as 10.32.0.18 is the instance ip, so i think pilot generates two listeners named 10.32.0.18_6379, one is for outbound and one is for inbound.

@hzxuzhonghu
Copy link
Member Author

hzxuzhonghu commented Oct 10, 2019

@istio/wg-networking-maintainers

What should we do?

  1. Donot generate the outbound listener to itself, as for normal proxies(with http protocol), the filter chain is like
            {
                "filterChainMatch": {
                    "prefixRanges": [
                        {
                            "addressPrefix": "10.32.0.13",
                            "prefixLen": 32
                        }
                    ]
                },
                "filters": [
                    {
                        "name": "envoy.tcp_proxy",
                        "typedConfig": {
                            "@type": "type.googleapis.com/envoy.config.filter.network.tcp_proxy.v2.TcpProxy",
                            "statPrefix": "BlackHoleCluster",
                            "cluster": "BlackHoleCluster"
                        }
                    }
                ]
            },

We donot allow accessing itself by the podip.

  1. generate the outbound listener, but should rename. But the listener will always proxy the traffic to blackhole.

@lambdai
Copy link
Contributor

lambdai commented Oct 10, 2019

solution 3: remove the inbound listener 10.32.0.18_6379. It should be not hand off by 15001 listener.
Might need a slight order change at ListenerBuilder. We need to aggregate 15001 and remove inbound 10.32.0.18_6379 before generating the outbound 10.32.0.18_6379.

@hzxuzhonghu
Copy link
Member Author

hzxuzhonghu commented Oct 10, 2019

There is also an inconsistent phenomenon: For tcp service instance, it can communicate to itself by podip:port. But for http one, it cannot.

EDIT: ignore this,the cause is i tested it with nc using pure tcp.

@hzxuzhonghu
Copy link
Member Author

remove the inbound listener 10.32.0.18_6379.

@lambdai Can not understand it well, I can see same virtualHosts exist in both 15006 and podip_port listeners. Does that mean for inbound traffic, it will only flow through virtualInbound 15006 listener.?

@rshriram
Copy link
Member

So I think I know the problem. Its happening because we are generating listeners for each service instance in the headless service in the listener code. We need to fix that code to skip the pod's own service instance [or more specifically, skip ones where the instance.address == node.address]

@lambdai
Copy link
Contributor

lambdai commented Oct 10, 2019

remove the inbound listener 10.32.0.18_6379.

@lambdai Can not understand it well, I can see same virtualHosts exist in both 15006 and podip_port listeners. Does that mean for inbound traffic, it will only flow through virtualInbound 15006 listener.?

Yes... Since 1.3 all the bind_to_port = false in bound listeners are actually not be hand off by 15006 listener.

@lambdai
Copy link
Contributor

lambdai commented Oct 10, 2019

So I think I know the problem. Its happening because we are generating listeners for each service instance in the headless service in the listener code. We need to fix that code to skip the pod's own service instance [or more specifically, skip ones where the instance.address == node.address]

Are the new listeners per instance supposed to be inbound or outbound? Anyway listener is expensive at envoy

@hzxuzhonghu
Copy link
Member Author

It is for outbound.

@phenixblue
Copy link

Is there a set release that will include the fix merged as part of #17791?

@hzxuzhonghu
Copy link
Member Author

It will be in release-1.4. And i think this should go into 1.3 as well. Will cherry-pick.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants