-
Notifications
You must be signed in to change notification settings - Fork 7.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RDS Stale since 1.22.1 upgrade #51612
Comments
Most likely cause of this is moving to delta XDS. Not sure why at all, but that is the main change in the area. Will need to look into it some more. If you can get |
This may be related to some other nacking issues with delta I've been looking into. I need to bump with the envoy folks |
@howardjohn will add it, annoyingly they've self resolved now so we'll need it to come back! |
|
I'll run with |
Argh scratch that, i have to do it on the proxies and not pilot? |
No, it needs to be on the proxies. But you can configure that cluster wide in meshConfig as defaultConfig.proxyMetadata |
Does meshConfig.proxyMetadata merge with any pod level proxyMetadata? Feature request for flags like this, it'd be really great if they were control plane level. Historically we've always been able to configure push configuration there, if this was a production impacting incident it'd be much nicer for us to be able to opt out of the behaviour at the pilot level, than redeploying 1000's of workloads. The fact that there's even an opt-out signifies that there was sufficient enough concern it might break something. |
In this case it cannot really be controlled at the control plane level since the protocol is initiated by the proxy |
Ah, OK. |
If it helps the one thing I've noticed is it happens during the day when there's more cluster churn (things being deployed). Overnight, it goes away. So it's certainly related to deployments (not materially changing the istio spec, just new pods) causing config pushes. |
Is this the right place to submit this?
Bug Description
Hi,
I've upgraded some of our clusters from 1.21 to 1.22.1 today; and our alerts picked up RDS marked as stale:
Istio docs say: "STALE means that Pilot has sent an update to Envoy but has not received an acknowledgement. This usually indicates a networking issue between Envoy and Pilot or a bug with Istio itself."
There's nothing telling in the istiod logs, or the proxies for these apps:
Version
Additional Information
What I find interesting is it seems to be the same subset (like, 6) of applications on each cluster (700 apps on the clusters). There is nothing unique that i'm aware of compared to the other apps (they're all built from the same helm chart so have loosely the same configuration).
These clusters are completely isolated/unique by the way, and the same app is deployed on them.
The text was updated successfully, but these errors were encountered: