-
Notifications
You must be signed in to change notification settings - Fork 505
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Namerd thrift interface strands items in the active cache #1778
Comments
@adleong, Why do you want to use reference counting? I am not sure what you have in mind. I thought about that and in my opinion solution similar like in inactive cache will be enough. I could change the implementation of active cache and add a timeout. What do you think? |
@mmrozek a simple timeout would not work for the active cache. The active cache requires the guarantee that any observation that has a pending request MUST not be removed from the active cache. Having a timeout could cause observations that have pending requests to be removed from the active cache, which would cause those requests to hang forever. |
@adleong our dtab looks something like this: This allows our containers proxying HTTP traffic through linkerd to hit a service e.g. http:https://service-name first if the service is found in Consul, otherwise it would try to use DNS to resolve an address on the network with any port allowed, defaulting to port 80 if not specified. We have a dstPrefix /api on our router. This evaluates to the following for an unknown service: But we do have a binding: What we are seeing is that these items never leave thriftNameInterpreter/bindingcache/active This is causing a problem because the observers just keep checking for the addresses and logging messages in our namerd log every 5 seconds, for example: 22:53:12 Is there a configuration I am missing to stop this infinite logging, or reduce it, or is this related to this stranding in active cache issue here? |
The Namerd thrift interface uses 2 caches: an active cache for observations that currently have pending requests on them and an inactive cache for observations that do not have any pending requests on them. When Namerd receives a request for an observation, it moves it to the active cache. When the value of an observation changes, all pending requests on that observation are satisfied and the observation is moved to the inactive cache.
However, when a client of Namerd cancels a pending request for an observation, that observation is not moved to inactive. This means that if Linkerd stops observing a service (for example if the service is reaped or if the Linkerd is shut down) the observation will stay in Namerd's active cache.
We should use some form of reference counting so that when there are no pending requests for an observation (i.e. they have all been cancelled) then we can move the observation to the inactive cache.
The text was updated successfully, but these errors were encountered: