Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Namerd thrift interface strands items in the active cache #1778

Open
adleong opened this issue Jan 12, 2018 · 3 comments
Open

Namerd thrift interface strands items in the active cache #1778

adleong opened this issue Jan 12, 2018 · 3 comments

Comments

@adleong
Copy link
Member

adleong commented Jan 12, 2018

The Namerd thrift interface uses 2 caches: an active cache for observations that currently have pending requests on them and an inactive cache for observations that do not have any pending requests on them. When Namerd receives a request for an observation, it moves it to the active cache. When the value of an observation changes, all pending requests on that observation are satisfied and the observation is moved to the inactive cache.

However, when a client of Namerd cancels a pending request for an observation, that observation is not moved to inactive. This means that if Linkerd stops observing a service (for example if the service is reaped or if the Linkerd is shut down) the observation will stay in Namerd's active cache.

We should use some form of reference counting so that when there are no pending requests for an observation (i.e. they have all been cancelled) then we can move the observation to the inactive cache.

@mmrozek
Copy link
Contributor

mmrozek commented May 17, 2018

@adleong, Why do you want to use reference counting? I am not sure what you have in mind. I thought about that and in my opinion solution similar like in inactive cache will be enough. I could change the implementation of active cache and add a timeout. What do you think?

@adleong
Copy link
Member Author

adleong commented May 17, 2018

@mmrozek a simple timeout would not work for the active cache. The active cache requires the guarantee that any observation that has a pending request MUST not be removed from the active cache. Having a timeout could cause observations that have pending requests to be removed from the active cache, which would cause those requests to hang forever.

@RogerReed
Copy link

@adleong our dtab looks something like this:
/ph => /$/io.buoyant.rinet
/api => /ph/80
/api => /$/io.buoyant.porthostPfx/ph
/api => /#/io.l5d.consul/default

This allows our containers proxying HTTP traffic through linkerd to hit a service e.g. http:https://service-name first if the service is found in Consul, otherwise it would try to use DNS to resolve an address on the network with any port allowed, defaulting to port 80 if not specified. We have a dstPrefix /api on our router.

This evaluates to the following for an unknown service:
service name: /api/unknown-service-name
resolutions considered:
/#/io.l5d.consul/default/app-public/unknown-service-name (neg)
/$/io.buoyant.porthostPfx/ph/unknown-service-name (neg)
/$/io.buoyant.rinet/80/unknown-service-name (neg)

But we do have a binding:
/$/io.buoyant.rinet/80/unknown-service-name /ph => /$/io.buoyant.rinet Namer Match
[/$/io.buoyant.rinet/80/unknown-service-name] (residual: /) Bound Path

What we are seeing is that these items never leave thriftNameInterpreter/bindingcache/active

This is causing a problem because the observers just keep checking for the addresses and logging messages in our namerd log every 5 seconds, for example:

22:53:12
D 0410 22:53:12.083 UTC THREAD26 TraceId:c6eb8a03e8f4da03: Failed to resolve service-name-unknown. Error java.net.UnknownHostException: service-name-unknown: Name or service not known
22:53:12
D 0410 22:53:12.083 UTC THREAD26 TraceId:c6eb8a03e8f4da03: Resolution failed for all hosts in ArraySeq((service-name-unknown,80,Map()))

Is there a configuration I am missing to stop this infinite logging, or reduce it, or is this related to this stranding in active cache issue here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
No open projects
Development

No branches or pull requests

3 participants