Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak when client cache is evicting #1377

Open
klingerf opened this issue Jun 9, 2017 · 7 comments
Open

Memory leak when client cache is evicting #1377

klingerf opened this issue Jun 9, 2017 · 7 comments
Assignees

Comments

@klingerf
Copy link
Member

klingerf commented Jun 9, 2017

First reported by @ashald and @edio.

In a demo setup where the number of destinations exceeds the bindingCache, the linkerd process leaks memory as it adds/removes clients to/from the cache.

The easiest way to reproduce is with this docker-compose setup:

client-memory-issue.zip

Start the demo with:

docker-compose build && docker-compose up -d

Then load the linkerd admin dashboard on docker port 9990. The initial setup has 10 destinations and a client cache size of 10. You can see from the dashboard that the heap size for the process hovers at ~85mb.

Next stop the demo:

docker-compose stop && docker-compose rm -vf

And edit linkerd.yml to shrink the binding cache size from 10 to 9. Then restart the demo and reload the admin dashboard. You'll see the heap size start to creep up over time. Left running for about 30 minutes, heap will be ~400mb.

@chrisgoffinet
Copy link
Contributor

I have confirmed this is a real leak. This exists in the latest build as well. I've taken a heap dump and working on a patch to fix this.

Tim-Brooks pushed a commit to Tim-Brooks/linkerd that referenced this issue Dec 20, 2018
This change is a simplified implementation of the Builder.Path() and
Visitor().ExpandPathsToFileVisitors() functions used by kubectl to parse files
and directories. The filepath.Walk() function is used to recursively traverse
directories. Every .yaml or .json resource file in the directory is read
into its own io.Reader. All the readers are then passed to the YAMLDecoder in the
InjectYAML() function.

Fixes linkerd#1376

Signed-off-by: ihcsim <[email protected]>
@dtacalau
Copy link
Contributor

@adleong Please assign this issue to me.

@dtacalau
Copy link
Contributor

I've tried to reproduce the memory leak with latest linkerd/namerd and had to modify the demo setup, here are the changes to initial demo setup:

  1. I had to remove all code related to prometheus from the server.go because when trying to build the app using golang onbuild image I got a go module dependencies error:
Building app01
Step 1/2 : FROM golang:onbuild
# Executing 3 build triggers
 ---> Using cache
 ---> Running in df941fedd080
+ exec go get -v -d
github.com/prometheus/client_golang (download)
github.com/beorn7/perks (download)
github.com/cespare/xxhash (download)
package github.com/cespare/xxhash/v2: cannot find package "github.com/cespare/xxhash/v2" in any of:
    /usr/local/go/src/github.com/cespare/xxhash/v2 (from $GOROOT)
    /go/src/github.com/cespare/xxhash/v2 (from $GOPATH)
github.com/golang/protobuf (download)
github.com/prometheus/client_model (download)
github.com/prometheus/common (download)
github.com/matttproud/golang_protobuf_extensions (download)
github.com/prometheus/procfs (download)

But removing prometheus did not matter, the leak was still reproducing on linkerd 1.0.1.

  1. Upped the versions, linkerd to 1.7.0, namerd to latest, slow_cooker to 1.2.0
  2. Update linkerd.yaml to have linkerd admin bind on all interfaces, w/o this you'll get a connection refused when accessing admin from browser of docker host https://localhost:9990

I've attached the updated demo setup files.
client-memory-issue-mod.zip

@dtacalau
Copy link
Contributor

Results of reproducing the leak with latest linkerd/namerd: screenshots with heap size after a 30 min run and cpu load.
pics.zip

After 30 minutes, the heap size increased a little nowhere near the ~400mb you get when reproducing with old linkerd 1.0.1. So, probably the leak is not a issue anymore, but I'd let the demo setup run for longer before reach a conclusion.

But I noticed another issue: the CPU load is very high after shrinking the binding cache size from 10 to 9. While before shrinking I noticed a 30% load max, after shrinking it averaged about 150%, so this would be another issue to investigate.

@dtacalau
Copy link
Contributor

@ashald and @edio it seems the leak does not reproduce anymore with latest linkerd 1.7.0. Do you want to try and reproduce it on your end? According to @adleong the high cpu load is normal after shrinking the binding cache.

@ashald
Copy link
Member

ashald commented Oct 22, 2019

I don't think this issue was observed recently so most likely you're right.
I no longer work there so if anyone can confirm this - that would be @edio.

@koiuo
Copy link
Contributor

koiuo commented Oct 24, 2019

Frankly speaking, I have hard time recalling any details, but as @ashald mentioned, we have not been observing any related symptoms for quite a while already.
I guess this issue can be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants