Memory leak when client cache is evicting #1377

klingerf · 2017-06-09T23:56:29Z

First reported by @ashald and @edio.

In a demo setup where the number of destinations exceeds the bindingCache, the linkerd process leaks memory as it adds/removes clients to/from the cache.

The easiest way to reproduce is with this docker-compose setup:

client-memory-issue.zip

Start the demo with:

docker-compose build && docker-compose up -d

Then load the linkerd admin dashboard on docker port 9990. The initial setup has 10 destinations and a client cache size of 10. You can see from the dashboard that the heap size for the process hovers at ~85mb.

Next stop the demo:

docker-compose stop && docker-compose rm -vf

And edit linkerd.yml to shrink the binding cache size from 10 to 9. Then restart the demo and reload the admin dashboard. You'll see the heap size start to creep up over time. Left running for about 30 minutes, heap will be ~400mb.

The text was updated successfully, but these errors were encountered:

chrisgoffinet · 2018-05-31T21:50:04Z

I have confirmed this is a real leak. This exists in the latest build as well. I've taken a heap dump and working on a patch to fix this.

This change is a simplified implementation of the Builder.Path() and Visitor().ExpandPathsToFileVisitors() functions used by kubectl to parse files and directories. The filepath.Walk() function is used to recursively traverse directories. Every .yaml or .json resource file in the directory is read into its own io.Reader. All the readers are then passed to the YAMLDecoder in the InjectYAML() function. Fixes linkerd#1376 Signed-off-by: ihcsim <[email protected]>

dtacalau · 2019-10-15T17:05:12Z

@adleong Please assign this issue to me.

dtacalau · 2019-10-21T08:49:42Z

I've tried to reproduce the memory leak with latest linkerd/namerd and had to modify the demo setup, here are the changes to initial demo setup:

I had to remove all code related to prometheus from the server.go because when trying to build the app using golang onbuild image I got a go module dependencies error:

Building app01
Step 1/2 : FROM golang:onbuild
# Executing 3 build triggers
 ---> Using cache
 ---> Running in df941fedd080
+ exec go get -v -d
github.com/prometheus/client_golang (download)
github.com/beorn7/perks (download)
github.com/cespare/xxhash (download)
package github.com/cespare/xxhash/v2: cannot find package "github.com/cespare/xxhash/v2" in any of:
    /usr/local/go/src/github.com/cespare/xxhash/v2 (from $GOROOT)
    /go/src/github.com/cespare/xxhash/v2 (from $GOPATH)
github.com/golang/protobuf (download)
github.com/prometheus/client_model (download)
github.com/prometheus/common (download)
github.com/matttproud/golang_protobuf_extensions (download)
github.com/prometheus/procfs (download)

But removing prometheus did not matter, the leak was still reproducing on linkerd 1.0.1.

Upped the versions, linkerd to 1.7.0, namerd to latest, slow_cooker to 1.2.0
Update linkerd.yaml to have linkerd admin bind on all interfaces, w/o this you'll get a connection refused when accessing admin from browser of docker host https://localhost:9990

I've attached the updated demo setup files.
client-memory-issue-mod.zip

dtacalau · 2019-10-21T09:04:24Z

Results of reproducing the leak with latest linkerd/namerd: screenshots with heap size after a 30 min run and cpu load.
pics.zip

After 30 minutes, the heap size increased a little nowhere near the ~400mb you get when reproducing with old linkerd 1.0.1. So, probably the leak is not a issue anymore, but I'd let the demo setup run for longer before reach a conclusion.

But I noticed another issue: the CPU load is very high after shrinking the binding cache size from 10 to 9. While before shrinking I noticed a 30% load max, after shrinking it averaged about 150%, so this would be another issue to investigate.

dtacalau · 2019-10-22T18:10:46Z

@ashald and @edio it seems the leak does not reproduce anymore with latest linkerd 1.7.0. Do you want to try and reproduce it on your end? According to @adleong the high cpu load is normal after shrinking the binding cache.

ashald · 2019-10-22T22:45:19Z

I don't think this issue was observed recently so most likely you're right.
I no longer work there so if anyone can confirm this - that would be @edio.

koiuo · 2019-10-24T00:43:34Z

Frankly speaking, I have hard time recalling any details, but as @ashald mentioned, we have not been observing any related symptoms for quite a while already.
I guess this issue can be closed.

adleong mentioned this issue Jul 10, 2017

[1.1.1] Linkerd OOMs on too many [long] distinct names #1486

Open

adleong assigned dtacalau Oct 15, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory leak when client cache is evicting #1377

Memory leak when client cache is evicting #1377

klingerf commented Jun 9, 2017

chrisgoffinet commented May 31, 2018

dtacalau commented Oct 15, 2019

dtacalau commented Oct 21, 2019

dtacalau commented Oct 21, 2019

dtacalau commented Oct 22, 2019

ashald commented Oct 22, 2019

koiuo commented Oct 24, 2019

Memory leak when client cache is evicting #1377

Memory leak when client cache is evicting #1377

Comments

klingerf commented Jun 9, 2017

chrisgoffinet commented May 31, 2018

dtacalau commented Oct 15, 2019

dtacalau commented Oct 21, 2019

dtacalau commented Oct 21, 2019

dtacalau commented Oct 22, 2019

ashald commented Oct 22, 2019

koiuo commented Oct 24, 2019