Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[1.1.1] Linkerd OOMs on too many [long] distinct names #1486

Open
ashald opened this issue Jul 10, 2017 · 6 comments
Open

[1.1.1] Linkerd OOMs on too many [long] distinct names #1486

ashald opened this issue Jul 10, 2017 · 6 comments
Labels

Comments

@ashald
Copy link
Member

ashald commented Jul 10, 2017

Was chasing another issue in v1.1.0 but hit this with a next test-case:

admin:
  port: ${PORT_ADMIN}

namers:
- kind: io.l5d.consul
  useHealthCheck: true
  consistencyMode: stale
  prefix: /consul

routers:
- protocol: http
  label: egress
  maxRequestKB: 51200
  maxResponseKB: 51200
  maxInitialLineKB: 10
  maxHeadersKB: 65
  dstPrefix: /http
  identifier:
    - kind: io.l5d.header.token
      header: Host
  interpreter:
    kind: io.l5d.mesh
    dst: /#/consul/.local/namerd-grpc
    root: /default
    experimental: true
  servers:
    - port: ${PORT_HTTP}
      ip: 0.0.0.0

Then just send requests with increasing length of the URI:

$ for n in {1..65000}; do; echo $n; curl -x localhost:${PORT_HTTP} -m1 "https://$(printf 'x%.0s' {1..$n}).com" &>/dev/null; done

Eventually linkerd crashes with messages like this one:

Exception in thread "finagle/netty3-9" java.lang.OutOfMemoryError: Java heap space
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "finagle/netty3-9"

VM error: Java heap space
java.lang.OutOfMemoryError: Java heap space
VM error: Java heap space
java.lang.OutOfMemoryError: Java heap space
VM error: Java heap space
java.lang.OutOfMemoryError: Java heap space
VM error: Java heap space
java.lang.OutOfMemoryError: Java heap space
VM error: Java heap space
java.lang.OutOfMemoryError: Java heap space
VM error: Java heap space
java.lang.OutOfMemoryError: Java heap space
VM error: Java heap space
java.lang.OutOfMemoryError: Java heap space
VM error: Java heap space
java.lang.OutOfMemoryError: Java heap space
VM error: Java heap space
Exception in thread "RawZipkinTracer-ShutdownHook" java.lang.InternalError: BMH.reinvoke=Lambda(a0:L/SpeciesData<L>,a1:L,a2:L,a3:L)=>{
    t4:L=MethodHandleImpl.array();
    t5:L=Species_L.argL0(a0:L);
    t6:L=MethodHandle.invokeBasic(t5:L,a1:L,a2:L,a3:L,t4:L);t6:L}
	at java.lang.invoke.MethodHandleStatics.newInternalError(MethodHandleStatics.java:127)

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "RawZipkinTracer-ShutdownHook"
java.lang.OutOfMemoryError: Java heap space
@adleong
Copy link
Member

adleong commented Jul 10, 2017

This is probably a duplicate of #1377 right?

@ashald
Copy link
Member Author

ashald commented Jul 10, 2017

I cannot say for 100% but I have a feeling that this might be a different issue. I tried running the bash snippet from above modified in a way that it generates a lot of distinct names but all of them are short. I ran test up until I resolved >20k names and it didn't crash till then. So this might have something to do with long names...

@ashald
Copy link
Member Author

ashald commented Jul 11, 2017

Given migration to Netty 4 this might not be an issue anymore, let me re-run the test with v1.1.1...

@ashald
Copy link
Member Author

ashald commented Jul 11, 2017

Was able to reproduce the issue with 1.1.1, between 2000 and 2100 iterations.

@ashald ashald changed the title [1.1.0] Linkerd OOMs on too many distinct names [1.1.1] Linkerd OOMs on too many distinct names Jul 11, 2017
@ashald ashald changed the title [1.1.1] Linkerd OOMs on too many distinct names [1.1.1] Linkerd OOMs on too many [long] distinct names Jul 11, 2017
@hawkw hawkw added the consul label Dec 8, 2017
@dtacalau
Copy link
Contributor

dtacalau commented Oct 22, 2019

@ashald I know it's been a long time, but, do you remember what config you used for namerd or any other bits and pieces needed to reproduce this issue?

I've tried to reproduce it with the latest linkerd 1.7.0 and I don't get the same behaviour as reported initially.

My setup:

  • Consul v1.5.1, single node install on my dev machine
  • start Linkerd 1.7.0 with the config from this issue.
  • start Namerd 1.7.0 with a basic config w/o consul. I've also tried with a config w/ consul configured as namer, but got the same results.
  • start 10 instances of the test script to send requests with increasing length of the URI;

My results:

  • the memory used, shown in admin page, is slowly increasing over time, after 1h run, >25k request, it varies between 400~1100 MB;
  • when I stop the test script, it stabilizes ~400MB;
  • during the test Linkerd keeps logging and error for each req:
    E 1021 19:51:04.143 EEST THREAD32 TraceId:2db63468bf3a208a: service failure: com.twitter.finagle.CancelledConnectionException
  • in admin page there's 100% failures

So, there's an increase in memory usage, but no OOM, so maybe something is missing from my setup.

I've attached all files used to reproduce this.
config-files-1489.zip
Screenshots from admin page:
pics.zip

@adleong
Copy link
Member

adleong commented Oct 22, 2019

This test script makes requests to all distinct names, causing massive binding cache churn and a huge rate of allocations. It's not surprising that GC is not able to keep up with this rate of allocations.

Linkerd is not intended to be able to deal with this type of traffic. I'm inclined to close this as won't fix unless there is a valid use case in here somewhere.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants