-
Notifications
You must be signed in to change notification settings - Fork 505
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Linkerd CPU Hotspots and Thread Usage #2382
Comments
@cpretzer since you were helping us earlier this year :) |
thanks @j0sh3rs I'll have a look! |
@j0sh3rs we've been looking into whether the recent netty and finagle updates would address this issue. So far, I haven't been able to get a test environment running to reproduce. Can you tell me more about the jmeter tests? Are they hitting your application in a scripted way? Or do they just throw load at the application? |
@cpretzer unfortunately, I've changed roles and am no longer with Ping Identity, so I don't have the context anymore to be able to troubleshoot the jemeter behaviors any longer. I'm not sure who, if anyone, has taken this over from me, so it may be this would go stale and should be closed. |
@j0sh3rs thanks for the update! I hope your new role is going well |
Issue Type:
What happened:
After roughly a week of running performance testing load through linkerd. (through jmeter), we experience a case where linkerd sees a sharp increase in cpu usage and thread count jump, primarily related to netty UnboundedFuturePool usage:
When doing sampling against the profiles, the cpu hotspots look like this:
Only after restarting linkerd (by patching the daemonset pods) does the issue resolve, only to reappear
What you expected to happen:
Linkerd's thread and cpu usage remain appropriate for the load it is receiving.
How to reproduce it (as minimally and precisely as possible):
run nightly jmeter load test for 7-10 days. Note: the issue is also observed in an environment where no Jmeter test runs, suggestive that it's not specifically tied to the jmeter usage.
Anything else we need to know?:
We attempted to work around the issue, suspecting it could be related to #2268 but still saw the same behavior while running with BiasedLocking enabled.
Some core configs of our jmeter setup include:
with
Use KeepAlive
checked on the jobsEnvironment:
Linkerd 1.7.1 (running on default java8)
Config:
Kubernetes 1.15.9 running on ubuntu 16.04
AWS m5.4xlarge instance type with EBS optimizations
The text was updated successfully, but these errors were encountered: