-
Notifications
You must be signed in to change notification settings - Fork 504
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gRPC unary throughput and latency issues #1379
Labels
Comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
We are getting reports of linkerd adding 10ms latency overhead at p50 with high concurrency.
I'm not able to reproduce the 10ms overhead at p50 but I have been able to find a throughput ceiling of 12.5k RPS which happens at a concurrency level of 20. After this, adding more clients doesn't help with throughput and only hurts latency. CPU utilization doesn't seem to go higher than 85% of total available CPU on a process with 6 cores pinned to it via
taskset
Here's a few representative lines from
strest-client
against linkerdConcurrency: 5
2017-06-12T09:46:53-07:00 0.0B 91011/0 10s L: 0 [ 0 2 ] 27 J: 0 0
Concurrency: 10
2017-06-12T09:48:44-07:00 0.0B 115573/0 10s L: 0 [ 2 3 ] 17 J: 0 0
Concurrency: 20
2017-06-12T09:50:00-07:00 0.0B 127035/0 10s L: 0 [ 4 6 ] 26 J: 0 0
Concurrency: 50
2017-06-12T09:51:09-07:00 0.0B 126739/0 10s L: 0 [ 12 21 ] 65 J: 0 0
Concurrency: 100
2017-06-12T09:52:38-07:00 0.0B 116632/0 10s L: 0 [ 35 55 ] 106 J: 0 0
Compare this with querying
strest-server
directly:Concurrency: 5
2017-06-12T09:59:58-07:00 0.0B 312830/0 10s L: 0 [ 0 0 ] 1 J: 0 0
Concurrency: 10
2017-06-12T10:00:10-07:00 0.0B 511413/0 10s L: 0 [ 0 0 ] 7 J: 0 0
Concurrency: 20
2017-06-12T10:00:30-07:00 0.0B 702391/0 10s L: 0 [ 0 1 ] 7 J: 0 0
Concurrency: 50
2017-06-12T10:12:24-07:00 0.0B 1010429/0 10s L: 0 [ 1 2 ] 30 J: 0 0
Concurrency: 100
2017-06-12T10:12:01-07:00 0.0B 1247139/0 10s L: 0 [ 2 3 ] 33 J: 0 0
Linkerd is resulting in a 90% drop in throughput due to some bottleneck. I don't have a smoking gun yet but this smells like Amdahl's law in action.
The best piece of evidence I have so far are the two attached flame graphs, one showing linkerd routing grpc unary traffic from
strest-client
and one showing routing grpc streaming traffic. What appears in the unary routing is a really high (> 40 frames) filter pipeline. So far, that's the only major difference but I'm still digging into whether there's monitor contention or some other parameter that's mistuned.grpc_svgs.zip
The text was updated successfully, but these errors were encountered: