Consul Namer doesn't update Mesh Interpreter when the Consul payload is significantly large #2363

cpretzer · 2019-12-17T06:42:45Z

Issue Type:

Bug report
Feature request

What happened:

Version 1.7.0 of Linkerd and Namerd lose connectivity on an update when the Consul namer process a large number of replicas and sends a request with a large body.

In this scenario, Namerd is configured to use the Consul namer and Linkerd is configured to use the io.l5d.mesh interface.

When the Consul store changes, Namerd will trigger an update to Linkerd through the mesh interpreter. Namerd will send a few data payloads, which Linkerd receives, and then the traffic just stops altogether.

What you expected to happen:

The update from Namerd should complete successfully and Linkerd should have an exact copy of the list of replicas that are stored in Namerd and Consul

How to reproduce it (as minimally and precisely as possible):

Reproducing this is a multi-step process which requires setting up Consul, Linkerd, and Namerd.

Requires Java 8 or 9

Create a working directory
1. mkdir consul-namer-repro, for example
Download Consul
1. Get the version for your OS: https://www.consul.io/downloads.html
2. Make sure that you can run it from the working directory
Download Linkerd and Namerd
1. https://github.com/linkerd/linkerd/releases/download/1.7.0/namerd-1.7.0-exec
2. https://github.com/linkerd/linkerd/releases/download/1.7.0/linkerd-1.7.0-exec
3. Copy the files to the working directory
4. Make the files executable: chmod 755 *1.7.0-exec
Download configuration files
1. https://gist.github.com/cpretzer/36682f0485abb106c8ca3ae028f1580c
2. click "Download Zip" and the tgz file will be compressed in another zip file
Start the services
Open a terminal window for each of the commands below (You can background the processes, but it'll be more difficult to read the logs)
./start_consul.sh: The consul logs should report that five services have been registered
./start_namerd.sh
./start_linkerd.sh
docker run --name --rm nginx_one -p 8080:8080 nginx
docker run --name --rm nginx_two -p 8081:8080 nginx
Check the current replica counts
This shows the number of replicas currently stored by Namerd and Linkerd, and they should all read 0
./admin-counts.sh
Curl the Linkerd service to get the replicas
curl https://localhost:4140/cat
Check the updated replica counts
The counts should now be at 5
./admin-counts.sh
Increase the number of replicas in consul
cp long.json conf-dir/ && consul reload
At this point the Namerd and Linkerd logs will show communication between each other. There may be some ChannelClosedException messages, but they are not related to updating the consul services
Check the updated replica counts
./admin-counts.sh
The counts will be out of sync.
namer_state will have about 6000
client_state and interpreter_state will have a much lower value and will be equal

Anything else we need to know?:

Any subsequent updates to the consul replicas, such as removing the long.json file or adding short-2.json file will not have any effect. Namerd will not attempt to communicate with Linkerd, which makes me think that the connection is in an unexpected state.

I have some tcpdump files with the packet information from the steps above.

To see the consul updates working normally:

Remove long.json from conf-dir
- rm -rf conf-dir/long.json
consul reload
Use ctrl-c to stop the Linkerd process, then restart it
cp short-2.json conf-dir/ && consul reload
Observe the communication between Linkerd and Namerd
./admin-counts.sh will show that the counts are all in sync
rm -rf conf-dir/short-2.json && consul reload
./admin-counts.sh will show that the counts are all in sync again

Environment:

linkerd/namerd version, config files:
Linkerd: 1.7.0
Namerd: 1.7.0
Consul: 1.5.1
Config files: https://gist.github.com/cpretzer/36682f0485abb106c8ca3ae028f1580c
Java Version: 1.8.0_211
Platform, version, and config files (Kubernetes, DC/OS, etc):
OS: macOS Mojave

The text was updated successfully, but these errors were encountered:

zaharidichev · 2019-12-17T14:01:58Z

So far I know two things for sure:

as I was expecting this has not much to do with Consul. If you scratch consul out of the equation and simply use a static file for the resolution data, if the file is significantly large (has many endpoints), you will see the same result
this is not a 1.7 bug. It manifests itself with 1.6.0 as well

What I think:

I think this is a bug with the H2 implementation or at least something wrong with how the H2 client is used by the mesh interface.

cpretzer · 2019-12-17T15:02:27Z

@zaharidichev thanks for taking a look at this

I'll continue to look at how the H2 client is used by the mesh interface.

A couple of things that I'm suspicious of are:

the WINDOW_UPDATE calls
the payloads sent from Namerd to Linkerd contain duplicate entries. Almost as if Namerd is sending multiple payloads with the entirety of its known replica sets at the time. I've observed that Namerd doesn't get the entire payload from Consul in one go, so I wonder if there is a race condition in how the data is being sent or read.

zaharidichev · 2019-12-17T15:26:11Z

I do not think the WINDOW_UPDATE calls are a reason for any concern. I am currently debugging that and it seems to me that there is only one H2 data frame ever send from namerd to linkerd no matter how big the payload is, which might have to do with the fact that this is always an UnaryToUnary H2 mode. Not sure, this is all hypothetical at the moment. I will try to get to the bottom of it. But this is almost certainly a problem of the H2 interface. Will keep you posted :)

cpretzer · 2019-12-18T06:53:52Z

@zaharidichev I continued to debug this today and made progress that confirms our conversation with @adleong today about this being a Codec issue.

The communication is definitely UnaryToStream, based on this snippet from the generated the resolver.pb.scala file:

...
trait Resolver {
  def getReplicas(req: io.linkerd.mesh.ReplicasReq): com.twitter.util.Future[io.linkerd.mesh.Replicas]
  def streamReplicas(req: io.linkerd.mesh.ReplicasReq): io.buoyant.grpc.runtime.Stream[io.linkerd.mesh.Replicas]
}
...

I didn't get a chance to test with the Scala server, but a different test that I ran was to configure the the [H2.server](https://github.com/linkerd/linkerd/blob/master/namerd/iface/mesh/src/main/scala/io/buoyant/namerd/iface/MeshIfaceInitializer.scala#L103) on the MeshIfaceInitializer to a different readTimeout value set for the transport. In this case, I saw that the interface continuously sent requests to the interpreter.

This indicates to me that the interface is not getting an ACK from the interpreter for the payload that causes the failure. In my tests, this TCP payload is never larger than 40kb, so I don't think that this is an aggregate size issue; again it seems more like how the Codec is decoding the streams and the frames.

Here is a snippet of the configuration that you can use in [MeshIfaceInitializer](https://github.com/linkerd/linkerd/blob/master/namerd/iface/mesh/src/main/scala/io/buoyant/namerd/iface/MeshIfaceInitializer.scala#L103) to reproduce that particular behavior:

...
      H2.server
        .withTracer(NullTracer)
        .configuredParams(tlsParams)
        .configuredParams(socketOptParams)
        .withStatsReceiver(stats1)
        .withTransport.verbose
        .withTransport.readTimeout(1.minute)
        .serve(addr, dispatcher)
...

The 1.minute value is arbitrary and you can shorten it. The logs will show the attempts to resend at any Duration that you set.

I tried similar configurations of the [H2.client](https://github.com/linkerd/linkerd/blob/master/interpreter/mesh/src/main/scala/io/buoyant/interpreter/MeshInterpreterInitializer.scala#L87) on the MeshInterpreterInitializer side, and the behavior did not change.

This evening, I've been looking through [DecodingStream](https://github.com/linkerd/linkerd/blob/master/grpc/runtime/src/main/scala/io/buoyant/grpc/runtime/DecodingStream.scala), [Codec](https://github.com/linkerd/linkerd/blob/master/grpc/runtime/src/main/scala/io/buoyant/grpc/runtime/Codec.scala), and some of the related classes to make sure that the frames and streams are being completely read and properly drained.

There may be an issue with the frames not being assembled in the right order, but I haven't found the code that handles that. @adleong can you point me in the right direction?

cpretzer · 2019-12-24T05:12:46Z

This is confirmed to be fixed in Linkerd 1.7.1

This was referenced Dec 19, 2019

merge changes for 2363 #2365

Closed

Add initialStreamWindowSizeKB #2366

Merged

cpretzer closed this as completed Dec 24, 2019

cpretzer mentioned this issue Jan 8, 2020

add initialStreamWindowSizeKB to 1.6.4 release #2370

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consul Namer doesn't update Mesh Interpreter when the Consul payload is significantly large #2363

Consul Namer doesn't update Mesh Interpreter when the Consul payload is significantly large #2363

cpretzer commented Dec 17, 2019

zaharidichev commented Dec 17, 2019 •

edited

Loading

cpretzer commented Dec 17, 2019

zaharidichev commented Dec 17, 2019

cpretzer commented Dec 18, 2019

cpretzer commented Dec 24, 2019

Consul Namer doesn't update Mesh Interpreter when the Consul payload is significantly large #2363

Consul Namer doesn't update Mesh Interpreter when the Consul payload is significantly large #2363

Comments

cpretzer commented Dec 17, 2019

Issue Type:

What happened:

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

zaharidichev commented Dec 17, 2019 • edited Loading

cpretzer commented Dec 17, 2019

zaharidichev commented Dec 17, 2019

cpretzer commented Dec 18, 2019

cpretzer commented Dec 24, 2019

zaharidichev commented Dec 17, 2019 •

edited

Loading