Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enabling zipkin b3 propagation leads to possible incorrect spans #2333

Closed
dtacalau opened this issue Sep 26, 2019 · 3 comments
Closed

enabling zipkin b3 propagation leads to possible incorrect spans #2333

dtacalau opened this issue Sep 26, 2019 · 3 comments
Assignees

Comments

@dtacalau
Copy link
Contributor

This could be a bug, not sure, was found while implementing and testing #2114.

What happened:
Problem is linkerd generates a new spanid and a new parentspanid for each router which has configured property "tracePropagator: kind: io.l5d.zipkin", but the parentspanid is different from the previous spanid.

What you expected to happen:
I'd expect parentspanid of a new span to be the same as spanid of the previous span. We need to make sure the tracing spans are generated correctly w/ and w/o propagation enabled.

How to reproduce it (as minimally and precisely as possible):
Here is my setup:

  1. Start linkerd using a config similar with the one from https://linkerd.io/2016/05/17/distributed-tracing-for-polyglot-microservices . Linkerd config defines three routers listening on three different ports. The router on port 9000 forwards to the router on port 9001, which forwards to the router on port 9002, which forwards to the Linkerd admin service on port 9990.
  2. Start zipkin service with SCRIBE transport enabled: export SCRIBE_ENABLED=true; java -jar zipkin.jar
  3. Start wireshark capture on all interfaces
  4. Use a http client from finale to send a request to linkerd 9000 port: git/finagle/doc/src/sphinx/code/quickstart$ ./sbt 'runMain Client', which by default will contain a traceid and a spanid using "x-b3-" headers

Scenario 1 (router on port 9000 has tracePropagator io.l5d.zipkin enabled, the other routers do not have tracePropagator) (linkerd config: 1-linkerd-one-router-with-propagation.yaml, capture: 1-capture-one-router-with-propagation.pcapng)
Packet 44
- Client send a HTTP request to l5d 9000 port, containing (X-B3-SpanId: 74d95541b82adc24; X-B3-TraceId: 74d95541b82adc24)
- x-b3- headers present, first span, spanid=traceid

Packet 49
- l5d forwards the request to l5d 9001 port, (x-b3-spanid: 9f387655f61ca970; x-b3-parentspanid: 135fe09bbbd2206e; x-b3-traceid: 74d95541b82adc24)
- new spanid different from packet 44
- parentspanid != spanid of packet 44????
- keeps the traceid
Packet 54 - l5d forwars the request to l5d 9002 port, (x-b3-spanid: 9f387655f61ca970; x-b3-parentspanid: 135fe09bbbd2206e; x-b3-traceid: 74d95541b82adc24)
- keeps all b3 trace unchanged (same spanid, same parentid, same traceid)

Scenario 2: (all routers have tracePropagator io.l5d.zipkin enabled) (linkerd config: 2-linkerd-all-three-routers-with-propagation.yaml capture: 2-capture-all-three-routers-with-propagation.pcapng)
Packet 41
- request from client to port 9000 with traceid=spanid
Packet 47
- request fwd to port 9001
- new spanid, new parentspanid, same traceid
Packet 52
- request fwd to port 9002
- new spanid, new parentspanid, same traceid
Packet 57
- request fwd to port 9990
- new spanid, new parentspanid, same traceid

Anything else we need to know?:
Attached 2 scenarios of testing with linkerd config and captures.

Environment:

  • linkerd/namerd version, config files:
    1.7.0
  • Platform, version, and config files (Kubernetes, DC/OS, etc):
    Ubuntu/localhost
  • Cloud provider or hardware configuration:
@dtacalau
Copy link
Contributor Author

Adding attachments: config files and captures for the 2 scenarios described above

linkerd-configs-and-captures.zip

@dtacalau
Copy link
Contributor Author

@adriancole Timing is interesting indeed, thanks for mentioning spring-cloud/spring-cloud-sleuth#1452 in my other PR, will take a look. In the meantime I was investigating this issue. What do you think, any resource I could use to make sure spans are generated correctly apart from reviewing the code and following traceIds?

@dtacalau
Copy link
Contributor Author

Not a bug, the behavior reported in issue description (my first comment) is expected behaviour.

I have verified Linkerd tracing in two ways:

  • with Zipkin trace propagation enabled, having in config tracePropagator io.l5d.zipkin, for one router and for all three routers;
  • with Zipkin trace propagation disabled, so Linkerd uses l5d-ctx-trace for trace propagation instead of Zipkin b3 headers.

It seems Linkerd has the same pattern of generating tracing spans in both cases, w/ and w/o Zipking tracing propagation,but, the catch is that you cannot see all generated spans using just console logging, you have to check Zipkin UI which shows all of them.

So. whenever a request reaches a router, Linkerd will create a server span, using the trace context from the request as starting point. If there's tracing context in the request, l5d-trace-context or zipkin, the router server span will be created as a span child of the request original tracing context. If there's no tracing context, the router span will have no parent span. Next, when the request is about to leave the router, it will create a client span, using the previous server span as parent. This
client span will be saved in the request headers before the request leaves the router.

For a router, you can see both the server span and client span using Zipkin UI. Using console logging you can print only the client span when this is set in the request headers.

Here's a diagram of the logic generating server/client spans and where you have access to see them:

x--> packet enters router1
(console logger) W 0927 09:31:06.075 EEST THREAD24: get traceid Some(ca75819b4fff3acc.ca75819b4fff3acc<:ca75819b4fff3acc)
1----> span1 - router1 creates server span
(zipkin ui only) traceId ca75819b4fff3acc spanId 6c33825db069e4e1 parentId ca75819b4fff3acc
2----> span2 - router1 creates client span
(console logger & zipkin ui) W 0927 09:31:06.243 EEST THREAD27 TraceId:ca75819b4fff3acc: set traceid ca75819b4fff3acc.26bbce9caebf53ac<:6c33825db069e4e1

x-----> packet leaves router1

x-----> packet enters router2
(console logger) W 0927 09:31:06.255 EEST THREAD28: get traceid Some(ca75819b4fff3acc.26bbce9caebf53ac<:6c33825db069e4e1)
3--------> span3 - router2 creates server span
(zipkin ui only) traceId ca75819b4fff3acc spanId 0602680015fde24a parentId 26bbce9caebf53ac
4----------> span4 - router2 creates client span
(console logger & zipkin ui) W 0927 09:31:06.320 EEST THREAD29 TraceId:ca75819b4fff3acc: set traceid ca75819b4fff3acc.d81ce149fd1e735e<:0602680015fde24a
x------------> packet leaves router2

x------------> packet enters router3
(console logger) W 0927 09:31:06.325 EEST THREAD30: get traceid Some(ca75819b4fff3acc.d81ce149fd1e735e<:0602680015fde24a)
5--------------> span5 - router3 creates server span
(zipkin ui only) traceId ca75819b4fff3acc spanId 0ecfa9b7cc7e4795 parentId d81ce149fd1e735e
6----------------> span6 - router3 creates client span
(console logger & zipkin ui) W 0927 09:31:06.371 EEST THREAD32 TraceId:ca75819b4fff3acc: set traceid ca75819b4fff3acc.bb740a6eea5b22e6<:0ecfa9b7cc7e4795
x----------------> packet leaves router3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant