-
Notifications
You must be signed in to change notification settings - Fork 504
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support for trace logs for all outbound and inbound communication in linkerd and namerd #2245
Comments
@thedebugger I've begun working on this and it looks the k8s namer has some reach logging statements that are available. For example, The log lines below show Linkerd with a k8s namer set to trace logging. The namer is watching a service whose endpoints refresh every 30 seconds. D 0517 15:37:19.813 PDT THREAD105 TraceId:36560018d6e610fa: k8s event on '/api/v1/watch/namespaces/default/endpoints/hello1' received 'EndpointsModified(Endpoints(Some(List(EndpointSubset(None,Some(List(EndpointAddress(10.1.1.55,Some(docker-for-desktop),Some(ObjectReference(Some(Pod),Some(default),Some(hello1-5c4b78f69-tkqld),Some(5317b137-78f4-11e9-b3b7-025000000001),None,Some(173776),None))))),Some(List(EndpointPort(7777,Some(http),Some(TCP))))))),Some(Endpoints),Some(ObjectMeta(Some(hello1),None,Some(default),Some(/api/v1/namespaces/default/endpoints/hello1),Some(b7e56b41-78ec-11e9-b3b7-025000000001),Some(173777),None,Some(2019-05-17T21:42:50Z),None,None,None)),Some(v1)))
D 0517 15:37:19.813 PDT THREAD105 TraceId:36560018d6e610fa: k8s watch on '/api/v1/watch/namespaces/default/endpoints/hello1' registered event with same or newer resource version Some(173777) (older resource version was Some(173759))
D 0517 15:37:19.813 PDT THREAD105 TraceId:36560018d6e610fa: k8s ns default service hello1 modified endpoints
D 0517 15:37:19.813 PDT THREAD105 TraceId:36560018d6e610fa: k8s ns default service hello1 added Endpoint(/10.1.1.55,Some(docker-for-desktop))
D 0517 15:37:19.813 PDT THREAD105 TraceId:36560018d6e610fa: k8s ns default service hello1 mapped port http to 7777
D 0517 15:37:47.726 PDT THREAD105 TraceId:36560018d6e610fa: k8s event on '/api/v1/watch/namespaces/default/endpoints/hello1' received 'EndpointsModified(Endpoints(None,Some(Endpoints),Some(ObjectMeta(Some(hello1),None,Some(default),Some(/api/v1/namespaces/default/endpoints/hello1),Some(b7e56b41-78ec-11e9-b3b7-025000000001),Some(173833),None,Some(2019-05-17T21:42:50Z),None,None,None)),Some(v1)))
D 0517 15:37:47.726 PDT THREAD105 TraceId:36560018d6e610fa: k8s watch on '/api/v1/watch/namespaces/default/endpoints/hello1' registered event with same or newer resource version Some(173833) (older resource version was Some(173777))
D 0517 15:37:47.726 PDT THREAD105 TraceId:36560018d6e610fa: k8s ns default service hello1 modified endpoints
D 0517 15:37:47.727 PDT THREAD105 TraceId:36560018d6e610fa: k8s ns default service hello1 removed Endpoint(/10.1.1.55,Some(docker-for-desktop))
D 0517 15:37:47.727 PDT THREAD105 TraceId:36560018d6e610fa: k8s ns default service hello1 removed port mapping from http to 7777
D 0517 15:37:49.696 PDT THREAD105 TraceId:36560018d6e610fa: k8s event on '/api/v1/watch/namespaces/default/endpoints/hello1' received 'EndpointsModified(Endpoints(Some(List(EndpointSubset(None,Some(List(EndpointAddress(10.1.1.56,Some(docker-for-desktop),Some(ObjectReference(Some(Pod),Some(default),Some(hello1-6b8f7994c8-d5kpp),Some(6515bb6b-78f4-11e9-b3b7-025000000001),None,Some(173843),None))))),Some(List(EndpointPort(7777,Some(http),Some(TCP))))))),Some(Endpoints),Some(ObjectMeta(Some(hello1),None,Some(default),Some(/api/v1/namespaces/default/endpoints/hello1),Some(b7e56b41-78ec-11e9-b3b7-025000000001),Some(173845),None,Some(2019-05-17T21:42:50Z),None,None,None)),Some(v1)))
D 0517 15:37:49.696 PDT THREAD105 TraceId:36560018d6e610fa: k8s watch on '/api/v1/watch/namespaces/default/endpoints/hello1' registered event with same or newer resource version Some(173845) (older resource version was Some(173833))
D 0517 15:37:49.697 PDT THREAD105 TraceId:36560018d6e610fa: k8s ns default service hello1 modified endpoints
D 0517 15:37:49.697 PDT THREAD105 TraceId:36560018d6e610fa: k8s ns default service hello1 added Endpoint(/10.1.1.56,Some(docker-for-desktop))
D 0517 15:37:49.697 PDT THREAD105 TraceId:36560018d6e610fa: k8s ns default service hello1 mapped port http to 7777
... Note: There is a new log line ( What we would need to add is logging between Linkerd and Namerd's hop, which is everything from points 3 to 6. Thoughts? |
Hi @dadjeibaah, thanks for taking a look. The existing logging for 1 and 2 looks good for our purposes. The biggest ongoing problem we have is linkerd state diverging from namerd. A namerd release with [points 3 and 4] logs identifying the route and downstream linkerd might be the key. |
@cheriot, got it. I wonder if you could infer which downstream Linkerd Namerd is sending discovery information by running tracing logging in the D 0521 11:35:55.692 PDT THREAD95: [L:/127.0.0.1:4321 R:/127.0.0.1:64000] write: DATA
D 0521 11:35:55.693 PDT THREAD115: [L:/127.0.0.1:4321 R:/127.0.0.1:64000] wrote: DATA: Return(())
D 0521 11:35:55.693 PDT THREAD115: [L:/127.0.0.1:4321 R:/127.0.0.1:64000] wrote: DATA: Return(()) In the example above, you can infer the downstream Linkerd client, which in this case is the ip at |
I don't think the H2 frames will get us there. Is there any identifier available that's more clear than ip? That will be difficult to trace back (in some datacenters there's a NAT). Let me describe the problem more. The two biggest issues are:
In both cases, we've been identifying the stale linkerds from metrics and restarting them. With the linkerd per host model that's pretty impactful. What we need on the namerd side is some identifier for the linkerd receiving the update, the route being updated, and an indication of whether this is a dtab change, an endpoint change, or something else. |
Thanks for your help improving the project!
Getting Help
Github issues are for bug reports and feature requests. For questions about
Linkerd, how to use it, or debugging assistance, start by
asking a question in the forums or join us on
Slack.
Full details at CONTRIBUTING.md.
Filing a Linkerd issue
Issue Type:
What happened:
We are seeing numerous cases where 1) namerd state diverge from k8s apiserver - dtabs and endpoint 2) linkerd state diverge from namerd e.g. linkerd thinks only 4 pods are available, but namerd sees 10. We suspect events are getting dropped in namerd and linkerd. To figure out where things are going wrong, linkerd and namerd should trace all inbound and outbound watch (or relevant) events.
What you expected to happen:
Namerd and linkerd should log all inbound and outbound events. E.g.
If we can also log on namerd that linkerd with "hostname" is connected to namerd, that would be really helpful too. IPs are NATed. So we lose the observability what linkerd is connected to what namerd.
How to reproduce it (as minimally and precisely as possible):
N/A
Anything else we need to know?:
Let me know if you know more information
Environment:
The text was updated successfully, but these errors were encountered: