Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Misleading and confusing linkerd errors #2150

Closed
negz opened this issue Oct 12, 2018 · 5 comments
Closed

Misleading and confusing linkerd errors #2150

negz opened this issue Oct 12, 2018 · 5 comments
Assignees

Comments

@negz
Copy link
Contributor

negz commented Oct 12, 2018

Issue Type: Feature request

What happened: While triaging #2148 I saw several exceptions such as the one below. I now understand this is a normal symptom of linkerd stopping a watch on the Kubernetes API, but at first glance I interpreted it as "linkerd is having issues connecting to the Kubernetes API".

I run linkerd at WARNING log level by default (due to a few components that are quite noisy at INFO level). When running at WARNING level linkerd doesn't provide a lot of context; you just see that there was an exception and a 'failure' relating to localhost/127.0.0.1:8001. When I bump up the logging to debug the Failed mid-stream. Terminating stream, closing connection line provides a better hint at what's going on.

W 1011 06:05:08.402 UTC THREAD38 TraceId:4b87accf7f787c46: Exception propagated to the default monitor (upstream address: /127.0.0.1:56278, downstream address: localhost/127.0.0.1:8001, $
abel: client).
Failure(k8s observation released, flags=0x02) with RemoteInfo -> Upstream Address: Not Available, Upstream id: Not Available, Downstream Address: localhost/127.0.0.1:8001, Downstream lab$
l: client, Trace Id: 4b87accf7f787c46.4b87accf7f787c46<:4b87accf7f787c46
D 1011 06:05:08.402 UTC THREAD43 TraceId:4b87accf7f787c46: Failed mid-stream. Terminating stream, closing connection                                                                      
com.twitter.finagle.ChannelClosedException: ChannelException at remote address: localhost/127.0.0.1:8001. Remote Info: Not Available 

What you expected to happen:
I'm not entirely sure what the best fix is here. This seems like something that is worth logging, but it may be possible to better represent that it's not necessarily problematic. As someone who does not come from the Finagle world I almost wish log lines like this contained links to the finagle docs to explain what the exceptions mean. Perhaps a linkerd documentation page containing the most common Finagle and linkerd exceptions and their meanings would help?

How to reproduce it (as minimally and precisely as possible):

  1. Embark on a technical career that does not involve working at Twitter. ;)
  2. Run linkerd.
@adleong
Copy link
Member

adleong commented Oct 13, 2018

I've been messing around with various combinations of starting and stopping watches and I haven't been able to reproduce the Exception propagated to the default monitor (upstream address: /127.0.0.1:56278, downstream address: localhost/127.0.0.1:8001, $ abel: client). error. Do you have repro steps for how to trigger this error message?

@negz
Copy link
Contributor Author

negz commented Oct 13, 2018

Do you have repro steps for how to trigger this error message?

Sadly no. I've seen it happen at least twice; once in #2148 and another time during a similar issue we saw in production, but I'm not able to reproduce it.

@adleong
Copy link
Member

adleong commented Oct 15, 2018

I'll keep an eye out for this message but if you discover a way to reproduce it, please update this ticket.

@negz
Copy link
Contributor Author

negz commented Oct 23, 2018

Not terribly useful, but we're seeing this again in production.

W 1023 01:12:07.071 UTC THREAD34 TraceId:da817311cab52ccf: Exception propagated to the default monitor (upstream address: /127.0.0.1:53730, downstream address: localhost/127.0.0.1:8001, label: client).
Failure(k8s observation released, flags=0x02) with RemoteInfo -> Upstream Address: Not Available, Upstream id: Not Available, Downstream Address: localhost/127.0.0.1:8001, Downstream label: client, Trace Id: da817311cab52ccf.da817311cab52ccf<:da817311cab52ccf

I see correlating log lines in the kubectl sidecar:

I1023 01:09:56.468980       1 logs.go:49] http: proxy error: context canceled
I1023 01:11:14.271047       1 logs.go:49] http: proxy error: context canceled
I1023 01:13:19.468649       1 logs.go:49] http: proxy error: context canceled

Honestly I'm not sure how to repro this or what more information to give you.

@adleong
Copy link
Member

adleong commented Feb 4, 2019

I'm going to close this until we have a reliable way to reproduce it. Please re-open if you are able to trigger this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants