Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve error message for unresponsive DNS Resolver #8262

Closed
elaine-jackson opened this issue Oct 3, 2019 · 23 comments
Closed

Improve error message for unresponsive DNS Resolver #8262

elaine-jackson opened this issue Oct 3, 2019 · 23 comments

Comments

@elaine-jackson
Copy link

elaine-jackson commented Oct 3, 2019

When attempting to send a request with http/client today inside of Docker, the OpenVPN Client sets a route of 0.0.0.0/0, this breaks connections to Docker DNS 127.0.0.11, when making a request to a host, I receive the error message: (ERROR) Socket::Addrinfo::Error getaddrinfo: Temporary failure in name resolution /usr/share/crystal/src/socket/tcp_socket.cr:75:15 in 'initialize', after a while I realized that this error only happened in containers where OpenVPN was running.

For those curious, an OpenVPN Client inside the Docker contain allows our developers to copy production data into local host for testing how of code works against our production databases. For security reasons our MySQL and PostgreSQL instances are not exposed to the public internet and require users to be connected to our internal VPN. This was tricky to debug until we realized production wasn't affected for some reason and discovered that OpenVPN was the culprit.

It would be helpful to my efforts if the error message included something like "The DNS Resolver x.x.x.x is not responding, check your system configuration.", it would of saved many hours. I consider myself a noob when it comes to networking and can imagine this might confuse other users too. Thoughts?

Error trace

08:48:40 Error: 500 | (ERROR) Socket::Addrinfo::Error getaddrinfo: Temporary failure in name resolution /usr/share/crystal/src/socket/tcp_socket.cr:75:15 in 'initialize'
/usr/share/crystal/src/socket/tcp_socket.cr:27:3 in 'new'
/usr/share/crystal/src/http/client.cr:755:5 in 'socket'
/usr/share/crystal/src/http/client.cr:643:19 in 'send_request'
/usr/share/crystal/src/http/client.cr:580:5 in 'exec_internal_single'
/usr/share/crystal/src/http/client.cr:567:5 in 'exec_internal'
/usr/share/crystal/src/http/client.cr:563:5 in 'exec'
/usr/share/crystal/src/http/client.cr:685:5 in 'exec'
/usr/share/crystal/src/http/client.cr:389:3 in 'post:body'
lib/CrystalProxmox/src/access/access.cr:13:5 in 'create_ticket'
lib/CrystalProxmox/src/CrystalProxmox.cr:21:22 in 'initialize'
lib/CrystalProxmox/src/CrystalProxmox.cr:13:5 in 'new'
src/models/virtual_machine.cr:159:12 in 'proxmox_connect'
src/models/virtual_machine.cr:163:23 in 'start_vm'
src/controllers/virtual_machine_controller.cr:19:41 in 'start'
config/routes.cr:84:9 in '->'
lib/amber/src/amber/router/route.cr:255:3 in 'call'
lib/amber/src/amber/router/context.cr:78:5 in 'process_request'
lib/amber/src/amber/pipes/controller.cr:8:9 in 'call'
/usr/share/crystal/src/http/server/handler.cr:26:7 in 'call_next'
src/pipes/user_authenticate.cr:15:11 in 'call'
/usr/share/crystal/src/http/server/handler.cr:26:7 in 'call_next'
lib/amber/src/amber/pipes/csrf.cr:18:11 in 'call'
/usr/share/crystal/src/http/server/handler.cr:26:7 in 'call_next'
lib/amber/src/amber/pipes/flash.cr:7:9 in 'call'
/usr/share/crystal/src/http/server/handler.cr:26:7 in 'call_next'
lib/amber/src/amber/pipes/session.cr:8:9 in 'call'
/usr/share/crystal/src/http/server/handler.cr:26:7 in 'call_next'
lib/amber/src/amber/pipes/logger.cr:15:9 in 'call'
/usr/share/crystal/src/http/server/handler.cr:26:7 in 'call_next'
lib/amber/src/amber/pipes/error.cr:9:9 in 'call'
/usr/share/crystal/src/http/server/handler.cr:26:7 in 'call_next'
lib/citrine-i18n/src/citrine-i18n/handler.cr:16:9 in 'call'
/usr/share/crystal/src/http/server/handler.cr:26:7 in 'call_next'
lib/amber/src/amber/pipes/client_ip.cr:26:9 in 'call'
/usr/share/crystal/src/http/server/handler.cr:26:7 in 'call_next'
lib/amber/src/amber/pipes/powered_by_amber.cr:6:9 in 'call'
lib/amber/src/amber/pipes/pipeline.cr:20:11 in 'call'
/usr/share/crystal/src/http/server/request_processor.cr:48:11 in 'process'
/usr/share/crystal/src/http/server/request_processor.cr:22:3 in 'process'
/usr/share/crystal/src/http/server.cr:497:5 in 'handle_client'
/usr/share/crystal/src/http/server.cr:463:13 in '->'
/usr/share/crystal/src/fiber.cr:255:3 in 'run'
/usr/share/crystal/src/fiber.cr:48:34 in '->'
???
@bcardiff
Copy link
Member

bcardiff commented Oct 3, 2019

I guess that if the message could be expanded with the host information available so far it would be an improvement. But since there could probably be a number of reasons for an unexpected name resolution I would not put probable causes in the error message. Only data.

@elaine-jackson
Copy link
Author

@bcardiff Data such as connection to 127.0.0.11 timed out would be helpful to include.

@bcardiff
Copy link
Member

bcardiff commented Oct 3, 2019

I agree with that. But not with something like "check your system configuration" since it might not be a miss-configuration.

@elaine-jackson
Copy link
Author

Fair and that might error be a bit too vague anyways. The more data on what went wrong the better.

@rvictory
Copy link

So, I looked into the code and that particular exception message is just passed up from the POSIX getaddrinfo method (a C bound method) and not Crystal code. Unfortunately, it appears that's about as good as you can get since the actual return from getaddroinfo for your example would likely have been

EAI_AGAIN
The name server returned a temporary failure indication. Try
again later.

per the docs (https://man7.org/linux/man-pages/man3/getaddrinfo.3.html). This was just a long way of saying I don't think what you're asking for is going to be possible without using a different method for name resolution, which would just make things messier than shunting down to getaddrinfo.

@rdp
Copy link
Contributor

rdp commented Nov 16, 2019

OK with this diff https://gist.github.com/rdp/26f5cde11e8e886b71df5bd42a8730d0

running this:

require "socket"
client = TCPSocket.new("localhosttt", 1234)

results in this message:

Unhandled exception: No address found for localhosttt:1234 over TCP when attempting a DNS lookup (Socket::Addrinfo::Error)

Is that better? Accurate? :)

@elaine-jackson
Copy link
Author

elaine-jackson commented Nov 16, 2019 via email

@rdp
Copy link
Contributor

rdp commented Nov 16, 2019

It is attempting to lookup the IP Address when it fails :) It'll have a different message if it can't resolve it, not sure what (basically the first part of that string will change...) :)

@elaine-jackson
Copy link
Author

elaine-jackson commented Nov 16, 2019 via email

@rdp
Copy link
Contributor

rdp commented Nov 16, 2019

Unfortunately it's just making a call out to "C code land" so it doesn't know exactly which DNS it may end up using, it just uses whatever the kernel does underneath... :)

@elaine-jackson
Copy link
Author

Unfortunately it's just making a call out to "C code land" so it doesn't know exactly which DNS it may end up using, it just uses whatever the kernel does underneath... :)

Does the kernel not give detailed responses? How does the C program dig know which server responded from the kernel?

% dig crystal-lang.org

; <<>> DiG 9.10.6 <<>> crystal-lang.org
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 63949
;; flags: qr rd ra; QUERY: 1, ANSWER: 4, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;crystal-lang.org.		IN	A

;; ANSWER SECTION:
crystal-lang.org.	59	IN	A	13.32.238.247
crystal-lang.org.	59	IN	A	13.32.238.64
crystal-lang.org.	59	IN	A	13.32.238.160
crystal-lang.org.	59	IN	A	13.32.238.38

;; Query time: 31 msec
;; SERVER: 8.8.8.8#53(8.8.8.8)
;; WHEN: Sat Nov 16 13:11:00 EST 2019
;; MSG SIZE  rcvd: 109

@rdp
Copy link
Contributor

rdp commented Nov 16, 2019 via email

@elaine-jackson
Copy link
Author

dig is probably not using getaddrinfo (that crystal uses), it's probably hitting the DNS servers directly, itself, instead of having the kernel look it up for it...

On Sat, Nov 16, 2019 at 11:11 AM Nathaniel Suchy @.***> wrote: Unfortunately it's just making a call out to "C code land" so it doesn't know exactly which DNS it may end up using, it just uses whatever the kernel does underneath... :) Does the kernel not give detailed responses? How does the C program dig know which server responded from the kernel? % dig crystal-lang.org ; <<>> DiG 9.10.6 <<>> crystal-lang.org ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 63949 ;; flags: qr rd ra; QUERY: 1, ANSWER: 4, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 512 ;; QUESTION SECTION: ;crystal-lang.org. IN A ;; ANSWER SECTION:crystal-lang.org. 59 IN A 13.32.238.247crystal-lang.org. 59 IN A 13.32.238.64crystal-lang.org. 59 IN A 13.32.238.160crystal-lang.org. 59 IN A 13.32.238.38 ;; Query time: 31 msec ;; SERVER: 8.8.8.8#53(8.8.8.8) ;; WHEN: Sat Nov 16 13:11:00 EST 2019 ;; MSG SIZE rcvd: 109 — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#8262?email_source=notifications&email_token=AAADBUFP5HURZN7FX76L3WDQUAZU7A5CNFSM4I45DEBKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEEHXHNA#issuecomment-554660788>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAADBUBNYMFLRQRJ4SOB72TQUAZU7ANCNFSM4I45DEBA .

How does it know my system's DNS resolver is 8.8.8.8?

@rdp
Copy link
Contributor

rdp commented Nov 16, 2019

I don't know it probably looks it up using some command line program or config files etc? https://unix.stackexchange.com/questions/28941/what-dns-servers-am-i-using/28958 You could check its source and tell us! :) Or is there a better error message possible maybe?

@elaine-jackson
Copy link
Author

I don't know it probably looks it up using some command line program or config files etc? https://unix.stackexchange.com/questions/28941/what-dns-servers-am-i-using/28958 You could check its source and tell us! :) Or is there a better error message possible maybe?

What about cases where you have more than one DNS Resolver configured? What if 8.8.8.8 doesn't respond to 8.8.4.4 is required? Config files can't tell you that.

@rdp
Copy link
Contributor

rdp commented Nov 17, 2019

I presume that 'command line programs' can (nmcli) but not sure how dig discovers its list...per se. Anyway I don't think there's an easy C "method call" to lookup the DNS servers, and it's so far distant from the actual c call getaddrinfo (unfortunately) that I still think it's going to be hard to actually add DNS info into the error message. At least for me... :) Let me know if the previously proposed is good enough and I could do a PR for it :)

@elaine-jackson
Copy link
Author

elaine-jackson commented Nov 17, 2019 via email

rdp added a commit to rdp/crystal that referenced this issue Nov 20, 2019
@rdp
Copy link
Contributor

rdp commented Nov 20, 2019

If getaddrinfo eventually has drop-in replacements at some point that themselves lookup the DNS servers then yeah, we'd have access to more info to return. refs: #8376 #4236 #2660 https://stackoverflow.com/a/2157622/32453

My latest incantation is this message:

getaddrinfo: No address found for badhostname:80 over IP when attempting host address resolution. Hint: check hostname, check resolution system (including DNS). (Socket::Addrinfo::Error)

("including" since it can also includes .local domains, etc/hosts, etc...)

Any objections? If not then I'll do a PR. Thanks!

@elaine-jackson
Copy link
Author

elaine-jackson commented Nov 21, 2019 via email

@RX14
Copy link
Contributor

RX14 commented Nov 27, 2019

How does the C program dig know which server responded from the kernel?

@rdp is correct - dig contacts the DNS servers directly. getaddrinfo uses the hosts file, and several other configurable plugins (man nsswitch.conf) to resolve DNS. We don't want to switch away from getaddrinfo, since it would break configurations which use these plugins (docker). Please keep in mind that no DNS servers may be contacted when looking up an address (/etc/hosts).

How does it know my system's DNS resolver is 8.8.8.8?

Probably by reading /etc/resolv.conf. dig doesn't need to have robust detection of which DNS server to use, since it's only a debugging program and you're expencted to provide it with dig foo @8.8.4.4 if required.

A trick you might want to know when debugging DNS is to test if getaddrinfo and dig are agreeing. You can use getent hosts google.com to quickly look up a hostname with getaddrinfo.

Yeah, this is probably not what you expect from DNS resolution on UNIX, but it is what it is unfortunately.

rdp added a commit to rdp/crystal that referenced this issue Dec 4, 2019
rdp added a commit to rdp/crystal that referenced this issue Dec 4, 2019
@rdp
Copy link
Contributor

rdp commented Feb 15, 2020

OK this is live in 0.33.0 (I tried to add as much as they'd let me get away with :)

@thelinuxlich
Copy link

Why this is throwing a name resolution failure?

https://play.crystal-lang.org/#/r/9ae2

@Sija
Copy link
Contributor

Sija commented Jun 18, 2020

@thelinuxlich Probably because play.crystal-lang.org runs within a sandbox without access to the external network? Did you try to run this code on your local machine?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants