Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nslookup fails in Alpine 3.11.3 #539

Open
jgoeres opened this issue Jan 20, 2020 · 39 comments
Open

nslookup fails in Alpine 3.11.3 #539

jgoeres opened this issue Jan 20, 2020 · 39 comments

Comments

@jgoeres
Copy link

jgoeres commented Jan 20, 2020

We just switched to Alpine 3.11.3 and now nslookup is failing for us unless we explicitly specify the DNS server IP (which is of course not an option), e.g.

foo@/#nslookup abs
Server:         127.0.0.11
Address:        127.0.0.11:53
** server can't find abs.<OUR_INTRANET_DOMAIN>.: NXDOMAIN
** server can't find abs.<OUR_INTRANET_DOMAIN>.: NXDOMAIN
[...]

versus

foo@/#nslookup abs 127.0.0.11
Server:         127.0.0.11
Address:        127.0.0.11:53

Non-authoritative answer:

Non-authoritative answer:
Name:   abs
Address: 172.27.0.12

Ping etc. work flawlessly.
Alas, we are using nslookup in some of our startscripts to defer starting of the actual application inside the container until another container shows up in DNS (cause the 3rd party tool we are using considers a failed DNS lookup a non-recoverable error...).

Could this be related to enabling the nslookup feature "FEATURE_NSLOOKUP_BIG" as mentioned here:
#476

@ncopa
Copy link
Collaborator

ncopa commented Jan 20, 2020

This is most likely related the FEATURE_NSLOOKUP_BIG change yes.

Does it work if you use a trailing . (dot)? Eg nslookup abs.

It seems that nslookup will append the search domain if there are no dots in the hostname.

@jgoeres
Copy link
Author

jgoeres commented Jan 21, 2020

Alas, adding a dot doesn't help (the following is running on Kubernetes, not plain Docker, therefore the DNS IP is different,but the result is the same).

foo@/#nslookup abs.
Server:         10.43.0.10
Address:        10.43.0.10:53

** server can't find abs.: NXDOMAIN
** server can't find abs.: NXDOMAIN

@jgoeres
Copy link
Author

jgoeres commented Jan 29, 2020

I am wondering if this is now an acknowledged problem that will eventually be fixed or not.
Just to summarize: on a plain docker installation nslookup works when appending a dot to the hostname of the container:

/ # nslookup zookeeper.
Server:         127.0.0.11
Address:        127.0.0.11:53
Non-authoritative answer:
Non-authoritative answer:
Name:   zookeeper
Address: 192.168.80.20

on Kubernetes it doesn't:

/ # nslookup myns-zookeeper.
Server:         10.96.0.10
Address:        10.96.0.10:53
** server can't find myns-zookeeper.: NXDOMAIN
** server can't find myns-zookeeper.: NXDOMAIN

@ncopa
Copy link
Collaborator

ncopa commented Jan 30, 2020

I am interested in fixing this, or at least report it upstream to busybox bugtracker, but I am not sure what the expected response is. Apparently the kubernetes dns server gives different response? Is it same dns server? are ther any other configs in /etc/resolv.conf?

It would be nice if we had a simple way to reporduce it, using public available internet servers.

What I know for sure is that "zookeeper" is not a valid hostname on internet. Nor is it a toplevel domain so nslookup zookeeper. is sort of expected to fail.

@ncopa
Copy link
Collaborator

ncopa commented Jan 30, 2020

It would also be helpful if you could report it upstream to https://bugs.busybox.net/

@ncopa
Copy link
Collaborator

ncopa commented Jan 30, 2020

a tcpdump of the network activity would also be helpful.

@jgoeres
Copy link
Author

jgoeres commented Jan 30, 2020

Hi
"zookeeper" is the internal DNS name of a Kubernetes service in our product, located in the same namespace (which is why it doesn't need to be a full-qualified name).
It could be the name of any K8s service in the same namespace in which the pod from which you do nslookup is running.

This is the content of resolv.conf in a Kubernetes environment:

/ # cat /etc/resolv.conf
nameserver 10.96.0.10
search mynamespace.svc.cluster.local svc.cluster.local cluster.local
options ndots:5

Compared this to a plain Docker environment (boot2docker/Docker Toolbox)

/ # cat /etc/resolv.conf
search <my company's internal domain name here>
nameserver 10.0.2.3

On Alpine 3.11.2, when running nslookup without dot, it works (in particular, exit code is 0):

/ # nslookup zookeeper
nslookup: can't resolve '(null)': Name does not resolve

Name:      zookeeper
Address 1: 10.99.146.94 zookeeper.mynamespace.svc.cluster.local
/ # echo $?
0

Compared to Alpine 3.11.3:

/ # nslookup zookeeper
Server:         10.96.0.10
Address:        10.96.0.10:53

** server can't find zookeeper.cluster.local: NXDOMAIN

Name:   zookeeper.mynamespace.svc.cluster.local
Address: 10.99.146.94

** server can't find zookeeper.cluster.local: NXDOMAIN
** server can't find zookeeper.svc.cluster.local: NXDOMAIN
** server can't find zookeeper.svc.cluster.local: NXDOMAIN

/ # echo $?
1

Observe that while in the Alpine 3.11.3 case the command apparently finds the proper IP address at some point and writes it into its output, its exit code is 1 instead of 0, and that breaks our start script.

Now with an attached dot, it fails in both 3.11.2 and 3.11.3, with slightly different output:

Alpine 3.11.2

/ # nslookup zookeeper.
nslookup: can't resolve '(null)': Name does not resolve

nslookup: can't resolve 'zookeeper.': Try again
/ # echo $?
1

Alpine 3.11.3

/ # nslookup zookeeper.
Server:         10.96.0.10
Address:        10.96.0.10:53

** server can't find zookeeper.: NXDOMAIN
** server can't find zookeeper.: NXDOMAIN

/ # echo $?
1

However, this is to be expected, as - AFAIK - adding a dot makes this a full-qualified name, so no lookup relative to the local search domains is performed, and so it has to fail.

This is the result of tcpdump when running nslookup on 3.11.3 (without trailing dot)

13:41:17.319069 IP foo-7f8cfbddd4-g8jv2.34512 > kube-dns.kube-system.svc.cluster.local.53: 9450+ A? zookeeper.mynamespace.svc.cluster.local. (57)
13:41:17.319213 IP foo-7f8cfbddd4-g8jv2.34512 > kube-dns.kube-system.svc.cluster.local.53: 10823+ A? zookeeper.svc.cluster.local. (51)
13:41:17.319228 IP foo-7f8cfbddd4-g8jv2.34512 > kube-dns.kube-system.svc.cluster.local.53: 12073+ A? zookeeper.cluster.local. (47)
13:41:17.319289 IP foo-7f8cfbddd4-g8jv2.34512 > kube-dns.kube-system.svc.cluster.local.53: 13317+ AAAA? zookeeper.mynamespace.svc.cluster.local. (57)
13:41:17.319321 IP foo-7f8cfbddd4-g8jv2.34512 > kube-dns.kube-system.svc.cluster.local.53: 19907+ AAAA? zookeeper.svc.cluster.local. (51)
13:41:17.319329 IP foo-7f8cfbddd4-g8jv2.34512 > kube-dns.kube-system.svc.cluster.local.53: 21050+ AAAA? zookeeper.cluster.local. (47)
13:41:17.319499 IP kube-dns.kube-system.svc.cluster.local.53 > foo-7f8cfbddd4-g8jv2.34512: 19907 NXDomain*- 0/1/0 (144)
13:41:17.319885 IP kube-dns.kube-system.svc.cluster.local.53 > foo-7f8cfbddd4-g8jv2.34512: 9450*- 1/0/0 A 10.99.146.94 (112)
13:41:17.320040 IP kube-dns.kube-system.svc.cluster.local.53 > foo-7f8cfbddd4-g8jv2.34512: 10823 NXDomain*- 0/1/0 (144)
13:41:17.320142 IP kube-dns.kube-system.svc.cluster.local.53 > foo-7f8cfbddd4-g8jv2.34512: 12073 NXDomain*- 0/1/0 (140)
13:41:17.320270 IP kube-dns.kube-system.svc.cluster.local.53 > foo-7f8cfbddd4-g8jv2.34512: 13317*- 0/1/0 (150)
13:41:17.320412 IP kube-dns.kube-system.svc.cluster.local.53 > foo-7f8cfbddd4-g8jv2.34512: 21050 NXDomain*- 0/1/0 (140)
13:41:17.866646 IP kube-dns.kube-system.svc.cluster.local.53 > foo-7f8cfbddd4-g8jv2.47457: 30617 ServFail- 0/0/0 (41)

And this is the for 3.11.2 (again, without the trailing dot):

13:44:05.480641 IP foo2-6df94fb567-g84fc.41771 > kube-dns.kube-system.svc.cluster.local.53: 48903+ A? zookeeper.mynamespace.svc.cluster.local. (57)
13:44:05.480680 IP foo2-6df94fb567-g84fc.41771 > kube-dns.kube-system.svc.cluster.local.53: 49401+ AAAA? zookeeper.mynamespace.svc.cluster.local. (57)
13:44:05.481085 IP kube-dns.kube-system.svc.cluster.local.53 > foo2-6df94fb567-g84fc.41771: 49401*- 0/1/0 (150)
13:44:05.481216 IP kube-dns.kube-system.svc.cluster.local.53 > foo2-6df94fb567-g84fc.41771: 48903*- 1/0/0 A 10.99.146.94 (112)
13:44:05.481516 IP foo2-6df94fb567-g84fc.37388 > kube-dns.kube-system.svc.cluster.local.53: 42577+ PTR? 94.146.99.10.in-addr.arpa. (43)
13:44:05.481864 IP kube-dns.kube-system.svc.cluster.local.53 > foo2-6df94fb567-g84fc.37388: 42577*- 1/0/0 PTR zookeeper.mynamespace.svc.cluster.local. (121)

@jgoeres
Copy link
Author

jgoeres commented Jan 31, 2020

The problem seems to be the additional search domains - if one of them fails, the command is considered failed.

If I remove the extra domains from resolv.conf and only leave

nameserver 10.96.0.10
search mynamespace.svc.cluster.local
options ndots:5

it works:

/ # nslookup zookeeper
Server:         10.96.0.10
Address:        10.96.0.10:53

Name:   zookeeper.mynamespace.svc.cluster.local
Address: 10.99.146.94

/ # echo $?
0

@ncopa
Copy link
Collaborator

ncopa commented Feb 1, 2020

The problem seems to be the additional search domains - if one of them fails, the command is considered failed.

That is what I suspected. Thank you for conforming that. We should have enough info to be able to fix this thing.

Next step will be to report it to busybox developers. https://bugs.busybox.net/

I am sorry that I have not had time to prioritize this, but I believe we will be able to have a fix for this for 3.11.4.

Thanks!

@jgoeres
Copy link
Author

jgoeres commented Feb 3, 2020

Just to clarify - should we report this to busybox devs (mainly by pointing them to this issue), or will you?

@ncopa
Copy link
Collaborator

ncopa commented Feb 3, 2020

Just to clarify - should we report this to busybox devs (mainly by pointing them to this issue), or will you?

I was hoping you could help me with that, while I work on a fix ;)

Thanks!

@ncopa
Copy link
Collaborator

ncopa commented Feb 4, 2020

I have reported it upstream: https://bugs.busybox.net/show_bug.cgi?id=12541

algitbot pushed a commit to alpinelinux/aports that referenced this issue Feb 4, 2020
@ncopa
Copy link
Collaborator

ncopa commented Feb 4, 2020

I have pushed a fix to alpine edge. Can you please test if it solves your issue? Use alpine:edge and do apk upgrade -U -a to get busybox-1.31.1-r10.

@jgoeres
Copy link
Author

jgoeres commented Feb 5, 2020

I tested it, on Kubernetes it works as expected:

$ kubectl run foo -i -t --image=alpine:edge --rm=true -- sh
kubectl run --generator=deployment/apps.v1 is DEPRECATED and will be removed in a future version. Use kubectl run --generator=run-pod/v1 or kubectl create instead.
If you don't see a command prompt, try pressing enter.
/ # apk upgrade -U -a
fetch http:https://dl-cdn.alpinelinux.org/alpine/edge/main/x86_64/APKINDEX.tar.gz
fetch http:https://dl-cdn.alpinelinux.org/alpine/edge/community/x86_64/APKINDEX.tar.gz
(1/2) Upgrading busybox (1.31.1-r9 -> 1.31.1-r10)
Executing busybox-1.31.1-r10.post-upgrade
(2/2) Upgrading ssl_client (1.31.1-r9 -> 1.31.1-r10)
Executing busybox-1.31.1-r10.trigger
OK: 6 MiB in 14 packages
/ # nslookup mynamespace-zookeeper
Server:         10.96.0.10
Address:        10.96.0.10:53

** server can't find mynamespace-zookeeper.cluster.local: NXDOMAIN

Name:   mynamespace-zookeeper.mynamespace.svc.cluster.local
Address: 10.101.67.109

** server can't find mynamespace-zookeeper.svc.cluster.local: NXDOMAIN

** server can't find mynamespace-zookeeper.cluster.local: NXDOMAIN

** server can't find mynamespace-zookeeper.svc.cluster.local: NXDOMAIN

/ # echo $?
0

The IP get's resolved against one of the domains found in /etc/resolv.conf and the exit code is 0.

Alas, it still doesn't work in plain Docker environments as it did before, unless I append a dot:

[myself@docker01 ~]$ docker run -it --rm --network mydockernetwork alpine:edge sh
/ # apk upgrade -U -a
fetch http:https://dl-cdn.alpinelinux.org/alpine/edge/main/x86_64/APKINDEX.tar.gz
fetch http:https://dl-cdn.alpinelinux.org/alpine/edge/community/x86_64/APKINDEX.tar.gz
(1/2) Upgrading busybox (1.31.1-r9 -> 1.31.1-r10)
Executing busybox-1.31.1-r10.post-upgrade
(2/2) Upgrading ssl_client (1.31.1-r9 -> 1.31.1-r10)
Executing busybox-1.31.1-r10.trigger
OK: 6 MiB in 14 packages
/ # ping zookeeper
PING zookeeper (192.168.192.21): 56 data bytes
64 bytes from 192.168.192.21: seq=0 ttl=64 time=0.418 ms
64 bytes from 192.168.192.21: seq=1 ttl=64 time=0.218 ms
64 bytes from 192.168.192.21: seq=2 ttl=64 time=0.232 ms
^C
--- zookeeper ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0.218/0.289/0.418 ms
/ # nslookup zookeeper
Server:         127.0.0.11
Address:        127.0.0.11:53

** server can't find zookeeper.<mycompany.internaldomain.com>.: NXDOMAIN

** server can't find zookeeper.ame.<mycompany3.internaldomain.com>.: NXDOMAIN

** server can't find zookeeper.<mycompany3.internaldomain.com>.: NXDOMAIN

** server can't find zookeeper.<mycompany2.internaldomain.com>.: NXDOMAIN

** server can't find zookeeper.<mycompany3.internaldomain.com>.: NXDOMAIN

** server can't find zookeeper.<mycompany2.internaldomain.com>.: NXDOMAIN

** server can't find zookeeper.<mycompany3.internaldomain.com>.: NXDOMAIN

** server can't find zookeeper.<mycompany3.internaldomain.com>.: NXDOMAIN

/ # echo $?
1
/ # nslookup zookeeper.
Server:         127.0.0.11
Address:        127.0.0.11:53

Non-authoritative answer:
Non-authoritative answer:
Name:   zookeeper
Address: 192.168.192.21

/ # cat /etc/resolv.conf
search <mycompany2.internaldomain.com>. ame.<mycompany3.internaldomain.com>. <mycompany3.internaldomain.com>. <mycompany3.internaldomain.com>.
nameserver 127.0.0.11
options ndots:0
/ #

As you can see, "Ping" can resolve trhe name just fine, nslookup without appended dot fails. Also see content of resolv.conf

@BretFisher
Copy link

I can confirm it still doesn't work in alpine:edge for docker as of 3/5/2020.

alpine@sha256:13d22f83f248957d0a553f14154d5f3fd413b6c0c595ebb094b0e12cbac71797

How I reproduced:

$ docker network create mynet
ac5d340dc87a0833ba86926cbeb50cc68bb98ed35d5dc9b01ab28a27e9c5b95b

$ docker run -d --network mynet --name website nginx
de6f0284a2a071d891f499bd4485535ec391fcd7dc9fef3bc1010a3cba3d384d

$ docker run --rm --network mynet alpine:edge nslookup website
Server:         127.0.0.11
Address:        127.0.0.11:53

** server can't find website.51ur3jppi0eupdptvsj42kdvgc.bx.internal.cloudapp.net: NXDOMAIN
** server can't find website.51ur3jppi0eupdptvsj42kdvgc.bx.internal.cloudapp.net: NXDOMAIN

Works in alpine:3.11.2

$ docker run --rm --network mynet alpine:3.11.2 nslookup website
nslookup: can't resolve '(null)': Name does not resolve

Name:      website
Address 1: 172.20.0.2 website.mynet

@ncopa
Copy link
Collaborator

ncopa commented Apr 1, 2020

Works with latest alpine:edge for me:

$ docker run --rm --network mynet alpine:edge nslookup website
Server:		127.0.0.11
Address:	127.0.0.11:53

Non-authoritative answer:
Non-authoritative answer:
Name:	website
Address: 172.19.0.2

@ncopa
Copy link
Collaborator

ncopa commented Apr 1, 2020

As you can see, "Ping" can resolve trhe name just fine, nslookup without appended dot fails. Also see content of resolv.conf

@jgoeres can you please test with latest edge and latest stable 3.11.5 and compare with nslookup from bind-tools package (eg apk add bind-tools)

@weibeld
Copy link

weibeld commented May 20, 2020

@ncopa I tested with edge, 3.11.5 and 3.11.2 on Kubernetes and compared with nslookup from the bind-tools package:

alpine:edge

/ # nslookup conncheck-service
Server:         10.96.0.10
Address:        10.96.0.10:53

** server can't find conncheck-service.svc.cluster.local: NXDOMAIN

Name:   conncheck-service.conncheck.svc.cluster.local
Address: 10.111.127.249

** server can't find conncheck-service.svc.cluster.local: NXDOMAIN

** server can't find conncheck-service.cluster.local: NXDOMAIN

** server can't find conncheck-service.cluster.local: NXDOMAIN

** server can't find conncheck-service.eu-central-1.compute.internal: NXDOMAIN

** server can't find conncheck-service.eu-central-1.compute.internal: NXDOMAIN

/ # echo $?
0

Long output, exit code is 0 if at least one of the queries succeeds (desired behaviour).

alpine:edge with nslookup from bind-tools

/ # nslookup conncheck-service
Server:         10.96.0.10
Address:        10.96.0.10#53

Name:   conncheck-service.conncheck.svc.cluster.local
Address: 10.111.127.249

/ # echo $?
0

alpine:3.11.5

/ # nslookup conncheck-service
Server:         10.96.0.10
Address:        10.96.0.10:53

** server can't find conncheck-service.cluster.local: NXDOMAIN

Name:   conncheck-service.conncheck.svc.cluster.local
Address: 10.111.127.249


** server can't find conncheck-service.svc.cluster.local: NXDOMAIN

** server can't find conncheck-service.svc.cluster.local: NXDOMAIN

** server can't find conncheck-service.cluster.local: NXDOMAIN

** server can't find conncheck-service.eu-central-1.compute.internal: NXDOMAIN

** server can't find conncheck-service.eu-central-1.compute.internal: NXDOMAIN

/ # echo $?
1

Long output, exit code is 1 if any of the queries fails (undesired behaviour).

alpine:3.11.5 with nslookup from bind-utils

/ # nslookup conncheck-service
Server:         10.96.0.10
Address:        10.96.0.10#53

Name:   conncheck-service.conncheck.svc.cluster.local
Address: 10.111.127.249

/ # echo $?
0

alpine:3.11.2

/ # nslookup conncheck-service
nslookup: can't resolve '(null)': Name does not resolve

Name:      conncheck-service
Address 1: 10.111.127.249 conncheck-service.conncheck.svc.cluster.local
/ # echo $?
0

Short output, exit code 0 if at least one of the queries succeeds (desired behaviour).

alpine:3.11.2 with nslookup from bind-tools

/ # nslookup conncheck-service
Server:         10.96.0.10
Address:        10.96.0.10#53

Name:   conncheck-service.conncheck.svc.cluster.local
Address: 10.111.127.249

/ # echo $?
0

weibeld added a commit to weibeld/k8s-conncheck that referenced this issue May 20, 2020
This is because in newer versions, nslookup has a different behaviour:
if the DNS lookup uses the "search" suffixes in /etc/resolv.conf, and if
any of them does not succeed, the command as a whole returns a non-zero
exit code. The exit code is only 0 if all the queries (for all the
"search" suffixes) succeed, which is usually not the case.

nslookup in Alpine 3.11.2 returns 0 if any of the queries succeeds, that
is, if the name can actually be resolved, and a non-zero exit code only
if no query at all sucdeeds. This is the desired behaviour.

See gliderlabs/docker-alpine#539
brocaar added a commit to brocaar/chirpstack-network-server that referenced this issue Jul 16, 2020
brocaar added a commit to chirpstack/chirpstack-gateway-bridge that referenced this issue Jul 16, 2020
brocaar added a commit to brocaar/chirpstack-application-server that referenced this issue Jul 16, 2020
@shaun-earsom
Copy link

shaun-earsom commented Oct 19, 2020

We're now on Alpine 3.12.0 if you grab alpine:latest. It looks like nslookup is working fine. So this "ticket" should be closed.

@SnorreSelmer
Copy link

SnorreSelmer commented Oct 20, 2020

We're now on Alpine 3.12.0 if you grab alpine:latest. It looks like nslookup is working fine. So this "ticket" should be closed.

user@server:~$ docker container run --rm --net dnsrr alpine nslookup search
Server:         127.0.0.11
Address:        127.0.0.11:53

** server can't find search.u01: NXDOMAIN

** server can't find search.u01: NXDOMAIN

user@server:~$ docker container run --rm --net dnsrr alpine nslookup search.
Server:         127.0.0.11
Address:        127.0.0.11:53

Non-authoritative answer:
Non-authoritative answer:
Name:   search
Address: 172.18.0.3
Name:   search
Address: 172.18.0.2

This is on alpine:latest

@exanup
Copy link

exanup commented Dec 13, 2020

Working for me totally fine with alpine:latest.

$ docker run --rm --name alpine -it --network net alpine nslookup web
Server:         127.0.0.11
Address:        127.0.0.11:53

Non-authoritative answer:
Non-authoritative answer:
Name:   web
Address: 172.20.0.2

@freimer
Copy link

freimer commented Jan 1, 2021

Fails for me. Interesting that alpine 3.11.2 and 3.11.3 both say busybox is the same version (1.31.1). And, they are the same file size. However, the 3.11.2 one has a date of Dec 18, 2019, while the 3.11.3 one has a date of Jan 15, 2020. And, the sha256 hash is different. The only dynamic library is libc.musl-x86_64.so.1, and they have the same date and hash. Copying the busybox from 3.11.2 to 3.11.3 makes it work. The APK for busybox is 1.31.1-r8 on 3.11.2 and 1.31.1-r9 on 3.11.3.

What's the diff between r8 and r9?:

diff --git a/main/busybox/busyboxconfig b/main/busybox/busyboxconfig
index 63dd9c6e7f..53e00e266f 100644
--- a/main/busybox/busyboxconfig
+++ b/main/busybox/busyboxconfig
@@ -925,8 +925,8 @@ CONFIG_NETSTAT=y
 CONFIG_FEATURE_NETSTAT_WIDE=y
 CONFIG_FEATURE_NETSTAT_PRG=y
 CONFIG_NSLOOKUP=y
-# CONFIG_FEATURE_NSLOOKUP_BIG is not set
-# CONFIG_FEATURE_NSLOOKUP_LONG_OPTIONS is not set
+CONFIG_FEATURE_NSLOOKUP_BIG=y
+CONFIG_FEATURE_NSLOOKUP_LONG_OPTIONS=y
 CONFIG_NTPD=y
 CONFIG_FEATURE_NTPD_SERVER=y
 CONFIG_FEATURE_NTPD_CONF=y

I'm not an expert on busybox, but it looks like these are compile-time options, so we can't "fix" this by a configuration change. The best option may be for the maintainer of the Alpine Linux package to revert this change. Basically what the change does is turn on the internal busybox resolver, rather than using the standard library. If there is an issue with the busybox resolver code, then of course that should be fixed. However, it was a change in the Alpine Linux package that turned this feature on and "broke" it. Can we get it turned back off? An ltrace of the r8 and r9 versions clearly shows the r8 calling the standard library resolver, where r9 does not. It also shows the r9 version (with the internal busybox resolver) string comparing for domain, search, and nameserver keywords in the resolv.conf, but not options. Look at the busybox source file for nslookup.c. It has no ability to parse options, and hence ndots. Please revert this.

Oh, and it is also broken in alpine:latest, which uses busybox-1.31.1-r19, ltrace shows the same behavior.

@leonboot
Copy link

I am experiencing this issue since alpine:3.13 as well, albeit under different circumstances as mentioned here, but likely related nonetheless. I'm using Docker on my development machine through Dinghy. It seems the resolving mechanism doesn't play nice with its DNS server (which is used to resolve *.docker addresses to its own IP address and forward all other queries to the host's resolver). Running an nslookup actually returns the IP address of the requested hostname, but ends with an NXDOMAIN error. Here are my findings:

  • On Alpine images up to 3.9 an nslookup works fine; it returns the IP addresses and has a zero exit code but does display a warning on the first line: nslookup: can't resolve '(null)': Name does not resolve (regardless of a trailing dot)
  • On Alpine 3.10 it works as well, but is notably slower than previous versions
  • On 3.11 the requested IP addresses are returned, but the command ends with an NXDOMAIN error and has an exit code of 1
  • The 3.12 image has the same issues as 3.11, except the exit code, which is zero in this case.
  • The 3.13 image has the same issues as 3.11, including the exit code of 1.

The biggest issue is that commands suck as apk add [package] fail, because the APK repository hostname cannot be resolved. I've tried the following command with Alpine images from 3.9 to 3.13:

docker run --rm -ti alpine:3.11 sh -c 'apk --no-cache add curl && curl -I https://www.google.com/'

Up to 3.12, the output is as follows:

fetch http:https://dl-cdn.alpinelinux.org/alpine/v3.12/main/x86_64/APKINDEX.tar.gz
fetch http:https://dl-cdn.alpinelinux.org/alpine/v3.12/community/x86_64/APKINDEX.tar.gz
(1/4) Installing ca-certificates (20191127-r4)
(2/4) Installing nghttp2-libs (1.41.0-r0)
(3/4) Installing libcurl (7.69.1-r3)
(4/4) Installing curl (7.69.1-r3)
Executing busybox-1.31.1-r19.trigger
Executing ca-certificates-20191127-r4.trigger
OK: 7 MiB in 18 packages
HTTP/2 200 
[...]

The 3.13 image, however, produces the following output:

fetch https://dl-cdn.alpinelinux.org/alpine/v3.13/main/x86_64/APKINDEX.tar.gz
WARNING: Ignoring https://dl-cdn.alpinelinux.org/alpine/v3.13/main: DNS lookup error
fetch https://dl-cdn.alpinelinux.org/alpine/v3.13/community/x86_64/APKINDEX.tar.gz
WARNING: Ignoring https://dl-cdn.alpinelinux.org/alpine/v3.13/community: DNS lookup error
ERROR: unable to select packages:
  curl (no such package):
    required by: world[curl]

The errors are very specific to my installation, I've tried these commands on an Ubuntu based Docker installation without any issues. Perhaps the cause of the issues @gaby is having is related. Could it be the resolving mechanism has issues with certain resolvers? Could it be an IPv6 issue? There's an issue over on the Dinghy repository that might be related.

@sdwerwed
Copy link

sdwerwed commented Apr 2, 2021

Today 2 April 2021 I can also confirm this issue on nslookup (ping is able to resolve the dns name)

Alpine images 3.14.0_alpha20210212 and 3.13.4 nslookup kubernetes.default:
image
Debian nslookup kubernetes.default:
image

@thotypous
Copy link

thotypous commented May 7, 2021

On Kubernetes, nslookup works fine for me, but every other software fails to resolve DNS:

image

This is both on latest and on edge.

Removing the search line from /etc/resolv.conf "solves" the issue.

@project-administrator
Copy link

Also, removing (or setting the value to "1") of options ndots:5 helps

@aliask
Copy link

aliask commented Sep 10, 2021

I also ran into this issue on Alpine 3.14.2 - nslookup worked fine, but curl, apk, ping etc would fail.
Some of my nameservers were returning NXDOMAIN because my VPN forces non-VPN DNS queries to respond NXDOMAIN instead of blocking the request.
Removing the public nameservers from resolv.conf "fixed" the issue, but this is frustrating because I can't maintain a resolv.conf with VPN and regular DNS servers with graceful fallback.

@from-nibly
Copy link

from-nibly commented Jun 16, 2022

Seeing this issue on 3.16.0 while on a vpn

resolv.conf looks like this

# Generated by resolvconf
search <my-company1>.com <my-company2>.com lan
nameserver <internal-ip-on-vpn>
nameserver <internal-ip-on-vpn>
nameserver <my-router>
options edns0

When using nslookup everything works fine.

However when curling an internal company domain I get one successful call, then the rest are failures. Unless I wait a minute or so and try again. It's very strange.

@VinceCui
Copy link

Any update? Is this problem so difficult to solve? Alpine's image is small and light, our team likes it, but this problem confuses us.

@wilbit
Copy link

wilbit commented Apr 7, 2023

I also ran into this issue on Alpine 3.14.2 - nslookup worked fine, but curl, apk, ping etc would fail. Some of my nameservers were returning NXDOMAIN because my VPN forces non-VPN DNS queries to respond NXDOMAIN instead of blocking the request. Removing the public nameservers from resolv.conf "fixed" the issue, but this is frustrating because I can't maintain a resolv.conf with VPN and regular DNS servers with graceful fallback.

I've met the similar issue.
My docker gitlab-runner is connected to GitLab via VPN.
ping, nslookup and (more important to me) ssh cannot resolve a domain name.

$ cat /etc/resolv.conf
# DNS requests are forwarded to the host. DHCP DNS options are ignored.
nameserver 192.168.65.7

alpine:3.13.0 and later do not work to me.
The latest version which is working to me is alpine:3.12.12
The errors look like

$ ping -c 4 $SSH_HOST
ping: bad address '<hidden>.local'
$ nslookup $SSH_HOST
Server:		192.168.65.7
Address:	192.168.65.7:53
Non-authoritative answer:
Name:	<hidden>.local
Address: 172.16.1.1[39](https://<masked>/-/jobs/178#L39)
*** Can't find <hidden>: No answer
$ ssh -v $SSH_USER@$SSH_HOST "echo '!!!done!!!'"
OpenSSH_9.1p1, OpenSSL 3.0.8 7 Feb 2023
debug1: Reading configuration data /etc/ssh/ssh_config
ssh: Could not resolve hostname <hidden>.local: Try again

@TheDevilDan
Copy link

TheDevilDan commented Apr 22, 2024

When i add bind tools, all works

apk add bind-tools

nslookup works fine, it's a workaround ? its busybox bug only ?

@gaby
Copy link

gaby commented Apr 22, 2024

@TheDevilDan This was fixed in recent alpine releases. Forgot which version

@TheDevilDan
Copy link

TheDevilDan commented Apr 22, 2024

I use 8.1-fpm-alpine the latest, and the domains are not full when i request : exit 1

MyServer# kubectl exec -n mynamespace -it AlpineBindUtils -- nslookup kubernetes.default
Server:         10.96.0.10
Address:        10.96.0.10#53

Name:   kubernetes.default.svc.cluster.local
Address: 10.96.0.1

MyServer# kubectl exec -n mynamespace -it AlpineWithoutBindUtils -- nslookup kubernetes.default
Server:         10.96.0.10
Address:        10.96.0.10:53

** server can't find kubernetes.default: NXDOMAIN

** server can't find kubernetes.default: NXDOMAIN

command terminated with exit code 1

@gaby
Copy link

gaby commented Apr 22, 2024

That bug was fixed in 3.18, and your image is based on that according to Docker Hub

@TheDevilDan
Copy link

Very strange, I have the problem in all pods alpine with busybox inside, I test with Traefik V3.0 RC5, same problem, I have to install bind-tools and it works perfectly after that

MyServer# kubectl exec -n mynamespace -it traefik-7b595bc5d6-5kcmm -- /bin/sh

/ # nslookup kubernetes.default
Server:         10.96.0.10
Address:        10.96.0.10:53

** server can't find kubernetes.default: NXDOMAIN

** server can't find kubernetes.default: NXDOMAIN


/ # apk list --installed
WARNING: opening from cache https://dl-cdn.alpinelinux.org/alpine/v3.19/main: No such file or directory
WARNING: opening from cache https://dl-cdn.alpinelinux.org/alpine/v3.19/community: No such file or directory
alpine-baselayout-3.4.3-r2 x86_64 {alpine-baselayout} (GPL-2.0-only) [installed]
alpine-baselayout-data-3.4.3-r2 x86_64 {alpine-baselayout} (GPL-2.0-only) [installed]
alpine-keys-2.4-r1 x86_64 {alpine-keys} (MIT) [installed]
apk-tools-2.14.0-r5 x86_64 {apk-tools} (GPL-2.0-only) [installed]
busybox-1.36.1-r15 x86_64 {busybox} (GPL-2.0-only) [installed]
busybox-binsh-1.36.1-r15 x86_64 {busybox} (GPL-2.0-only) [installed]
ca-certificates-20230506-r0 x86_64 {ca-certificates} (MPL-2.0 AND MIT) [installed]
ca-certificates-bundle-20230506-r0 x86_64 {ca-certificates} (MPL-2.0 AND MIT) [installed]
libc-utils-0.7.2-r5 x86_64 {libc-dev} (BSD-2-Clause AND BSD-3-Clause) [installed]
libcrypto3-3.1.4-r5 x86_64 {openssl} (Apache-2.0) [installed]
libssl3-3.1.4-r5 x86_64 {openssl} (Apache-2.0) [installed]
musl-1.2.4_git20230717-r4 x86_64 {musl} (MIT) [installed]
musl-utils-1.2.4_git20230717-r4 x86_64 {musl} (MIT AND BSD-2-Clause AND GPL-2.0-or-later) [installed]
scanelf-1.3.7-r2 x86_64 {pax-utils} (GPL-2.0-only) [installed]
ssl_client-1.36.1-r15 x86_64 {busybox} (GPL-2.0-only) [installed]
tzdata-2024a-r0 x86_64 {tzdata} (Public-Domain) [installed]
zlib-1.3.1-r0 x86_64 {zlib} (Zlib) [installed]

/ # apk add bind-tools
fetch https://dl-cdn.alpinelinux.org/alpine/v3.19/main/x86_64/APKINDEX.tar.gz
fetch https://dl-cdn.alpinelinux.org/alpine/v3.19/community/x86_64/APKINDEX.tar.gz
(1/14) Installing fstrm (0.6.1-r4)
(2/14) Installing krb5-conf (1.0-r2)
(3/14) Installing libcom_err (1.47.0-r5)
(4/14) Installing keyutils-libs (1.6.3-r3)
(5/14) Installing libverto (0.3.2-r2)
(6/14) Installing krb5-libs (1.21.2-r0)
(7/14) Installing json-c (0.17-r0)
(8/14) Installing nghttp2-libs (1.58.0-r0)
(9/14) Installing protobuf-c (1.4.1-r7)
(10/14) Installing libuv (1.47.0-r0)
(11/14) Installing xz-libs (5.4.5-r0)
(12/14) Installing libxml2 (2.11.7-r0)
(13/14) Installing bind-libs (9.18.24-r1)
(14/14) Installing bind-tools (9.18.24-r1)
Executing busybox-1.36.1-r15.trigger
OK: 18 MiB in 31 packages

/ # nslookup kubernetes.default
Server:         10.96.0.10
Address:        10.96.0.10#53

Name:   kubernetes.default.svc.cluster.local
Address: 10.96.0.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests