Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix IPv6 getaddrinfo on win/mac, and selftest hang when IPv6 n/a on Blue Waters system #9892

Closed
eschnett opened this issue Jan 23, 2015 · 15 comments
Assignees
Labels
domain:io Involving the I/O subsystem: libuv, read, write, etc.

Comments

@eschnett
Copy link
Contributor

Blue Waters is a large HPC system at the NCSA. This is a Cray system running a standard Linux kernel. When I run Julia's self-tests on the front end, they abort with the following error:

$ /mnt/a/u/sciteam/eschnett/SIMFACTORY/julia-master/src/julia-180a6c5e57f0d03fdce044b9374c3edb8b00f4d6/usr/bin/julia --check-bounds=yes -f ./runtests.jl socket
     * socket              exception on 1: ERROR: LoadError: bind: address family not supported (EAFNOSUPPORT)
 in bind at ./socket.jl:443
 in runtests at /mnt/a/u/sciteam/eschnett/SIMFACTORY/julia-master/src/julia-180a6c5e57f0d03fdce044b9374c3edb8b00f4d6/test/testdefs.jl:67
 in anonymous at ./multi.jl:643
 in run_work_thunk at ./multi.jl:604
 in remotecall_fetch at ./multi.jl:677
 in remotecall_fetch at ./multi.jl:692
 in anonymous at ./task.jl:1621
while loading socket.jl, in expression starting on line 128
ERROR: LoadError: LoadError: bind: address family not supported (EAFNOSUPPORT)
 in bind at ./socket.jl:443
 in runtests at /mnt/a/u/sciteam/eschnett/SIMFACTORY/julia-master/src/julia-180a6c5e57f0d03fdce044b9374c3edb8b00f4d6/test/testdefs.jl:67
 in anonymous at ./multi.jl:643
 in run_work_thunk at ./multi.jl:604
 in remotecall_fetch at ./multi.jl:677
 in remotecall_fetch at ./multi.jl:692
 in anonymous at ./task.jl:1621
while loading socket.jl, in expression starting on line 128
while loading /mnt/a/u/sciteam/eschnett/SIMFACTORY/julia-master/src/julia-180a6c5e57f0d03fdce044b9374c3edb8b00f4d6/test/runtests.jl, in expression starting on line 42

This looks as if the system didn't support IPv6. Indeed if I run ifconfig -a, only IPv4 addresses are listed.

If this is the case, then this test should be disabled on systems that don't support IPv6.

@kshyatt
Copy link
Contributor

kshyatt commented Jan 23, 2015

👍 Having Julia work on Blue Waters is my heart's desire

@ihnorton
Copy link
Member

The IPv6 sections of getaddrinfo are currently commented out. Not sure why, but the first step is probably to uncomment those lines and test getaddrinfo("::1"). Then we may be able to enable getaddrinfo for both address types, and use getaddrinfo("::1") as the conditional for the tests.

@Keno

@eschnett
Copy link
Contributor Author

This fails from the REPL:

julia> bind(UDPSocket(), ip"::1", uint16(2001))
ERROR: bind: address family not supported (EAFNOSUPPORT)
 in bind at ./socket.jl:443

Couldn't this be used to detect IPv6 support?

@StefanKarpinski
Copy link
Sponsor Member

@Keno is crucial to so many things – I think we may need to spend some time working on cloning technology.

@timholy
Copy link
Sponsor Member

timholy commented Jan 23, 2015

At JuliaCon2014, I swiped his beer while he wasn't looking and extracted a little bit of DNA. You should see all little Kenos running around their playpens in my lab; so cute.

@eschnett
Copy link
Contributor Author

@ihnorton The IPv6 lines in socket.jl's _uv_hook_getaddrinfo are commented out because the would introduce a test failure. On OS X, they lead to a hang while testing socket.

@ViralBShah
Copy link
Member

Do we have a way to run tests without running them in parallel (I am sure there must be a way)? On systems with firewalls, it is nice to be able to do so.

@eschnett
Copy link
Contributor Author

make testall1 runs the tests on a single core. This sets an environment variable to achieve this; you can also set it manually.

@ihnorton ihnorton added the domain:io Involving the I/O subsystem: libuv, read, write, etc. label Jan 26, 2015
@ihnorton ihnorton self-assigned this Jan 26, 2015
@ViralBShah
Copy link
Member

Perhaps @amitmurthy could help here.

@ihnorton
Copy link
Member

I can reproduce the hang on windows when the ip6 getaddrinfo lines are uncommented. The hang is here. Maybe we are failing to clean up sufficiently on windows and mac after the getaddrinfo call, because accept does not seem directly related.

@ihnorton
Copy link
Member

Minimal example (win64 local cygwin build, current master. works fine when those lines are commented out in getaddrinfo):

julia> defaultport = rand(2000:4000)
3441
julia> port, server = listenany(defaultport)
(0x0d71,TCPServer(active))
julia> @async connect("localhost", 3091)
Breakpoint 1, jl_getaddrinfo (loop=0x1b4dac0 <uv_default_loop_>,
    host=0x81c6fe70 "localhost", service=0x0, cb=0x8278b5e0) at jl_uv.c:728
728     {
(gdb) c
Continuing.
Task (queued) @0x0000000081cd8a80
julia> accept(server)
# ... Bueller?

@ihnorton ihnorton changed the title Selftest aborts on Blue Waters system, probably IPv6 problem Fix IPv6 getaddrinfo on win/mac, and selftest hang when IPv6 n/a on Blue Waters system Feb 23, 2015
ihnorton added a commit to ihnorton/julia that referenced this issue Feb 25, 2015
@ihnorton
Copy link
Member

The issue with the test is that getaddrinfo("localhost") returns ::1, so we end up listening on ip4 and trying to connect on ip6. I can't say I entirely follow the control flow, but we get two callbacks on _uv_hook_getaddrinfo, one for each address available. Fixing the test is easy but fixing the API is harder.

The fact that getaddrinfo enumerates all addresses is problematic right now, because the parent functions can only ever return one IPAddr. I think we should change getaddrinfo to only return the first enumerated address, and add another function getaddresses (e.g.) returning a vector of address(es).

@amitmurthy
Copy link
Contributor

How about getaddrinfo(host; addr=IPv4)?

By default it only returns the the first ipv4 address. The first IPv6 address, if any, must be explicitly requested.

@vtjnash
Copy link
Sponsor Member

vtjnash commented Oct 6, 2017

fixed by #23596

@vtjnash vtjnash closed this as completed Oct 6, 2017
@vtjnash vtjnash reopened this Oct 6, 2017
vtjnash added a commit that referenced this issue Oct 6, 2017
ensures that listenany and connect are using the same ipaddr
@KristofferC
Copy link
Sponsor Member

Why reopen?

@vtjnash vtjnash closed this as completed in 4f6abd7 Oct 9, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:io Involving the I/O subsystem: libuv, read, write, etc.
Projects
None yet
Development

No branches or pull requests

9 participants