Skip to content
This repository has been archived by the owner on Apr 19, 2023. It is now read-only.

Problem with nested connections to a unix domain socket #146

Open
welchwilmerck opened this issue Jul 12, 2018 · 2 comments
Open

Problem with nested connections to a unix domain socket #146

welchwilmerck opened this issue Jul 12, 2018 · 2 comments

Comments

@welchwilmerck
Copy link

welchwilmerck commented Jul 12, 2018

I use NG to traverse a tree where the recursion is implemented by invoking a wrapper script as a subprocess at each node of the tree. The wrapper script detects when it's launched from the command line and starts the NG server. Each nested invocation of the script then takes the other path and runs the NG client connecting back to the server for each of its children. My current traversal goes 4 deep.

Works fine with the server listening on TCP, but throws the exception below when using a unix domain socket. It does seem to be a timing issue as on a loaded machine the NG server might throw one or two exceptions per thousand nodes while on an otherwise idle machine I'll see ~50 per thousand.

I will work up an example I can share, but it will take me a while.

I've tried various combinations of:
RHEL 6/7
JDK 8/10
JNA 4.4.0/4.5.1

Jul 11, 2018 2:40:08 PM com.martiansoftware.nailgun.NGCommunicator lambda$startBackgroundReceive$1
WARNING: Nailgun client read future raised an exception
java.io.IOException: com.sun.jna.LastErrorException: [104] Connection reset by peer
        at com.martiansoftware.nailgun.NGUnixDomainSocket$NGUnixDomainSocketInputStream.doRead(NGUnixDomainSocket.java:127)
        at com.martiansoftware.nailgun.NGUnixDomainSocket$NGUnixDomainSocketInputStream.read(NGUnixDomainSocket.java:98)
        at java.base/java.io.DataInputStream.readInt(DataInputStream.java:392)
        at com.martiansoftware.nailgun.NGCommunicator.readChunkImpl(NGCommunicator.java:482)
        at com.martiansoftware.nailgun.NGCommunicator.readChunk(NGCommunicator.java:465)
        at com.martiansoftware.nailgun.NGCommunicator.lambda$null$0(NGCommunicator.java:191)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1135)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.base/java.lang.Thread.run(Thread.java:844)
Caused by: com.sun.jna.LastErrorException: [104] Connection reset by peer
        at com.martiansoftware.nailgun.NGUnixDomainSocketLibrary.read(Native Method)
        at com.martiansoftware.nailgun.NGUnixDomainSocket$NGUnixDomainSocketInputStream.doRead(NGUnixDomainSocket.java:125)
        ... 9 more
@welchwilmerck
Copy link
Author

I was able to reproduce with a simple class and a stack of scripts:

processnesting.tar.gz

Put the scripts in a directory in your path.
Add the new class to examples and package.
Start the server with the hardcoded path to the socket that in the scripts.
Run the wrapper script (maybe more than once - sometimes it's fine).

java -cp \
nailgun-examples-0.9.3-SNAPSHOT.jar\
:nailgun-server-0.9.3-SNAPSHOT.jar\
:jna-4.5.1.jar \
com.martiansoftware.nailgun.NGServer local:/dev/shm/pnsocket
$ pnwrap pn1
pnwrap of pn1
************* IN pn1 temp **********
pn1
pnwrap of pn2
************* IN pn2 temp **********
pn2
pnwrap of pn3
************* IN pn3 temp **********
pn3
pnwrap of pn4
************* IN pn4 temp **********
pn4
pnwrap of pnleaf
************* IN pnleaf temp **********
inside leaf
------------- OUT pnleaf temp ----------------
out of pn
------------- OUT pn4 temp ----------------
out of pn
------------- OUT pn3 temp ----------------
out of pn
------------- OUT pn2 temp ----------------
out of pn
------------- OUT pn1 temp ----------------
out of pn
NGServer 0.9.3-SNAPSHOT started on local socket /dev/shm/pnsocket.
Jul 16, 2018 5:43:07 PM com.martiansoftware.nailgun.NGCommunicator lambda$startBackgroundReceive$1
WARNING: Nailgun client read future raised an exception
java.io.IOException: com.sun.jna.LastErrorException: [104] Connection reset by peer
	at com.martiansoftware.nailgun.NGUnixDomainSocket$NGUnixDomainSocketInputStream.doRead(NGUnixDomainSocket.java:127)
	at com.martiansoftware.nailgun.NGUnixDomainSocket$NGUnixDomainSocketInputStream.read(NGUnixDomainSocket.java:98)
	at java.io.DataInputStream.readInt(DataInputStream.java:387)
	at com.martiansoftware.nailgun.NGCommunicator.readChunkImpl(NGCommunicator.java:482)
	at com.martiansoftware.nailgun.NGCommunicator.readChunk(NGCommunicator.java:465)
	at com.martiansoftware.nailgun.NGCommunicator.lambda$null$0(NGCommunicator.java:191)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:748)
Caused by: com.sun.jna.LastErrorException: [104] Connection reset by peer
	at com.martiansoftware.nailgun.NGUnixDomainSocketLibrary.read(Native Method)
	at com.martiansoftware.nailgun.NGUnixDomainSocket$NGUnixDomainSocketInputStream.doRead(NGUnixDomainSocket.java:125)
	... 9 more

@sbalabanov-zz
Copy link
Contributor

At the moment we do not have cycles to investigate this, the recursive scenario is far beyond the traditional usage of ng. PRs are still gladly accepted :-)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants