Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NFS mounting fails when I re-launch dinghy #272

Open
hezhizhen opened this issue Mar 20, 2018 · 33 comments
Open

NFS mounting fails when I re-launch dinghy #272

hezhizhen opened this issue Mar 20, 2018 · 33 comments

Comments

@hezhizhen
Copy link

hezhizhen commented Mar 20, 2018

I have been facing the problem these days. I start dinghy and then stop it using dinghy halt, and then when I start it again, the logs will be like as follows:

Starting the dinghy VM...
Starting NFS daemon, this will require sudo
Waiting for NFS daemon...
NFS mounting failed
Starting the FsEvents daemon
Starting DNS and HTTP proxy

The only solution is to restart my laptop. But once I re-launch dinghy, NFS always fails to mount. Is there anything wrong with my settings and workflow? What should I do to fix it?

  • MacOS: 10.13.3
  • dinghy: 4.6.3
@m-bymike
Copy link

m-bymike commented Jun 12, 2018

Same for me, here's some output from /usr/local/Cellar/dinghy/4.6.4/local/var/dinghy-NFS.output

UNFS3 unfsd 0.9.23 (C) 2009, Pascal Schmidt <[email protected]>
bind: Address already in use
Couldn't bind to udp port 19091

I tried through all variants like dinghy halt, dinghy restart or even dinghy destroy. Even after a simple halt, ps auxwww | grep nfs doesn't list a single process. In addition, netstat does not list the port to be in use.

dinghy: 4.6.4
OSX: 10.13.4

Edit 2018-06-15: netstat actually lists the port as in use, but the process behind isn't alive anymore. I was unable to find a way to close it, even shutting down the network interface didn't work.

@codekitchen
Copy link
Owner

Well, darn. We originally used to select a random port to run NFS on with each start, exactly because of this problem. But I could no longer reproduce the problem back in December and it was causing problems for some people who run stricter firewalls, so I removed that randomization and the port is just hard-coded now. Sounds like it's still a problem for some.

I'm not sure what the best fix is here, it may involve making changes to unfsd to allow address reuse.

@m-bymike
Copy link

@codekitchen Thank you for your update. Do you have an Idea which versions of UNFS3 did close the port successful back in December? It would be a good starting point for further investigations...

@codekitchen
Copy link
Owner

I don't believe it has changed in unfs3, it just appears to affect some people more often than others. I never see the problem anymore when doing a restart or halt/start, for example. I don't have any theory as to why, yet.

@hezhizhen
Copy link
Author

@codekitchen maybe you need to buy an old macbook...

@codekitchen
Copy link
Owner

Haha, I don't know what counts as "old" but I use a 2014 Macbook Pro and a 2013 Mac Pro.

Something that just occurred to me, what VM provider are you both using? I've been using xhyve for a couple years now, maybe this is more prevalent with Virtualbox? I'll do some tests.

@hezhizhen
Copy link
Author

hezhizhen commented Jun 26, 2018

@codekitchen It seems that my macbook is a newer one (early-2015 MacBook Pro)...

I used docker first and then switched to dinghy, so the VM provider is VirtualBox, whose version is 5.1.6 r110634 (Qt5.5.1).

Maybe you could create a poll to see the effect of VM providers and their versions.

@m-bymike
Copy link

Late 2013 13-inch Macbook Pro... But running Virtualbox here...

@m-bymike
Copy link

m-bymike commented Jun 26, 2018

Funny enough: The problem doesn't show with xhyve. Just tried it, and I'm able to restart dinghy as often as I like...

@codekitchen
Copy link
Owner

Dang, I was afraid of that. This may be a Virtualbox bug, then. I'll see if I can dig up any more information.

We could potentially bring back the port randomization as a configurable option, though that kinda sucks having to make people set configuration to work around this.

@m-bymike
Copy link

m-bymike commented Jul 2, 2018

@codekitchen Short update: out of curiosity I wen't back to Virtualbox. But now I'm able to shutdown/restart the box w/o any nfs port issues. Since it has been a few days, I'm not sure what changed. No update of Virtualbox for sure, but I installed as mentioned xhyve... So I haven't got a Solution or something close, except that now I'm part of the 'works on my machine' group...

@alanjc7
Copy link

alanjc7 commented Jul 9, 2018

I’ve also hit this error on Dinghy 4.6.3, MacOS 10.13.3. Restarting was the only workaround.

@codekitchen
Copy link
Owner

@alanjc7 are you using the Virtualbox backend as well?

@alanjc7
Copy link

alanjc7 commented Jul 10, 2018

This turned out to be an issue with my Firewall config anyway, apologies for the false alarm!

@Ghazgkull
Copy link

@alanjc7 I'm encountering this same issue. Can you please say more about what the problem was and how you addressed it?

@hezhizhen hezhizhen changed the title NFS mounting failed when I re-launch dinghy NFS mounting fails when I re-launch dinghy Jul 12, 2018
@alanjc7
Copy link

alanjc7 commented Jul 12, 2018

I turned on a Firewall which was blocking unfsd connections - removing it allowed me to build without the NFS mounting issues.

@Ghazgkull
Copy link

Thanks for the info, @alanjc7. Digging around a bit, this looks like a different issue than the one I and others on my team saw last week when setting up dinghy for the first time. In all our cases, our firewalls were updated to allow unfsd. Using netstat and lsof, we were able to see that port 19091 was being held open by an invalid (dead?) process id. Not knowing how to manually close a port in this state, we each ended up having to reboot our machines to get past the problem.

I wonder if there's something happening during the initial process of creating a new VM (dinghy create --provider virtualbox) which is leaving a dead process holding this port?

@k0pernicus
Copy link

Unfortunately, same issue for me:

bind: Address already in use
Couldn't bind to udp port 19091

Also, my firewall is allowing unfsd.
As @Ghazgkull said, the port 19091 is opened by an invalid process id...

@mvgijssel
Copy link

Same for me!

@codekitchen
Copy link
Owner

@k0pernicus and @mvgijssel are you both using the Virtualbox back-end as well?

@mvgijssel
Copy link

Yes!

@k0pernicus
Copy link

I used the xhyve back-end.
For information, I get back to a GNU/Linux system since a few days...
So, unfortunately, I can't test any solution anymore to solve this issue - sorry in advance.

@v-stickykeys
Copy link

@codekitchen I'm using xhyve and getting the same issue as @k0pernicus

Firewall is accepting unfsd

dinghy-NFS.output is telling me

bind: Address already in use
Couldn't bind to udp port 19091

I use $ netstat -vp udp | grep 19091 to get the pid but when I try $ kill 11615 the result is

kill: (11615) - No such process

Haven't found a solution yet.

@codekitchen
Copy link
Owner

Darn. I've tried many times, but I can't reproduce it myself, so I'll need some help debugging the issue.

As I understand it, if the UDP port isn't being freed after the process exits, that means the process is a zombie that hasn't been fully cleaned up, usually either because it has a child process still running, or because the parent process failed to clean up after it.

Because we daemonize unfsd, the parent process is launchd (PID 1), so cleaning up after it should always work. On the other hand, I don't think that unfsd ever spawns child processes, so that rules out both possibilities and it seems likely that I'm missing something crucial. 😬

The next time this happens, can you use that netstat command to find the defunct PID, and then look for any information on the parent or children processes? I think ps -axf | grep <PID> should give relevant info, including parent PID, or you could look in Activity Monitor.

@k0pernicus
Copy link

k0pernicus commented Aug 15, 2019

Hi @codekitchen

I got this issue again.

I used this netstat command to get the PID, thanks to this stackoverflow post:

netstat -vanp udp | 
  awk 'BEGIN { l=0 }
       { if (l == 0) { print $0 }
         else if (l == 1) { print $0, "process" }
         else { "ps -o comm " $9 "| tail -1" | getline line;
                print $0, line }
         l++; }'

Unfortunately, I can't retrieve the associated relevant informations using ps...
I looked also in Activity Monitor, and still no process :/

@jordanranson
Copy link

This might not help but I've been having this issue too, usually happens after waking up my computer from sleeping. Rebooting doesn't fix it, the only thing that does is completely destroying and recreating the image, which as you imagine is not ideal.

My specs:
2017 MacBook Pro running macOS 10.14.3 (18D42)

@codekitchen
Copy link
Owner

@jordanranson Can you confirm any errors that appear in your /usr/local/Cellar/dinghy/*/local/var/dinghy-NFS.output? I can't imagine how a bind: Address already in use error could persist across a reboot and require actually destroying and re-creating the image, so I'm wondering if you are seeing the same symptoms but a different underlying cause.

@mashalifshin
Copy link

I'm seeing this issue as well, with Mojave 10.4.6 on a 2018 15inch Macbook pro, running dinghy with xhyve driver. This is the first time I'm seeing it, occurred after the macbook crashed with a kernel panic (but that's a separate story). Here is a step by step of what happened, in the hopes this will be helpful for debugging.

Trying to restart the VM:

$ dinghy restart                                                                                                                                             
The VM 'dinghy' does not exist in docker-machine.
Run `dinghy create` to create the VM, `dinghy help create` to see available options.

Okay, try to create it again:

$ dinghy create                                                                                                                                              
Creating the dinghy VM...
Docker machine "dinghy" already exists
/usr/local/Cellar/dinghy/4.6.5/cli/dinghy/system.rb:8:in `system_print': Failure calling `docker-machine create -d xhyve --xhyve-memory-size 2048 --xhyve-cpu-count 4 --xhyve-disk-size 20000 dinghy` (System::Failure)
	from /usr/local/Cellar/dinghy/4.6.5/cli/dinghy/machine.rb:130:in `system_print'
	from /usr/local/Cellar/dinghy/4.6.5/cli/dinghy/machine.rb:16:in `create'
	from /usr/local/Cellar/dinghy/4.6.5/cli/cli.rb:65:in `create'
	from /usr/local/Cellar/dinghy/4.6.5/cli/thor/lib/thor/command.rb:27:in `run'
	from /usr/local/Cellar/dinghy/4.6.5/cli/thor/lib/thor/invocation.rb:126:in `invoke_command'
	from /usr/local/Cellar/dinghy/4.6.5/cli/thor/lib/thor.rb:359:in `dispatch'
	from /usr/local/Cellar/dinghy/4.6.5/cli/thor/lib/thor/base.rb:440:in `start'
	from /usr/local/bin/_dinghy_command:12:in `<main>'

Okay I'll try to remove it then:

$ docker-machine rm dinghy
About to remove dinghy
WARNING: This action will delete both local reference and remote instance.
Are you sure? (y/n): y
Error removing host "dinghy": Unable to find 'xhyve' process by PID: 2729

Here s what I see in my dinghy-NFS.output:

=== Starting NFS at 2019-10-02T16:01:24-04:00 ===

UNFS3 unfsd 0.9.23 (C) 2009, Pascal Schmidt <[email protected]>
/Users/masha: ip 192.168.64.3 mask 255.255.255.255 options 6
unable to send RPC reply
unable to send RPC reply
unable to send RPC reply
unable to send RPC reply
unable to send RPC reply
unable to send RPC reply
unable to send RPC reply
unable to send RPC reply
unable to send RPC reply
unable to send RPC reply
unable to send RPC reply
unable to send RPC reply
unable to send RPC reply
unable to send RPC reply
unable to send RPC reply
unable to send RPC reply
unable to send RPC reply
unable to send RPC reply
unable to send RPC reply
unable to send RPC reply
unable to send RPC reply
...

So I cannot even get a udp port number.

Let me know if this is helpful and if you have ideas. I'll be trying to unstick this but will probably need to re-create the image soon. I'll update if I learn more.

@mashalifshin
Copy link

Actually that wasn't as bad as I thought it would be. After a restart I was able to successfully run a docker-machine rm dinghy. Hope the debugging info above can help.

@codekitchen
Copy link
Owner

@mashalifshin thanks for the logs. The unable to send RPC reply lines actually imply that UNFS was running, that's an unrelated warning that it likes to spit out sometimes while syncing files but doesn't seem to affect its working. It's possible that was from an earlier run, but I don't see any bind errors like the previous reports.

The fact that dinghy restart was saying the VM didn't exist when it still did is interesting, the output of docker-machine status dinghy might've indicated a bug in dinghy, we just check if that command succeeds to decide if the machine exists. Overall it sounds like something went wrong with docker-machine-driver-xhyve, they might be interested in those logs especially Error removing host "dinghy": Unable to find 'xhyve' process by PID: 2729. I'm glad a restart cleared it up.

@mashalifshin
Copy link

Thanks very much for your reply Brian. And a great suggestion for reaching out to the xhyve driver folks about the bug.

@charlesdg
Copy link

netstat -vp udp | grep 19091 and then kill the pid with sudo kill <pid> worked for me !

@klokie
Copy link

klokie commented Feb 6, 2021

Hi folks, looks like I'm a bit late to the party but wanted to share something that helped me get NFS shares to work with Dinghy on Mac OS 10.15.7.

In my case I had to grant "Full Disk Access" to /sbin/nfsd (under System Preferences > Security & Privacy > Privacy > Full Disk Access). As noted in laravel/homestead#1304 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests