Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On High Sierra, dinghy appears to mark files as modified #269

Closed
ms-ati opened this issue Feb 21, 2018 · 58 comments
Closed

On High Sierra, dinghy appears to mark files as modified #269

ms-ati opened this issue Feb 21, 2018 · 58 comments

Comments

@ms-ati
Copy link

ms-ati commented Feb 21, 2018

Hi all, I can't claim to know exactly what is causing it, but our team has identified a major issue impeding development on Dinghy since we upgraded to Mac OS X "High Sierra" recently.

When Dinghy is running, and a file in the shared folder is edited externally (e.g. in Emacs) and saved to disk. When user attempts to edit the file again, something unexpected happens:

  • editor reports that file was changed externally, and asks if it should be saved again?
  • on attempt to save, editor asks if we want to overwrite external changes?

It should be noted that the expected thing also happens: the app inside Dinghy observes the change.

We've verified this behavior occurring in all of: Emacs, Sublime Text, and Rubymine, for example. However, for Rubymine, there is a workaround by changing to nio2 file api.

Hypothesis: Is this due to an interaction between the Dinghy NFS server and the new Apple APFS file system, which has nanosecond modification times?

We've verified that dinghy stop allows the editors to save files unimpeded by this issue.

Any ideas? Anyone else seeing this?

@ms-ati
Copy link
Author

ms-ati commented Feb 21, 2018

Forthcoming: we are going to experiment with running on regular Docker on Vagrant and report back if that suffers the same problem.

@ms-ati
Copy link
Author

ms-ati commented Feb 21, 2018

Is it possible that it has anything to do with this aspect of how APFS behaves, or any other aspect of APFS in particular?

@ms-ati
Copy link
Author

ms-ati commented Feb 21, 2018

@ryansch
Copy link
Contributor

ryansch commented Feb 21, 2018

@ms-ati We (myself and coworkers) are all running High Sierra with dinghy and we haven't seen this issue across vim, macvim, neovim, atom, or vs code. Note: we all did a clean install instead of an upgrade.

@ms-ati
Copy link
Author

ms-ati commented Feb 21, 2018

@ryansch We are seeing this universally -- every developer. We all did the upgrades of existing fairly late-model Macbook Pros, rather than clean installs. We all had our drives converted to APFS.

Can you specify: In Disk Utility, is the disk where your code lives an APFS volume?

@ryansch
Copy link
Contributor

ryansch commented Feb 21, 2018

@ms-ati Yes all of my macos volumes are inside of an APFS container.

@mizhi
Copy link

mizhi commented Feb 21, 2018

@ryansch I'm in the same org as @ms-ati , one thing I just thought of, are your APFS volumes encrypted?

@ryansch
Copy link
Contributor

ryansch commented Feb 21, 2018

@mizhi They are!

@mizhi
Copy link

mizhi commented Feb 22, 2018

To add a bit more mystery to this, I setup a vagrant machine with docker and ran our setup in that, and did not experience the issue described above. To make sure I wasn't going crazy, I shut that down and started up everything using dinghy and immediately started getting the file modification issue.

Going to dig some more on the dinghy side to see if I can suss out more answers (or better observations).

@codekitchen
Copy link
Owner

I haven't seen this myself, but it's almost surely something to do with https://github.com/codekitchen/fsevents_to_vm which Dinghy uses to make filesystem events work across the NFS mount. So I'm not surprised that it doesn't happen with vagrant-flavored docker, since vagrant doesn't make any attempt to make fsevents work.

@ms-ati
Copy link
Author

ms-ati commented Feb 23, 2018

...to make the above even more specific. Is it reasonable @codekitchen to hypothesize the issue is with

https://github.com/codekitchen/fsevents_to_vm/blob/master/lib/fsevents_to_vm/ssh_emit.rb#L14

...specifically, that then unfsd is observing the touch inside the vm, and propagating back to the APFS file system a new value? Implying a hypothesis that the change to APFS in High Sierra, in some cases, is causing an unexpected mtime to propagate back out via NFS?

@codekitchen
Copy link
Owner

That does seem the most likely culprit, though I'm not up to speed on APFS and I haven't seen this myself, yet.

A quick glance through the unfs3 source code shows it uses utime for setting timestamps, which only has second-level precision, so it doesn't look likely that fsevents_to_vm can just switch to replicating over higher-resolution timestamps as a fix.

@ms-ati
Copy link
Author

ms-ati commented Feb 26, 2018

@codekitchen I've investigated a bit and this definitely makes sense -- but it doesn't compute how this could affect some, but not all users. How is it possible that not everyone using Dinghy on APFS is experiencing this?

@ms-ati
Copy link
Author

ms-ati commented Feb 26, 2018

Also, in principle a solution to this issue should be possible and in fact not too hard, right? I ask because, I observe that the filesystem inside dinghy is ext4, which DOES itself support nanosecond times, ultimately.

Strawman of the steps one could take here:

  1. Ensure Dinghy's base Linux vm image is configured exactly correct so that nanosecond timestamps in the ext4 filesystem are fully working
  2. Test-drive fsevents_to_vm, if host is on APFS, to propagate the full sub-second timestamp (this is probably just modifying the ruby code to read the fractional timestamp, and the touch command that we run inside the VM to write the fractional timestamp
  3. [HARD?] Patch unfsd to stat and to write nanosecond timestamps where supported

Of the above 3 steps, I feel least confident about (3).

Thoughts?

@ms-ati
Copy link
Author

ms-ati commented Feb 26, 2018

cc @michaeldiscala

@codekitchen
Copy link
Owner

Yeah that's why I dug into the unfs3 source first -- patching that would be the only difficult part, though hopefully not more than a few hours work. I don't know how much work on top of that it'd be to get the patch accepted upstream, or whether they'd require parity in the Windows back-end. It does look like NFSv3 supports nanosecond-precision timestamps, so it should hopefully be possible. It's not something I'll be able to tackle myself in the near future unfortunately.

@ms-ati
Copy link
Author

ms-ati commented Feb 26, 2018

@codekitchen Ok, lines up with my thinking as well. Our first step is to see if we can isolate and identify what is different in the setups of developers who are seeing this impact vs those who are not. You and I are discussing a reasonable hypothesis for how to fix a problem, but which doesn't seem to affect everyone.

Do you have any hypothesis on how the above could possibly not impact all users?

@paolomainardi
Copy link

@ms-ati @codekitchen i suggest you to just stick with standard nfs server implementation (es: https://github.com/adlogix/docker-machine-nfs) we are using that with Dinghy and it works fairly well (https://github.com/sparkfabrik/sparkdock)

@ms-ati
Copy link
Author

ms-ati commented Feb 26, 2018

@paolomainardi Thanks for those links, it looks like similar issues on High Sierra were faced in docker-machine-nfs in adlogix/docker-machine-nfs#79 , and they fixed it with parameters to the NFS daemon?

@paolomainardi
Copy link

It was another kind of problem, mainly related to an operating system bug then fixed

@ryansch
Copy link
Contributor

ryansch commented Feb 26, 2018

One thing that still confuses me: why aren't we all seeing this bug? What's different about your workflow/environment (etc)?

@ms-ati
Copy link
Author

ms-ati commented Feb 26, 2018

@ryansch Yes! We are furiously invalidating hypotheses as quickly as we think of them. We are (un-)lucky enough to have multiple developers who see, and who are not seeing, this issue. So we are working hard to prove any hypothesis about what is different between their setups.

@paolomainardi
Copy link

I am using APFS without the encryption + native nfs server and i can't face any issues here.

@ryansch
Copy link
Contributor

ryansch commented Feb 26, 2018

@ms-ati That's quite interesting that you have devs on both sides of the fence there. Feel free to ask for help testing any hypotheses that pan out!

@seckenrode
Copy link

Hey all! I wanted to chime in here with some info and reproduction as well. I similarly am on an APFS volume, with encryption resulting from an in-place upgrade to High Sierra.

When dinghy is not running, I do not see any file modified errors while saving files.

no-error

When dinghy is running, I do see file modified errors, if I am saving while typing.

error

This seems to lend credence to the fact that it's related to the timestamp differences between modified timestamps mentioned above. If I save a file while I am not actively typing, my editor does not prompt with any errors:

no-error-ii

Hope that helps, happy to provide more info as needed!

@ryansch
Copy link
Contributor

ryansch commented Feb 26, 2018

Wow @seckenrode that's excellent. There's no bug quite like a race condition!

@seckenrode
Copy link

seckenrode commented Feb 26, 2018

To clarify an additional thing that we're finding interesting:

The test_file.rb is in my /Users/seckenrode/Desktop directory. I do not have any docker containers actively running at the time of this test, and I do not have any docker volumes mounted AFAIK. The only difference between the first and second/third GIFs here is that I ran dinghy up between the two.

I'm not sure what that means, w.r.t. the fsevents_to_vm and the unsfd server here.

Thanks!

@ms-ati
Copy link
Author

ms-ati commented Feb 26, 2018

Another clarification: I thought that I wasn't seeing the same issue as @seckenrode. But in fact I am! It's just that I wasn't typing and saving as quickly together as he was.

Steps to reproduce on my machine:

  1. dinghy up
  2. Open a file anywhere under my home directory in Sublime Text 3
  3. While typing quickly, his Cmd-S to save, like "Hello World"

What I observe is, if I type quickly the Cmd-S, I see the same as @seckenrode gif above

@ms-ati
Copy link
Author

ms-ati commented Feb 26, 2018

It appears that the crucial reproduction step is: typing in the editor within 1 second after saving.

This appears to trigger the error in Sublime Text 3 every time. It appears the difference between our developers with and without the issue was more in their typing and saving speed and habits than in their machine configurations?

@ms-ati
Copy link
Author

ms-ati commented Feb 26, 2018

...it appears that Emacs has this worse, as by default it asks the user every time a file modification time is changed, not just when the buffer has changes that differ from disk.

This behavior can be changed by the user to do what Sublime Text 3 does (compare the buffer contents to the disk contents to avoid warning when no changes) through the use of Emacs lisp, I think:
https://emacs.stackexchange.com/questions/21137/fix-changed-on-disk-reread-from-disk-when-file-has-not-changed

@thierrymichel
Copy link

It seems that you are aware of the problem.
Just to say that we encounter the same issue in applications like Paw or Coderunner. But everything works fine with Visual Studio Code…

If it can help we also have strange behaviour with Dropbox.
Some files, like *.key (Keynote), are modified without any user actions, can not be opened by others or generates conflicted versions…

If you want more informations, just tell me.

And, last but not least, thanks for the hard work! 👍

@ms-ati
Copy link
Author

ms-ati commented Apr 16, 2018

Regarding a path forward for fixing UNFS3 on High Sierra to support the nanosecond timestamps: I've filed an "Is this project alive?" bug with UNFS3 on SourceForge:
https://sourceforge.net/p/unfs3/bugs/10/

If no response in one week, what would that mean? Well, it would mean that in addition to making and testing a High Sierra fix for UNFS3, we'd also have to become the de-facto upstream for the version published for Mac OS via Homebrew.

Please let me know ^ if you have thoughts on the above @codekitchen ?

@ms-ati
Copy link
Author

ms-ati commented Apr 24, 2018

Unfortunately, as you can see in the discussion with UNFS3 authors on SourceForge, that project appears to be effectively dead with no upstream authors to collaborate with.

Unless someone else emerges who wants to take on development of UNFS3 to the extent that they can collaborate with us in adding support for nanosecond timestamps on APFS on High Sierra, and take on responsibility for an upstream repository and Homebrew recipe, then we are probably not going to pursue this any further.

We are investigating whether Docker for Mac has improved sufficiently in recent months to be a viable replacement for Dinghy for our use cases.

@derfian
Copy link

derfian commented May 7, 2018

Despite @ms-ati's claims that there are no upstream developers to collaborate with,
I spent some time this weekend digging into this issue and came up with something. I've got a git branch of unfs3 with some code that should allow for microsecond or nanosecond resolution access/modify timestamps, on platforms where this is supported. I haven't really tested the code much yet, so beware of bugs.

@ms-ati
Copy link
Author

ms-ati commented May 7, 2018

Hi @derfian this is fantastic news! We're so happy to hear about this. What sort of collaboration might be most helpful to you to get your branch towards something ready for production? Is there a test suite we could adapt for it, for example?

@ms-ati
Copy link
Author

ms-ati commented May 7, 2018

During my lunch break, what I am trying today: Following the Homebrew instructions to make a proper pull request against Dinghy in order to use @derfian 's new branch for unfs3.

I'll report back here with

  1. Does it actually work on my High Sierra system
  2. A link to the PR in question to discuss further

Any thoughts?

@ms-ati
Copy link
Author

ms-ati commented May 7, 2018

Ok. To see my progress on testing @derfian UNFS3 changes in Dinghy thus far, please see this gist:
https://gist.github.com/ms-ati/aaafdf091e1b04c4621aeec18cbfab4c

I will update as I progress.

Based on dtruss -n unfsd output, I am seeing calls to lstat64, fstat64, and setattrlistat. I am not seeing any calls to utime, utimes, utimesat, nor clock_gettime, when I edit a file and watch the dtruss output.

In order to view the local MacOS nanosecond timestamps, I did brew install coreutils and gls --full-time.

@ms-ati
Copy link
Author

ms-ati commented May 8, 2018

Here's my branch of the Dinghy Homebrew Tap where I am developing, pulling in @derfian 's branch of UNFS3, and soon to pull in my own fork of Dinghy to get the necessary support on the fsevents_to_vm side:

https://github.com/ms-ati/homebrew-dinghy/tree/use-unfs3-branch-higher-resolution-setattr

Thoughts? Comments?

@ms-ati
Copy link
Author

ms-ati commented May 9, 2018

Update: It appears that the above-referenced branch of unfs3 by @derfian (Thanks! 👍) is

a. Correctly setting the nanosecond timestamps to the Mac OS server host, but
b. Incorrectly reading the nanosecond timestamps from the Mac OS server host -- they appear as zeros to the client Ubuntu Linux system.

Details reported on SourceForge for UNFS3 here:
https://sourceforge.net/p/unfs3/bugs/10/#f004

@ms-ati
Copy link
Author

ms-ati commented May 11, 2018

Hi @derfian -- just wondering if you thought you might have a chance to look at the above comments on your nanosecond-timestamps branch of UNFS3?

Have a great weekend and thanks again!

@ms-ati
Copy link
Author

ms-ati commented May 16, 2018

Great news everyone!

I've verified that @derfian latest code on his branch of unfs3 works, please see my latest comment on sourceforge for details and steps to reproduce correct nanosecond behavior with Dinghy.

What is left to do? Well, the correct behavior is only observed when fsevents_to_vm is disabled. When it is running, the timestamps are set to zero nanoseconds again.

So, the next step is to get fixes into the fsevents_to_vm stack so that nanosecond timestamps are preserved.

HUGE THANK YOU to @derfian -- do you have plans to cut a new official release of unfs3 from your branch code?

@ms-ati
Copy link
Author

ms-ati commented May 16, 2018

@codekitchen Is there anyone who can help investigate the fsevents_to_vm side with me?

@codekitchen
Copy link
Owner

That's awesome. I should have time to poke at fsevents_to_vm later next week. If anybody else wants to look before then, I think all that may need to change is the code that reads the mtime on this line. Unless I'm forgetting something, which is possible as it's been a long time.

@ms-ati
Copy link
Author

ms-ati commented May 16, 2018

Hi @codekitchen there's an additional wrinkle. I have a branch of fsevents_to_vm which correctly formats the mtime using Time#iso8601(9) which is the right way to get nanoseconds, and on a Ubuntu container setting using touch -d works correctly.

However, the boot2docker host that fsevents_to_vm is actually ssh'ing into is actually based on BusyBox, and so the touch command is just BusyBox rather than Gnu Coreutils touch. The point of this digression is that the touch command there does NOT support nanoseconds.

One path forward that I was going to try was to have dinghy run an additional container which does contain a nanoseconds-enabled touch command, just for the purpose of replacing the ssh-to-touch with docker exec to touch. Any thoughts on that?

Could the existing Proxy container be extended to (a) mount the NFS-share of the host home directory, and (b) support docker exec from fsevents_to_vm to run the touch command?

@ms-ati
Copy link
Author

ms-ati commented May 17, 2018

@codekitchen To restate more concisely what I was saying in previous comment. First, disable your fsevent_to_vm in .dinghy/preferences.yml or just kill it. Then try this:

A. WORKING: Ubuntu's touch (from gnu coreutils) accepts nanoseconds:

# Outside on host Mac OS
$ touch ~/test_file && gls --full-time ~/test_file
-rw-r--r-- 1 username staff 0 2018-05-17 10:39:00.592204127 -0400 /Users/username/test_file
$ docker run -it -v $HOME:/fstest ubuntu:16.04 bash

# Inside Ubuntu container
root@91e1e49f6bf2:/# touch -m -c -d 2018-05-17T14:41:04.730369000Z /fstest/test_file
root@91e1e49f6bf2:/# ls --full-time /fstest/test_file
-rw-r--r-- 1 501 dialout 0 2018-05-17 14:41:04.730369000 +0000 /fstest/test_file

# Outside on host Mac OS again
gls --full-time ~/test_file
-rw-r--r-- 1 username staff 0 2018-05-17 10:41:04.730369000 -0400 /Users/username/test_file

B: BROKEN: Boot2Docker's touch (from BusyBox) rejects nanoseconds:

# Outside on host Mac OS
$ dinghy ssh

# Inside Dinghy's Boot2Docker VM
docker@dinghy:~$ touch -m -c -d 2018-05-17T14:44:12.503968000Z /Users/username/test_file
touch: invalid date '2018-05-17T14:44:12.503968000Z'

@ms-ati
Copy link
Author

ms-ati commented May 17, 2018

So, given that we've replaced the format_time in fsevents_to_vm with time.utc.8601(9), which gives you a nanosecond representation in the right format:

In the solution space that I was referencing above, I'm aware of two main shapes to get this timestamp set correctly inside the VM to trigger the inotify events in Linux:

1. Continue to SSH to BusyBox

If we are continuing to ssh into the VM and run a command, we need something other than BusyBox's touch. Possibly we could download a static binary here or otherwise add the command we need to the BusyBox in the VM?

2. Exec inside a Docker container instead

Alternatively, we could use docker exec to run a command inside a Docker container which does have the gnu coreutils version of touch. Since dinghy already runs the http proxy container, I suggested that as one alternative; however we could just as easily run a separate small container just for this purpose.

Thoughts @codekitchen and others?

@ms-ati
Copy link
Author

ms-ati commented May 18, 2018

Hi @codekitchen I know you're busy and hate to be a bother, but would really appreciate your thoughts on which direction above so I don't waste my time on a PR that you wouldn't accept. Make sense?

@ms-ati
Copy link
Author

ms-ati commented May 22, 2018

Per agreement with @derfian on the unfs3 sourceforge issue here:

I have a few portability concerns about the new code that I'd like to review before merging to master. Making a new release isn't at the top of my priority list at the moment.

No worries, I understand. In the meantime, would you be willing to commit to keeping your existing Github branch available at the same path? Could we temporarily make dinghy depend on that branch to build, to get Mac OS docker users back into good shape, and later change the dependency to be your next release?

Sure, I will do that. I don't mind having the dinghy team depend on this branch until everything else settles into place.

I will be submitting a PR to homebrew-dinghy to officially switch the unfs3 dependency to his branch implementing nanosecond support, until such a time in the future as @derfian is able to finish testing cross-platform support and make an official release.

@codekitchen
Copy link
Owner

Hm yeah that's a bummer. Using the proxy container seems like a workable solution, my only concern would be whether there will be too much overhead doing a docker exec for every touch. If that turns out to be the case, we could possibly run sshd inside the proxy container or come up with some other mechanism for keeping open a long-running connection.

I wouldn't be opposed to option 1, the static binary, either, I'm just not sure about the specifics.

Sorry I haven't been more responsive, I've been traveling without my laptop. If you're going to work toward this I'll hold off on doing anything this week. Thanks!

@ms-ati
Copy link
Author

ms-ati commented May 22, 2018

@codekitchen No worries, I've just proven out an easy solution: literally copy the GNU coreutils touch binary from a ubuntu:16.04 container into the Dinghy VM. It works perfectly, same version of libc, etc. Will have a PR for you in next few days.

@ms-ati
Copy link
Author

ms-ati commented May 23, 2018

Ok @codekitchen, PRs are up to both homebrew-dinghy (to pull in the fixes from UNFS3) and fsevents_to_vm

codekitchen/homebrew-dinghy#2
codekitchen/fsevents_to_vm#9

The ball is now in your court to review, and eventually to release a new version of Dinghy to pull in fsevents_to_vm when it is merged! :)

@ms-ati
Copy link
Author

ms-ati commented May 25, 2018

Ping @codekitchen -- hope you get a chance to review this soon!
Have a great Memorial Day weekend (assuming you are based in US per your profile)!

@codekitchen
Copy link
Owner

Big thanks to @ms-ati for tackling this, it's never fun to wrangle changes in a dependency like unfs3. I've cut a new Dinghy release v4.6.4 with this fix, everything seems to be working well. Please report if you still see the problem or any other issues that may have cropped up.

@jscheel
Copy link
Contributor

jscheel commented Feb 7, 2020

@codekitchen I recently updated to Catlina (from Sierra) and I am still noticing this issue. I have dinghy 4.6.5 running. I can easily test by changing anything in an xcode project (for example, updating the version number). Because xcode autosaves the change, I see a warning that the file was modified by another process. When I stop dinghy, it works as expected.

@jscheel
Copy link
Contributor

jscheel commented Feb 7, 2020

@codekitchen ahh, it looks like I may have needed to install unfs from homebrew

@bigdrum
Copy link

bigdrum commented Aug 28, 2020

@jscheel I had the same problem, do you mind elaborating a bit more how you installed unfs from homebrew?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants