Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: cygwin and msys2 support (bash) #1291

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Conversation

W1M0R
Copy link

@W1M0R W1M0R commented Jun 11, 2024

This PR adds support for two common Linux-on-Windows environments: Cygwin (Bash) and MSYS2 (Git Bash).

These environments have lots in common with a few notable differences. It is good enough for this PR to see MSYS2 as a variant of Cygwin, but with different defaults.

The key difference (relevant to this PR) is in the way that these environments deal with paths in process arguments and environment variables. And the key similarity is that they both offer the cygpath utility to handle path conversions manually.

This PR uses the cygpath utility to:

  1. fix environment variables and process arguments for use with direnv's stdlib.sh (Unix-style paths)
  2. fix environment variables and process arguments for use with direnv's Go runtime on Windows (Windows-style paths)
  3. export environment variables according to the conventions of MSYS2
  4. export environment variables according to the conventions of Cygwin
  5. export the PATH environment variable in Unix-style
  6. enhance relevant stdlib.sh functions (such as PATH_add) for automatic conversions to Unix-style paths (MSYS2)
  7. prevent the Bash export hook from converting environment variables to Windows-style paths (MSYS2)

This PR was tested on Cygwin, MSYS2 "Minimal" (Git Bash), MSYS2 "Standard" (MINGW64) and WSL, with all unit tests passing and all direnv commands working as expected. More specifically, I am satisfied that the following direnv commands work well: hook, allow, deny, reload, export, edit, exec and status. I also tested this on an existing direnv project and am happy with the results.

The functionality in this PR is only activated in conventional MSYS2 environments and conventional Cygwin environments. Other environments revert back to the regular direnv behaviour. Care was taken to reduce the performance impact for non-cygpath environments by doing cheap calculations to skip the cygpath stack as early as possible. Relevant checks are in stdlib.sh and in cygpath.go.

@zimbatm Windows has been my primary development environment for a long time and I don't expect that to change anytime soon, so I'd be happy to maintain this contribution. I also spend a lot of time in the Windows Subsystem for Linux (WSL) and use the Nix Package Manager together with direnv whenever I have the opportunity (and have done so for quite a few projects), so I have a relatively good understanding of the typical direnv build and runtime environments, and it is easy enough for me to test (and maintain) this contribution in different contexts.

Related Issues

This PR will resolve the following issues (when using direnv inside an MSYS2 Git Bash or a Cygwin Bash environment):

  1. PATH gets mangled when using direnv from git-bash on Windows #253
  2. Incorrect path format is exported on Windows 10 with mintty / git bash, breaking the PATH and command resolution #796
  3. direnv mangles environment variables on windows #1274
  4. Quick question is it possible to support win natively? #1026
  5. How to run on windows? #343
  6. Windows compatibility #142
  7. Windows: path and path list separators #123

@pjeby also has a PR aimed towards Cygwin support: #630 (also see #638).

Future Work

  1. A future PR could aim to replace all cygpath calls with a native Go implementation based on the cygpath source code, the cygwin_conv_path function and/or the conv_path_list function. The MSYS2 runtime also has its own path conversion function that can be consulted. However, this could be tricky, and potentially complicate the direnv codebase a lot, and we may still end up needing cygpath as a fallback. It is also worth noting that one of the reasons to drop cygpath is to avoid the performance penalty of calling an external process on Windows, however, in the stdlib.sh we would just be replacing calls to cygpath with calls to direnv (unless we want to maintain two native cygpath implementations - one for Bash in the stdlib.sh and one for Go).
  2. Both MSYS2 and Cygwin provide GitHub actions. It may be a good idea to set something like this up.
  3. An MSYS2-specific build of direnv could be added to the MSYS2 package registry, to allow direnv to be installed on MSYS2 using pacman -S direnv (like fish and make for example).
  4. A Cygwin-specific build of direnv could be uploaded to the Cygwin package repository, to allow direnv to be installed on Cygwin using the Cygwin setup program.
  5. If other shells (Fish, Zsh, etc) are a thing in MSYS2/Cygwin, then support could be added for them.
  6. Minimal changes were made to the stdlib.sh. If usage on MSYS2/Cygwin gains traction, it might be helpful to write a stdlib extension with Cygwin/MSYS2 helper functions (e.g. is_msys2, is_cygwin, has_cygpath, etc). If an extension is not desired, then it could be included in the stdlib.
  7. This PR adds useful debug messages to identify calls to cygpath from the stdlib and from the Go runtime. If we need to squeeze more performance out of direnv by reducing cygpath calls even further, we can grep on debug messages that contain the word cygpath and then run a few scenarios to verify opportunities for optimization.

Implementation Notes

  1. The regular cross-platform build and deploy process of direnv is sufficient for this PR to work. The functionality in this PR is not dependent on direnv being built in an MSYS2/Cygwin environment.
  2. Environment variable normalisation is performed during direnv startup, and not during direnv export, to enable the direnv diffing algorithm to work its magic.
  3. Cygpath calls are reduced to a minimum by first using basic heuristics to determine whether the path manipulation requires cygpath.
  4. Since MSYS2 does automatic path conversions by default for process arguments and environment variables, we only need to fix the way the PATH environment is exported (Go runtime) and manipulated (stdlib.sh).
  5. Since Cygwin does not do automatic path conversions, we need to convert relevant arguments and environment variables from Unix to Windows style (Go runtime). We also fix PATH export.
  6. To export the automatic environment modifications made by MSYS2, just continue using eval "$(direnv export bash)". To export an environment that looks more like your .envrc file, use eval "$(MSYS2_ENV_CONV_EXCL="*" direnv export bash)". The eval "$(direnv hook bash)" command performs the export using the latter method, since it provides a familiar default experience to direnv users and is also faster since it skips a chunk of MSYS2's environment processing.
  7. The make test target in the GNUmakefile tests Bash support, but specifically excludes the tests for the other shells supported by direnv, e.g. Elvish, Fish, etc. This is because the conventional MSYS2/Cygwin environments use Bash, and I didn't want to make any guarantees about support for non-conventional environments.
  8. The GNUmakefile provides a target to setup a build environment using scoop, that can be used for building on MSYS2 (Git Bash) and Cygwin. However, using scoop is not the conventional approach on those environments. The minimal MSYS2 environment (Git Bash) has no package manager, so using scoop is sensible. The regular MSYS2 environment includes the pacman package manager for installing packages. The Cygwin environment does not have a package manager, but includes a setup executable that allows you to choose the packages you want in your environment. The difference between these approaches, is that scoop will install the native Windows packages as typically made available by the tool authors, whereas MSYS2 pacman and Cygwin setup.exe will install versions of the packages that are compiled in those environments with specific patches applied to make Windows tooling more POSIX-y (Cygwin) or Unix tooling more Windows-y (MSYS2).

Developer Notes

  1. bash.exe is an MSYS2 process (as opposed to direnv which is a native Windows executable) and expects a Unix-style PATH environment variable.
  2. The cygpath.exe utility is available by default in an MSYS2 environment and a Cygwin environment.
  3. Git Bash is a minimal MSYS2 MINGW64 environment that does not have the pacman package manager (which is included by default with the standard MSYS2 installation).

Comments

@zimbatm @tiymat @joshahussey While testing this PR in WSL, I came across the following code (introduced in #1242):

test_nonempty "$VIETUAL_ENV"

When running the unit tests, it looks like a silent failure, in that the tests report success, but looking at the output it seems like it actually fails. I suspected that this was a typo and renamed it to VIRTUAL_ENV, however, this caused the unit tests to fail hard, so I decided to leave it as is.

@garyo If you are still interested in using direnv in an MSYS2/Cygwin environment, then this PR can help. It should be as simple as cloning the PR, installing go, and running make followed by make install.

@W1M0R W1M0R marked this pull request as draft June 11, 2024 20:30
.gitattributes Outdated
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of the tests require that these files be in LF. On Windows, the default on clone (on my system) is CRLF. This ensures that it uses EOL as expected by the tests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be enough to use * text=auto eol=lf?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would, I'll update it accordingly.

.gitignore Outdated Show resolved Hide resolved
GNUmakefile Outdated Show resolved Hide resolved
internal/cmd/env.go Outdated Show resolved Hide resolved
Copy link
Author

@W1M0R W1M0R Jun 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On MSYS2, symlinks are full file copies. And on Windows, the modtime of state-A and state-B and symlink are all the same. I touch the files to ensure that their modtime is changed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might have some implications on the direnv reload logic. Can you check if direnv reload and direnv allow behave as expected?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll check and get back to you.

Copy link
Author

@W1M0R W1M0R Jun 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They behave as expected. When I execute those commands, I expect the environment to reload, and in both cases it does. These results make sense, since direnv reload performs a touch that updates the modtime of the file and triggers the reload, and direnv allow performs a file write operation that also updates the modtime of the file.

One reason why the msys2 unit test needs to touch the files, is because both state-A and state-B have the same modtime (probably the time that the project was cloned), and the msys2 version of ln -fs does some kind of copy magic that makes exact copies of those files when creating the ./symlink file, keeping their modtime values in tact and equal. The unix version of ln probably updates the ./symlink modtime as expected.

The direnv export command (used by direnv_eval) probably does not touch the watched files, to avoid any weird side effects, but as indicated, both direnv reload and direnv allow do "touch" the watched files, so things work as expected on msys2.

I could update the test to perform a direnv allow instead of a touch if you prefer. A part of me thinks that if a symlink changes in such a way that it would cause another environment to be loaded (in this case only the value of STATE is changed but it just as well could have executed an rm -rf * statement), then it might be more correct to require that the user execute direnv allow first, before executing the export.

@W1M0R
Copy link
Author

W1M0R commented Jun 13, 2024

I'm happy with this PR now. This version of direnv works well for my projects on Git Bash.

@pjeby Would you be able to give this implementation a try on your system?

@W1M0R W1M0R marked this pull request as ready for review June 13, 2024 21:31
@pjeby
Copy link
Contributor

pjeby commented Jun 14, 2024

@pjeby Would you be able to give this implementation a try on your system?

It looks like this is msys specific, and I use Cygwin. So as long as the PR's changes don't affect Cygwin I should be good.

@W1M0R
Copy link
Author

W1M0R commented Jun 14, 2024

Thanks @pjeby. This PR should not activate on Cygwin environments.

From what I gather from these sources (and a few others), it looks like I am correctly distinguishing between Cygwin and MSYS2 environments using the MSYSTEM variable:

  1. https://cygwin.com/cygwin-ug-net/setup-env.html
  2. https://www.msys2.org/wiki/How-does-MSYS2-differ-from-Cygwin/

UPDATE: I installed Cygwin and verified that the MSYSTEM variable is not available there, so the MSYS2 functionality will not be triggered.

I also cloned direnv's master branch (i.e without this PR) in my Cygwin environment, but couldn't get the bash unit tests to pass, so I'm not too confident that direnv would work as expected in Cygwin. But as I understand it, you have your own fork overcoming those limitations.

@W1M0R W1M0R marked this pull request as draft June 14, 2024 07:00
@W1M0R W1M0R marked this pull request as ready for review June 14, 2024 14:03
@pjeby
Copy link
Contributor

pjeby commented Jun 14, 2024

Now that you've added all this cygpath usage, it might be useful to note that making direnv export do a cygpath -p on the PATH variable before output when running under Cygwin would make this also work for Cygwin.

Doesn't need to be in this PR or anything, but I thought it might be useful to know. I'm currently using a patched _direnv_hook function that does:

[[ "$PATH" != *'\'* ]] || export PATH=$(/usr/bin/cygpath -p "$PATH")

right after running the eval "$(direnv export bash)" bit.

So if direnv did the cygpath -p internally when running on Cygwin, it'd be compatible with Cygwin shells, at least to the extent my current setup is. (AFAIK Cygwin only mangles PATH, not command-line arguments or other environment variables.)

(Of course, this could also be done by changing the output of direnv hook, but then it'd need changing for every supported shell.)

@W1M0R
Copy link
Author

W1M0R commented Jun 14, 2024

Thanks for the tip! If this PR makes the cut, then I'll see if I can run with the momentum and get some basic Cygwin support into another PR.

My first implementation of this PR involved a similar approach where I resolved the PATH issue directly in the export, but this ended up working against the way direnv performs its environment diffing. This is not that big of an issue when you are only exporting the environment, but it does pose problems when you start using the shell hook, which is probably how most people use direnv anyway.

I'd probably investigate performing the cygpath call earlier in the stack for Cygwin, in a similar way I did for this PR.

internal/cmd/cmd_exec.go Outdated Show resolved Hide resolved
internal/cmd/look_path.go Outdated Show resolved Hide resolved
Copy link
Member

@zimbatm zimbatm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @W1M0R, thanks for the contribution.

Past the minor changes, the only blocker is that I need a Windows machine to test this. I will need your help to maintain some of that code. If you are ready to do this, we can proceed.

.gitattributes Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be enough to use * text=auto eol=lf?

if len(os.Getenv("MSYSTEM")) == 0 {
// This is Windows, but not an MSYS2 environment.
return m
}

if _, err := exec.LookPath("cygpath.exe"); err != nil {
// This is an MSYS2 environment...without cygpath. We require cygpath
// for some initial setup.
return m
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These requirements need to be added to the docs. Otherwise, it's hard for users to know whether direnv is working or not.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem. Do you want me to add entries to the installation.md file, the README.md file and the direnv.1.md file?

Just to be clear, this is the standard MSYS2 environment, so user's don't need to do anything to set this up. An msys2 environment would be broken if it didn't have the MSYSTEM variable set and cygpath.exe available in the path.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the code slightly to log an error as a hint to the user that something seems odd about their MSYS2 environment.

// toWinMixedPath uses cygpath.exe to convert the Unix input path to a Windows
// Mixed path.
func (m msys2) toWinMixedPath(input string) (string, error) {
return m.execCygpath("-m", input)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much logic is it to port over if we wanted all the logic embedded in direnv?

I'm asking because Windows is a bit slow in spawning new programs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To implement cygpath, you would at minimum need to be able to parse /etc/fstab, which controls the drive mapping prefix for both Cygwin and msys (unless msys has an API for this that you can call?). See also the source code for the bit that doesn't involve /etc/fstab.

(Oh, and in order to find/etc/fstab you'll probably need to run cygpath first, as there are many possible install locations for msys and cygwin; my msys is in a subdirectory of ~/scoop/apps/, for example.)

Copy link
Author

@W1M0R W1M0R Jun 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zimbatm I agree, the spawning part can be slow. This PR already does some effort to reduce the amount of calls to cygpath. This specific function, for example, is called only during construction of the msys2 object (which happens once during a single execution of direnv).

But you are right, it can be improved by implementing the relevant parts of cygpath itself, directly in Go. If we don't need a fully equivalent cygpath implementation, then we might get away with a heuristic based solution. Here is an example: since the current implementation only cares to know where / and /tmp are mapped to, we could potentially resolve those paths without using cygpath, by looking at the TMP, TMPDIR or TEMP environment variable for the /tmp value and maybe the MSYSTEM_PREFIX value (or something similar) to derive a value for the / path.

I don't want to optimize too early though. I'm hoping that user feedback might identify other situations where we might need to use cygpath (or not).

It may even be possible to pull in the actual c source code of cygpath and use it together with CGO for a fully equivalent solution, but this will complicate builds. Maybe one day someone will take a stab at implementing cygpath entirely in Go.

To answer your question more directly: For how this PR uses cygpath, I think we could get away with a very basic version of it implemented in Go using the heuristic method proposed above, but I'm not convinced that the gains are worth the effort, although I am happy to implement a version of it if you prefer to keep cygpath out of the loop.

It is also worth mentioning that the stdlib.sh already has a reference to cygpath (in the source_env function) and uses it when it is available. With that in mind, since it will already be called for every source_env invocation on msys2, the two extra calls made in this PR will not add that much of an overhead.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zimbatm I've made a slight optimization: cygpath is now only called once at direnv startup, specifically to resolve the PATH variable. Maybe we can leave the "implement cygpath logic in Go" for another PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might have some implications on the direnv reload logic. Can you check if direnv reload and direnv allow behave as expected?

@W1M0R W1M0R force-pushed the msys2-1 branch 2 times, most recently from 19c38cf to 5886bda Compare June 19, 2024 12:34
@W1M0R
Copy link
Author

W1M0R commented Jun 19, 2024

Hey @W1M0R, thanks for the contribution.

Past the minor changes, the only blocker is that I need a Windows machine to test this. I will need your help to maintain some of that code. If you are ready to do this, we can proceed.

Thanks @zimbatm. Windows is my primary development environment (has been for a long time and I don't expect that to change anytime soon), so I'd be happy to maintain the relevant portions.

@W1M0R W1M0R force-pushed the msys2-1 branch 5 times, most recently from fe0b7dc to f49c709 Compare June 20, 2024 12:01
@W1M0R W1M0R marked this pull request as draft June 21, 2024 06:21
@W1M0R W1M0R changed the title feat: msys2 support (git bash) feat: cygwin (bash) and msys2 (git bash) support Jun 22, 2024
@W1M0R W1M0R requested a review from zimbatm June 22, 2024 20:37
@W1M0R W1M0R requested a review from pjeby June 22, 2024 20:37
@W1M0R W1M0R marked this pull request as ready for review June 24, 2024 06:09
@W1M0R
Copy link
Author

W1M0R commented Jun 24, 2024

@pjeby This PR now has support for Cygwin. It turns out there were a few additional environment variables that needed some special handling for direnv to function properly. Will you be able to review the PR to see if there is anything off about the Cygwin support?

@zimbatm I realized there was enough commonality between Cygwin and MSYS2, so I expanded this PR to cover both environments.

@W1M0R W1M0R marked this pull request as draft June 24, 2024 20:51
@W1M0R W1M0R marked this pull request as ready for review June 24, 2024 21:37
@pjeby
Copy link
Contributor

pjeby commented Jun 25, 2024

So I'm looking over the code, and maybe I'm reading this wrong due to limited context in-diff, but it looks to me at first glance like there's going to be multiple cygpath calls on every direnv execution, which is likely to visibly slow down prompts, loading new .bashrc, etc. From the code, it looks like it's there to make DIRENV_ stuff show up as unix paths in stdlib.sh, but I'm not sure why? My setup works fine without that. The only variable I do any conversion on is PATH -- and only on detecting a windows-format PATH coming back to Cygwin.

Note that Cygwin commands mostly accept Windows paths, they just mostly don't manipulate them well. And I think that expecting Cygwin and MSYS users to take the time to write their .envrc or scripts to use cygpath when managing those vars is a fine tradeoff, especially since it then only happens when an .envrc is being run. (Or alternately, only running cygpath when calling an .envrc, since that happens much less often.)

@W1M0R
Copy link
Author

W1M0R commented Jun 25, 2024

So I'm looking over the code, and maybe I'm reading this wrong due to limited context in-diff, but it looks to me at first glance like there's going to be multiple cygpath calls on every direnv execution, which is likely to visibly slow down prompts, loading new .bashrc, etc. From the code, it looks like it's there to make DIRENV_ stuff show up as unix paths in stdlib.sh, but I'm not sure why? My setup works fine without that. The only variable I do any conversion on is PATH -- and only on detecting a windows-format PATH coming back to Cygwin.

@pjeby Thanks for going through the code. Yes, every invocation of direnv on Cygwin will invoke a few cygpath calls, for converting key environment variables that come in as Unix paths, to Windows paths. Since direnv is a native Windows executable, the Go runtime expects Windows-style paths.

An easy way to see the affect of this, is during the LoadConfig phase of direnv. It looks like direnv is succeeding because LoadConfig is designed to "fail" silently. The unit tests helped in identifying this issue, but I got the clearest view of the problem by enabling debug logging, using a DIRENV_DEBUG=1 direnv export bash. The debug messages show that, without my fix, direnv fails to load the config correctly. With my fixes, all of those silent failures go away. Granted, one might still be able to fine tune the process and do some more optimizations.

Another place where this has an effect is in the diffing algorithm (when watched files are being checked for modtime changes). This becomes clear with you use the direnv hook, not so much if you are manually using direnv export. If Go can't find the files being watched (i.e. if the watched Unix paths are not converted to Windows paths before checking their mod stats), then the diffing algorithm doesn't work.

Note that Cygwin commands mostly accept Windows paths, they just mostly don't manipulate them well. And I think that expecting Cygwin and MSYS users to take the time to write their .envrc or scripts to use cygpath when managing those vars is a fine tradeoff, especially since it then only happens when an .envrc is being run. (Or alternately, only running cygpath when calling an .envrc, since that happens much less often.)

If by this you mean that users should manage the DIRENV_ stuff (using cygpath) themselves, then I'd disagree on that point: user's shouldn't need to fix (or know) about those direnv internals. For path variables in general, this PR does not call cygpath on your behalf in Cygwin (the stdlib.sh unit tests show they are no-ops), it does however call cygpath for MSYS2 systems when manipulating the PATH using the direnv stdlib. Without this feature, I had to modify my existing .envrc files to be MSYS2 aware, but with that feature, I could just use my existing .envrc files - this is a nice user experience.

The cygpath calls inside the stdlib will only be called when the direnv diffing algorithm determines that a .envrc change was made, so I'm not too concerned about the performance impact of that (since that file and its watches don't change that regularly). Although there is probably more room for optimization there (i.e. I see that the stdlib invokes direnv quite a few times for something like direnv exec). In my testing, I didn't notice a painful slowdown of direnv.

If the direnv hook is installed, then every rendering of the prompt will eval a direnv export, that is, direnv will load the environment, check the watches and then if needed diff the environment at every prompt render. This is probably the main bottleneck, and if cygpath calls can be limited here, then that would be first prize. I'll go through the code again and see where I can shave off some more cygpath calls.

If you have more time available, I'd appreciate it if you can clone this PR and test it on some of your direnv projects.

@pjeby
Copy link
Contributor

pjeby commented Jun 25, 2024

I'm a bit confused by some of your statements, see e.g.:

$ DIRENV_DEBUG=1 direnv export bash
direnv: export:04:53:06 cmd_export.go:22: start
direnv: export:04:53:06 cmd_export.go:35: loading RCs
direnv: export:04:53:06 cmd_export.go:43: updating RC
direnv: export:update:04:53:06 cmd_export.go:46: Determining action:
direnv: export:update:04:53:06 cmd_export.go:47: toLoad: "C:\\cygwin\\home\\pje\\obp\\.envrc"
direnv: export:update:04:53:06 cmd_export.go:48: loadedRC: &cmd.RC{path:"C:\\cygwin\\home\\pje\\obp\\.envrc", allowPath:"", times:cmd.FileTimes{list:(*[]cmd.FileTime)(0xc0001a0000)}, config:(*cmd.Config)(0xc0000fe000)}
direnv: export:update:04:53:06 file_times.go:192: getLatestStat: C:\cygwin\home\pje\obp\.envrc: Lstat: 1713897990, Stat: 1713897990 -> preferring Stat
direnv: export:update:04:53:06 file_times.go:140: Check: C:\cygwin\home\pje\obp\.envrc: up to date
direnv: export:update:04:53:06 file_times.go:192: getLatestStat: C:\cygwin\home\pje\.local\share\direnv\allow\eac2b675c00aba9923aa30237edfa3b70202f1d78df5398feba517f9b2b5fa62: Lstat: 1713897994, Stat: 1713897994 -> preferring Stat
direnv: export:update:04:53:06 file_times.go:140: Check: C:\cygwin\home\pje\.local\share\direnv\allow\eac2b675c00aba9923aa30237edfa3b70202f1d78df5398feba517f9b2b5fa62: up to date
direnv: export:update:04:53:06 file_times.go:192: getLatestStat: C:\home\pje\obp\.envrc: Lstat: 1713897990, Stat: 1713897990 -> preferring Stat
direnv: export:update:04:53:06 file_times.go:140: Check: C:\home\pje\obp\.envrc: up to date
direnv: export:update:04:53:06 cmd_export.go:60: no update needed

$ set | grep DIRENV
DIRENV_DIFF=eJzsV...,==
DIRENV_DIR='-C:\cygwin\home\pje\obp'
DIRENV_FILE='C:\cygwin\home\pje\obp\.envrc'
DIRENV_WATCHES=eJy...k6

This looks like it's working just fine on Cygwin with Windows paths, without needing any cygpath adjustments? Also, it appears that cygwin's dirname and basename commands work fine on Windows paths, at least for my quick checks. The above are all using direnv 2.32.3, installed via scoop on Windows, under Cygwin bash. (i.e., not using this PR.)

To be fair, I'm not making use of the stdlib on my PC, only basic exports, so if there's some issue there I'm not aware. I'm just not clear on how you're getting unix paths for DIRENV_ stuff in the first place.

Are you sure you're not running in an MSYS environment rather than a Cygwin one? MSYS is extremely aggressive about converting filename arguments and environment variables, where Cygwin is more laid back and only converts PATH by default.

@W1M0R
Copy link
Author

W1M0R commented Jun 25, 2024

Thank you for checking! Your feedback definitely indicates that I'm missing something. Let me give it another go.

@W1M0R W1M0R marked this pull request as draft June 25, 2024 11:21
@W1M0R
Copy link
Author

W1M0R commented Jun 25, 2024

I updated the code with more checks to prevent unnecessary calls to cygpath.

The unit tests helped in identifying this issue, but I got the clearest view of the problem by enabling debug logging, using a DIRENV_DEBUG=1 direnv export bash.

I was wrong about that statement, it was DIRENV_DEBUG=1 make test that helped me identify where direnv was tripping up because of unexpected Unix paths provided to the Go runtime. The unit test scripts set up certain variables (that would not be present in your example), like DIRENV_CONFIG, DIRENV_BASH, XDG_DATA_HOME and XDG_CONFIG_HOME, all in Unix format. That breaks direnv, so I have to temporarily convert those variables (if they are present) using cygpath.

With the added checks, cygpath usage is significantly reduced, practically down to the PATH environment variable. Thanks for highlighting the issue.

@pjeby Do you see any other red flags?

@W1M0R W1M0R changed the title feat: cygwin (bash) and msys2 (git bash) support feat: cygwin and msys2 support (bash) Jun 25, 2024
@W1M0R W1M0R marked this pull request as ready for review June 25, 2024 22:31
@pjeby
Copy link
Contributor

pjeby commented Jun 26, 2024

I'm mostly worried that you're changing a lot of existing direnv-on-cygwin semantics in ways that aren't sufficiently conservative. For example, you've got new logic around symlinks that seem they might be based on the assumption that Cygwin doesn't use real symlinks. But my setup uses CYGWIN=winsymlinks:nativestrict, meaning that it does, in fact use real Windows symlinks.

You've also got a lot of code that appears to check for cygpath as an indication of whether you're running on Cygwin/MSYS, but that's really not the way to go about it in stdlib.sh. OSTYPE=cygwin or OSTYPE=msys are how you tell if a bash script is being run in the corresponding OS.

So at this point I'm worried that while your tests work in your environment, that even if they were made to work on mine, they don't necessarily match other users' environments, and that I may not even be spotting everything that's an issue.

TBH, I ran into similar issues when I was working on this, which is why we ended up splitting the objective into more bite-sized chunks and got some of those in.

What I would strongly suggest at this point is first doing a PR that only addresses inbound translation of those variables that direnv directly uses for its own operations (locating configuration/etc), without affecting .envrc handling to start with.

Then, once some experience is gained with the limited use of cygpath, and there is feedback available from actual users, the next PR could address dealing with back-translation of specific outputs to be emitted from direnv export -- and only from output emission, not input translation.

I'm very concerned that the current approach is conflating these two (very different!) requirements in a way that's likely to lead to lots of bugs and corner cases. direnv should not be translating input vars that aren't for its own immediate use, as that runs the risk of making the environment mangling situation worse, not better. Both msys and cygwin have their own translation layers in each direction with Windows, and injecting another such layer (that AFAICT might be running twice?) doesn't seem like a great plan. For output, this should be fine, since you are dealing with the variables being sent back and reverting the final "unix to windows" translation (if applicable).

It's like there's currently this A -> B -> A -> B -> OUT process. I'm worried we're going to get A -> B -> B' -> A'' -> B'' -> OUT (or worse), instead of A -> B -> IN(direnv only) + -> A -> B ->OUT -> OUT'. Does that make sense?

Sorry to be so negative, but I'm trying to do a conservative reading because the diffs don't show a lot of context and it's been years since I last dove deeply enough into direnv's internals to have the full picture in my head. But from my recollection of my previous attempt the only clean way I could see to make this work properly on MSYS at least was to bypass its translation altogether (e.g. by having the .envrc runner output its environment vars to a pipe, and/or having the direnv hook function pipe exports into direnv export directly.) Cygwin OTOH is in a "it works OKish now, please don't break it" state, provided PATH is fixed in the output of direnv export. (Hence my inclination to only do output translation at the final phase of direnv export.)

I would actually almost suggest that it's cleaner to add the cygpath calls to the output of direnv hook because there you can directly check OSTYPE, except that's a bash-only variable. But provided the .envrc wrote its vars as output to a pipe rather than passing them via the environment, that would be the other good place to do it. Better still: running the .envrc handler from the hook would bypass environment translation altogether, and piping the result into direnv for formatting would remove another translation layer, so there'd never be any translation at all.

But in any case, having a PR that only addresses input conversion of direnv-use-only paths that doesn't affect the actual process environment passed down anywhere else would be a ton easier to read and verify safety of. And then doing the other parts as progressive enhancements would be more reviewable as well.

@W1M0R
Copy link
Author

W1M0R commented Jun 26, 2024

I'm mostly worried that you're changing a lot of existing direnv-on-cygwin semantics in ways that aren't sufficiently conservative. For example, you've got new logic around symlinks that seem they might be based on the assumption that Cygwin doesn't use real symlinks. But my setup uses CYGWIN=winsymlinks:nativestrict, meaning that it does, in fact use real Windows symlinks.

The current direnv-on-cygwin experience is very limited and should not be the standard to compare with. The unit tests fail and direnv exec is not supported. With this PR, unit tests succeeds and direnv exec works on Cygwin. In terms of the symlinks, the logic is very minimal (one or two lines) and only used in the unit test to deal with stock Cygwin environments (i.e. a CI-friendly environment).

You've also got a lot of code that appears to check for cygpath as an indication of whether you're running on Cygwin/MSYS, but that's really not the way to go about it in stdlib.sh. OSTYPE=cygwin or OSTYPE=msys are how you tell if a bash script is being run in the corresponding OS.

It's not that much code, maybe you were looking at an older commit. I also don't check for Cygwin/MSYS2 anymore, since that doesn't matter. What does matter is whether cygpath is available and whether the path you want to convert looks like a Windows path or not.

To clarify, I do distinguish between Cygwin and MSYS2 inside the unit tests, to show how these environments deal with things differently. However, in the actual stdlib and Go runtime, I don't distinguish between these environments on a logic/functionality level, only for debug logging purposes.

So at this point I'm worried that while your tests work in your environment, that even if they were made to work on mine, they don't necessarily match other users' environments, and that I may not even be spotting everything that's an issue.

I'd love to know if the unit tests pass in your environment.

TBH, I ran into similar issues when I was working on this, which is why we ended up splitting the objective into more bite-sized chunks and got some of those in.

I've also wondered how I can simplify this PR and break it into chunks. My goal is to have the unit tests pass on MSYS2/Cygwin, so any bite-sized contribution should satisfy that. If the chunk doesn't enable the tests to pass, then I don't see the value in the contribution. So currently, I think this is the smallest chunk that I could implement to satisfy that goal. I've already removed some stdlib stuff that I added originally, but found is not needed.

Another way I could try and simplify is by focusing on MSYS2 as a first PR (as per my initial attempt) and then wait for the feedback cycle and then look into Cygwin. With this approach I would have to re-introduce the code that distinguishes between Cygwin and MSYS2. I'm not too fond of this approach since I'm happy with the results as is, but I'm not against it.

What I would strongly suggest at this point is first doing a PR that only addresses inbound translation of those variables that direnv directly uses for its own operations (locating configuration/etc), without affecting .envrc handling to start with.

This alone would not make the unit tests pass.

Then, once some experience is gained with the limited use of cygpath, and there is feedback available from actual users, the next PR could address dealing with back-translation of specific outputs to be emitted from direnv export -- and only from output emission, not input translation.

This PR only exports PATH to the environment.

I'm very concerned that the current approach is conflating these two (very different!) requirements in a way that's likely to lead to lots of bugs and corner cases. direnv should not be translating input vars that aren't for its own immediate use, as that runs the risk of making the environment mangling situation worse, not better. Both msys and cygwin have their own translation layers in each direction with Windows, and injecting another such layer (that AFAICT might be running twice?) doesn't seem like a great plan. For output, this should be fine, since you are dealing with the variables being sent back and reverting the final "unix to windows" translation (if applicable).

It's like there's currently this A -> B -> A -> B -> OUT process. I'm worried we're going to get A -> B -> B' -> A'' -> B'' -> OUT (or worse), instead of A -> B -> IN(direnv only) + -> A -> B ->OUT -> OUT'. Does that make sense?

In this PR, direnv only translates input vars when they are for its own immediate use, and only for the duration and purpose of the specific direnv command. Except for PATH, which is always translated and exported.

Sorry to be so negative, but I'm trying to do a conservative reading because the diffs don't show a lot of context and it's been years since I last dove deeply enough into direnv's internals to have the full picture in my head. But from my recollection of my previous attempt the only clean way I could see to make this work properly on MSYS at least was to bypass its translation altogether (e.g. by having the .envrc runner output its environment vars to a pipe, and/or having the direnv hook function pipe exports into direnv export directly.) Cygwin OTOH is in a "it works OKish now, please don't break it" state, provided PATH is fixed in the output of direnv export. (Hence my inclination to only do output translation at the final phase of direnv export.)

This PR would transform the Cygwin experience from OKish to "as advertised".

I would actually almost suggest that it's cleaner to add the cygpath calls to the output of direnv hook because there you can directly check OSTYPE, except that's a bash-only variable. But provided the .envrc wrote its vars as output to a pipe rather than passing them via the environment, that would be the other good place to do it. Better still: running the .envrc handler from the hook would bypass environment translation altogether, and piping the result into direnv for formatting would remove another translation layer, so there'd never be any translation at all.

I considered OSTYPE, but found discussions online that suggested that it is not as a reliable method as one thinks. For one thing, OSTYPE worked in the stdlib.sh (at least for Cygwin), but for some reason was empty in the Go runtime. Regardless, as indicated earlier, it is no longer necessary to know whether we are using cygpath in an MSYS2 environment or in a Cygwin environment, it is good enough to know if cygpath is available and if the path in question is a Windows path or a Unix path.

Adding cygpath calls to the output of direnv hook is not enough, since it does not cater for how direnv deals with watched files and environment diffs. In my mind, this PR does the right thing.

But in any case, having a PR that only addresses input conversion of direnv-use-only paths that doesn't affect the actual process environment passed down anywhere else would be a ton easier to read and verify safety of. And then doing the other parts as progressive enhancements would be more reviewable as well.

This would not allow the unit tests to pass.

@pjeby
Copy link
Contributor

pjeby commented Jun 26, 2024

I'm not saying that it's not a good idea to do more, ever: I'm saying that this PR is currently hard for me to review and validate without re-reading most of the source of direnv, so breaking the PR into smaller pieces/phases would be immensely helpful in me being able to properly review it.

As it stands, the only level of review I can give right now is to look at stuff and go, "this looks concerning to me", because I don't have enough contiguous time available to me to load the entire program structure in my head - I'm able to devote minutes to it right now, not hours. If you want me to do an actual review and/or build this patch, it might be several weeks before you get a response. So I've been doing this by skimming and cross-referencing the PR in the interest of getting you faster/more feedback than, "I'll let you know in a few weeks."

If you'd rather wait for more detailed feedback or go without it from me, that's fine; I just assumed you'd rather have a low-fi review now than a more thorough one at some unspecified future date. Conversely, if you want a higher fidelity review from me now, that's only going to happen if the PR has a lot smaller scope, hence my suggestion to break it up. (As @zimbatm did with my last attempt at this, for I believe similar reasons.)

@W1M0R
Copy link
Author

W1M0R commented Jun 27, 2024

I'm not saying that it's not a good idea to do more, ever: I'm saying that this PR is currently hard for me to review and validate without re-reading most of the source of direnv, so breaking the PR into smaller pieces/phases would be immensely helpful in me being able to properly review it.

Understood. I was too excited. I think I can submit individual PRs that would introduce things gradually.

As it stands, the only level of review I can give right now is to look at stuff and go, "this looks concerning to me", because I don't have enough contiguous time available to me to load the entire program structure in my head - I'm able to devote minutes to it right now, not hours. If you want me to do an actual review and/or build this patch, it might be several weeks before you get a response. So I've been doing this by skimming and cross-referencing the PR in the interest of getting you faster/more feedback than, "I'll let you know in a few weeks."

And I appreciate your early feedback, it has guided me in several ways already.

If you'd rather wait for more detailed feedback or go without it from me, that's fine; I just assumed you'd rather have a low-fi review now than a more thorough one at some unspecified future date. Conversely, if you want a higher fidelity review from me now, that's only going to happen if the PR has a lot smaller scope, hence my suggestion to break it up. (As @zimbatm did with my last attempt at this, for I believe similar reasons.)

I'm not in a rush. If time is the issue, then take your time, I'm going away for a bit of a holiday anyway and won't be back for a week or so. But if the PR is too big, even if you have enough time, then let me know, and I'll do incremental PRs for things like the gitattributes file and other small stuff.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants