Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix joinpath on Windows and improve joinpath on linux #33477

Merged
merged 4 commits into from
Oct 18, 2019

Conversation

musm
Copy link
Contributor

@musm musm commented Oct 5, 2019

Fix #33475 and #33455

@musm musm force-pushed the joinpath branch 2 times, most recently from f5d9e9d to 5142031 Compare October 6, 2019 03:00
base/path.jl Show resolved Hide resolved
base/path.jl Outdated Show resolved Hide resolved
base/path.jl Show resolved Hide resolved
base/path.jl Outdated
@@ -246,8 +246,70 @@ function splitpath(p::String)
return out
end

joinpath(path::AbstractString) = path
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might as well make the function type stable while you're at it and always return String.

@StefanKarpinski
Copy link
Sponsor Member

If you're concerned about efficiency and not creating lots of intermediary strings—which you seem to be—then you may want to use the sprint function to construct the output strings incrementally via printing to an IOBuffer. Of course, I don't think efficiency matters much for operations on file paths, but that would be the more efficient approach here.

base/path.jl Outdated
continue
end
# same drive in different case
result_drive = p_drive
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to normalize the drive name capitalization?

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are UNC drives also case insensitive?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to normalize the drive name capitalization?

Don't think so. We should just preserve the input capitalization.

Are UNC drives also case insensitive?

In this case it doesn't matter since if the user specifies a UNC path it is absolute.

@musm
Copy link
Contributor Author

musm commented Oct 9, 2019

@StefanKarpinski: Thank you very much for the detailed review. I went ahead and also removed the troublesome pathsep function, which apparently was only used by joinpath. As mentioned in #33455, this function behaves incorrectly due to an odd compiler(?) bug. (In particular, it gives the wrong answer when used in joinpath, but if you recompile the joinpath function it then gives the correct output). It was my mistake to wrongly assume this bug was due to the recursive nature of the function, which is simply not true and is my fault.

The current implementation of joinpath is correct and irrelevant to the issue in #33455, but there is still a bug due the issue in #33475, which would require using lowercase, in the current implementation.

Hence, it is probably possible to keep the current implementation and check drive letter lowercase issues on Windows with an additional if-clause. It's also possible, if there is a preference, to use the implementation in this PR. Both options I am comfortable with. The advantage of the current implementation is speed, but as you mention it probably doesn't matter for such a non-critical path API function. The advantage of the existing joinpath is conciseness and elegance (I haven't fully investigated what it would take to fix the drive casing issue but as mentioned I don't think too much effort is needed to fix #33475)

test/path.jl Outdated
@test joinpath(S("foo"), S(homedir())) == homedir()
@test joinpath(S(abspath("foo")), S(homedir())) == homedir()

if Sys.iswindows()
@test joinpath(S("foo"),S("bar:baz")) == "bar:baz"
@test joinpath(S("C:"),S("foo"),S("D:"),S("bar")) == "D:bar"
@test joinpath(S("C:"),S("foo"),S("D:bar"),S("baz")) == "D:bar$(sep)baz"

# relative folders and case-insensitive drive letters
@test joinpath(S("C:\\a\\b"), S("c:c\\e")) == "c:\\a\\b\\c\\e"
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the earlier or later capitalization of the drive name be kept?

Copy link
Contributor Author

@musm musm Oct 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The preference is to keep the later capitalization. This follows the general rule to prefer p2 in joinpath(p1,p2), or if p2 is absolute to override p1 completely.

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does Python do here? I would think that in the case where a is absolute and b is relative, if they are joinable (i.e. the drives are compatible) one would want joinpath(a, b) to start with a and end with b.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does Python do here? I

Python keeps the later capitalization. But it doesn't really matter, since it is case-insentive.

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's interesting. I was thinking that this might affect the definition of relpath but that doesn't seem to depend on joinpath at all in its logic, although I suspect it probably does have a bug in the case of drive letters with case mismatches.

Copy link
Contributor Author

@musm musm Oct 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, honestly I haven't had a close look at relpath to tell how this affects relpath. My general thinking is that it would be a good idea to audit all of our FileSystem API, to check how we do with UNC paths and relative paths with drive specifications. We also don't support extended paths, which is another good extension in the future.

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@test joinpath(S("C:\\a\\b"), S("c:c\\e")) == "c:\\a\\b\\c\\e"
@test joinpath(S("C:\\a\\b"), S("c:c\\e")) == "C:\\a\\b\\c\\e"

And change this test.

@musm
Copy link
Contributor Author

musm commented Oct 10, 2019

@StefanKarpinski It seems that this function is also a candidate for early conversion. In other words I should define

    joinpath(path::AbstractString, paths::AbstractString...) = joinpath(String(path), String.(paths)...)

for performance reasons.

@StefanKarpinski
Copy link
Sponsor Member

Yes, this would be a good case for early conversion.

@musm
Copy link
Contributor Author

musm commented Oct 10, 2019

Great, and I also confirmed locally that this early conversion does help a lot with performance and reducing allocations.

@StefanKarpinski
Copy link
Sponsor Member

Excellent. It should also make things easier on the compiler which needs to specialize less code for "weird string types".

@StefanKarpinski
Copy link
Sponsor Member

This seems good to me, I just need to look into #33455 (comment) and understand what's going on there. If I forget, please ping me to remind me about this because we definitely want this.

@StefanKarpinski StefanKarpinski self-assigned this Oct 11, 2019
@musm
Copy link
Contributor Author

musm commented Oct 11, 2019

NP. And thanks for taking a look at this.

Yes the current bug in joinpath needs closer inspection, because the compiler is messing up the pathsep function.

@StefanKarpinski
Copy link
Sponsor Member

Do you think that if we weren't hitting that bug then the current joinpath implementation would work? If so, that alleviates my concern about the recursive identity and makes this PR a combination of compiler bug workaround and performance optimization. Is there any other functional behavior change that this PR makes?

@musm
Copy link
Contributor Author

musm commented Oct 11, 2019

If we weren't hitting that bug then the current implementation works for #33455 but not #33475 (it would require more tweaks to get it to fix that later bug)

So this PR works around the compiler bug by performance optimization (on all platforms) and in the Windows case the implementation in this PR also fixes #33475 by testing another lettercasing of drive volume names.

@StefanKarpinski
Copy link
Sponsor Member

Ok, thanks for the explanation and reassurances. That makes me feel pretty good about this. I will leave it open for now while I look into the compiler issue.

@musm
Copy link
Contributor Author

musm commented Oct 17, 2019

Maybe the compiler got smarter, or more likely my benchmarks might have been sloppy done, but now I see no performance difference between the AbstractString implementation and the one that does eager conversion to String.

@StefanKarpinski
Copy link
Sponsor Member

The only thing I would change at this point is keeping the earlier drive name when they have a case mismatch, since I would expect joinpath("C:\\foo", "c:baz") to produce C:\\foo\\baz.

base/path.jl Outdated
result_path = p_path
continue
end
result_drive = p_drive # same drive in different case
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
result_drive = p_drive # same drive in different case

Just delete this line, I think.

test/path.jl Outdated
@test joinpath(S("foo"), S(homedir())) == homedir()
@test joinpath(S(abspath("foo")), S(homedir())) == homedir()

if Sys.iswindows()
@test joinpath(S("foo"),S("bar:baz")) == "bar:baz"
@test joinpath(S("C:"),S("foo"),S("D:"),S("bar")) == "D:bar"
@test joinpath(S("C:"),S("foo"),S("D:bar"),S("baz")) == "D:bar$(sep)baz"

# relative folders and case-insensitive drive letters
@test joinpath(S("C:\\a\\b"), S("c:c\\e")) == "c:\\a\\b\\c\\e"
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@test joinpath(S("C:\\a\\b"), S("c:c\\e")) == "c:\\a\\b\\c\\e"
@test joinpath(S("C:\\a\\b"), S("c:c\\e")) == "C:\\a\\b\\c\\e"

And change this test.

@musm
Copy link
Contributor Author

musm commented Oct 18, 2019

@StefanKarpinski I see where you are coming from. At the same time, I think the current behavior is preferred, since in joinpath(a,b), the argument b has "preference". Thus I'm inclined to keep the current behavior if that's ok.

@StefanKarpinski
Copy link
Sponsor Member

I'm not sure I agree that b has preference. If has preference only if b is absolute, in which case, yes, we should take the drive value from b. If b is not absolute then a is the start of the result and b is the rest, in which case it makes much more sense to keep the drive from a.

@musm
Copy link
Contributor Author

musm commented Oct 18, 2019

Updated

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

joinpath bug on Windows for relative folders
2 participants