Fix is_url from splitting the scheme incorrectly when using PEP 440's direct references #6203

retpolanne · 2019-01-25T13:57:07Z

Hello,

This PR fixes #6202 and includes tests for this issue.
When installing a .whl from a remote URL following this example,
pip @ https:///somewhere/pip-1.3.1-py33-none-any.whl

is_url was splitting the scheme incorrectly and it wouldn't recognize the line as a URL. Pip would try (and fail) to reference a local .whl file instead.

retpolanne · 2019-01-25T14:04:54Z

Sorry for the newbie fails on the linting :(

retpolanne · 2019-02-13T12:08:28Z

@uranusjr hi, could you please take a look at this PR?

uranusjr

I feel this approach is backwards: instead of treating @ in a URL-like string as a special case, the parser function should be able to exclude that case before the is_url check is even done.

I need to think about this more in detail to figure out what the right approach is, but this is probably not it.

retpolanne · 2019-02-14T14:47:40Z

@uranusjr would it be a better idea to call split_scheme_from_url before calling is_url?

Or maybe use a regex to strip an URL from the line? (So you won't treat just the @)

cjerdonek · 2019-02-14T14:48:53Z

src/pip/_internal/download.py

    return scheme in ['http', 'https', 'file', 'ftp'] + vcs.all_schemes


+def split_scheme_from_url(url):


I feel like generic parsing functions like this should go in misc.py with the other URL parsing functions. (Incidentally, I also think that path_to_url() and friends shouldn't be in download.py either.)

Looking at download.py, these URL related functions should stay together, either at misc.py or at download.py. But misc.py looks really polluted to me. Maybe creating an utils package that contains a file for URL related functions would be a better idea?

Yes, probably. :) But I don’t want to sidetrack this PR further. For functions other than new functions you’re adding here, it would need to be done as a separate PR. I’m also not sure what type of function you’ll wind up needing after your conversation with @uranusjr. (I haven’t thought about it myself.)

Great, I would really like to tackle this up :)
I'll add this function to the misc.py then, if I still have to use it.

uranusjr · 2019-02-14T14:55:30Z

I’m thinking maybe the function should be reorganised somehow. Instead of checking for URL-like, path-like, and finally as a name, it should check for name first (using PEP 508’s definition; maybe packaging.requirement would be useful?), and fallback to look for a URL or path afterwards.

retpolanne · 2019-02-14T17:20:48Z

@uranusjr do you have any idea on how to do it this way? I could only think about doing it the other way around, by elimination (e.g. if something is not a file or a url, it is probably a name).

uranusjr · 2019-02-15T01:22:41Z

A formal syntax definition is included in the PEP 508 document, and (I believe) implemented by _vendor.packaging.requirement.Requirement. If that’s the case, you can

Split the markers (same).
Try to parse as a name, and catch the exception if that fails.
Try to parse as a URL, and then path (same).

I’m not sure if that would work, but it could be worth a try.

retpolanne · 2019-02-15T02:08:33Z

@uranusjr thanks for the explanation. I'll try to understand this class more in-depth. I see that there are many characters and regex that are probably used for parsing here, e.g.

pip/src/pip/_vendor/packaging/requirements.py

Line 41 in b6a2be0

URI = Regex(r"[^ ]+")("url")

but I don't understand how pyparse works. Maybe the bug that I reported can be fixed here instead? (by adding proper validation)

uranusjr · 2019-02-15T04:03:04Z

I don’t believe that matters, since the rule is only used as part of the name@url syntax. I tried it out a bit:

>>> Requirement('foo@https://[email protected]')
<Requirement('foo@ https://[email protected]')>
>>> Requirement('https://[email protected]')
Traceback (most recent call last):
[snipped]
pip._vendor.packaging.requirements.InvalidRequirement: Parse error at "':https://user@'": Expected stringEnd

So I think you can do something like

try:
    Requirement(name)
except InvalidRequirement:
    pass    # Maybe a nameless URL or a path
else:
    return ...   # Create InstallRequirement from name

if is_url(name):
    return ...   # Create InstallRequirement from URL

return ...  # Create InstallRequirement from path

uranusjr · 2019-02-15T06:42:47Z

Err I read the parser code in whole, and it’s… a mess 😭

Let’s start over. So the code currently parses like this:

Does this look like a URL?
a. Yes. Build a Link and go to 3.
b. No. Go to 2.
Does this look like a path (contains a path separator or starts with .)?
a. Yes. Build a Link and go to 3.
b. No. Treat it as a requirement string and go to 4.
Parse the Link to get a package name, and go to 4.
Build a requirement out of the information gathered.

The problem now is that PEP 440 URL reqs should go 1b-2b-4, but currently falls into 1b-2a-3-4. So we need to find a distinctive characteristic between a path and a PEP 440 URL req (the name req variant poses no problems), and fix the condition in 2.

URL_REQ = NAME "@" SCHEME ":" URI

# According to RFC 3986.
SCHEME = ALPHA *( ALPHA | DIGIT | "+" | "-" | "." )

# According to PEP 508.
NAME = LETTER_OR_DIGIT IDENTIFIER_END
IDENTIFIER_END = LETTER_OR_DIGIT | (('-' | '_' | '.' )* LETTER_OR_DIGIT

# I can't find the definition, but according to common sense?
LETTER_OR_DIGIT = ALPHA | DIGIT

We can conclude: A URL req must contain at least one @, and the part before the first must not contain a path separator, and must not start with ..

Now the fix becomes clear. The condition near line 235 should be modified to something like this:

def _looks_like_path(name):
    return (
        os.path.sep in name or
        (os.path.altsep is not None and os.path.altsep in name) or
        name.startswith('.')
    )

if is_url(name):
    link = Link(name)
else:
    ...
    elif is_archive_file(p):
        if os.path.isfile(p):
            link = Link(path_to_url(p))
        else:
            url_req_parts = p.split('@', 1)
            if not _looks_like_path(url_req_parts[-1]):
                logger.warning(...)

I know, this change make the code even more messy than before, but this is the best I can come up with without taking the whole thing apart 😞

retpolanne · 2019-02-17T23:06:07Z

I played with Requirement a little bit and couldn't find a way to use this

URL_REQ = NAME "@" SCHEME ":" URI

It looks like it parses named requirements here, which is what is expected
NAMED_REQUIREMENT = NAME + Optional(EXTRAS) + (URL_AND_MARKER | VERSION_AND_MARKER)

But when using the URL_REQ line, I kind of break it even more 😞 . Hopefully, it looks like unnamed requirements are passed as names, and not URLs.

I played a little bit with some validations:

(Pdb) URL_TEST = Optional(AT) + URI
(Pdb) URL_TEST.parseString('https://google.com')
(['https://google.com'], {'url': ['https://google.com']})
(Pdb) URL_TEST.parseString('google @ https://google.com')
(['google'], {'url': ['google']})

uranusjr · 2019-02-18T03:10:45Z

It occurs to me just now that we need another test case for URLs with authentication.

pip install https://user:[email protected]

src/pip/_internal/req/constructors.py

cjerdonek

Some comments.

src/pip/_internal/req/constructors.py

cjerdonek

A couple more quick comments.

src/pip/_internal/req/constructors.py

cjerdonek · 2019-02-23T22:51:19Z

src/pip/_internal/req/constructors.py

+            "Directory %r is not installable. Neither 'setup.py' "
+            "nor 'pyproject.toml' found." % name
+        )
+    if is_archive_file(path):


You can use the "early return" pattern here again to reduce indentation by doing if not is_archive_file(path): and then returning None. Then the rest doesn't need to be indented.

src/pip/_internal/req/constructors.py

cjerdonek · 2019-02-24T05:37:18Z

tests/unit/test_req.py

+
+@patch('pip._internal.req.req_install.os.path.isdir')
+@patch('pip._internal.req.req_install.os.path.isfile')
+def test_get_path_to_archive_pep440_url(isdir_mock, isfile_mock):


Great to see you start writing these tests! A couple comments:

First, it's a helpful convention if when testing a function or method named my_method, the test function starts with the string test_my_method. That way it makes it easy to locate all the tests of a given function. So in this case, all of these should start with test_get_path_to_url_... (you don't need to include the leading underscore). Also, if you have more than one test function for a certain function, you can add a suffix describing the special case, like test_get_path_to_url__archive_pep440_url(). (I like to separate the function name portion from the suffix with a double underscore so someone can tell where the function name portion ends.)

Also, if you're testing multiple cases of a simple function, it helps to use @pytest.mark.parametrize to cut down on the amount of repetition. Take a look at test_make_vcs_requirement_url and the test functions following that for some examples. In this case, your inputs and outputs are strings (along with booleans to set your mocks), so it should be amenable to test parametrization.

One more comment: I like to put the test functions in the same order as the original functions appear in the module. This also makes it easier to locate test functions when you're scrolling around. The test module has a parallel structure to the module it's testing.

@cjerdonek is it ok to use a noqa on test names? Just in case they get too big.

(Turns out I didn't need it.)

I don't expect they would ever get too big. You can put the arguments on the next line if it ever started to get too long.

3e57673
There, I had some issues parametrizing the tests that had URLs though.

cjerdonek · 2019-03-03T01:11:42Z

@vinicyusmacedo Are you still working on your changes, or were you waiting for another review? I noticed at least one (easy) comment wasn't addressed, which is why I was waiting.

retpolanne · 2019-03-03T01:40:13Z

@cjerdonek sorry, I forgot about some of the comments. I'm pushing them right now and I think that's it :)

cjerdonek · 2019-03-03T04:38:19Z

@vinicyusmacedo Can you also review the pip docs to see if anything needs changing / updating? For example, there is this part from the section on Requirements Specifiers that looks like it needs to be updated:

pip does not support the url_req form of specifier at this time

Maybe you can add a paragraph after the "Since version 6.0," paragraph saying, "Since version 19.1," describing the change you're adding.

src/pip/_internal/req/constructors.py

retpolanne · 2019-03-03T12:56:47Z

@cjerdonek requirements file format and examples need changing as well. Should I use Since version 19.1 as well in these parts?

cjerdonek · 2019-03-03T12:59:19Z

@vinicyusmacedo You can leave out mention of the version for now in those other sections.

cjerdonek · 2019-03-03T13:01:58Z

Does this mean you can also delete the parentheses here:

(pip does not support the url_req form of specifier at this time)

cjerdonek · 2019-03-04T00:14:36Z

@uranusjr Now that the code and tests for this PR are more in shape, and because @vinicyusmacedo followed the approach that you suggested, can you review this carefully, and also see if any test cases are missing or should be added? Like, would it be good to have any test cases anywhere with a space missing before and/or after the @ symbol to make sure it's not interpreted as a direct reference?

retpolanne · 2019-05-30T00:12:57Z

@cjerdonek hey, sorry for bothering, is there anything that I missed on this PR?
I thinks that all sums up the fix.

retpolanne · 2019-07-21T01:10:01Z

Ping @uranusjr

uranusjr

This could use some squashing, but code-wise 👍

uranusjr · 2019-07-21T02:19:35Z

src/pip/_internal/req/constructors.py

+    if os.path.altsep is not None and os.path.altsep in name:
+        return True
+    if name.startswith('.'):
+        return True


I just realised this if does not work as intended, and probably should be removed. A ./whatever string would’ve been caught in previous checks. This only matters for strings like .whatever, which I guess still does look like a path…? (but then the docstring is not accurate)

@uranusjr that's exactly it. I don't really know why some package would start with ., but I'll add a test case for it and add it to the docstring.

Packages can’t start with a dot, so this check doesn’t really matter either way :p But it’s better to remove it since its mere existence can be confusing to future readers.

Nice. I have removed it then.

@uranusjr oops, actually you can use . to install a package. You can use it to install the current directory as a package if it has a setup.py file.

I still have a problem, though: the Windows tests might fail with the == since the path separator is different. I'll go with name.startswith then (I could make separate test cases for Windows, but that doesn't sound so good).

I believe the sep and altsep parts cover the different separators (if my memory of implementation from other projects serves).

From what I understood from the docs, it appears to be only available on Windows (the altsep on Windows would be the forward-slash)

https://docs.python.org/3/library/os.html#os.altsep

That is correct, hence the first test would detect \ on Windows and / on POSIX; the second test detects / on Windows (always false on POSIX).

You could add a simple Windows-only test like this, if you’re inclined to:

@pytest.mark.parametrize('path', [ '.\\path\\to\\installable', 'relative\\path', 'C:\\absolute\\path', ]) @pytest.skipif(os.path.sep != '\\') def test_looks_like_path_win(path): assert _looks_like_path(path) == True

@uranusjr I could also skip this test if not sys.platform.startswith("win")

xavfernandez · 2019-07-21T21:13:35Z

Thanks for sticking with it @vinicyusmacedo 👍

retpolanne · 2019-07-22T14:30:29Z

@xavfernandez @uranusjr just added some Windows-specific tests.

BrownTruck · 2019-07-27T06:00:03Z

Hello!

I am an automated bot and I have noticed that this pull request is not currently able to be merged. If you are able to either merge the master branch into this pull request or rebase this pull request against master then it will be eligible for code review and hopefully merging!

retpolanne · 2019-07-27T14:19:08Z

@xavfernandez is it possible to merge this one?

desaintmartin · 2019-09-16T08:36:02Z

Gentle up! How can we help to get this merged?

chrahunt

LGTM!

chrahunt · 2019-09-16T13:52:05Z

I went through the existing comments and I believe all of them are addressed, so I will merge this. If I missed anything we can always address it in a followup.

Thanks for sticking with it @vinicyusmacedo!

retpolanne · 2019-09-16T14:46:07Z

@chrahunt thank you and everyone who reviewed this PR :)

uranusjr requested changes Feb 14, 2019

View reviewed changes

cjerdonek reviewed Feb 14, 2019

View reviewed changes

cjerdonek reviewed Feb 18, 2019

View reviewed changes

src/pip/_internal/req/constructors.py Show resolved Hide resolved

cjerdonek reviewed Feb 23, 2019

View reviewed changes

src/pip/_internal/req/constructors.py Outdated Show resolved Hide resolved

cjerdonek reviewed Feb 24, 2019

View reviewed changes

src/pip/_internal/req/constructors.py Outdated Show resolved Hide resolved

cjerdonek reviewed Feb 24, 2019

View reviewed changes

This was referenced Mar 3, 2019

Lacking parity between requirements.txt and install_requires syntax #6097

Closed

pip's lack of support for PEP 508 URLs in requirements.txt / argument breaks pep517 #6306

Closed

cjerdonek reviewed Mar 3, 2019

View reviewed changes

src/pip/_internal/req/constructors.py Outdated Show resolved Hide resolved

cjerdonek reviewed Mar 3, 2019

View reviewed changes

src/pip/_internal/req/constructors.py Outdated Show resolved Hide resolved

pypa-bot removed the needs rebase or merge PR has conflicts with current master label May 9, 2019

retpolanne mentioned this pull request Jun 9, 2019

pip fails to install remote dependency when it is a .whl and follows PEP 440 #6202

Closed

uranusjr approved these changes Jul 21, 2019

View reviewed changes

uranusjr reviewed Jul 21, 2019

View reviewed changes

chrahunt mentioned this pull request Jul 21, 2019

pip tries to parse PEP 508 URLs in pyproject.toml's [build-system] requires as file path #6405

Closed

uranusjr approved these changes Jul 21, 2019

View reviewed changes

xavfernandez approved these changes Jul 21, 2019

View reviewed changes

xavfernandez mentioned this pull request Jul 22, 2019

Drop appveyor in favor of azure #6767

Closed

desaintmartin mentioned this pull request Jul 22, 2019

@ delimiter for url is not supported jazzband/pip-tools#854

Closed

BrownTruck added the needs rebase or merge PR has conflicts with current master label Jul 27, 2019

pypa-bot removed the needs rebase or merge PR has conflicts with current master label Jul 27, 2019

retpolanne added 2 commits August 12, 2019 14:32

Added test to fail pep508

5b93c09

Adding improvements to the _get_path_to_url function

16af35c

cjerdonek mentioned this pull request Sep 15, 2019

Simplify input requirement parsing #7019

Open

chrahunt mentioned this pull request Sep 16, 2019

Clean up req.constructors.install_req_from_line #7025

Merged

chrahunt approved these changes Sep 16, 2019

View reviewed changes

chrahunt merged commit 82c2dd4 into pypa:master Sep 16, 2019

retpolanne deleted the fix-pep-508 branch September 16, 2019 22:35

atugushev mentioned this pull request Sep 23, 2019

pip-compile loses VCS egg name from setup.py jazzband/pip-tools#902

Closed

lock bot added the auto-locked Outdated issues that have been locked by automation label Oct 16, 2019

lock bot locked as resolved and limited conversation to collaborators Oct 16, 2019

		return scheme in ['http', 'https', 'file', 'ftp'] + vcs.all_schemes


		def split_scheme_from_url(url):

Fix is_url from splitting the scheme incorrectly when using PEP 440's direct references #6203

Fix is_url from splitting the scheme incorrectly when using PEP 440's direct references #6203

Conversation

retpolanne commented Jan 25, 2019 • edited Loading

retpolanne commented Jan 25, 2019

retpolanne commented Feb 13, 2019

uranusjr left a comment

Choose a reason for hiding this comment

retpolanne commented Feb 14, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cjerdonek Feb 14, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

uranusjr commented Feb 14, 2019

retpolanne commented Feb 14, 2019

uranusjr commented Feb 15, 2019

retpolanne commented Feb 15, 2019

uranusjr commented Feb 15, 2019

uranusjr commented Feb 15, 2019 • edited Loading

retpolanne commented Feb 17, 2019 • edited Loading

uranusjr commented Feb 18, 2019 • edited Loading

cjerdonek left a comment

Choose a reason for hiding this comment

cjerdonek left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

retpolanne Feb 24, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cjerdonek commented Mar 3, 2019

retpolanne commented Mar 3, 2019

cjerdonek commented Mar 3, 2019 • edited Loading

retpolanne commented Mar 3, 2019

cjerdonek commented Mar 3, 2019

cjerdonek commented Mar 3, 2019 • edited Loading

cjerdonek commented Mar 4, 2019 • edited Loading

retpolanne commented May 30, 2019

retpolanne commented Jul 21, 2019

uranusjr left a comment

Choose a reason for hiding this comment

uranusjr Jul 21, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

retpolanne Jul 21, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

uranusjr Jul 22, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xavfernandez commented Jul 21, 2019 • edited Loading

retpolanne commented Jul 22, 2019

BrownTruck commented Jul 27, 2019

retpolanne commented Jul 27, 2019

desaintmartin commented Sep 16, 2019

chrahunt left a comment

Choose a reason for hiding this comment

chrahunt commented Sep 16, 2019

retpolanne commented Sep 16, 2019

retpolanne commented Jan 25, 2019 •

edited

Loading

cjerdonek Feb 14, 2019 •

edited

Loading

uranusjr commented Feb 15, 2019 •

edited

Loading

retpolanne commented Feb 17, 2019 •

edited

Loading

uranusjr commented Feb 18, 2019 •

edited

Loading

retpolanne Feb 24, 2019 •

edited

Loading

cjerdonek commented Mar 3, 2019 •

edited

Loading

cjerdonek commented Mar 3, 2019 •

edited

Loading

cjerdonek commented Mar 4, 2019 •

edited

Loading

uranusjr Jul 21, 2019 •

edited

Loading

retpolanne Jul 21, 2019 •

edited

Loading

uranusjr Jul 22, 2019 •

edited

Loading

xavfernandez commented Jul 21, 2019 •

edited

Loading