Create `mousebender.resolve` #105

brettcannon · 2023-08-21T23:05:30Z

Use resolvelib to keep compatibility with other projects like pip and PDM.

Provide flexibility in how the resolution is handled:

Separate out how metadata is fetched (so you can swap out either the resolution or download logic; also makes testing easier)
Track which projects brought in which dependencies (to be able to render the dependency graph)
Newest or oldest version (to know when your lower bounds no longer work)
Pass in platform and marker details (so you can resolve for any platform on any machine)
Fastest or most compatible wheel (so you can see what the loosest wheel tag triple you need is)
Support external constraints (since it seems to be a common need)

Provider protocol

The text was updated successfully, but these errors were encountered:

brettcannon · 2023-08-21T23:08:50Z

Code links:

pradyunsg · 2023-09-01T20:53:21Z

A couple of relevant sections from existing standards:

https://peps.python.org/pep-0440/#summary-of-permitted-suffixes-and-relative-ordering (covered by Version FWIW)
https://peps.python.org/pep-0592/#installers

brettcannon · 2023-09-02T18:05:10Z

The identify() method seems to be about generating a key for a requirement (typing suggests it can be a candidate or requirement, but the docstring only mentions requirements):

Given a requirement, return an identifier for it.

This is used to identify a requirement, e.g. whether two requirements
should have their specifier parts merged.

So the important thing here seems to be having requirements that are for the same thing to line up for merging.

Apparently everyone makes extras part of key; pip, PDM, and the extras example from resolvelib. What's interesting is PDM and pip return a string while the resolvelib example returns a string or tuple of name and extras.

Since the typing of identify() suggests nothing more than it must be hashable, probably the simplest solution here is to return a tuple of Tuple[NormalizedName, frozenset[NormalizedName]] for the distribution name and extras. For any requirements that have no extras then the empty set can be used.

Now we just have to figure out the representation of a requirement in order to generate that tuple. 😅

brettcannon · 2023-09-02T22:46:06Z

The get_dependencies() method seems to be about taking a candidate (which is distribution and version) and getting back its dependencies in the form of a requirement:

Get dependencies of a candidate.

This should return a collection of requirements that candidate
specifies as its dependencies.

Depending on how candidate objects look, this could be very easy or it could require going back to the wheel to get the dependencies.

Since the resolver is the one having to figure out priorities, the dependencies probably don't have a priority order, thus returning something like a set should be okay.

brettcannon · 2023-09-02T22:48:59Z

The is_satisfied_by() method seems to just check if a candidate satisfies a requirement.

Whether the given requirement can be satisfied by a candidate.

The candidate is guaranteed to have been generated from the
requirement.

A boolean should be returned to indicate whether candidate is a
viable solution to the requirement.

Now the interesting thing is "the candidate is guaranteed to have been generated from the requirement", and yet we still have to check if the candidate satisfies the requirement. 🤔 The best I can think of is multiple requirements got merged into a single requirement and this is to check if the candidate is still viable.

brettcannon · 2023-09-02T23:06:00Z

The get_preference() method is all about creating a sort key for the DSU pattern.

The preference is defined as "I think this requirement should be
resolved first". The lower the return value is, the more preferred
this group of arguments is.

The PyPI wheel example from resolvelib and an open issue about a default implementation suggest just going with the number of candidates as that prioritizes the most restricted distributions first.

Pip's implementation is way more involved. From its docstring:

Prefer if any of the known requirements is "direct", e.g. points to an
explicit URL.
If equal, prefer if any requirement is "pinned", i.e. contains
operator === or ==.
If equal, calculate an approximate "depth" and resolve requirements
closer to the user-specified requirements first. If the depth cannot
by determined (eg: due to no matching parents), it is considered
infinite.
Order user-specified requirements by the order they are specified.
If equal, prefers "non-free" requirements, i.e. contains at least one
operator, such as >= or <.
If equal, order alphabetically for consistency (helps debuggability).

PDM's implementation is also more involved. The return value gives a hint as to what it prioritises:

return (
            not is_python,
            not is_top,
            not is_file_or_url,
            not is_pinned,
            not is_backtrack_cause,
            dep_depth,
            -constraints,
            identifier,
        )

It might be best to go with the simple solution as suggested in the open issue and example and increase complexity as necessary.

brettcannon · 2023-09-02T23:10:35Z

If get_dependencies() is candidate -> requirements, the get_matches() method seems to be requirement -> candidate.

This is seems to be where communicating with e.g. PyPI would come in (as the PyPI wheel example suggests).

This does require returning things in a sorted order of preference.

brettcannon · 2023-09-03T00:32:15Z

After all of those notes, it seems like the methods do the following (with a wheel-specific view):

identify(): Key for a dict for a requirement which is consistent based on the distribution and not any constraints
get_preference(): Sort requirements based on how important they are to resolve for first
find_matches(): Find all the candidates that could satisfy a requirement
is_satisfied_by(): Does a candidate still work for a requirement?
get_dependencies(): Get the dependencies of a candidate

It seems all the magic in terms of what to customize happens in find_matches(). I think the tricky bit here is the right abstraction to let things be customized while also helping to minimize network traffic. Do you do subclassing? Composition? Do you ask for everything and let the common code do the filtering of the results, or do you let the thing you delegate to handle that to avoid needless networking requests? Another question is do you return all potential wheels, or just the best one (e.g., compiled wheel and pure Python wheels, or just the compiled one)? If you do all wheels you have the greatest chance at success, but it will also increase resolution complexity. It also then becomes a question of whether you prefer newer versions with only pure Python wheels or older versions with compiled wheels?

It seems like requirements are pretty much what packaging.requirements.Requirement are. Candidates are details about a specific wheel.

pradyunsg · 2023-09-03T11:36:24Z

requirements that are for the same thing to line up for merging.

Same for the corresponding candidates -- for every candidate generated by a find_matches(requirement), it needs to have identify(candidate) == identify(requirement). Without that, the resolver cannot associate the two.

I don't remember if we added an explicit error check for this in resolvelib, but flagging this since I don't see this nuance mentioned here.

PS: We need to improve that docstring for identify. 😅

brettcannon · 2023-09-08T21:41:55Z

Same for the corresponding candidates -- for every candidate generated by a find_matches(requirement), it needs to have identify(candidate) == identify(requirement). Without that, the resolver cannot associate the two.

How does that play into extras? The extras example says:

To model projects with extras, we define a candidate as being a project with a
specific set of dependencies. This introduces a problem, as the resolver could
produce a solution that demands version 1.0 of X[foo] and version 2.0 of
X[bar]. This is impossible, as there is actually only one project X to be
installed. To address this, we inject an additional dependency for every
candidate with an extra - X[foo] version v depends on X version v. By doing
this, we constrain the solution to require a unique version of X.

So does that mean you have the same candidate wheel under multiple identifiers to satisfy any and all extras that are possible for a distribution (and thus cache a lot to avoid the cost of looking up the same thing for a distribution just because you now come across a new set of extras)? Or do you add a dependency to the distribution+extra of the distribution itself and have a virtual candidate that always satisfies the distribution+extra so the resolver effectively ignores the distribution+extra node of the dependency graph?

brettcannon · 2023-09-08T22:14:12Z

PS: We need to improve that docstring for identify. 😅

sarugaku/resolvelib#138

pradyunsg · 2023-09-10T00:29:36Z

So does that mean you have the same candidate wheel under multiple identifiers to satisfy any and all extras that are possible for a distribution (and thus cache a lot to avoid the cost of looking up the same thing for a distribution just because you now come across a new set of extras)?

For pip, yes-ish (-ish because I don't recall the caching details). The extra candidate is based on another candidate and the way that this is handled is by having the extra candidate depend on a requirement that only find_matches with the specific candidate for the underlying distribution file which the metadata was pulled from.

For something like the following...

[project]
name = "home"
version = "1.0.0"

[project.optional-dependencies]
base = ["base-dependency"]
dog = ["dog-food", "dog-bed"]
cat = ["cat-food", "cat-bed"]

And doing a home[base, dog] and home[base, cat] will give resolvelib a graph that looks something like:

graph LR
    subgraph legend
        requirement(["requirement"])
        candidate[["candidate"]]
    end
    subgraph cat-extra["identity = home[base,cat]"]
        home-cat-requirement(["home[base,cat]"])
        home-cat-version-one[["ExtrasCandidate(\n  LinkCandidate(home-1.0.0-*.whl),\n  [base,cat]\n)"]]
        style home-cat-version-one text-align:left
    end

    subgraph dog-extra["identity = home[base,dog]"]
        home-dog-requirement(["home[base,dog]"])
        home-dog-version-one[["ExtrasCandidate(\n  LinkCandidate(home-1.0.0-*.whl),\n  [base,dog]\n)"]]
        style home-dog-version-one text-align:left
    end

    subgraph cat-requirements ["layout grouping\ncat extra dependencies"]
        cat-requirement-food(["cat-food"])
        cat-requirement-bed(["cat-bed"])
    end

    subgraph base-requirements ["layout grouping\nbase extra dependencies"]
        base-requirement(["base-dependency"])
    end
    subgraph home-one ["identity = home"]
        home-requirement-one(["ExplicitRequirement(LinkCandidate(home-1.0.0-*.whl))"])
        home-version-one[["LinkCandidate(home-1.0.0-*.whl)"]]
    end
    subgraph dog-requirements ["layout grouping\ndog extra dependencies"]
        dog-requirement-food(["dog-food"])
        dog-requirement-bed(["dog-bed"])
    end


    home-dog-requirement --> home-dog-version-one;
    home-cat-requirement --> home-cat-version-one;

    home-dog-version-one --> home-requirement-one;
    home-dog-version-one --> base-requirement;
    home-cat-version-one --> home-requirement-one;
    home-cat-version-one --> base-requirement;

    home-requirement-one --> home-version-one;

    home-cat-version-one --> cat-requirement-food;
    home-cat-version-one --> cat-requirement-bed;

    home-dog-version-one --> dog-requirement-food;
    home-dog-version-one --> dog-requirement-bed;

And, yes, I've not expanded the requirements that came from the extras into their corresponding candidates because that isn't relevant to the extra handling question.

(A home 2.0.0 will add a new node corresponding to those for 1.0.0, with basically the same connections if the dependencies are unchanged. Adding that in made mermaid render the graph in an unreadable format so I've skipped it)

Edit: Updated to explicitly reference the pip-side class object names.

brettcannon · 2023-09-10T01:11:17Z

@pradyunsg thanks for this! So if I understand this correctly, there are pseudo-candidates per extras set and distro version which ends up with an implicit requirement to the distribution itself -- i.e. no extras -- with a == to the same version the pseudo-candidate specified.

pradyunsg · 2023-09-10T09:49:31Z

So if I understand this correctly, there are pseudo-candidates per extras set and distro version which ends up with an implicit requirement to the distribution itself -- i.e. no extras -- with a == to the same version the pseudo-candidate specified.

Yes, except that it's not done via == but as an explicit requirement that matches the candidate. This matters for things like:

>>> SpecifierSet("==1.0").contains("1.0+local_version_label") 
True

Where the 1.0 wheel can be different from 1.0+local_version_label wheel.

I'm realising that this can (and does) lead to the resolver backtracking on the wrong requirement sometimes, since backtracking when an ExplicitRequirement is involved is going to result in suboptimal backtracking. The fix there is to prefer to backtrack on ExplicitRequirement whenever possible.

brettcannon · 2023-09-10T17:16:54Z

backtracking when an ExplicitRequirement is involved is going to result in suboptimal backtracking. The fix there is to prefer to backtrack on ExplicitRequirement whenever possible.

That sounds contradictory. If backtracking when an explicit requirement is involved is suboptimal, why is the fix to prefer backtracking in that case? Are you saying explicit requirements should come earlier or later in the ordering provided by get_preferences() (which is what I assume you're suggesting as the mechanism to signal what to backtrack on)?

brettcannon · 2023-09-10T17:18:10Z

And now I'm seeing why sarugaku/resolvelib#14 exists. 😅

pradyunsg · 2023-09-13T00:21:31Z

Are you saying explicit requirements should come earlier or later in the ordering provided by get_preferences() (which is what I assume you're suggesting as the mechanism to signal what to backtrack on)?

Yea -- the preference for backtracking on the ExplicitRequirement should be higher that other kinds of requirements, when it is implicated in a conflict.

brettcannon · 2023-10-13T02:38:45Z

Started outlining the code in the 'resolve' branch.

brettcannon · 2024-02-11T22:38:44Z

The resolver now works! I haven't done constraints, but it works well enough that I have a PoC resolver that works with PyPI metadata that can be directly downloaded.

If anyone has feedback on the high-level API I would appreciate it since that's the bit that can't be easily changed later.

ofek · 2024-02-11T23:00:51Z

Do you mind to briefly explain how the PoC differs from the resolver in pip?

brettcannon · 2024-02-13T00:22:05Z

Do you mind to briefly explain how the PoC differs from the resolver in pip?

At minimum, the blatant differences are:

No support for sdists
Not support for constraints

There's probably other subtleties that I'm not aware of.

brettcannon added the impact-feature label Aug 21, 2023

kieferrm mentioned this issue Sep 18, 2023

Python: Iteration Plan for September 2023 microsoft/vscode-python#21968

Closed

31 tasks

kieferrm mentioned this issue Oct 8, 2023

Python: Iteration Plan for October 2023 microsoft/vscode-python#22173

Closed

33 tasks

kieferrm mentioned this issue Nov 4, 2023

Python: Iteration Plan for November 2023 microsoft/vscode-python#22422

Closed

29 tasks

kieferrm mentioned this issue Dec 20, 2023

Python: Iteration Plan for January 2024 microsoft/vscode-python#22684

Closed

39 tasks

kieferrm mentioned this issue Jan 16, 2024

Iteration Plan for January 2024 microsoft/vscode#201255

Closed

67 tasks

kieferrm mentioned this issue Feb 4, 2024

Python: Iteration Plan for February 2024 microsoft/vscode-python#22847

Closed

38 tasks

kieferrm mentioned this issue Mar 3, 2024

Python: Iteration Plan for March 2024 microsoft/vscode-python#23020

Closed

36 tasks

kieferrm mentioned this issue Apr 7, 2024

Python: Iteration Plan for April 2024 microsoft/vscode-python#23196

Closed

18 tasks

kieferrm mentioned this issue May 6, 2024

Python: Iteration Plan for May 2024 microsoft/vscode-python#23341

Closed

29 tasks

kieferrm mentioned this issue Jun 12, 2024

Python: Iteration Plan for June 2024 microsoft/vscode-python#23603

Closed

20 tasks

kieferrm mentioned this issue Jul 7, 2024

Python: Iteration Plan forJuly 2024 microsoft/vscode-python#23749

Open

35 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create `mousebender.resolve` #105

Create `mousebender.resolve` #105

brettcannon commented Aug 21, 2023 •

edited

Loading

brettcannon commented Aug 21, 2023 •

edited

Loading

pradyunsg commented Sep 1, 2023 •

edited

Loading

brettcannon commented Sep 2, 2023

brettcannon commented Sep 2, 2023

brettcannon commented Sep 2, 2023

brettcannon commented Sep 2, 2023

brettcannon commented Sep 2, 2023

brettcannon commented Sep 3, 2023

pradyunsg commented Sep 3, 2023

brettcannon commented Sep 8, 2023

brettcannon commented Sep 8, 2023

pradyunsg commented Sep 10, 2023 •

edited

Loading

brettcannon commented Sep 10, 2023

pradyunsg commented Sep 10, 2023

brettcannon commented Sep 10, 2023

brettcannon commented Sep 10, 2023

pradyunsg commented Sep 13, 2023

brettcannon commented Oct 13, 2023

brettcannon commented Feb 11, 2024

ofek commented Feb 11, 2024

brettcannon commented Feb 13, 2024

Create mousebender.resolve #105

Create mousebender.resolve #105

Comments

brettcannon commented Aug 21, 2023 • edited Loading

Provider protocol

brettcannon commented Aug 21, 2023 • edited Loading

pradyunsg commented Sep 1, 2023 • edited Loading

brettcannon commented Sep 2, 2023

brettcannon commented Sep 2, 2023

brettcannon commented Sep 2, 2023

brettcannon commented Sep 2, 2023

brettcannon commented Sep 2, 2023

brettcannon commented Sep 3, 2023

pradyunsg commented Sep 3, 2023

brettcannon commented Sep 8, 2023

brettcannon commented Sep 8, 2023

pradyunsg commented Sep 10, 2023 • edited Loading

brettcannon commented Sep 10, 2023

pradyunsg commented Sep 10, 2023

brettcannon commented Sep 10, 2023

brettcannon commented Sep 10, 2023

pradyunsg commented Sep 13, 2023

brettcannon commented Oct 13, 2023

brettcannon commented Feb 11, 2024

ofek commented Feb 11, 2024

brettcannon commented Feb 13, 2024

Create `mousebender.resolve` #105

Create `mousebender.resolve` #105

brettcannon commented Aug 21, 2023 •

edited

Loading

brettcannon commented Aug 21, 2023 •

edited

Loading

pradyunsg commented Sep 1, 2023 •

edited

Loading

pradyunsg commented Sep 10, 2023 •

edited

Loading