Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-80789: Get rid of the ensurepip infra for many wheels #109245

Merged
merged 15 commits into from
Jan 30, 2024

Conversation

webknjaz
Copy link
Contributor

@webknjaz webknjaz commented Sep 10, 2023

This is a refactoring change that aims to simplify ensurepip. Before it, this module had legacy infrastructure that made an assumption that ensurepip would be provisioning more then just a single wheel. That assumption is no longer true since [1][2][3].

In this change, the improvement is done around removing unnecessary loops and supporting structures to change the assumptions to expect only the bundled or replacement pip wheel.

This is a spin-off of #12791.

@webknjaz webknjaz force-pushed the maintenance/ensurepip-single-wheel branch 2 times, most recently from b9fe227 to 6b2ce22 Compare September 10, 2023 23:33
@webknjaz webknjaz changed the title Get rid of the ensurepip infra for many wheels GH-109245: Get rid of the ensurepip infra for many wheels Sep 10, 2023
@webknjaz webknjaz force-pushed the maintenance/ensurepip-single-wheel branch from 71911de to bf909c0 Compare September 10, 2023 23:36
@webknjaz
Copy link
Contributor Author

cc @AA-Turner @pfmoore @pradyunsg

@AA-Turner AA-Turner changed the title GH-109245: Get rid of the ensurepip infra for many wheels GH-80789: Get rid of the ensurepip infra for many wheels Sep 16, 2023
@AA-Turner
Copy link
Member

I've taken the liberty of pushing two changes to your branch; feel free to revert in whole or part.

Both look to simplify what ensurepip is doing internally -- e.g. with only pip, having a _Package type doesn't really make sense any-more -- similarly with caching the output, which is fast to compute.

I've also removed type annotations, as these are ignored and we shouldn't introduce more (tomllib and importlib.resources are special cases).

I think there are further simplifications we could make, but I thought this was a reasonable middle-ground -- let me know your thoughts!

A

Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. But I would prefer to get a second review on this one.

Copy link
Member

@pfmoore pfmoore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor comment, but it doesn't block this - feel free to leave it if you think your version is better.

# Make the code deterministic if a directory contains multiple wheel files
# of the same package, but don't attempt to implement correct version
# comparison since this case should not happen.
filenames = sorted(filenames)
filenames = sorted(filenames, reverse=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why reverse? I guess it gives better odds of getting the correct version, but as the comment above says, we're not doing version comparison because it should never be needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noted this in the commit message but not a PR comment; sorry. The implementation of this function now short-circuits and returns the first match, but that means we need to reverse the order as previously the final match would be returned, and we want to preserve that behaviour. We retain no guarantees about finding the correct version.

@webknjaz
Copy link
Contributor Author

I've taken the liberty of pushing two changes to your branch; feel free to revert in whole or part.

Both look to simplify what ensurepip is doing internally -- e.g. with only pip, having a _Package type doesn't really make sense any-more -- similarly with caching the output, which is fast to compute.

I've also removed type annotations, as these are ignored and we shouldn't introduce more (tomllib and importlib.resources are special cases).

I think annotations contribute to maintainability.

I think there are further simplifications we could make, but I thought this was a reasonable middle-ground -- let me know your thoughts!

I was specifically avoiding any refactoring in this PR, trying to make it as small as possible and keep the potential conflicts with the rest of related PRs at minimum. I don't like some of the changes but if this gets it merged faster so be it.

Lib/ensurepip/__init__.py Outdated Show resolved Hide resolved
@webknjaz
Copy link
Contributor Author

FreeBSD failing check reason:

⚠️ Failed to start an instance: FAILED_PRECONDITION: Monthly free compute limit exceeded!

@pfmoore
Copy link
Member

pfmoore commented Sep 16, 2023

I think annotations contribute to maintainability.

I'm pretty sure there's still a policy that we don't use type annotations in the stdlib.

@webknjaz
Copy link
Contributor Author

I'm pretty sure there's still a policy that we don't use type annotations in the stdlib.

Interesting.. https://devguide.python.org/search/?q=type+annotations&check_keywords=yes&area=default doesn't return anything related (nor does using Google search against that site).

@AA-Turner
Copy link
Member

doesn't return anything related

See https://discuss.python.org/t/21487; #108125 (comment). Quoting Brett: "It's codified based on discussions among the core developers that we have all agreed to over the years"

A

Lib/ensurepip/__init__.py Show resolved Hide resolved
Lib/ensurepip/__init__.py Show resolved Hide resolved
# Make the code deterministic if a directory contains multiple wheel files
# of the same package, but don't attempt to implement correct version
# comparison since this case should not happen.
filenames = sorted(filenames)
filenames = sorted(filenames, reverse=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noted this in the commit message but not a PR comment; sorry. The implementation of this function now short-circuits and returns the first match, but that means we need to reverse the order as previously the final match would be returned, and we want to preserve that behaviour. We retain no guarantees about finding the correct version.

Lib/ensurepip/__init__.py Outdated Show resolved Hide resolved
Lib/ensurepip/__init__.py Outdated Show resolved Hide resolved
Lib/ensurepip/__init__.py Outdated Show resolved Hide resolved
@webknjaz
Copy link
Contributor Author

@AA-Turner so I tried something which resulted in some simplification. I used pathlib but also made functions to return paths instead of complex structures with data that is never used together by the callers. I made sure that the coverage of the changed parts is at 100%, while making it less branchy...

@webknjaz
Copy link
Contributor Author

I have made the requested changes; please review again

webknjaz and others added 14 commits January 25, 2024 14:33
This is a refactoring change that aims to simplify ``ensurepip``.
Before it, this module had legacy infrastructure that made an
assumption that ``ensurepip`` would be provisioning more then just a
single wheel. That assumption is no longer true since [[1]][[2]][[3]].

In this change, the improvement is done around removing unnecessary
loops and supporting structures to change the assumptions to expect
only the bundled or replacement ``pip`` wheel.

[1]: python@ece20db
[2]: python#101039
[3]: python#95299
This change is intended to make it clear that the helper now only
returns one package and no more.

Co-Authored-By: [email protected]
- Exit early if there are no files in the directory
- Return a match early by iterating in reverse order
- Merge filename checks
- Inline the path variable
- Remove type annotations
- Return a dict with consistent fields
- Remove caching
- Remove type annotations
- Leverage the known wheel package dir value to calculate full paths
It provides us with the ability to write simpler high-level logic that
is easier to understand. As a side effect, it became possible to make
the code less branchy.
It was returning bytes on FreeBSD which was incorrectly injected into
the Python code snippet executed in a subprocess and had a b-preffix.
Some code comments and test function names were still referring to the
removed function name. Not anymore!
This script is separate and is only used in CI as opposed to runtime.
It was tripping the linters and became unnecessary after hardcoding
the pip wheel filename prefix in the string.
@webknjaz webknjaz force-pushed the maintenance/ensurepip-single-wheel branch from 45f9944 to 2472d87 Compare January 25, 2024 13:33
@webknjaz
Copy link
Contributor Author

@pradyunsg there was a merge conflict that I solved with rebase. No other changes made.

Copy link
Member

@pradyunsg pradyunsg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last change, but LGTM otherwise!

Lib/ensurepip/__init__.py Show resolved Hide resolved
Lib/ensurepip/__init__.py Show resolved Hide resolved
@webknjaz
Copy link
Contributor Author

@pradyunsg 🛎 the CI remains green after your branch sync FYI.

@pradyunsg pradyunsg merged commit 9639043 into python:main Jan 30, 2024
28 checks passed
@webknjaz
Copy link
Contributor Author

webknjaz commented Feb 1, 2024

@AA-Turner @pradyunsg @vstinner apply the backport labels?

@vstinner
Copy link
Member

vstinner commented Feb 1, 2024

I don't think that such refactoring change should be backported to stable branches. Only bugfixes.

aisk pushed a commit to aisk/cpython that referenced this pull request Feb 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants