Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calculate candidate string versions only once in get_applicable_candidates #12664

Conversation

notatallshaw
Copy link
Member

@notatallshaw notatallshaw commented Apr 30, 2024

This is a minor performance, I measure it at 1% fairly consistently across different resolutions I've tried.

I was looking at the "After call graph" in #12663 and noticed that get_applicable_candidates was taking 16% of run time, and even though it was only called 921 times it is calling other methods hundreds of thousands of times.

The only obvious thing I spotted though is it effectively calculating [c.version for c in candidates] twice, and can be seen on this part of the call graph:

Highlighted call graph

image

I suspect though that this function has further significant optimization, so I will leave it as draft for now and think on it and take any suggestions, before marking it ready for review.

@ichard26 ichard26 added the type: performance Commands take too long to run label Apr 30, 2024
@notatallshaw notatallshaw force-pushed the Calculate-candidate-versions-once-in-`get_applicable_candidates` branch from eb7c1eb to 3ddec7f Compare May 3, 2024 22:57
@notatallshaw
Copy link
Member Author

Found one further minor improvement, specifier.filter returns an Iterable of the type you give it an Iterable of, so there's no need to stingify the output, it's already a string.

Other than that this logic is quite sensitive due to the complicated way prereleases work in the version spec, so I wasn't able to find any other big improvements, except when allow_prereleases = True, but it didn't seem worth adding a special path for this use case.

Marking ready for review.

@notatallshaw notatallshaw marked this pull request as ready for review May 3, 2024 23:09
@notatallshaw
Copy link
Member Author

notatallshaw commented May 4, 2024

Updated now Python 3.13 is passing CI

src/pip/_internal/index/package_finder.py Outdated Show resolved Hide resolved
news/12664.feature.rst Outdated Show resolved Hide resolved
@notatallshaw notatallshaw force-pushed the Calculate-candidate-versions-once-in-`get_applicable_candidates` branch from e0bcee2 to d845ad9 Compare June 2, 2024 20:58
# types. This way we'll use a str as a common data interchange
# format. If we stop using the pkg_resources provided specifier
# and start using our own, we can drop the cast to str().
candidates_and_versions = [(c, str(c.version)) for c in candidates]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know how much memory overhead this may induce in a large install? I agree this block can likely be further optimised since it is basically filtering on one list.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know how much memory overhead this may induce in a large install?

I ran memray and there was no noticable memory overhead, peak memory for a dry run install of apache-airflow[all]==2.9.2 on Python 3.12 was 354 MBs, memory usage was dominated by making a list of pages of all candidates (I'm going to make a seperate issue on that).

I agree this block can likely be further optimised since it is basically filtering on one list.

I tried making it simpler, but found that the behavior of pre-releases made it problematic. You can't filter against an individual version, because 1 pre-release will allow that pre-release, but one final version and a pre-release will not allow that pre-release unless allow_prereleases=True.

@notatallshaw notatallshaw force-pushed the Calculate-candidate-versions-once-in-`get_applicable_candidates` branch from d845ad9 to 811ab0b Compare July 7, 2024 18:03
@notatallshaw notatallshaw added this to the 24.2 milestone Jul 7, 2024
@notatallshaw
Copy link
Member Author

I'm putting this on the 24.2 milestone, let me know if that's an overreach and I will remove it.

@pradyunsg pradyunsg merged commit 888d2cc into pypa:main Jul 13, 2024
28 checks passed
@pradyunsg
Copy link
Member

Thanks @notatallshaw!

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 29, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants