Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update filecmp.py #27137

Closed
wants to merge 2 commits into from
Closed

Update filecmp.py #27137

wants to merge 2 commits into from

Conversation

busterbeam
Copy link

[3.7] File Comparisons (GH-????)

This is for filecmp.py as in 3.7 or 3.11a there has been no improvements in terms of using generators. I was annoyed by the fact that my backup script progress bar would stall on directories with a large number of files.

This is a different approach using generators instead of waiting for huge directories to be compared then returning. Probably minor speed improvements but nicer if there are progress bars as it allows a more continuous stream of data.

Script not completely finished as doc-strings could be improved and further testing is recommended, not sure but testing systems may need to be adjusted for generators. This module could be reworked so that only when the directory comparison object is given a key word argument does it change into a generator object

different approach using generators instead of waiting for huge directories to be compared then returning.  Probably minor speed improvements but nicer if there are progress bars as it allows a more continuous stream of data
@the-knights-who-say-ni
Copy link

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept this contribution by verifying everyone involved has signed the PSF contributor agreement (CLA).

Recognized GitHub username

We couldn't find a bugs.python.org (b.p.o) account corresponding to the following GitHub usernames:

@busterbeam

This might be simply due to a missing "GitHub Name" entry in one's b.p.o account settings. This is necessary for legal reasons before we can look at this contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

You can check yourself to see if the CLA has been received.

Thanks again for the contribution, we look forward to reviewing it!

@busterbeam
Copy link
Author

This is a lot of work you just get this improvement!? Hopefully my paperwork has gone through

Copy link
Contributor

@ambv ambv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is too big for inclusion without an issue on the bug tracker, as well as a NEWS entry. Also please consider library usage of filecmp that you will be breaking with your change, by removing existing names, renaming function arguments. I also find the unnecessary formatting changes distracting, and the removal of __class_getitem__ a regression.

@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

And if you don't make the requested changes, you will be put in the comfy chair!

@busterbeam
Copy link
Author

This change is too big for inclusion without an issue on the bug tracker, as well as a NEWS entry. Also please consider library usage of filecmp that you will be breaking with your change, by removing existing names, renaming function arguments. I also find the unnecessary formatting changes distracting, and the removal of __class_getitem__ a regression.

"that you will be breaking with your change" - I have tried to make sure it doesn't break anything but if you think it does then I will rewrite it... do you think rewriting it so that there is a generator/stream argument so that it functions just as before just with the additional key word argument set to True it will act as a generator instead?

"by removing existing names, renaming function arguments" - I do believe all the necessary names are still there if you re-read the script, yes I have removed the phase functions but really the phases needed more "divisions" for speed improvements and phase is not a good variable name in my personal opinion

oh the __class_getitem__ should not of been removed I agree, mistake, when I was editing most of it I was referring to my local copy of python 3.7 version. I wondered what the types import was for.

@busterbeam
Copy link
Author

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

I'm going to be brutally honest with all the steps required to make suggested changes on one module I will probably leave it and maybe later make it as a package for PyPI. The current filecmp module is not very agile on larger file systems. I implemented a progress bar for the checks it was making and it would occasionally "stall". I see too many scripts that take the approach of making functions that create an internal list and return/pass it on the function, when it isn't necessary

@bedevere-bot
Copy link

Thanks for making the requested changes!

@ambv: please review the changes made to this pull request.

@ambv
Copy link
Contributor

ambv commented Jul 15, 2021

"that you will be breaking with your change" - I have tried to make sure it doesn't break anything but if you think it does then I will rewrite it... do you think rewriting it so that there is a generator/stream argument so that it functions just as before just with the additional key word argument set to True it will act as a generator instead?

The Python standard library works in a way where existing code is freely imported by users. Changing any importable API (unless it's underscore-named; and even then we tend to be careful) requires a deprecation period.

"by removing existing names, renaming function arguments" - I do believe all the necessary names are still there if you re-read the script, yes I have removed the phase functions but really the phases needed more "divisions" for speed improvements and phase is not a good variable name in my personal opinion

Changing argument names is an API change. If people used keyword arguments before, you're now breaking their usage. Yes, you also removed the phase methods as well as the _filter helper.

oh the __class_getitem__ should not of been removed I agree, mistake, when I was editing most of it I was referring to my local copy of python 3.7 version. I wondered what the types import was for.

The Python 3.7 version is now irrelevant since we're working on Python 3.11. You should make your changes based on the version that is in the main branch to avoid unnecessary changes. In general, providing a new kind of a dircmp class will be cleaner compared to gutting the existing one and replacing it with yours.

added generator key word argument so it should act slower if you desire, defaults to non generator, the report feature will still utilize the generators.   `__class_getitem__` has been added
@busterbeam
Copy link
Author

I guess this should be closed improvement not needed, I guess I can just rewrite my dircmp for my program anyway, dircmp provided by the python standard library is not as agile and uses more lines that it needs

@busterbeam busterbeam closed this Jul 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants