Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-42369: Fix thread safety of zipfile._SharedFile.tell #26974

Merged
merged 2 commits into from
Mar 20, 2022

Conversation

kevinmehall
Copy link
Contributor

@kevinmehall kevinmehall commented Jun 30, 2021

The _SharedFile tracks its own virtual position into the file as self._pos and updates it after reading or seeking. tell() should return this position instead of calling into the underlying file object, since if multiple _SharedFile instances are being used concurrently on the same file, another one may have moved the real file position. Additionally, calling into the underlying tell may expose thread safety issues in the underlying file object because it was called without taking the lock.

Prior to this fix, the test case in https://bugs.python.org/issue42369#msg381212 reliably caused a zipfile.BadZipFile: Bad CRC-32 for file 'file1' in less than a second; with this fix I have not seen this error.

https://bugs.python.org/issue42369

The `_SharedFile` tracks its own virtual position into the file as
`self._pos` and updates it after reading or seeking. `tell()` should
return this position instead of calling into the underlying file object,
since if multiple `_SharedFile` instances are being used concurrently on
the same file, another one may have moved the real file position.
Additionally, calling into the underlying `tell` may expose thread
safety issues in the underlying file object because it was called
without taking the lock.

Prior to this fix, the test case in
https://bugs.python.org/issue42369#msg381212
reliably caused a `zipfile.BadZipFile: Bad CRC-32 for file 'file1'`
after a few dozen reads; with this fix I have not seen this error.
@ain-soph
Copy link

Nice! This could certainly help me solve the issue in machine learning when I want to load data from a zip file.
Hope its priority gets higher.

@github-actions
Copy link

This PR is stale because it has been open for 30 days with no activity.

@github-actions github-actions bot added the stale Stale PR or inactive for long period of time. label Aug 13, 2021
Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but please add a NEWS file.

@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

And if you don't make the requested changes, you will be poked with soft cushions!

@serhiy-storchaka serhiy-storchaka removed the stale Stale PR or inactive for long period of time. label Mar 19, 2022
@kevinmehall
Copy link
Contributor Author

I have made the requested changes; please review again

(hopefully got the rst formatting right)

@bedevere-bot
Copy link

Thanks for making the requested changes!

@serhiy-storchaka: please review the changes made to this pull request.

@serhiy-storchaka serhiy-storchaka merged commit e730ae7 into python:main Mar 20, 2022
@serhiy-storchaka serhiy-storchaka added type-bug An unexpected behavior, bug, or error needs backport to 3.9 only security fixes needs backport to 3.10 only security fixes labels Mar 20, 2022
@miss-islington
Copy link
Contributor

Thanks @kevinmehall for the PR, and @serhiy-storchaka for merging it 🌮🎉.. I'm working now to backport this PR to: 3.9.
🐍🍒⛏🤖

@miss-islington
Copy link
Contributor

Thanks @kevinmehall for the PR, and @serhiy-storchaka for merging it 🌮🎉.. I'm working now to backport this PR to: 3.10.
🐍🍒⛏🤖

@bedevere-bot bedevere-bot removed the needs backport to 3.9 only security fixes label Mar 20, 2022
@bedevere-bot
Copy link

GH-32008 is a backport of this pull request to the 3.9 branch.

@bedevere-bot bedevere-bot removed the needs backport to 3.10 only security fixes label Mar 20, 2022
@bedevere-bot
Copy link

GH-32009 is a backport of this pull request to the 3.10 branch.

@serhiy-storchaka
Copy link
Member

Thank you @kevinmehall.

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Mar 20, 2022
)

The `_SharedFile` tracks its own virtual position into the file as
`self._pos` and updates it after reading or seeking. `tell()` should
return this position instead of calling into the underlying file object,
since if multiple `_SharedFile` instances are being used concurrently on
the same file, another one may have moved the real file position.
Additionally, calling into the underlying `tell` may expose thread
safety issues in the underlying file object because it was called
without taking the lock.
(cherry picked from commit e730ae7)

Co-authored-by: Kevin Mehall <[email protected]>
miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Mar 20, 2022
)

The `_SharedFile` tracks its own virtual position into the file as
`self._pos` and updates it after reading or seeking. `tell()` should
return this position instead of calling into the underlying file object,
since if multiple `_SharedFile` instances are being used concurrently on
the same file, another one may have moved the real file position.
Additionally, calling into the underlying `tell` may expose thread
safety issues in the underlying file object because it was called
without taking the lock.
(cherry picked from commit e730ae7)

Co-authored-by: Kevin Mehall <[email protected]>
miss-islington added a commit that referenced this pull request Mar 20, 2022
The `_SharedFile` tracks its own virtual position into the file as
`self._pos` and updates it after reading or seeking. `tell()` should
return this position instead of calling into the underlying file object,
since if multiple `_SharedFile` instances are being used concurrently on
the same file, another one may have moved the real file position.
Additionally, calling into the underlying `tell` may expose thread
safety issues in the underlying file object because it was called
without taking the lock.
(cherry picked from commit e730ae7)

Co-authored-by: Kevin Mehall <[email protected]>
miss-islington added a commit that referenced this pull request Mar 20, 2022
The `_SharedFile` tracks its own virtual position into the file as
`self._pos` and updates it after reading or seeking. `tell()` should
return this position instead of calling into the underlying file object,
since if multiple `_SharedFile` instances are being used concurrently on
the same file, another one may have moved the real file position.
Additionally, calling into the underlying `tell` may expose thread
safety issues in the underlying file object because it was called
without taking the lock.
(cherry picked from commit e730ae7)

Co-authored-by: Kevin Mehall <[email protected]>
hello-adam pushed a commit to hello-adam/cpython that referenced this pull request Jun 2, 2022
)

The `_SharedFile` tracks its own virtual position into the file as
`self._pos` and updates it after reading or seeking. `tell()` should
return this position instead of calling into the underlying file object,
since if multiple `_SharedFile` instances are being used concurrently on
the same file, another one may have moved the real file position.
Additionally, calling into the underlying `tell` may expose thread
safety issues in the underlying file object because it was called
without taking the lock.
(cherry picked from commit e730ae7)

Co-authored-by: Kevin Mehall <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants