-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PLAY-2375] -- Sanitize Storage URIs #54
Conversation
…ize the URI in the manner that is most appropriate for the way it encodes secret data. [PLAY-2375]
[PLAY-2375]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple things. Also, I only happened to remember the download_url_key
query parameter in the Swift/CloudFiles backends. There might be other query parameters or URI components that are sensitive in those or other implementations. We should read through the implementations and ensure we have all sensitive data (including optional things) removed in the sanitized URL for each one.
Also, when we finish this, we need to remember to back-port it to the 0.12 branch for legacy Python 2 projects.
storage/url_parser.py
Outdated
parsed_url = parsed_uri._replace( | ||
netloc=":".join((parsed_uri.hostname, str(parsed_uri.port)))) | ||
else: | ||
parsed_url = parsed_uri._replace(netloc=parsed_uri.hostname) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this might be clearer, and less redundant, if it were rewritten along the lines of:
new_netloc = parsed_uri.hostname
if parsed_uri.port is not None:
new_netloc = ":".join((new_netloc, str(parsed_uri.port)))
# _replace, etc...
storage/storage.py
Outdated
@@ -129,6 +129,10 @@ def get_download_url(self, seconds: int = 60, key: Optional[str] = None) -> str: | |||
raise NotImplementedError( | |||
"{} does not implement 'get_download_url'".format(self._class_name())) | |||
|
|||
def get_sanitized_uri(self) -> str: | |||
raise NotImplementedError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not 100% sure how I feel about this, but, since almost all of the implementations ended up being the same, should we put a default implementation for this in the base class?
One danger with doing that is that it, if a particular backend has sensitive data elsewhere in the URL (besides the netloc
) and the implementor forgets to override this implementation, it's possible that the default sanitized URL could return sensitive data.
One solution to that would be to have the default implementation be overly restrictive (say, only returning the scheme, hostname, port and path, no query parameters, etc), and then have implementations override it which want to include extra values in the URL.
Again, I'm on the fence about which of these options is best (including just leaving it as it's currently implemented). Perhaps @tjensen or @joshmarshall has an opinion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@spiralman @shroudofthurin I think I'm okay with the default implementation being in the base class and even making it overly restrictive.
One thing we could do is have the base class take an optional whitelist of identifiers indicating URI parts that are safe to include in the sanitized URI. Then, subclasses that want to include the query parameters and username can tell the base to include them. e.g.
class DerivedStorage(Storage):
def __init__(self):
super().__init__(sanitize_whitelist=[USERNAME, QUERY])
...could result in storage URIs that look like: proto://username@netloc/path?query=parameters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tjensen @spiralman @shroudofthurin While I agree that the current implementations are very similar, this is explicit boilerplate I believe to be a Good Thing[tm], since it's such an important consideration for any project using storage URIs. Thus my preference would be that it should raise, as it does in the code now (although I might prefer renaming the "sanitized_uri" helper to "sanitize_basic_auth" or something more specific.) If we do use a default behavior instead of raising, I recommend that we at least log a warning when a storage class with the "default" sterilization behavior is imported / registered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I definitely agree with renaming the sanitize_uri
helper to something more explicit (basic_auth feels a little confusing, because we're not actually using it for HTTP Basic Auth). The spec calls it "user information", so maybe remove_user_info
?
Anyway, I'm 50/50 on whether there should be a default implementation or it should raise, so it seems like we have a hung jury. :-) I guess I'd recommend @shroudofthurin make the deciding vote, since it's her PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@spiralman I think I will go with your original solution:
One solution to that would be to have the default implementation be overly restrictive (say, only returning the scheme, hostname, port and path, no query parameters, etc), and then have implementations override it which want to include extra values in the URL.
…sent, as per reviewer's request [PLAY-2375]
…equest [PLAY-2375]
…, port and path, no query parameters. [PLAY-2375]
…ctive, and then have implementations override it which want to include extra values in the URL, as per reviewer's request. [PLAY-2375]
[PLAY-2375]
@spiralman I think I've addressed all of your comments. Would you mind taking another look at this PR when you get the chance? Thanks 🍡 |
[PLAY-2375]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple things.
storage/url_parser.py
Outdated
|
||
new_query = dict(parse_qsl(parsed_uri.query)) | ||
|
||
if "download_url_key" in new_query: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is specific to Swift/CloudFiles, and should be in those implementations, not here.
tests/test_local_storage.py
Outdated
@@ -267,3 +267,20 @@ def test_local_storage_get_download_url_returns_none_on_empty_base(self) -> None | |||
|
|||
def test_local_storage_rejects_multiple_query_values_for_download_url_key_setting(self) -> None: | |||
self.assert_rejects_multiple_query_values("/foo/bar/object.mp4", "download_url_base") | |||
|
|||
def test_local_storage_get_sanitized_uri(self) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test name should indicate what the expected behavior is. In this case, there's nothing "secret" in a local file://
URI, so there is nothing to remove.
storage/url_parser.py
Outdated
return new_uri.geturl() | ||
|
||
|
||
def sanitize_filepath(parsed_uri: ParseResult) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused about the behvior of this function. It seems like it removes the username/password and all query parameters from the URL. However, that doesn't really match what the name says (I would expect it to remove sensitive information from the path part of the URI, based on the name).
…viewer's request. [PLAY-2375]
…ns, as per reviewer's request. [PLAY-2375]
[PLAY-2375]
@spiralman back to you |
[PLAY-2375]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@shroudofthurin I realized that this is missing a version number bump. Normally, we bump the minor version, for a new feature, since this is still pre-1.0. That'd make the next version 0.14.0, but if you and @joshmarshall prefer to bump the bugfix version, that's fine with me (SemVer is a little goofy before 1.0). I'm not making this a blocking comment so that you can get it merged during the day in your timezone. |
@spiralman If we bump the master version from 0.13.3 to 0.14, what should we bump the Python 2-only v0.12 branch version to? |
@tjensen Yeah, that's where it's kinda goofy. My feeling is only bump the bugfix number in the 0.12 branch, but bump the minor version in master: since Python 2 is no longer officially supported, I think we'll need to maintain the 0.12 branch on an "as needed" basis, which means SemVer won't work out very well. Given that usage of it should drop off, hopefully that's OK? I'm fine with other ideas for bumping the version. Since we haven't ever moved to 1.0, SemVer is somewhat hazy, so I suspect it mostly just matters that we bump it to a bigger number. |
[PLAY-2375]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shroudofthurin Just a few remaining documentation questions!
#### `get_sanitized_uri()` #### | ||
|
||
Removes the username/password, as well as all query paramters, form the URL. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"parameters"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to add documentation for remove_user_info()
too, since other users can create and register their own storage providers?
[PLAY-2375]
…ble documentation, included the latest Python 2.7 release as well, as per reviewer's request. [PLAY-2375]
@joshustudio I've made the requested changes 🍡 |
…s request. [PLAY-2375]
@joshustudio Added documentation for the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Add the ability for each backend to sanitize it's URI in such a way that is most appropriate for the way it encodes secret data.
@ustudio/reviewers for your consideration.