Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-12910: update and correct quote docstring #2568

Merged
merged 1 commit into from
Apr 10, 2019

Conversation

joernhees
Copy link
Contributor

@joernhees joernhees commented Jul 4, 2017

Fixes some mistakes and misleadings in the quote function docstring:

  • reserved chars are never actually used by quote code, unreserved chars are
  • reserved chars in comment were wrong and incomplete
  • mentioned that use-case is not minimal quoting wrt. RFC, but cautious quoting

see discussion in https://bugs.python.org/issue12910

@the-knights-who-say-ni
Copy link

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept your contribution by verifying you have signed the PSF contributor agreement (CLA).

Unfortunately our records indicate you have not signed the CLA. For legal reasons we need you to sign this before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

Thanks again to your contribution and we look forward to looking at it!

Fixes some mistakes and misleadings in the quote function docstring:
- reserved chars are never actually used by quote code, unreserved chars are
- reserved chars were wrong and incomplete
- mentioned that use-case is not minimal quoting wrt. RFC, but cautious quoting
@joernhees
Copy link
Contributor Author

@the-knights-who-say-ni re-check?

@Mariatta Mariatta added docs Documentation in the Doc dir and removed CLA not signed labels Jul 7, 2017
@Mariatta Mariatta removed the docs Documentation in the Doc dir label Jul 10, 2017
@openandclose
Copy link

This PR is being in 'awaiting review' state for a while.

Meanwhile the doc has been simply wrong about tilde and reserved characters.
Can someone notify someone who might be interested?

The problem seems old,
but closely related to the recent commit updating to RFC 3986.
#173

@openandclose
Copy link

openandclose commented Mar 2, 2019

@joernhees @brettcannon
What dou you think about just fixing the quotational parts first?

openandclose@8ee3beb
openandclose@4a6f592

@joernhees
Copy link
Contributor Author

the "quotational" parts the comments in the code referring to the RFCs?

From my point of view it would make sense to update them, but not without clearly pointing out that the code does not strictly follow them for backwards compatibility reasons

@openandclose
Copy link

openandclose commented Mar 5, 2019

I think I understand your point.
Please see the original commit message of the above.
openandclose@db4fd2f
(I omitted these since your commit should treat them.)

One problem is that to 'strictly follow',
they should state what is the status of these characters ("%/<>^`{}).
But I couldn't find out in RFC 3986.

@joernhees
Copy link
Contributor Author

Not sure if i fully understand your statement / question: the list of chars "%/<>^`{} you present covers different areas (in RFC 3986):

  • characters that aren't allowed in a URI at all ("<>^`{}, but also many more likeäöüß | (see RFC3987 for IRIs and fun like that))
  • "reserved characters" that can appear in a URI, but have special meaning (/)
  • % which can only be part of a URI to escape something (in other words, if % is not followed by 2 HEXDIGs then the URI is invalid)

Still i'm quite sure that discussing those and what should've been stated in the RFC in the end doesn't lead us anywhere ;)

This PR is about fixing the mistakes in the docs that might mislead people (e.g., myself). The quote function %-escapes all characters that are neither in the unreserved chars from RFC 3986 (_ALWAYS_SAFE) nor the additional chars set via the safe arg. I still think that's the best way to explain it, which is why i made this PR.

@openandclose
Copy link

openandclose commented Mar 6, 2019

Sorry for the confusion, I shouldn't have included '/' and '%'.
The actual characters I intended are: "<>^`{|}.

I still feel strange about the passing mention in RFC 3987,
and the fact URI spec has to refer to IRI spec for 'ascii' characters.
But let's not talk about it now.

I brought up the two commits above, since they are so simple
it's almost absurd to say 'awaiting review'.
(I'm not interested in the commits themselves, the contents are included in your commit anyway.)

@csabella csabella requested a review from orsenthil April 9, 2019 23:50
Copy link
Member

@orsenthil orsenthil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @joernhees and @csabella

@orsenthil
Copy link
Member

Give me a moment - I skipped the reference to the change in characters in the comment, and I only assumed it was grammar correction. I am looking at this again.

"$" | "," | "~"
unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
reserved = gen-delims / sub-delims
gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change the "/" character to "|" first. Even in the comments, if we preserved the previous doc, it won't intuitively break our reading. At other places, "|" is used for or condition.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it was used as per the RFC example:
https://tools.ietf.org/html/rfc3986#appendix-A

This is okay!

@orsenthil
Copy link
Member

Thanks for this discussion. The documentation change makes sense.

  1. For a long time, we had a wrong information in the doc string.
  2. Now, this is corrected with the mention of this.
    The quote function %-escapes all characters that are neither in the
    unreserved chars ("always safe") nor the additional chars set via the
    safe arg.

So, this should cover for the usage scenarios, and I am going to ahead with +1 with this change.

Thank you!

@miss-islington
Copy link
Contributor

Thanks @joernhees for the PR, and @orsenthil for merging it 🌮🎉.. I'm working now to backport this PR to: 3.7.
🐍🍒⛏🤖

@bedevere-bot
Copy link

@orsenthil: Please replace # with GH- in the commit message next time. Thanks!

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Apr 10, 2019
Fixes some mistakes and misleadings in the quote function docstring:
- reserved chars are never actually used by quote code, unreserved chars are
- reserved chars were wrong and incomplete
- mentioned that use-case is not minimal quoting wrt. RFC, but cautious quoting
(cherry picked from commit 750d74f)

Co-authored-by: Jörn Hees <[email protected]>
@bedevere-bot
Copy link

GH-12754 is a backport of this pull request to the 3.7 branch.

miss-islington added a commit that referenced this pull request Apr 10, 2019
Fixes some mistakes and misleadings in the quote function docstring:
- reserved chars are never actually used by quote code, unreserved chars are
- reserved chars were wrong and incomplete
- mentioned that use-case is not minimal quoting wrt. RFC, but cautious quoting
(cherry picked from commit 750d74f)

Co-authored-by: Jörn Hees <[email protected]>
@csabella
Copy link
Contributor

Thank @joernhees for the PR and thank you @orsenthil for the review and merge!

@openandclose
Copy link

@orsenthil change also parse.rst.
'~' was wrongly refered as 'reserved' there.

https://github.com/python/cpython/blob/master/Doc/library/urllib.parse.rst

... versionchanged:: 3.7
Moved from :rfc:2396 to :rfc:3986 for quoting URL strings. "~" is now
included in the set of reserved characters.

@orsenthil
Copy link
Member

@openandclose - Sure. Thanks for this note.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants