-
-
Notifications
You must be signed in to change notification settings - Fork 30.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More correct string truncating in PyUnicode_FromFormat() #70278
Comments
The C code often uses %.<number><format> in PyUnicode_FromFormat(). %.200s protects from unlimited output when broken pointer points on random non-null-terminated data. %.200R is used to limit the size of human-readable messages. In all these case formatted string can look well-formed with short data, but mis-formed (not closed quote, truncated backslash escaping or � decoded from truncated UTF-8 sequence) with long data. I propose to make truncating in PyUnicode_FromFormat() more smart.
|
See my old issue bpo-10833 which proposed to *remove* the arbitrary limit |
Could we make this feature available at the Python level too? It sounds --Guido (mobile)
|
I think we can make this feature available with classic formatting '%.100r', but with new formatting '{0:.100!r}' (especially with f-strings) this can be not so easy. |
Well it seems a little odd to spend effort on a corner case of the C-level |
PyUnicode_FromFormat() no longer produces the ending \ufffd character for truncated C string when use precision with %s and %V. It now truncates the string before the start of truncated multibyte sequences.
#120365 solves item 4. It is the simplest of the mentioned problems. It only exists at the C level. |
I suggest adding |
…-120365) PyUnicode_FromFormat() no longer produces the ending \ufffd character for truncated C string when use precision with %s and %V. It now truncates the string before the start of truncated multibyte sequences.
…%V (pythonGH-120365) PyUnicode_FromFormat() no longer produces the ending \ufffd character for truncated C string when use precision with %s and %V. It now truncates the string before the start of truncated multibyte sequences.
…%V (pythonGH-120365) PyUnicode_FromFormat() no longer produces the ending \ufffd character for truncated C string when use precision with %s and %V. It now truncates the string before the start of truncated multibyte sequences.
…%V (pythonGH-120365) PyUnicode_FromFormat() no longer produces the ending \ufffd character for truncated C string when use precision with %s and %V. It now truncates the string before the start of truncated multibyte sequences.
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
Linked PRs
The text was updated successfully, but these errors were encountered: