Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More correct string truncating in PyUnicode_FromFormat() #70278

Open
serhiy-storchaka opened this issue Jan 12, 2016 · 7 comments
Open

More correct string truncating in PyUnicode_FromFormat() #70278

serhiy-storchaka opened this issue Jan 12, 2016 · 7 comments
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement

Comments

@serhiy-storchaka
Copy link
Member

serhiy-storchaka commented Jan 12, 2016

BPO 26090
Nosy @gvanrossum, @vstinner, @ezio-melotti, @serhiy-storchaka

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2016-01-12.09:54:14.845>
labels = ['interpreter-core', 'type-feature']
title = 'More correct string truncating in PyUnicode_FromFormat()'
updated_at = <Date 2016-06-21.07:22:59.885>
user = 'https://github.com/serhiy-storchaka'

bugs.python.org fields:

activity = <Date 2016-06-21.07:22:59.885>
actor = 'Drekin'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Interpreter Core']
creation = <Date 2016-01-12.09:54:14.845>
creator = 'serhiy.storchaka'
dependencies = []
files = []
hgrepos = []
issue_num = 26090
keywords = []
message_count = 5.0
messages = ['258092', '258095', '258108', '258111', '258118']
nosy_count = 5.0
nosy_names = ['gvanrossum', 'vstinner', 'ezio.melotti', 'serhiy.storchaka', 'Drekin']
pr_nums = []
priority = 'normal'
resolution = None
stage = 'needs patch'
status = 'open'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue26090'
versions = ['Python 3.6']

Linked PRs

@serhiy-storchaka
Copy link
Member Author

The C code often uses %.<number><format> in PyUnicode_FromFormat(). %.200s protects from unlimited output when broken pointer points on random non-null-terminated data. %.200R is used to limit the size of human-readable messages.

In all these case formatted string can look well-formed with short data, but mis-formed (not closed quote, truncated backslash escaping or � decoded from truncated UTF-8 sequence) with long data.

I propose to make truncating in PyUnicode_FromFormat() more smart.

  1. Truncated %R should keep at least one end character (the quote or ">").
  2. Truncated output should include "..." or "[...]" as truncating sign.
  3. \c, \OOO, \xXX, \uXXXX, and \UXXXXXXXX should not be truncated. It is better to omit these sequences at all (cut the string before them) that output them truncated.
  4. Doesn't truncate UTF-8 sequence inside a character for %s.

@serhiy-storchaka serhiy-storchaka added interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement labels Jan 12, 2016
@vstinner
Copy link
Member

See my old issue bpo-10833 which proposed to *remove* the arbitrary limit
on strings. It was rejected.

@gvanrossum
Copy link
Member

Could we make this feature available at the Python level too? It sounds
really useful.

--Guido (mobile)
On Jan 12, 2016 2:01 AM, "STINNER Victor" <[email protected]> wrote:

STINNER Victor added the comment:

See my old issue bpo-10833 which proposed to *remove* the arbitrary limit
on strings. It was rejected.

----------


Python tracker <[email protected]>
<http://bugs.python.org/issue26090\>


@serhiy-storchaka
Copy link
Member Author

I think we can make this feature available with classic formatting '%.100r', but with new formatting '{0:.100!r}' (especially with f-strings) this can be not so easy.

@gvanrossum
Copy link
Member

Well it seems a little odd to spend effort on a corner case of the C-level
error messages if we can't even replicate it in pure Python.

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
serhiy-storchaka added a commit to serhiy-storchaka/cpython that referenced this issue Jun 11, 2024
PyUnicode_FromFormat() no longer produces the ending \ufffd
character for truncated C string when use precision with %s and %V.
It now truncates the string before the start of truncated multibyte sequences.
@serhiy-storchaka
Copy link
Member Author

#120365 solves item 4. It is the simplest of the mentioned problems. It only exists at the C level.

@vstinner
Copy link
Member

Truncated output should include "..." or "[...]" as truncating sign.

I suggest adding (...) to indicate that the string is truncated.

serhiy-storchaka added a commit that referenced this issue Jun 24, 2024
…-120365)

PyUnicode_FromFormat() no longer produces the ending \ufffd
character for truncated C string when use precision with %s and %V.
It now truncates the string before the start of truncated multibyte sequences.
mrahtz pushed a commit to mrahtz/cpython that referenced this issue Jun 30, 2024
…%V (pythonGH-120365)

PyUnicode_FromFormat() no longer produces the ending \ufffd
character for truncated C string when use precision with %s and %V.
It now truncates the string before the start of truncated multibyte sequences.
noahbkim pushed a commit to hudson-trading/cpython that referenced this issue Jul 11, 2024
…%V (pythonGH-120365)

PyUnicode_FromFormat() no longer produces the ending \ufffd
character for truncated C string when use precision with %s and %V.
It now truncates the string before the start of truncated multibyte sequences.
estyxx pushed a commit to estyxx/cpython that referenced this issue Jul 17, 2024
…%V (pythonGH-120365)

PyUnicode_FromFormat() no longer produces the ending \ufffd
character for truncated C string when use precision with %s and %V.
It now truncates the string before the start of truncated multibyte sequences.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

3 participants