PEP 701 – Syntactic formalization of f-strings #102856

pablogsal · 2023-03-20T22:29:07Z

Changes in the C tokenizer
Categorize failing tests
Fix failing tests or modify/remove them as needed
Changes in Python tokenizer

Linked PRs

pablogsal · 2023-03-20T22:30:59Z

CC: @lysnikolaou @isidentical @mgmacias95

pablogsal · 2023-03-20T22:31:18Z

See this for the latest report on errors from @isidentical

pablogsal · 2023-03-20T22:31:36Z

Draft PR for the C tokenizer up: #102855

pablogsal · 2023-03-20T22:37:14Z

Things for the cleanup of #102855:

Cleaning up the grammar and the action helpers (the names are still ridiculous and there are multiple rules commented out).
Remove the old parsing code and check that we didn't break anything 😅
Clean/refactor the tokenizer struct (better names, factor stuff into its own structure as needed).
Consider factoring out tok_get_fstring_mode because is a monster.

pablogsal · 2023-03-20T22:49:30Z

Ok with #102855 we have the following failing tests:

Most of these are updating error messages, line numbers and other stuff but some may have actual bugs so we should check them. Please, mention which ones are you working on so we don't clash with one another.

mgmacias95 · 2023-03-20T23:03:11Z

Working on test_tokenize

Eclips4 · 2023-03-21T08:46:22Z

Hello, Pablo!
Can I get work on test_ast?
Recently I sent some PR's about this file (for example, #102797). So, I have some experience in that =)

ramvikrams · 2023-03-21T10:59:07Z

I can work with test_type_comments and test_unparse.

pablogsal · 2023-03-21T11:10:18Z

@Eclips4 @ramvikrams wonderful! Just make PRs against my fork!

Report here or ping any of us if you find something that could be a bug (don't just fix the tests blindly because there may be bugs lurking).

pablogsal · 2023-03-21T11:11:58Z

@lysnikolaou can you work on cleaning up the grammar + the actions?

@isidentical can you work on cleaning up some of the tokenizer layers? (This is quite a lot so we can probably work together here).

Eclips4 · 2023-03-21T16:10:53Z

@pablogsal
About test_ast.py
Seems thats like there only a one test will be failed, and how I undestand, that's a bug:

cpython/Lib/test/test_ast.py

Lines 779 to 780 in 7f760c2

    
           with self.assertRaises(SyntaxError): 
        
               ast.parse('f"{x=}"', feature_version=(3, 7))

I think, there's two solutions:

Remove this test, because support of python3.7 will be ended soon.
Now errors raised by tokenizer.c instead of string_parser.c, so as I understand, we should change python_gram, is it right? ( We need access to feature_version, which in tokenizer inaccessible )

pablogsal · 2023-03-21T16:59:02Z

2. Now errors raised by tokenizer.c instead of string_parser.c, so as I understand, we should change python_gram, is it right? ( We need access to feature_version, which in tokenizer inaccessible )

Probably we can do this but on the other hand I would prefer to not overcomplicate this so I think (1) is better

lysnikolaou · 2023-03-21T19:20:25Z

@lysnikolaou can you work on cleaning up the grammar + the actions?

Will do!

Eclips4 · 2023-03-21T21:01:37Z

Also, I can take a look at test_cmd_line_script. Seems easy.

pablogsal · 2023-03-21T21:08:34Z

Also, I can take a look at test_cmd_line_script. Seems easy.

All yours!

CharlieZhao95 · 2023-03-22T03:27:28Z

I found that no one has claimed test_eof yet, so I made some work. :)
Failed test case: test_eof.test_eof_with_line_continuation

I looked at its commit history. This test case is a regression test for crash, so it seems like a good choice to keep the case and update the error message directly.

cpython/Lib/test/test_eof.py

Lines 39 to 40 in 72186aa

    
           def test_eof_with_line_continuation(self): 
        
               expect = "unexpected EOF while parsing (<string>, line 1)"

Update unexpected EOF while parsing (<string>, line 1) to (unicode error) 'unicodeescape' codec can't decode bytes in position 0-1: truncated \xXX escape (<string>, line 1)

ramvikrams · 2023-03-23T05:50:10Z

@pablogsal in test_type_comments we have not used any f-strings.

pablogsal · 2023-03-23T13:36:33Z

@pablogsal in test_type_comments we have not used any f-strings.

The one failing there is this problem:

======================================================================
FAIL: test_fstring (test.test_type_comments.TypeCommentTests.test_fstring)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/work/cpython/cpython-ro-srcdir/Lib/test/test_type_comments.py", line 275, in test_fstring
    for tree in self.parse_all(fstring, minver=6):
  File "/home/runner/work/cpython/cpython-ro-srcdir/Lib/test/test_type_comments.py", line 239, in parse_all
    with self.assertRaisesRegex(SyntaxError, expected_regex,
AssertionError: SyntaxError not raised : feature_version=(3, 4)

----------------------------------------------------------------------

which I think is a feature version problem.

Eclips4 · 2023-03-23T17:55:03Z

@pablogsal in test_type_comments we have not used any f-strings.

The one failing there is this problem:

======================================================================
FAIL: test_fstring (test.test_type_comments.TypeCommentTests.test_fstring)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/work/cpython/cpython-ro-srcdir/Lib/test/test_type_comments.py", line 275, in test_fstring
    for tree in self.parse_all(fstring, minver=6):
  File "/home/runner/work/cpython/cpython-ro-srcdir/Lib/test/test_type_comments.py", line 239, in parse_all
    with self.assertRaisesRegex(SyntaxError, expected_regex,
AssertionError: SyntaxError not raised : feature_version=(3, 4)

----------------------------------------------------------------------

which I think is a feature version problem.

cpython/Lib/test/test_type_comments.py

Line 223 in 4695709

lowest = 4 # Lowest minor version supported

We can just change this to 6, and this test will be pass.
I don't research this problem, but this solution looks like the simplest.
However... supporting syntax of python3.4 && 3.5 looks cinda strange.

pablogsal · 2023-03-23T18:24:29Z

We can just change this to 6, and this test will be pass.
I don't research this problem, but this solution looks like the simplest.
However... supporting syntax of python3.4 && 3.5 looks cinda strange.

I don't think that will work because we are not doing version checking anymore. See previous comments. The fix is probably to no pass feature_version.

sunmy2019 · 2023-03-24T08:55:05Z

Looks like no one is analyzing test_exceptions. I will look into it these two days.

4 platforms seem to have met the same problem here.

pablogsal · 2023-03-24T10:54:31Z

@isidentical @lysnikolaou i have pushed some rules for error messages, please take a look and complete them with more if you have some spare cycles. With these the failures in test_fstring have decreased notably

isidentical · 2023-03-25T00:36:44Z

I can confirm that the total number of failures has been decrased from 88 to 63. I'll try to see what are the most high impact ones and submit a PR to clear them.

isidentical · 2023-03-25T02:46:33Z

If anyone intends to work on any of the remaining tasks in test_fstring, please double check with this PR (pablogsal#52) since it brings down the total failures to 30 with some explanations/required decisions for the rest.

sunmy2019 · 2023-03-25T04:12:14Z

After looking into the failure in test_exceptions.

check(b'Python = "\xcf\xb3\xf2\xee\xed" +', 1, 18)

Old parser and the new parser raises the same exception (UnicodeDecodeError), but with different col_offset. This is because it was raised by the wrong token.

I would consider it a bug in the old parser. Just as this comment mentions,

cpython/Parser/string_parser.c

Lines 38 to 48 in 64cb1a4

    
                       /* This is needed, in order for the SyntaxError to point to the token t, 
        
                          since _PyPegen_raise_error uses p->tokens[p->fill - 1] for the 
        
                          error location, if p->known_err_token is not set. */ 
        
                       p->known_err_token = t; 
        
                       if (octal) { 
        
                           RAISE_SYNTAX_ERROR("invalid octal escape sequence '\\%.3s'", 
        
                                              first_invalid_escape); 
        
                       } 
        
                       else { 
        
                           RAISE_SYNTAX_ERROR("invalid escape sequence '\\%c'", c); 
        
                       }

the error token was not correctly set in the old parser.

Maybe we should open an issue for the old parse? But the possible fixing might be error-prone, since we might need to keep track of every possible code path.

As for the new parser, I think a change in the test case would be fine.

sunmy2019 · 2023-03-31T10:44:01Z

I am working on an PR to fix test_unparse this weekend.

_PyPegen_concatenate_strings did not implement concatenating empty Constant with FormattedValue, resulting unparse failure.

sunmy2019 · 2023-03-31T20:04:19Z

Hi, I got some bad news.

I have been testing against memory leaks with ./python -m test -j $(nproc) -R :
~30% of the tests failed on current head 270b661

For example,

0:01:01 load avg: 13.77 [157/433/42] test_unittest failed (reference leak)
beginning 9 repetitions
test_unittest leaked [89, 89, 89, 89] references, sum=356
test_unittest leaked [90, 89, 89, 89] memory blocks, sum=357
.......
0:01:16 load avg: 15.00 [185/433/49] test_inspect failed (reference leak)
beginning 9 repetitions
test_inspect leaked [429, 429, 429, 429] references, sum=1716
test_inspect leaked [318, 318, 318, 317] memory blocks, sum=1271

These references are most likely to leak during the compilation (such as using import, using compile/exec/eval, or using ast.parse)

We might need to look into that.

Update: Memory Leakage fixed by commit pablogsal@d8b12e2

The root cause is that someone forgot to use _PyArena_AddPyObject in 3 places.

This is very tricky because _PyArena_AddPyObject was scattered in many subroutines. Sometimes you should add _PyArena_AddPyObject, but sometimes you should not (add will cause a negative ref count).

Just like an old saying, managing memory by hand is so much pain, and also error-prone. This check can be done, by analyzing the PyObject*s registered to the arena by the time the AST was created, but that is a totally different story.

pablogsal · 2023-04-06T12:14:09Z

Only 12 test left in test_fstring and we are ready to go!

)

…04824) Co-authored-by: Pablo Galindo Salgado <[email protected]> Co-authored-by: Jelle Zijlstra <[email protected]>

…cs (pythonGH-104824) (cherry picked from commit c45701e) Co-authored-by: Marta Gómez Macías <[email protected]> Co-authored-by: Pablo Galindo Salgado <[email protected]> Co-authored-by: Jelle Zijlstra <[email protected]>

…ocs (GH-104824) (#104847) gh-102856: Add changes related to PEP 701 in 3.12 What's New docs (GH-104824) (cherry picked from commit c45701e) Co-authored-by: Marta Gómez Macías <[email protected]> Co-authored-by: Pablo Galindo Salgado <[email protected]> Co-authored-by: Jelle Zijlstra <[email protected]>

pablogsal · 2023-05-24T10:08:58Z

Closing this as we already have a What's New entry, C tokenizer, and Python tokenizer. Let's tackle any small remaining items in separate issues from now on.

Probably we need to alter https://docs.python.org/3/reference/lexical_analysis.html#formatted-string-literals. @lysnikolaou can you take a go at that?

(cherry picked from commit 3e97c00) Co-authored-by: Hugo van Kemenade <[email protected]>

lysnikolaou · 2023-05-24T10:21:29Z

Probably we need to alter https://docs.python.org/3/reference/lexical_analysis.html#formatted-string-literals. @lysnikolaou can you take a go at that?

Sure!

Co-authored-by: Hugo van Kemenade <[email protected]>

…PEP701

…r PEP701

…01 (#104861)

…r PEP701 (pythonGH-104861) (cherry picked from commit 8e5b3b9) Co-authored-by: Lysandros Nikolaou <[email protected]>

…er PEP701 (GH-104861) (#104865) (cherry picked from commit 8e5b3b9) Co-authored-by: Lysandros Nikolaou <[email protected]>

flying-sheep · 2023-10-02T23:27:03Z

Ah great! I didn't know someone would continue PEP 536. Happy that it's happening!

Shouldn't PEP 536 be mentioned in this one?

pablogsal · 2023-10-03T00:00:43Z

Ah great! I didn't know someone would continue PEP 536. Happy that it's happening!

Shouldn't PEP 536 be mentioned in this one?

It's mentioned in the PEP:

https://peps.python.org/pep-0701/

Btw as heads up: We don't monitor normally closed issues so is very likely that people won't answer to comments when the issue is closed :)

flying-sheep · 2023-10-03T10:29:23Z

Thanks! Seems like I missed the mention then. Perfect!

bedevere-bot mentioned this issue Mar 20, 2023

gh-102856: Initial implementation of PEP 701 #102855

Merged

sunmy2019 mentioned this issue Mar 25, 2023

Handle invalid expressions pablogsal/cpython#54

Merged

bedevere-bot mentioned this issue May 21, 2023

gh-102856: Tokenize performance improvement #104731

Merged

pablogsal pushed a commit that referenced this issue May 22, 2023

gh-102856: Tokenize performance improvement (#104731)

8817886

lysnikolaou pushed a commit that referenced this issue May 22, 2023

gh-102856: Allow comments inside multi-line f-string expresions (#104006

0a77960

)

bedevere-bot mentioned this issue May 23, 2023

gh-102856: Add changes related to PEP 701 in 3.12 What's New docs #104824

Merged

pablogsal added a commit that referenced this issue May 24, 2023

gh-102856: Add changes related to PEP 701 in 3.12 What's New docs (#1…

c45701e

…04824) Co-authored-by: Pablo Galindo Salgado <[email protected]> Co-authored-by: Jelle Zijlstra <[email protected]>

bedevere-bot mentioned this issue May 24, 2023

[3.12] gh-102856: Add changes related to PEP 701 in 3.12 What's New docs (GH-104824) #104847

Merged

pablogsal closed this as completed May 24, 2023

bedevere-bot mentioned this issue May 24, 2023

gh-102856: Add missing quote to fix doctest #104852

Merged

miss-islington pushed a commit to miss-islington/cpython that referenced this issue May 24, 2023

pythongh-102856: Add missing quote to fix doctest (pythonGH-104852)

ec62b10

(cherry picked from commit 3e97c00) Co-authored-by: Hugo van Kemenade <[email protected]>

bedevere-bot mentioned this issue May 24, 2023

[3.12] gh-102856: Add missing quote to fix doctest (GH-104852) #104854

Merged

hugovk added a commit that referenced this issue May 24, 2023

gh-102856: Add missing quote to fix doctest (#104852)

3e97c00

hugovk added a commit that referenced this issue May 24, 2023

[3.12] gh-102856: Add missing quote to fix doctest (GH-104852) (#104854)

2d685ec

Co-authored-by: Hugo van Kemenade <[email protected]>

lysnikolaou added a commit to lysnikolaou/cpython that referenced this issue May 24, 2023

pythongh-102856: Update Formatted string literals docs section after …

e437849

…PEP701

lysnikolaou added a commit to lysnikolaou/cpython that referenced this issue May 24, 2023

pythongh-102856: Update "Formatted string literals" docs section afte…

298a693

…r PEP701

bedevere-bot mentioned this issue May 24, 2023

gh-102856: Update "Formatted string literals" docs section after PEP701 #104861

Merged

lysnikolaou added a commit that referenced this issue May 24, 2023

gh-102856: Update "Formatted string literals" docs section after PEP7…

8e5b3b9

…01 (#104861)

bedevere-bot mentioned this issue May 24, 2023

[3.12] gh-102856: Update "Formatted string literals" docs section after PEP701 (GH-104861) #104865

Merged

lysnikolaou added a commit that referenced this issue May 24, 2023

[3.12] gh-102856: Update "Formatted string literals" docs section aft…

25890eb

…er PEP701 (GH-104861) (#104865) (cherry picked from commit 8e5b3b9) Co-authored-by: Lysandros Nikolaou <[email protected]>

erlend-aasland mentioned this issue Jun 6, 2023

3.12 backport gh 105236 #105358

Closed

graingert mentioned this issue Aug 16, 2023

#11857 test on py 3.12rc twisted/twisted#11910

Merged

Erotemic mentioned this issue Oct 23, 2023

Tokenize generate_tokens regression in CPython 3.12 #111224

Closed

hugovk mentioned this issue Dec 10, 2023

Change in tokenize.generate_tokens behaviour with non-ASCII #112943

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PEP 701 – Syntactic formalization of f-strings #102856

PEP 701 – Syntactic formalization of f-strings #102856

pablogsal commented Mar 20, 2023 •

edited by bedevere-bot

Loading

pablogsal commented Mar 20, 2023

pablogsal commented Mar 20, 2023

pablogsal commented Mar 20, 2023

pablogsal commented Mar 20, 2023 •

edited

Loading

pablogsal commented Mar 20, 2023 •

edited

Loading

mgmacias95 commented Mar 20, 2023

Eclips4 commented Mar 21, 2023

ramvikrams commented Mar 21, 2023

pablogsal commented Mar 21, 2023

pablogsal commented Mar 21, 2023

Eclips4 commented Mar 21, 2023 •

edited

Loading

pablogsal commented Mar 21, 2023

lysnikolaou commented Mar 21, 2023

Eclips4 commented Mar 21, 2023

pablogsal commented Mar 21, 2023

CharlieZhao95 commented Mar 22, 2023

ramvikrams commented Mar 23, 2023

pablogsal commented Mar 23, 2023 •

edited

Loading

Eclips4 commented Mar 23, 2023

pablogsal commented Mar 23, 2023

sunmy2019 commented Mar 24, 2023

pablogsal commented Mar 24, 2023

isidentical commented Mar 25, 2023

isidentical commented Mar 25, 2023

sunmy2019 commented Mar 25, 2023

sunmy2019 commented Mar 31, 2023

sunmy2019 commented Mar 31, 2023 •

edited

Loading

pablogsal commented Apr 6, 2023

pablogsal commented May 24, 2023 •

edited

Loading

lysnikolaou commented May 24, 2023

flying-sheep commented Oct 2, 2023

pablogsal commented Oct 3, 2023

flying-sheep commented Oct 3, 2023

PEP 701 – Syntactic formalization of f-strings #102856

PEP 701 – Syntactic formalization of f-strings #102856

Comments

pablogsal commented Mar 20, 2023 • edited by bedevere-bot Loading

Linked PRs

pablogsal commented Mar 20, 2023

pablogsal commented Mar 20, 2023

pablogsal commented Mar 20, 2023

pablogsal commented Mar 20, 2023 • edited Loading

pablogsal commented Mar 20, 2023 • edited Loading

mgmacias95 commented Mar 20, 2023

Eclips4 commented Mar 21, 2023

ramvikrams commented Mar 21, 2023

pablogsal commented Mar 21, 2023

pablogsal commented Mar 21, 2023

Eclips4 commented Mar 21, 2023 • edited Loading

pablogsal commented Mar 21, 2023

lysnikolaou commented Mar 21, 2023

Eclips4 commented Mar 21, 2023

pablogsal commented Mar 21, 2023

CharlieZhao95 commented Mar 22, 2023

ramvikrams commented Mar 23, 2023

pablogsal commented Mar 23, 2023 • edited Loading

Eclips4 commented Mar 23, 2023

pablogsal commented Mar 23, 2023

sunmy2019 commented Mar 24, 2023

pablogsal commented Mar 24, 2023

isidentical commented Mar 25, 2023

isidentical commented Mar 25, 2023

sunmy2019 commented Mar 25, 2023

sunmy2019 commented Mar 31, 2023

sunmy2019 commented Mar 31, 2023 • edited Loading

pablogsal commented Apr 6, 2023

pablogsal commented May 24, 2023 • edited Loading

lysnikolaou commented May 24, 2023

flying-sheep commented Oct 2, 2023

pablogsal commented Oct 3, 2023

flying-sheep commented Oct 3, 2023

pablogsal commented Mar 20, 2023 •

edited by bedevere-bot

Loading

pablogsal commented Mar 20, 2023 •

edited

Loading

pablogsal commented Mar 20, 2023 •

edited

Loading

Eclips4 commented Mar 21, 2023 •

edited

Loading

pablogsal commented Mar 23, 2023 •

edited

Loading

sunmy2019 commented Mar 31, 2023 •

edited

Loading

pablogsal commented May 24, 2023 •

edited

Loading