Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-41076: Pre-feed the parser with the f-string expression location #21054

Merged
merged 6 commits into from
Jun 27, 2020

Conversation

lysnikolaou
Copy link
Contributor

@lysnikolaou lysnikolaou commented Jun 22, 2020

This PR changes the parsing of f-string expressions with the new
parser. The parser gets pre-fed with the location of the expression
itself (not the f-string, which was what we were doing before). This
allows us to completely skip the shifting of the AST nodes after
the parsing is completed.

https://bugs.python.org/issue41076

This PR changes the parsing of f-string expressions with the new
parser. The parser gets pre-fed with the location of the expression
itself (not the f-string, which was what we were doing before). This
allows us to completely skip the shifting of the AST nodes after
the parsing is completed.
@pablogsal
Copy link
Member

Wow, this is an excellent insight and looks fantastic! I can try to review it later tomorrow but I have an initial question:

Since we are storing the offset in fstring_col_offset, what happens with nested f-strings? Every parser instance has its own 'fstring_col_offset' and therefore they will not collide or there is something to consider on the propagation of these values?

@lysnikolaou
Copy link
Contributor Author

lysnikolaou commented Jun 22, 2020

Since we are storing the offset in fstring_col_offset, what happens with nested f-strings? Every parser instance has its own 'fstring_col_offset' and therefore they will not collide or there is something to consider on the propagation of these values?

Nested f-strings should be handled correctly already. I checked test_fstring.py for tests the test the node locations of nested f-strings before pushing and there is one, so I guess it's ok. Hand-testing with some more examples verified this.

fstring_col_offset actually only gets used for the caret in the case of an error. Because only the expression part of the f-string is shown on an error, we need to shift the caret accordingly.

Nested f-strings are no problem because t->col_offset points to the correct column offset of the nested f-string (this is due to the pre-feeding of the parser with a beginning offset and due to correctly computing of the offset in _PyPegen_fill_token when it first gets fetched from the tokenizer). So, if the column offset of the nested fstring is correct, it only gets down to fstring_find_expr_location computing the correct amount of characters between the start of the nested f-string and the start of its expression. This function has been battle-tested by previous Python versions and in the old version of the new parser (the one before this PR), so I guess we're okay.

Does this make sense?

@pablogsal
Copy link
Member

pablogsal commented Jun 22, 2020

Although is not caused by this PR, I noticed that we have some inconsistency with the old parser regarding f-strings. For instance:

❯ ./python -X oldparser
Python 3.9.0b3+ (heads/deprecation_warnings-dirty:21b84dd204, Jun 21 2020, 19:23:13)
[GCC 10.1.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> x =23; f'\n{1+2} - {lel $ 2}'
  File "<fstring>", line 1
    (lel $ 2)
         ^
SyntaxError: invalid syntax
>>>
❯ ./python
>>> x =23; f'\n{1+2} - {lel $ 2}'
  File "<stdin>", line 1
    (lel $ 2)
             ^
SyntaxError: invalid syntax
>>>

@lysnikolaou
Copy link
Contributor Author

Oh, this is a bug that needs to be fixed. I think I know what's causing it.

@lysnikolaou
Copy link
Contributor Author

I realised that there was a bug in my implementation, in case the fstring wasn't starting at offset 0 (when we compute the offset of the caret in the error text, we also need to account for all the characters before the f-string, in case it's not at offset 0). The fix also fixes the bug @pablogsal mentioned above, so that's a win-win.

@pablogsal
Copy link
Member

Closing and reopening to trigger the CI again

@pablogsal pablogsal closed this Jun 27, 2020
@pablogsal pablogsal reopened this Jun 27, 2020
Parser/string_parser.c Outdated Show resolved Hide resolved
@pablogsal pablogsal merged commit 1f0f4ab into python:master Jun 27, 2020
@miss-islington
Copy link
Contributor

Thanks @lysnikolaou for the PR, and @pablogsal for merging it 🌮🎉.. I'm working now to backport this PR to: 3.9.
🐍🍒⛏🤖 I'm not a witch! I'm not a witch!

@miss-islington
Copy link
Contributor

Sorry, @lysnikolaou and @pablogsal, I could not cleanly backport this to 3.9 due to a conflict.
Please backport using cherry_picker on command line.
cherry_picker 1f0f4abb110b9fbade6175842b6a26ab0b8df6dd 3.9

pablogsal pushed a commit to pablogsal/cpython that referenced this pull request Jun 27, 2020
…ation (pythonGH-21054)

This commit changes the parsing of f-string expressions with the new parser. The parser gets pre-fed with the location of the expression itself (not the f-string, which was what we were doing before). This allows us to completely skip the shifting of the AST nodes after the parsing is completed..
(cherry picked from commit 1f0f4ab)

Co-authored-by: Lysandros Nikolaou <[email protected]>
@bedevere-bot
Copy link

GH-21190 is a backport of this pull request to the 3.9 branch.

@bedevere-bot bedevere-bot removed the needs backport to 3.9 only security fixes label Jun 27, 2020
pablogsal added a commit that referenced this pull request Jun 28, 2020
…ation (GH-21054) (GH-21190)

This commit changes the parsing of f-string expressions with the new parser. The parser gets pre-fed with the location of the expression itself (not the f-string, which was what we were doing before). This allows us to completely skip the shifting of the AST nodes after the parsing is completed..
(cherry picked from commit 1f0f4ab)
@lysnikolaou
Copy link
Contributor Author

🎉 🎉

I feel like this is one of the most major improvements we've had lately. Thanks for the review @pablogsal!

@pablogsal
Copy link
Member

I feel like this is one of the most major improvements we've had lately. Thanks for the review @pablogsal!

Yeah, me too! This was a very good insight and a great job. Well done!

fasih pushed a commit to fasih/cpython that referenced this pull request Jun 29, 2020
…ythonGH-21054)

This commit changes the parsing of f-string expressions with the new parser. The parser gets pre-fed with the location of the expression itself (not the f-string, which was what we were doing before). This allows us to completely skip the shifting of the AST nodes after the parsing is completed.
arun-mani-j pushed a commit to arun-mani-j/cpython that referenced this pull request Jul 21, 2020
…ythonGH-21054)

This commit changes the parsing of f-string expressions with the new parser. The parser gets pre-fed with the location of the expression itself (not the f-string, which was what we were doing before). This allows us to completely skip the shifting of the AST nodes after the parsing is completed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants