Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in Tokenizer/Automatic Parenthesization for Python 3.12 #14455

Open
zacharyrs opened this issue Jun 5, 2024 · 1 comment
Open

Bug in Tokenizer/Automatic Parenthesization for Python 3.12 #14455

zacharyrs opened this issue Jun 5, 2024 · 1 comment

Comments

@zacharyrs
Copy link

zacharyrs commented Jun 5, 2024

Hey there!

I've discovered a bug with the tokenizer and automatic forward-slash-parenthesization.

Specifically, the following will result in an error when run in IPython 8.25.0 on Python 3.12.3:

1| from pathlib import Path
2| 
3| (
4|     Path(".")
5|     / f")"
6|     / "a a a a a a a a a"
7| )

Interestingly, the issue is mitigated if the f-string on line 5 is:

  • removed/replaced by a plain string
  • has any character after the ending parenthesis
  • starts with an opening parenthesis (note if there's anything before, including a space, it'll fail)

From a little digging, the tokenizer starts a new line (in tokens_by_line) when it encounters the new line at the end of the f-string.
It looks like the end parenthesis in the f-string is becoming an FSTRING_MIDDLE token, and deincrementing parenlev.

See here

def make_tokens_by_line(lines:List[str]):
"""Tokenize a series of lines and group tokens by line.
The tokens for a multiline Python string or expression are grouped as one
line. All lines except the last lines should keep their line ending ('\\n',
'\\r\\n') for this to properly work. Use `.splitlines(keeplineending=True)`
for example when passing block of text to this function.
"""
# NL tokens are used inside multiline expressions, but also after blank
# lines or comments. This is intentional - see https://bugs.python.org/issue17061
# We want to group the former case together but split the latter, so we
# track parentheses level, similar to the internals of tokenize.
# reexported from token on 3.7+
NEWLINE, NL = tokenize.NEWLINE, tokenize.NL # type: ignore
tokens_by_line: List[List[Any]] = [[]]
if len(lines) > 1 and not lines[0].endswith(("\n", "\r", "\r\n", "\x0b", "\x0c")):
warnings.warn(
"`make_tokens_by_line` received a list of lines which do not have lineending markers ('\\n', '\\r', '\\r\\n', '\\x0b', '\\x0c'), behavior will be unspecified",
stacklevel=2,
)
parenlev = 0
try:
for token in tokenutil.generate_tokens_catch_errors(
iter(lines).__next__, extra_errors_to_catch=["expected EOF"]
):
tokens_by_line[-1].append(token)
if (token.type == NEWLINE) \
or ((token.type == NL) and (parenlev <= 0)):
tokens_by_line.append([])
elif token.string in {'(', '[', '{'}:
parenlev += 1
elif token.string in {')', ']', '}'}:
if parenlev > 0:
parenlev -= 1
except tokenize.TokenError:
# Input ended in a multiline string or expression. That's OK for us.
pass
if not tokens_by_line[-1]:
tokens_by_line.pop()
return tokens_by_line

This issue does not occur on an older version of Python (e.g., 3.11.x), even when running the latest version of IPython.

@zacharyrs zacharyrs changed the title Bug in Tokenizer for 3.12 Bug in Tokenizer/Automatic Parenthesization for Python 3.12 Jun 5, 2024
@Carreau
Copy link
Member

Carreau commented Jun 19, 2024

thanks for the report, i'll see what I can do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants