Bug in Tokenizer/Automatic Parenthesization for Python 3.12 #14455

zacharyrs · 2024-06-05T03:05:06Z

Hey there!

I've discovered a bug with the tokenizer and automatic forward-slash-parenthesization.

Specifically, the following will result in an error when run in IPython 8.25.0 on Python 3.12.3:

1| from pathlib import Path
2| 
3| (
4|     Path(".")
5|     / f")"
6|     / "a a a a a a a a a"
7| )

Interestingly, the issue is mitigated if the f-string on line 5 is:

removed/replaced by a plain string
has any character after the ending parenthesis
starts with an opening parenthesis (note if there's anything before, including a space, it'll fail)

From a little digging, the tokenizer starts a new line (in tokens_by_line) when it encounters the new line at the end of the f-string.
It looks like the end parenthesis in the f-string is becoming an FSTRING_MIDDLE token, and deincrementing parenlev.

See here

ipython/IPython/core/inputtransformer2.py

Lines 511 to 556 in 1b4607f

 def make_tokens_by_line(lines:List[str]): 

 """Tokenize a series of lines and group tokens by line. 

  The tokens for a multiline Python string or expression are grouped as one 

  line. All lines except the last lines should keep their line ending ('\\n', 

  '\\r\\n') for this to properly work. Use `.splitlines(keeplineending=True)` 

  for example when passing block of text to this function. 

  """ 

 # NL tokens are used inside multiline expressions, but also after blank 

 # lines or comments. This is intentional - see https://bugs.python.org/issue17061 

 # We want to group the former case together but split the latter, so we 

 # track parentheses level, similar to the internals of tokenize. 

 # reexported from token on 3.7+ 

 NEWLINE, NL = tokenize.NEWLINE, tokenize.NL # type: ignore 

 tokens_by_line: List[List[Any]] = [[]] 

 if len(lines) > 1 and not lines[0].endswith(("\n", "\r", "\r\n", "\x0b", "\x0c")): 

 warnings.warn( 

 "`make_tokens_by_line` received a list of lines which do not have lineending markers ('\\n', '\\r', '\\r\\n', '\\x0b', '\\x0c'), behavior will be unspecified", 

 stacklevel=2, 

 ) 

 parenlev = 0 

 try: 

 for token in tokenutil.generate_tokens_catch_errors( 

 iter(lines).__next__, extra_errors_to_catch=["expected EOF"] 

 ): 

 tokens_by_line[-1].append(token) 

 if (token.type == NEWLINE) \ 

 or ((token.type == NL) and (parenlev <= 0)): 

 tokens_by_line.append([]) 

 elif token.string in {'(', '[', '{'}: 

 parenlev += 1 

 elif token.string in {')', ']', '}'}: 

 if parenlev > 0: 

 parenlev -= 1 

 except tokenize.TokenError: 

 # Input ended in a multiline string or expression. That's OK for us. 

 pass 

 if not tokens_by_line[-1]: 

 tokens_by_line.pop() 

 return tokens_by_line

This issue does not occur on an older version of Python (e.g., 3.11.x), even when running the latest version of IPython.

The text was updated successfully, but these errors were encountered:

Carreau · 2024-06-19T14:56:00Z

thanks for the report, i'll see what I can do.

zacharyrs changed the title ~~Bug in Tokenizer for 3.12~~ Bug in Tokenizer/Automatic Parenthesization for Python 3.12 Jun 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug in Tokenizer/Automatic Parenthesization for Python 3.12 #14455

Bug in Tokenizer/Automatic Parenthesization for Python 3.12 #14455

zacharyrs commented Jun 5, 2024 •

edited

Loading

Carreau commented Jun 19, 2024

Bug in Tokenizer/Automatic Parenthesization for Python 3.12 #14455

Bug in Tokenizer/Automatic Parenthesization for Python 3.12 #14455

Comments

zacharyrs commented Jun 5, 2024 • edited Loading

Carreau commented Jun 19, 2024

zacharyrs commented Jun 5, 2024 •

edited

Loading