UP031 may be undesirable in raw strings and regular expressions #14555

neutrinoceros · 2024-11-23T08:30:23Z

Here's a real life example taken from astropy

# t.py
import re
KEYWORD_LENGTH = 8
re.compile(r"^[A-Z0-9_-]{0,%d}$" % KEYWORD_LENGTH)

This code isn't flagged for rule UP031 under ruff 0.7

❯ uvx --with 'ruff<0.8.0' ruff check t.py --select UP031
All checks passed!

but running the latest ruff 0.8.0:

❯ uvx --with 'ruff==0.8.0' ruff check t.py --select UP031
t.py:3:12: UP031 Use format specifiers instead of percent format
  |
1 | import re
2 | KEYWORD_LENGTH = 8
3 | re.compile(r"^[A-Z0-9_-]{0,%d}$" % KEYWORD_LENGTH)
  |            ^^^^^^^^^^^^^^^^^^^^^ UP031
  |
  = help: Replace with format specifiers

Found 1 error.

On the other hand, this is the form of that string that I need to write in order to satisfy the newly expended rule UP031

rf"^[A-Z0-9_-]{{0,{KEYWORD_LENGTH}}}$"

note that I needed to escape raw { and } to {{ and }} respectively. I don't think this makes the code more readable, and may actually be counterproductive. Admittedly this is a small example were it doesn't hurt too much but there could legitimately be much more involved cases in the wild.

Would it be possible to change this rule so that it ignores raw strings altogether ?

The text was updated successfully, but these errors were encountered:

MichaReiser · 2024-11-25T08:53:13Z

Thanks for bringing this up. This is an interesting case.

I agree that the need for escaping the { adds some cognitive overhead, but I personally don't think it's too bad and I prefer trading the % and %d with two extra escapes.

Would it be possible to change this rule so that it ignores raw strings altogether ?

To me it's not clear how escaping { with {{ is much different from having to write \{. Both cases require escaping. That's why this is more about whether UP031 should be raised on strings containing any curly braces that require escaping. I would find it surprising if I opted into UP031, and some strings wouldn't be flagged (for not very apparent reasons of why).

I'm interested in hearing more opinions.

eerovaher · 2024-11-25T14:55:34Z

I'm not convinced the f-string is better than printf-style formatting in this particular case, but there is also the option of using template strings. The example snippet could be rewritten as

import re
from string import Template
KEYWORD_LENGTH = 8
re.compile(Template("^[A-Z0-9_-]{0,$N}$$").substitute(N=KEYWORD_LENGTH))

That being said, template strings are more convenient if the string contains many literal {} pairs and the example string contains only one, but they are a valid way of replacing printf-style formatting, which strengthens the case for UP031 not making any exceptions.

eli-schwartz · 2024-11-25T23:33:04Z

To me it's not clear how escaping { with {{ is much different from having to write \{. Both cases require escaping.

No, the latter case doesn't require escaping at all, per the original description where r'' strings are successfully used for precisely that reason.

I'd argue that users of raw strings are making a pretty clear statement that they would prefer to avoid having to use escapes. Although perhaps I'd also argue that f-strings should only be used when they subjectively look prettier than the alternative (and that all use cases for PEP 701 look bad) and that therefore UP031 is too unreliable due to the inability to control the heuristics used and should simply be disabled in project configs. So maybe it doesn't matter.

MichaReiser · 2024-11-26T06:42:57Z

No, the latter case doesn't require escaping at all, per the original description where r'' strings are successfully used for precisely that reason.

My point here was mainly that I don't see a difference between converting a normal string containing curly braces to an f-string to converting a raw string containing curly braces to an f-string.

"a { len: %d }" % LENGTH
r"a { len: %d }" % LENGTH

For both cases, it's necessary to escape { when rewriting the format string to an f-string.

"a { len: {{LENGTH}} }"
r"a { len: {{LENGTH}} }"

I'd argue that users of raw strings are making a pretty clear statement that they would prefer to avoid having to use escapes. Although perhaps I'd also argue that f-strings should only be used when they subjectively look prettier

Yeah, that's a hard metric to get right for all users, but I'm all ears if someone has suggestions. I'd suggest that users feel free to use noqa comments if they feel that it arguably looks worse in the few one-off cases where this happens.

bashonly · 2024-11-26T14:53:34Z

My point here was mainly that I don't see a difference between converting a normal string containing curly braces to an f-string to converting a raw string containing curly braces to an f-string.

IME, complex regex patterns with literal curly braces are where UP031 can cause the most trouble, e.g. this code snippet...

        fields_m = re.finditer(
            r'''(?x)
                (?P<key>%s)\s*:\s*function\s*\((?P<args>(?:%s|,)*)\){(?P<code>[^}]+)}
            ''' % (_FUNC_NAME_RE, _NAME_RE),
            fields)

...would become...

        fields_m = re.finditer(
            rf'''(?x)
                (?P<key>{_FUNC_NAME_RE})\s*:\s*function\s*\((?P<args>(?:{_NAME_RE}|,)*)\){{(?P<code>[^}}]+)}}
            ''',
            fields)

...making it significantly harder to read and maintain (IMO)

neutrinoceros mentioned this issue Nov 23, 2024

MNT: migrating to ruff 0.8.0 astropy/astropy#17430

Open

MichaReiser added rule Implementing or modifying a lint rule needs-decision Awaiting a decision from a maintainer labels Nov 25, 2024

neutrinoceros mentioned this issue Nov 26, 2024

STY: ignore rule UP031 in a specific case where an fstring would be less readable astropy/astropy#17452

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UP031 may be undesirable in raw strings and regular expressions #14555

UP031 may be undesirable in raw strings and regular expressions #14555

neutrinoceros commented Nov 23, 2024

MichaReiser commented Nov 25, 2024

eerovaher commented Nov 25, 2024

eli-schwartz commented Nov 25, 2024

MichaReiser commented Nov 26, 2024

bashonly commented Nov 26, 2024

UP031 may be undesirable in raw strings and regular expressions #14555

UP031 may be undesirable in raw strings and regular expressions #14555

Comments

neutrinoceros commented Nov 23, 2024

MichaReiser commented Nov 25, 2024

eerovaher commented Nov 25, 2024

eli-schwartz commented Nov 25, 2024

MichaReiser commented Nov 26, 2024

bashonly commented Nov 26, 2024