Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UP031 may be undesirable in raw strings and regular expressions #14555

Open
neutrinoceros opened this issue Nov 23, 2024 · 5 comments
Open

UP031 may be undesirable in raw strings and regular expressions #14555

neutrinoceros opened this issue Nov 23, 2024 · 5 comments
Labels
needs-decision Awaiting a decision from a maintainer rule Implementing or modifying a lint rule

Comments

@neutrinoceros
Copy link

Here's a real life example taken from astropy

# t.py
import re
KEYWORD_LENGTH = 8
re.compile(r"^[A-Z0-9_-]{0,%d}$" % KEYWORD_LENGTH)

This code isn't flagged for rule UP031 under ruff 0.7

❯ uvx --with 'ruff<0.8.0' ruff check t.py --select UP031
All checks passed!

but running the latest ruff 0.8.0:

❯ uvx --with 'ruff==0.8.0' ruff check t.py --select UP031
t.py:3:12: UP031 Use format specifiers instead of percent format
  |
1 | import re
2 | KEYWORD_LENGTH = 8
3 | re.compile(r"^[A-Z0-9_-]{0,%d}$" % KEYWORD_LENGTH)
  |            ^^^^^^^^^^^^^^^^^^^^^ UP031
  |
  = help: Replace with format specifiers

Found 1 error.

On the other hand, this is the form of that string that I need to write in order to satisfy the newly expended rule UP031

rf"^[A-Z0-9_-]{{0,{KEYWORD_LENGTH}}}$"

note that I needed to escape raw { and } to {{ and }} respectively. I don't think this makes the code more readable, and may actually be counterproductive. Admittedly this is a small example were it doesn't hurt too much but there could legitimately be much more involved cases in the wild.

Would it be possible to change this rule so that it ignores raw strings altogether ?

@MichaReiser MichaReiser added rule Implementing or modifying a lint rule needs-decision Awaiting a decision from a maintainer labels Nov 25, 2024
@MichaReiser
Copy link
Member

Thanks for bringing this up. This is an interesting case.

I agree that the need for escaping the { adds some cognitive overhead, but I personally don't think it's too bad and I prefer trading the % and %d with two extra escapes.

Would it be possible to change this rule so that it ignores raw strings altogether ?

To me it's not clear how escaping { with {{ is much different from having to write \{. Both cases require escaping. That's why this is more about whether UP031 should be raised on strings containing any curly braces that require escaping. I would find it surprising if I opted into UP031, and some strings wouldn't be flagged (for not very apparent reasons of why).

I'm interested in hearing more opinions.

@eerovaher
Copy link
Contributor

I'm not convinced the f-string is better than printf-style formatting in this particular case, but there is also the option of using template strings. The example snippet could be rewritten as

import re
from string import Template
KEYWORD_LENGTH = 8
re.compile(Template("^[A-Z0-9_-]{0,$N}$$").substitute(N=KEYWORD_LENGTH))

That being said, template strings are more convenient if the string contains many literal {} pairs and the example string contains only one, but they are a valid way of replacing printf-style formatting, which strengthens the case for UP031 not making any exceptions.

@eli-schwartz
Copy link
Contributor

To me it's not clear how escaping { with {{ is much different from having to write \{. Both cases require escaping.

No, the latter case doesn't require escaping at all, per the original description where r'' strings are successfully used for precisely that reason.

I'd argue that users of raw strings are making a pretty clear statement that they would prefer to avoid having to use escapes. Although perhaps I'd also argue that f-strings should only be used when they subjectively look prettier than the alternative (and that all use cases for PEP 701 look bad) and that therefore UP031 is too unreliable due to the inability to control the heuristics used and should simply be disabled in project configs. So maybe it doesn't matter.

@MichaReiser
Copy link
Member

No, the latter case doesn't require escaping at all, per the original description where r'' strings are successfully used for precisely that reason.

My point here was mainly that I don't see a difference between converting a normal string containing curly braces to an f-string to converting a raw string containing curly braces to an f-string.

"a { len: %d }" % LENGTH
r"a { len: %d }" % LENGTH

For both cases, it's necessary to escape { when rewriting the format string to an f-string.

"a { len: {{LENGTH}} }"
r"a { len: {{LENGTH}} }"

I'd argue that users of raw strings are making a pretty clear statement that they would prefer to avoid having to use escapes. Although perhaps I'd also argue that f-strings should only be used when they subjectively look prettier

Yeah, that's a hard metric to get right for all users, but I'm all ears if someone has suggestions. I'd suggest that users feel free to use noqa comments if they feel that it arguably looks worse in the few one-off cases where this happens.

@bashonly
Copy link

My point here was mainly that I don't see a difference between converting a normal string containing curly braces to an f-string to converting a raw string containing curly braces to an f-string.

IME, complex regex patterns with literal curly braces are where UP031 can cause the most trouble, e.g. this code snippet...

        fields_m = re.finditer(
            r'''(?x)
                (?P<key>%s)\s*:\s*function\s*\((?P<args>(?:%s|,)*)\){(?P<code>[^}]+)}
            ''' % (_FUNC_NAME_RE, _NAME_RE),
            fields)

...would become...

        fields_m = re.finditer(
            rf'''(?x)
                (?P<key>{_FUNC_NAME_RE})\s*:\s*function\s*\((?P<args>(?:{_NAME_RE}|,)*)\){{(?P<code>[^}}]+)}}
            ''',
            fields)

...making it significantly harder to read and maintain (IMO)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-decision Awaiting a decision from a maintainer rule Implementing or modifying a lint rule
Projects
None yet
Development

No branches or pull requests

5 participants