Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

smart-case sensitivity is fooled by brackets in search pattern #229

Closed
ngirard opened this issue Nov 9, 2016 · 2 comments
Closed

smart-case sensitivity is fooled by brackets in search pattern #229

ngirard opened this issue Nov 9, 2016 · 2 comments
Labels
bug A bug. question An issue that is lacking clarity on one or more points.

Comments

@ngirard
Copy link

ngirard commented Nov 9, 2016

As title says, the case sensivity is not respected any more when the search pattern contains capitals within brackets.
As an example,
rg --smart-case '[EÉ]conomie'
returns also lowercase matches ("economie").

@BurntSushi
Copy link
Owner

I see. Under what circumstances should an uppercase literal be detected? Would \p{Lu} qualify? What about [[:upper:]]? Or what about [@-a], which contains A-Z?

@BurntSushi BurntSushi added bug A bug. question An issue that is lacking clarity on one or more points. labels Nov 9, 2016
@BurntSushi
Copy link
Owner

Some more thoughts on this.

  1. There is an uppercase literal in the regex pattern itself. For example, [A-Z]foo contains an uppercase literal, and therefore the search would be case sensitive. Another example, foo\p{Lu} (where \p{Lu} is the set of all uppercase Unicode letters) contains no uppercase literals and therefore would be a case insensitive search.
  2. There are no literal characters at all. For example, \p{Lu} contains no literals, and therefore should be a case sensitive search.

(2) is actually slightly tricky to implement because the regex parser is a bit too ambitious. e.g., After parsing, there's no actual way to distinguish \p{Lu} from its correspond character class as if it were manually typed by the user.

This is kind of why I hate the smart case feature and why it's not the default. In the simple case, it has nice behavior, but when your regex grows beyond literals, its behavior becomes unclear and less intuitive.

In the absence of input from others, I think I'd like to just implement rule (1) and call it a day. It would, for example, work in @ngirard's case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug A bug. question An issue that is lacking clarity on one or more points.
Projects
None yet
Development

No branches or pull requests

2 participants