-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ripgrep fails to match pattern including digit character class #1203
Comments
This commit fixes a bug where the reverse suffix literal optimization wasn't quite right. It was too eagerly skipping past parts of the input without verifying that there was no match. We fix this by being a bit more careful with what we're searching by keeping track of the starting position of the last literal matched. Subsequent literal searches then start immediately after the last one. This is necessary in particular when the suffix literal can have overlapping matches. e.g., searching `000` in `0000` can match at either positions 0 or 1, but searching `abc` in `abcd` can only match as position 0. This was initially reported as a bug against ripgrep: BurntSushi/ripgrep#1203
Thanks so much for reporting this! It is indeed a bug in the regex engine. You can this and this for the specific changes to the regex engine to fix this. The short story is that you were tripping in a reverse suffix literal optimization that wasn't quite correct. The reason why it seemed sensitive to different regexes and inputs is because it is! :-) The reverse suffix literal optimization only runs in very specific circumstances related to the size and structure of the regex, and this particular bug is only tripped when a suffix literal (such as This should now be fixed on master, since I've bumped ripgrep's regex dependency to |
No kidding. Thanks for the explanation and the fixes! |
What version of ripgrep are you using?
How did you install ripgrep?
What operating system are you using ripgrep on?
macOS 10.14.3 (18D109)
Describe your question, feature request, or bug.
rg
appears to fail to find a certain pattern in a one-line file that definitely contains that pattern.I must be missing something — this seems very unlikely to be a legitimate bug — but I can't figure out what.
If this is a bug, what are the steps to reproduce the behavior?
echo 153.230000 >| test.txt
rg '\d\d\d00' test.txt
. This successfully finds a match of23000
.rg '\d\d\d000' test.txt
. This fails to find any match, when it should match230000
Note that
grep '\d\d\d000' test.txt
correctly matches230000
. (grep --version
grep (BSD grep) 2.5.1-FreeBSD
)If this is a bug, what is the actual behavior?
If this is a bug, what is the expected behavior?
rg '\d\d\d000' test.txt
should identify the single match in the file, asgrep
does. Specifically:Other
Note that changing the corpus in seemingly irrelevant ways can cause the bug to change or disappear. For example, the
\d\d\d000
pattern matches if three0
characters are prepended to the contents of the file (that is, the file contains000153.230000
).The text was updated successfully, but these errors were encountered: