Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rg -Ff fails to find a match in a case where it should #1334

Closed
elindsey opened this issue Jul 30, 2019 · 2 comments
Closed

rg -Ff fails to find a match in a case where it should #1334

elindsey opened this issue Jul 30, 2019 · 2 comments
Labels
bug A bug.

Comments

@elindsey
Copy link

elindsey commented Jul 30, 2019

What version of ripgrep are you using?

$ rg --version
ripgrep 11.0.1
-SIMD -AVX (compiled)
+SIMD +AVX (runtime)

How did you install ripgrep?

brew install ripgrep

What operating system are you using ripgrep on?

macos 10.14.6

Describe your question, feature request, or bug.

With specific types of input files, I'm seeing rg -Ff return no matches in cases where it should return a match.

If this is a bug, what are the steps to reproduce the behavior?

patterns contains the same ip prefix 40 times:

$ cat patterns 
1.208.0.0/12
1.208.0.0/12
1.208.0.0/12
1.208.0.0/12
1.208.0.0/12
1.208.0.0/12
1.208.0.0/12
1.208.0.0/12
1.208.0.0/12
1.208.0.0/12
1.208.0.0/12
1.208.0.0/12
1.208.0.0/12
1.208.0.0/12
1.208.0.0/12
1.208.0.0/12
1.208.0.0/12
1.208.0.0/12
1.208.0.0/12
1.208.0.0/12
1.208.0.0/12
1.208.0.0/12
1.208.0.0/12
1.208.0.0/12
1.208.0.0/12
1.208.0.0/12
1.208.0.0/12
1.208.0.0/12
1.208.0.0/12
1.208.0.0/12
1.208.0.0/12
1.208.0.0/12
1.208.0.0/12
1.208.0.0/12
1.208.0.0/12
1.208.0.0/12
1.208.0.0/12
1.208.0.0/12
1.208.0.0/12
1.208.0.0/12

corpus is a single line containing the prefix from the patterns file:

$ cat corpus 
1.208.0.0/12

this is expected to produce a match, but does not:

$ rg -Ff patterns corpus 
(no matches)

treating patterns as non fixed string finds a match:

$ rg -f patterns corpus 
1:1.208.0.0/12

and strangely, removing a single line from patterns will also find a match:

$ head -n 39 patterns > fewerpatterns
$ rg -Ff fewerpatterns corpus 
1:1.208.0.0/12

I haven't dug in enough to see if this purely a result of 39 vs 40 lines, or if it's 39 vs 40 lines and the size of the '1.208.0.0/12' string, but I do seem to be crossing some boundary into buggy behavior with this specific test.

corpus.txt
fewerpatterns.txt
patterns.txt

@BurntSushi BurntSushi added the bug A bug. label Aug 1, 2019
@BurntSushi
Copy link
Owner

Ah thanks for this bug report! Literal optimizations are going to be the death me. I introduced this regression when I added an optimization for the -F flag to bypass the regex engine entirely and use Aho-Corasick. There are a number of cases where Aho-Corasick can't be used directly, which I thought I had handled, but one slipped by: ripgrep escapes patterns when the -F flag is set before passing them down to the matcher, and those same escaped patterns were being handed to the Aho-Corasick builder directly. So you were actually searching for the literal 1\.208\.0\.0/12, which is obviously wrong.

The offending line of code is here, which also explains why 40 was a magic number:

if !self.config.can_plain_aho_corasick() || literals.len() < 40 {

(Specifically, the regex engine, currently, can often search for a small number of literals more quickly than Aho-Corasick, so we were diverting to the regex engine for a smaller number of literals, in which case, using escaped literals is correct.)

This is basically one giant mess because all of this should be pushed down into the regex engine, but there's a lot of refactoring work that needs to be done before that can happen. So I tried to paper over it at a higher level and got bit here.

Anyway, this should be fixed on master. I'm going to see about getting a patch release out soon.

@elindsey
Copy link
Author

elindsey commented Aug 1, 2019

Thanks Andrew! I really appreciate both the fix and the explanation - interesting edge case. 😁

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug A bug.
Projects
None yet
Development

No branches or pull requests

2 participants