Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected --count-matches result #1573

Closed
knutwannheden opened this issue May 8, 2020 · 0 comments
Closed

Unexpected --count-matches result #1573

knutwannheden opened this issue May 8, 2020 · 0 comments
Labels
bug A bug.

Comments

@knutwannheden
Copy link

What version of ripgrep are you using?

ripgrep 12.0.1 (rev 1d5b1011e5)
-SIMD -AVX (compiled)

How did you install ripgrep?

Choco

What operating system are you using ripgrep on?

Windows 10

Describe your bug.

I have a pattern (with look-around) which with --count reports 2 but with --count-matches reports 0. This doesn't appear to make sense. I would expect ``--count-matchesto report a number at least as high as--count`.

What are the steps to reproduce the behavior?

File test.txt with contents:

def A;
def B;
use A;
use B;

Ripgrep usage:

rg --pcre2 -U '(?s)def (\w+);(?=.*use \w+)' test.txt --count-matches

What is the actual behavior?

The output is 0 whereas the output for the same command with --count instead of --count-matches is 2.

What is the expected behavior?

I would expect an output of 2 and generally the result of --count-matches to be equal to or greater than that of --count.

@BurntSushi BurntSushi added the bug A bug. label May 8, 2020
BurntSushi added a commit that referenced this issue May 8, 2020
In order to implement --count-matches, we simply re-execute the regex on
the spans reported by the searcher. The spans always correspond to the
lines that participated in the match. This is the correct thing to do,
except when the regex contains look-ahead (or look-behind).

In particular, the look-around permits the regex's match success to
depends on an arbitrary point before or after the lines actually
reported as participating in the match. Since only the matched lines are
reported to the printer, it is possible for subsequent searching on
those lines to fail.

A true fix for this would somehow make the total span available to the
printer. But that seems tricky since it isn't always available. For
PCRE2's case in multiline mode, it is available because we force it to
be so for correctness.

For now, we simply detect this corner case heuristically. If the match
count is zero, then it necessarily means there is some kind of
look-around that isn't matching. So we set the match count to 1. This is
probably incorrect in some cases, although my brain can't quite come up
with a concrete example. Nevertheless, this is strictly better than the
status quo.

Fixes #1573
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug A bug.
Projects
None yet
Development

No branches or pull requests

2 participants