Add support for stop after num matches #159

mengelbrecht · 2016-10-09T08:41:59Z

Add an option to stop reading a file after num matches have been found. This equals the -m, --max-count option from grep.

The text was updated successfully, but these errors were encountered:

BurntSushi · 2016-10-09T13:01:00Z

I believe GNU grep has this, right?

I'm inclined to add it, but could you provide a use case for it? One slight
concern I have is its use in recursive directory search, since on a multi
core system, the output of search results isn't deterministic. However, I
can see it being useful as a way to avoid flooding your terminal.

For single file search, its usefulness is a bit clearer I think.

On Oct 9, 2016 4:42 AM, "Markus Engelbrecht" [email protected]
wrote:

Add an option to stop reading a file after num matches have been found.
This equals the -m, --max-count option from grep.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#159, or mute the thread
https://github.com/notifications/unsubscribe-auth/AAb34gtufVeUBufuNmREEbhedQaxI24uks5qyKjXgaJpZM4KR8jt
.

mengelbrecht · 2016-10-09T14:00:15Z

Yes GNU grep has this.

The use case is repeatedly filtering stdin (or a single file) with a match limit to look for a specific match. If the match is not found the user can refine the pattern until it is.
This technique can be applied to filter a list of files for a certain entry and pick it (e.g. CtrlP fuzzy file finding).

BurntSushi · 2016-10-09T14:08:47Z

@mgee But even if you give a list of files, ripgrep will process them in parallel, which means the output won't be deterministic. That means the option only becomes useful in your use case if you pass -j1 (which disables parallelism and therefore gets deterministic output).

I'm not necessarily saying that this is a good reason to not have this flag, but I am saying that it seems like bad UX.

mengelbrecht · 2016-10-09T14:31:07Z

@BurntSushi true, this option only has good UX when used with stdin or a single file (my use case). Maybe a note in the documentation about non-determinism when ripgrep is given multiple files (or a path) would be sufficient?

mengelbrecht · 2016-10-09T15:28:36Z

@BurntSushi I just realized that what I previously wrote could be misunderstood. With a list of files I meant I have a single file which contains a list of filenames which I successively filter with a pattern. In each step the pattern can be refined to reduce the list of matched filenames until the desired filename is found.

BurntSushi · 2016-10-09T15:48:35Z

@mgee Ah, I did misunderstand that. Thanks for clarifying.

mengelbrecht · 2016-11-06T19:04:43Z

Thanks!

sergeevabc · 2019-03-25T12:46:56Z

GNU’s Grep -m is quite an ambiguous switch. Manual says stop after NUM matches, but since Grep is line-based tool (which slips away every now and then), it counts not actual pattern matches, but lines. Imagine this:

$ grep -m 4 -n -oP pattern
2:8  
2:11 
3:16           
4:28                      <--- you need 4 results only, stop after that
4:29 
4:30 
4:36

Instead, you have to grep ... | head -4 to get 4 results.

Now Ripgrep provides a clearer description (limit the number of matching lines per file searched to NUM), thanks for that! But it turns out that it cannot stop after actual pattern matches. So frustrating…

AlphaJack · 2023-07-26T21:33:53Z

ack -1 ... is useful for recursive search, as it stops searching other files after a match is found. grep cannot reproduce this behavior, as -m 1 does not prevent searching other files after a match is found.

Podbrushkin · 2023-12-27T11:50:41Z

Not sure why this is closed as Completed, since RipGrep doesn't have option "to stop after num matches".
I have a 5gb text file without linebreaks, I want to find a substring in it and get byte offset of its first occurrence. Both Grep and RipGrep can't help me with that.

rg -b -o --no-line-number --line-buffered --fixed-strings -m 1 "000000" .\5gbunicode.txt - This takes 20 seconds and outputs 3876278 lines, while I need only one. --max-matches would be very helpful. Or if there would be alternative --line-buffered which will buffer relative to actual output, not to file structure. I think to buffer lines relative to output is more reasonable, since buffering is about output, not input. This way at least it will be possible to limit output by pressing Ctrl+C once you see in terminal first matches occurring.

BurntSushi · 2023-12-27T14:11:37Z

Not sure why this is closed as Completed

It was closed as completed because a -m/--max-count flag was added. There was even a commit referenced that can click on above. I'm not sure how else to explain it.

rg -b -o --no-line-number --line-buffered --fixed-strings -m 1 "000000" .\5gbunicode.txt - This takes 20 seconds and outputs 3876278 lines, while I need only one.

When reporting behavior that doesn't work like you expect, please include a reproduction. And please don't bump issues that are 7 years old. Please just take a moment to minimize it. Here, watch:

$ cat haystack
fooabcfooabc
fooabcfooabc
$ rg abc haystack
1:fooabcfooabc
2:fooabcfooabc
$ rg abc haystack -m1
1:fooabcfooabc
$ rg abc haystack -m1 -o
1:abc
1:abc

The above output should make it clear that the -m1 flag is actually doing something. In the second example, only one line is printed, although that line does contain two matches. In the second example, two lines are printed, but only from the first matching line. Indeed, this is consistent with how the -m/--max-count flag is documented:

    -m NUM, --max-count=NUM
        Limit the number of matching lines per file searched to NUM.

        Note that 0 is a legal value but not likely to be useful. When used,
        ripgrep won't search anything.

That is, it is a limit on the number of matching lines and not the number of matches.

To me, this suggests your 5gbunicode.txt file is one giant line.

I want to find a substring in it and get byte offset of its first occurrence. Both Grep and RipGrep can't help me with that.

Of course they can. Watch:

$ rg abc haystack -m1 -o | head -n1
abc

Both ripgrep and grep will stop searching after printing the first match.

Or if there would be alternative --line-buffered which will buffer relative to actual output, not to file structure. I think to buffer lines relative to output is more reasonable, since buffering is about output, not input. This way at least it will be possible to limit output by pressing Ctrl+C once you see in terminal first matches occurring.

I can't make heads or tails of what you're saying here. The default behavior is to choose the buffering strategy based on where ripgrep is printing (line buffering for tty and block buffering for a file). By passing --line-buffered, you are forcing line buffering and disabling ripgrep's automatic heuristic. It is unclear why you're doing that here and I don't understand what you're asking for. Please open a Discussion question about this if you're still confused.

BurntSushi added the enhancement An enhancement to the functionality of the software. label Oct 9, 2016

BurntSushi closed this as completed in 58aca2e Nov 6, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for stop after num matches #159

Add support for stop after num matches #159

mengelbrecht commented Oct 9, 2016

BurntSushi commented Oct 9, 2016

mengelbrecht commented Oct 9, 2016

BurntSushi commented Oct 9, 2016

mengelbrecht commented Oct 9, 2016

mengelbrecht commented Oct 9, 2016

BurntSushi commented Oct 9, 2016

mengelbrecht commented Nov 6, 2016

sergeevabc commented Mar 25, 2019

AlphaJack commented Jul 26, 2023

Podbrushkin commented Dec 27, 2023

BurntSushi commented Dec 27, 2023

Add support for stop after num matches #159

Add support for stop after num matches #159

Comments

mengelbrecht commented Oct 9, 2016

BurntSushi commented Oct 9, 2016

mengelbrecht commented Oct 9, 2016

BurntSushi commented Oct 9, 2016

mengelbrecht commented Oct 9, 2016

mengelbrecht commented Oct 9, 2016

BurntSushi commented Oct 9, 2016

mengelbrecht commented Nov 6, 2016

sergeevabc commented Mar 25, 2019

AlphaJack commented Jul 26, 2023

Podbrushkin commented Dec 27, 2023

BurntSushi commented Dec 27, 2023