Output format with parseable match indices? #244

birkenfeld · 2016-11-21T19:17:11Z

For integration in other tools (editors...) it would be great to have a parseable output format that includes the start and end index for each match in a line. You can do it with colors (but see #242) but it's not really what colors are meant for.

ag has --ackmate, which outputs something like this:

$ ag --ackmate host
:src/util.rs
85;43 4,55 4:                                       ext_host: "localhost".into(),

The text was updated successfully, but these errors were encountered:

BurntSushi · 2016-11-21T19:22:52Z

Is ackmate a sufficiently popular format that we should just provide that and call it done? Is it convenient to use? Is there a specification for it?

BurntSushi · 2016-11-21T19:23:34Z

Note that --vimgrep exists, but I think it's missing the end position of a match (it give you line number and starting column number).

birkenfeld · 2016-11-21T19:28:51Z

Is ackmate a sufficiently popular format that we should just provide that and call it done? Is it convenient to use? Is there a specification for it?

It seems to be what ack clones have. Convenience is probably not that important here - although I like that it avoids text volume because file names are not repeated. There is no specification that I know of.

BurntSushi · 2016-11-22T01:43:22Z

The format for --ackmate appears to be:

:{file-path}
{line};{column} {length}:{match}

Note that the --vimgrep --heading format is:

{file-path}
{line}:{column}:{match}

They are pretty close, including the reduction in volume by avoiding repeated file paths. Alas, --vimgrep is missing the length of the match. I don't think changing --vimgrep is an option, and supporting --ackmate seems nice if only because it's probably a format that people have written ad hoc parsers for.

BurntSushi · 2016-11-22T01:46:15Z

One thing that grates me a little bit is that I managed to implement the --vimgrep option in the Printer without actually mentioning or relying on the concept "vimgrep." (Instead, it's a combination of line_per_match, column and line_number.) With --ackmate, its format is so weird that I'd probably need to add an option just for it. Blech.

BurntSushi · 2016-11-22T01:47:33Z

@birkenfeld Could you say more about why you specifically need this? I've never done an editor integration, so I don't really know. However, I do know that lots of folks have already integrated ripgrep into emacs and vim, and this is the first I'm hearing that --vimgrep is insufficient.

BurntSushi · 2016-11-22T01:50:17Z

The "orthogonal" way to do this I guess is to add a --length option that includes the length of the match (in bytes). And if we're going to make --column imply --line-number, then --length probably needs to in turn imply --column. That way, you could use rg --length --heading foo to get:

{file-path}
{line}:{column}:{length}:{match}

I think I'd prefer this over hacking --ackmate into the printer. (Which I have plans of factoring out into a library.)

birkenfeld · 2016-11-22T07:34:01Z

The goal is highlighting matched parts of a line in an editor-specific way (e.g. faces in Emacs). The ripgrep Emacs package once had this feature, but relied on parsing color escapes, which changed in 0.3.

A vimgrep-like format including match length is possible, but harder to handle because usually you'd want to display a matching line only once, but highlight multiple matches if appropriate. With an ackmate-like format, which specifies all matches in a single line (see initial post), that requires much less postprocessing of the output.

cc @nlamirault (the ripgrep.el author)

BurntSushi · 2016-11-22T12:10:37Z

but relied on parsing color escapes

Wow. I didn't even consider that as a breaking change. And probably still don't. Relying on color escapes sounds like really bad juju! But I understand why someone might do it if there's no other way!

A vimgrep-like format including match length is possible, but harder to handle because usually you'd want to display a matching line only once, but highlight multiple matches if appropriate.

OK, this is a key thing I was missing. I guess in vim, the opposite is true. This also means that my description of the ackmate format above was wrong. It looks like this instead:

:{file-path}
({line};{column} {length}, ...):{match}

Blech. I guess I begrudgingly accept that we should do this. I will attempt to put this into the current printer, but it's not going to be nice, and hopefully I can rethink it for libripgrep. The key problem with adding stuff to the printer is that you need to consider its interaction with every other output flag. For example, does --no-heading/--heading have an effect with --ackmate? (Yes?) What about --no-filename? (Yes?) Some others: --vimgrep, --null, --context/--before-context/--after-context.

BurntSushi · 2018-05-28T16:11:22Z

The printer isn't that big; it's just subtle. You can skim it and get an idea of the type of case analysis involved here: https://github.com/BurntSushi/ripgrep/blob/master/src/printer.rs

It is used directly from the searcher. Do a search for self.printer: https://github.com/BurntSushi/ripgrep/blob/master/src/search_stream.rs

BurntSushi · 2018-06-05T11:37:58Z

Another thing to consider with respect to JSON is that I was thinking about a "JSON lines" format. But I see the proposal from @garygreen above is not. The benefit of a JSON lines approach is that each match can be emitted as it appears. If we instead emit one big JSON blob, then all of the matches must be buffered into memory before we get any output at all.

garygreen · 2018-06-05T12:28:16Z

I was having the same thoughts, I think a "JSON lines" is an excellent idea and would allow you to stream the results in a consistent format.

Maybe this could be like:

Start of query:

{
    "type": "begin_search",
    "token": "<some unique token string>",
    "query": "ipaddress",
    "paths": ["path1", "path2"]
}

A line match:

{
    "type": "line_match",
    "token": "<token>",
    "file": "E:\\Sites\\cc-new\\app\\Member\\BannedMember.php",
    "line": 34,
    "content": "'ipaddress' => Request::getClientIp(),",
    "ranges": [
        [1, 9]
    ]
}

End of query:

{
    "type": "end_search",
    "token": "<token>",
    "stats": {
        "total_matches": 2,
        "files_searched": 102
    }
}

The token is just an idea to make sure you are consuming results from the same stream, in case you have parallel things running? Not sure if that would be a concern or not but maybe something worth considering.

garygreen · 2018-06-05T12:37:15Z

Interesting wiki on JSON streaming: https://en.wikipedia.org/wiki/JSON_streaming

alphapapa · 2018-06-22T01:59:18Z

JSON output would be very useful indeed for editor integration. e.g. in Emacs one could use the built-in JSON-parsing functions, which are written in C, rather than doing regexp-parsing of ripgrep's output.

Does JSON output deserve a separate issue, or should this one be commandeered for it? :)

BurntSushi · 2018-07-24T11:11:24Z

For the people who desire JSON output, how would you handle the fact that ripgrep's output is not necessarily guaranteed to be valid UTF-8? This is a problem for JSON because JSON requires its strings to be valid UTF-8 (or UTF-16 or UTF-32, but we can ignore those for the purposes of ripgrep and this problem). At a high level, I think there are three approaches ripgrep could take:

Silently drop any matches containing invalid UTF-8.
Lossily encode any invalid UTF-8 using the Unicode replacement codepoint.
Come up with a tagged union representation that represents matches that are valid UTF-8 as standard JSON strings and matches that are invalid UTF-8 as base64 encoded JSON strings.

(1) kind of stinks, for obvious reasons.

(2) sounds nice on the surface, but I also envision the JSON output containing match offsets for individual matches found within the line. It seems doable to update those offsets such that they are correct for the lossily encoded match string. (e.g., A single \xFF byte would be replaced by \xEF\xBF\xBD, which is the UTF-8 encoding of U+FFFD.) This seems annoying to me. This also seems less than ideal since I could imagine that consumers might want to use those matches offsets to go find something in the original file for example, which wouldn't work unless we yielded two sets of matches offsets: one for the string provided in the JSON and another for the original string found in the file. Again, that's annoying.

(3) feels like it might be the best solution. It preserves, byte-for-byte, the original match and also makes it easy for callers to choose their own behavior. e.g., They could ignore matches that aren't valid UTF-8 for example. Here's a straw man JSON representation for matches that are valid UTF-8:

{
    "text": "the match",
    "bytes": null
}

and now for matches that are not valid UTF-8:

{
    "text": null,
    "bytes": "dGhl/21hdGNo"
}

where dGhl/21hdGNo is base64 for the\xFFmatch.

fluffysquirrels · 2018-07-24T13:08:32Z

I like (2) and (3). Consumers writing a simple script may want to display _something_ useful in all cases (this is my typical use case). More sophisticated consumers may want something completely correct and accurate (e.g. IDE integration or editing the file). How about providing both UTF-8 encoded and raw bytes, one for each consumer? Or a command line flag for one or the other.

…

On Tue, 24 Jul 2018, 12:11 Andrew Gallant, ***@***.***> wrote: For the people who desire JSON output, how would you handle the fact that ripgrep's output is not necessarily guaranteed to be valid UTF-8? This is a problem for JSON because JSON requires its strings to be valid UTF-8 (or UTF-16 or UTF-32, but we can ignore those for the purposes of ripgrep and this problem). At a high level, I think there are three approaches ripgrep could take: 1. Silently drop any matches containing invalid UTF-8. 2. Lossily encode any invalid UTF-8 using the Unicode replacement codepoint. 3. Come up with a tagged union representation that represents matches that are valid UTF-8 as standard JSON strings and matches that are invalid UTF-8 as base64 encoded JSON strings. (1) kind of stinks, for obvious reasons. (2) sounds nice on the surface, but I also envision the JSON output containing match offsets for individual matches found within the line. It seems doable to update those offsets such that they are correct for the lossily encoded match string. (e.g., A single \xFF byte would be replaced by \xEF\xBF\xBD, which is the UTF-8 encoding of U+FFFD.) This seems annoying to me. This also seems less than ideal since I could imagine that consumers might want to use those matches offsets to go find something in the original file for example, which wouldn't work unless we yielded two sets of matches offsets: one for the string provided in the JSON and another for the original string found in the file. Again, that's annoying. (3) feels like it might be the best solution. It preserves, byte-for-byte, the original match and also makes it easy for callers to choose their own behavior. e.g., They could ignore matches that aren't valid UTF-8 for example. Here's a straw man JSON representation for matches that are valid UTF-8: { "text": "the match", "bytes": null } and now for matches that are not valid UTF-8: { "text": null, "bytes": "dGhl/21hdGNo" } where dGhl/21hdGNo is base64 for the\xFFmatch. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#244 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAm2krEEQVVCgIC8SfYiTFVZJSrjdRz6ks5uJwDhgaJpZM4K4lcS> .

BurntSushi · 2018-07-24T13:11:28Z

I doubt I'll do both. Doing both would require computing match indices for both.

In general, "just do everything" isn't a tenable strategy moving forward because of the maintenance costs it implies. A benefit of option (3) is that it is both very simple to implement and very simple to get correct and very hard to use incorrectly. I'm inclined to say that "writing a simple script that handles all cases" ceases to be simple, and that you should just base64 decode the raw bytes and do your own lossy decode if that's what you want.

fluffysquirrels · 2018-07-24T13:16:40Z

Fair points. Then (3) is easy to implement and imposes only a small cost for the simplr script consumer; sounds like a great option.

…

On Tue, 24 Jul 2018, 14:11 Andrew Gallant, ***@***.***> wrote: I doubt I'll do both. Doing both would require computing match indices for both. In general, "just do everything" isn't a tenable strategy moving forward because of the maintenance costs it implies. A benefit of option (3) is that it is both very simple to implement and very simple to get correct and very hard to use incorrectly. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#244 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAm2ksqyoha14sFI7ShXlQ3wsdfdx2s9ks5uJx0EgaJpZM4K4lcS> .

BurntSushi · 2018-07-29T20:11:41Z

For folks interested extracting machine readable descriptions of search results from ripgrep, would you want to take a quick look my proposed JSON wire format and see whether it looks reasonable to you?

cc @roblourens I'm especially interested in your opinion, since I think you're are my biggest machine consumer, and I suspect you'd be happy to stop parsing ANSI escape sequences. :-)

garygreen · 2018-07-29T21:12:37Z

@BurntSushi

That is some awesome work Andrew. Thank you so much for all your time and effort on this it's much appreciated.

I'm also interested in seeing what Rob makes of it as I'm sure vscode would love to consume the JSON results rather than doing some manual parsing.

Anyway, I've had a look over the proposed spec if your interested in hearing my thoughts and it looks perfect, though I have made a few observations as below.

Any reason why submatches need to be keyed by "match"?

"submatches": [
  {"match": {"text": "Watson"}, "start": 15, "end": 21}
]

Would this be simpler?

"submatches": [
  {"text": "Watson", "start": 15, "end": 21}
]

Will the submatches always be in order, e.g. start of 15 would be always before start 30? I'm sure it wouldn't make much difference to consumers but might be worth noting.
Would it make sense rather than having path in every JSON-line to instead have some kind of code that uniquely represents the search being performed?

For example:

{
  "type": "begin",
  "token": "nBeqhQgeljaMnkq2J1h6",
  "data": {
    "path": {"text": "/home/andrew/sherlock"}},
  }
}
{
  "type": "context",
  "token": "nBeqhQgeljaMnkq2J1h6",
  "data": {
    "lines": {"text": "can extract a clew from a wisp of straw or a flake of cigar ash;\n"},
    "line_number": 4,
    "absolute_offset": 193,
    "submatches": []
  }
}

This would ensure that you are consuming results for the right search - for example, if your running search in parallel on the same terminal but for different queries, simply saying "path" won't allow you to group the results easily.

With a unique token approach (which can just be a random string of characters to make collisions extremely unlikely), you can group those results and know exactly what search they are for. I'm sure this would also be useful for vscode and other consumers as well that handle a Cancellation Token of some kind - they could use the token for that search.

BurntSushi · 2018-07-29T21:27:39Z

@garygreen Awesome, thanks for taking a look and giving feedback!

Any reason why submatches need to be keyed by "match"?

Yes. A match is an arbitrary data object, which means it is itself an object that contains either a text or bytes field. The text/bytes fields could be "inlined" into the submatch itself, but I'd rather not do that and keep things a bit more explicit. This is also consistent with how file paths are handled.

Will the submatches always be in order, e.g. start of 15 would be always before start 30? I'm sure it wouldn't make much difference to consumers but might be worth noting.

Hmm, good question. It seems reasonable to guarantee an ordering here. I will add that to the docs.

Would it make sense rather than having path in every JSON-line to instead have some kind of code that uniquely represents the search being performed?

I think I at least want to have a path in every message, which should be much more convenient for consumers. If you don't include a path, then it makes it necessary for even simple use cases like "do something with a match and its file path" to track state.

I've thought about your identifier idea. I am not opposed to it. But I would like to see how far we can get without it and just use file paths, which are something I think should be in every message anyway.

If you're worried about file paths adding too much redundant to the output, then it is plausible that we could make ripgrep's --no-filename suppress path in match/context messages (which is consistent with its current documented behavior). In which case, we'd probably want to add an identifier to each message. But this is a backwards compatible addition and is more complex, so I'd like to see how far we can get with a simpler approach.

This would also ensure that you are consuming results for the same search - for example, if your running search in parallel on the same terminal but for different queries, simply saying "path" won't allow you to group the results easily. With a unique token approach, you can group those results and know exactly what search they are for. I'm sure this would also be useful for vscode and other consumers as well that handle a Cancellation Token of some kind - they could use the token for that search.

I'm not sure I'm convinced of the prevelance of this use case. Generally speaking, you don't exploit parallelism by running multiple ripgrep processes. Instead, you should let ripgrep handle parallelism for you.

Also, how is this token generated? Is the onus on ripgrep to ensure that every token it generates is unique? Probably a uuidv4 is good enough for that.

roblourens · 2018-07-30T21:13:07Z

I think this looks great, and I am very excited to delete the current parsing code 😁

What's the purpose of the begin message? I'm not sure what that would be used for. Will begin/end be sent even when a file doesn't contain any matches?

In developing vscode's search provider API, I've thought about matches in very long lines, or very long matches. The current version of the API has result objects that include a "preview" of the match, not necessarily the whole line. Then the client will be able to request a preview of a certain size in some TBD format, such as number of characters before and after the match, or just total number of characters.

I wonder whether something like that would be useful for ripgrep too. If you have a short match in a very long line, the consumer doesn't necessarily need the full text of the line.

On the other hand, if you have several matches in a long line, you won't be duplicating the full line text, which isn't true in vscode's model.

If the full text of the line is present, then technically you don't need to include the match text in 'submatches', since it can be easily derived from lines, right? I suppose it's convenient to have it already there.

I like the approach for non-utf8 encodings, I think that's a good idea.

BurntSushi · 2018-07-30T21:24:04Z

@roblourens That's great feedback, thank you!

What's the purpose of the begin message? I'm not sure what that would be used for. Will begin/end be sent even when a file doesn't contain any matches?

Yeah, the intent is that both begin and end would be sent even if the file didn't contain any matches. Writing that out though, I wonder if that will actually adversely impact performance. It probably will. I could change it such that begin/end are only emitted when there is at least one match.

The purpose of begin isn't entirely clear, other than to serve as an indicator that a new set of matches is being shown. If we adopt a token approach as described above, then begin would also serve to introduce a token for each new search. I could also imagine begin potentially growing new fields as time progresses. If we make it so begin/end are only shown when there's a match, then begin itself probably has very little downside.

In developing vscode's search provider API, I've thought about matches in very long lines, or very long matches. The current version of the API has result objects that include a "preview" of the match, not necessarily the whole line. Then the client will be able to request a preview of a certain size in some TBD format, such as number of characters before and after the match, or just total number of characters.

I wonder whether something like that would be useful for ripgrep too. If you have a short match in a very long line, the consumer doesn't necessarily need the full text of the line.

On the other hand, if you have several matches in a long line, you won't be duplicating the full line text, which isn't true in vscode's model.

Ah this is interesting! My current thinking is to just let consumers such as yourself sort this out, but if the volume of data turns out to be a performance bottleneck then we can certain add some backwards compatible knobs for this going forward.

If the full text of the line is present, then technically you don't need to include the match text in 'submatches', since it can be easily derived from lines, right? I suppose it's convenient to have it already there.

That's correct, yeah. The text in each submatch is strictly superfluous. It's mostly there as a convenience. We can expose knobs for this too if performance is a problem.

@roblourens @garygreen If you'll allow me to summarize to make sure we're on the same page:

Short of actually using JSON wire format (which might reveal problems, as such things always do), it appears reasonable enough to emit search results in a structured format.
There may be a desire for adding tokens to the output so that the consumer can properly associate inputs with outputs. But this can be done in a backwards compatible way.
The current JSON wire format is perhaps convenient, but may have print too much data such that it becomes a performance bottleneck. We can expose new knobs (as ripgrep flags) that reduce the data volume in various ways if that would help performance, but we can cross that bridge when we get there and tune as needed.

garygreen · 2018-07-30T22:24:29Z

Interesting observation matching against a huge line. I can see it being a concern for people who match against minified files (god knows why though). I'm unfamiliar with how ripgrep currently handles those matches, I would assume it just outputs a massive line showing the match.

It's not really a problem specifically related to the JSON formatter, though. If it does become an issue, then I'm sure we could consider adding some kind of "--match-byes-around=100" type option which would give you 100 bytes around the matches. Again, not sure how ripgrep handles that currently so it might already be available.

`begin` and `end` when no matches

I would agree that it's probably not worth sending these for files with no matching results. Just skip sending both, that would also prevent sending JSON lines for all files that don't have any matches (those files the consumer probably isn't interested in at). If consumers did want it, then it could probably be added as an option to always send them, even when no matches with a caveat that it may impact performance.

`progress` idea

Following on from the above, in order to avoid sending lots of redundant data like stats for end matches for each file - instead have a new progress type which periodically outputs the overall progress of the search.

How often this is output could be configured --json-progress-timer=1000 which would output progress every second. Alternatively, it could output progress after searching every X files.

This would also serve as a way of providing stats for what's been searched so far periodically - especially in cases where there are no matches. Say your searching 100,000 files that takes a few minutes and expect no matches, if you didn't send begin and end you would have no progress updates or no way of displaying how many files have been searched so far until the search completes.

`finish` idea

At the moment there is currently no way of knowing when the search is totally finished, so you may be left in limbo and can't for instance have a UI that says "Search complete"? You could possibly listen for when the ripgrep process exits, but maybe it would be easier to be explicit and have a finish type that would let you know when the search has finished in it's entirety. It would contain a stats object detailing the information of the overall search.

Order of `context` and `match` guaranteed?

I'm just thinking from the point of view from a consumer how easy it would be to re-match the contexts up against the matches to show them in a UI in the correct order.

If the order is guaranteed, it would be easier, for example, a match on line 12 (with 2 context matches enabled)

You would first get JSON lines for context lines 10 and 11
Followed by the match on line 12
Two more context for lines 13 and 14.

BurntSushi · 2018-07-30T22:33:56Z

@garygreen

I would agree that it's probably not worth sending these for files with no matching results. Just skip sending both, that would also prevent sending JSON lines for all files that don't have any matches (those files the consumer probably isn't interested in at). If consumers did want it, then it could probably be added as an option to always send them, even when no matches with a caveat that it may impact performance.

Yup, this is done. begin/end won't be shown by default in the non-match case, but I have an option now that enables this.

progress idea

Definitely going to punt on this until a specific use case for it can be evaluated. I'm trying to focus on shipping the (nearly) minimal subset of useful things here that will let folks migrate off of parsing ANSI escapes without losing or regressing on any features.

At the moment there is currently no way of knowing when the search is totally finished, so you may be left in limbo and can't for instance have a UI that says "Search complete"? You could possibly listen for when the ripgrep process exits, but maybe it would be easier to be explicit and have a finish type that would let you know when the search has finished in it's entirety. It would contain a stats object detailing the information of the overall search.

You generally know it's done because the process exits and the stdout pipe closes, at which point the consumer reads EOF. But yes, the wire format presented thus far is the library-ized format for a single search. ripgrep will add at least one message, probably called summary, that is emitted once the search is complete. And yes, it will have a stats object representing the accumulation of statistics.

Order of context and match guaranteed?

Oh yes, absolutely. Matches are printed in the order in which they appear. To do otherwise would be crazytown. :-)

roblourens · 2018-07-30T23:59:14Z

If we make it so begin/end are only shown when there's a match, then begin itself probably has very little downside.

Sounds good, I think begin could get very noisy when searching in large directories.

And I agree with what you say about large lines - in vscode's case at least, it's the same as what we're doing already, and getting large lines from ripgrep definitely isn't the bottleneck.

jessegrosjean · 2018-07-31T16:13:36Z

On the other hand, if you have several matches in a long line, you won't be duplicating the full line text, which isn't true in vscode's model.

Ah this is interesting! My current thinking is to just let consumers such as yourself sort this out, but if the volume of data turns out to be a performance bottleneck then we can certain add some backwards compatible knobs for this going forward.

I'm just starting with integrating this project into my own editor, and this is the first problem I ran into. Maybe just change the behavior of --max-columns flag? Instead of reporting that the line was over max column, just return the portion of the line around the match that fits into max columns?

Great project, thanks for making it. I was 2 days in to reinventing the wheel badly when I can across it.

BurntSushi · 2018-07-31T17:02:54Z

@jessegrosjean Could you elaborate more on the problem? I'm having trouble understanding it. You're saying that you ran into performance problems? Is it possible to reproduce them outside the editor?

What you're asking for really sounds like a new "preview" type feature, which is similar to, but different from, what --max-columns achieves. --max-columns is really about limiting the length of lines reported, where as a hypothetical preview feature is quite a bit more complex than that and will need to account for windowing every match, of which there may be multiple.

It might be smart to create a new issue for this. I doubt it will wind up in the initial support for JSON.

jessegrosjean · 2018-07-31T20:13:23Z

@BurntSushi I'm blown away at the speed of ripgrep in terminal. I'm trying not to loose that speed when running from my app. I think the JSON format (as I understand it) makes large searches that rg handles easily from the terminal near impossible to perform using the JSON API.

I think including a full line of context with each match is just too much bandwidth for some cases.

For example:

My test case is pathological–search for e in my home directory. It generates 4,655,585 results. Crazy, but this is just the kind of thing that a user might try to see if an app works and isn't buggy. And in fact it's what made me so impressed with ripgrep. When I run ripgrep on my home directory I see:

time rg e > NULL

real	0m2.381s
user	0m2.226s
sys	0m1.615s

Wow! Fast. But then if I do:

 time rg --vimgrep e > NULL

My computer starts to die. I kill the process after 20 seconds and starting to run out of memory. I "think" the problem is that unlike the default command --vimgrep returns a full line for each result. It’s just to much bandwidth when you have many results.

What you're asking for really sounds like a new "preview" type feature, which is similar to, but different from, what --max-columns achieves. --max-columns is really about limiting the length of lines reported, where as a hypothetical preview feature is quite a bit more complex than that and will need to account for windowing every match, of which there may be multiple.

I was asking for this, but I think you are correct, it’s to complicated, and would still require to much overlapping bandwidth in some cases. For example imagine the case where the user searches for e in a giant minified.js file.

Better I think is to just provide some options for what data is included in “match” (from the JSON API). What about an option to just omit the “match.lines” value?

My app could then generate the initial list of results quickly (by omitting the “lines” values). And then lazily (as matches are scrolled through the view) load and highlight the actually matched text.

BurntSushi · 2018-07-31T22:55:02Z

@jessegrosjean Like I said, I think you should open a new ticket, since we're getting way off the topic of JSON. I did this for you in #999 and gave you a response. TL;DR - I'm not convinced any specific action should be taken on the JSON API at this time. Let's tackle performance problems---if they exist---as they arise.

BurntSushi · 2018-08-07T22:46:31Z

@roblourens The ag/libripgrep-freeze-2 branch has JSON Lines support. You can enable JSON with the --json flag.

roblourens · 2018-08-07T23:30:26Z

The output looks great, I am very excited for this.

This commit updates the CHANGELOG to reflect all the work done to make libripgrep a reality. * Closes #162 (libripgrep) * Closes #176 (multiline search) * Closes #188 (opt-in PCRE2 support) * Closes #244 (JSON output) * Closes #416 (Windows CRLF support) * Closes #917 (trim prefix whitespace) * Closes #993 (add --null-data flag) * Closes #997 (--passthru works with --replace) * Fixes #2 (memory maps and context handling work) * Fixes #200 (ripgrep stops when pipe is closed) * Fixes #389 (more intuitive `-w/--word-regexp`) * Fixes #643 (detection of stdin on Windows is better) * Fixes #441, Fixes #690, Fixes #980 (empty matching lines are weird) * Fixes #764 (coalesce color escapes) * Fixes #922 (memory maps failing is no big deal) * Fixes #937 (color escapes no longer used for empty matches) * Fixes #940 (--passthru does not impact exit status) * Fixes #1013 (show runtime CPU features in --version output)

BurntSushi added enhancement An enhancement to the functionality of the software. question An issue that is lacking clarity on one or more points. labels Nov 21, 2016

BurntSushi removed the question An issue that is lacking clarity on one or more points. label Nov 22, 2016

BurntSushi self-assigned this Nov 22, 2016

BurntSushi added the question An issue that is lacking clarity on one or more points. label Nov 28, 2016

BurntSushi mentioned this issue Dec 26, 2016

Using style "bold" does not use the bold ansi escape sequence in the output #293

Closed

BurntSushi mentioned this issue Jan 6, 2017

Create alias to fast-edit a result #304

Closed

BurntSushi modified the milestone: libripgrep Jan 10, 2017

BurntSushi added the libripgrep An issue related to modularizing ripgrep into libraries. label Jan 11, 2017

This was referenced Feb 12, 2017

json output option to allow scripting #359

Closed

multiline search for simple cases #360

Closed

vlevit mentioned this issue Apr 15, 2017

Highlighting doesn't work with ripgrep 0.5.1 nlamirault/ripgrep.el#20

Open

tiehuis mentioned this issue Apr 27, 2017

machine interface #462

Closed

okdana mentioned this issue Jul 30, 2017

ripgrep match styling not detected by emacs #570

Closed

roblourens mentioned this issue Sep 6, 2017

Incorrect result display with empty line and colors #599

Closed

rmccue mentioned this issue Dec 10, 2017

Publish RipgrepParser as npm package? microsoft/vscode#39975

Closed

This was referenced Feb 14, 2018

--line-number-width doesn't work well with --no-heading #795

Closed

fix issue #359 --machine-readable #802

Closed

BurntSushi mentioned this issue Mar 9, 2018

Feature request: structured output for tooling #848

Closed

garygreen mentioned this issue May 28, 2018

Add json output format #930

Closed

BurntSushi mentioned this issue Jul 31, 2018

--vimgrep is much slower than without #999

Closed

BurntSushi mentioned this issue Aug 19, 2018

libripgrep: PCRE2 support, multiline search, JSON output and more #1017

Merged

BurntSushi closed this as completed in #1017 Aug 20, 2018

garygreen mentioned this issue Feb 23, 2019

Add support for outputting to stdout PHP-CS-Fixer/PHP-CS-Fixer#4320

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Output format with parseable match indices? #244

Output format with parseable match indices? #244

birkenfeld commented Nov 21, 2016

BurntSushi commented Nov 21, 2016

BurntSushi commented Nov 21, 2016

birkenfeld commented Nov 21, 2016

BurntSushi commented Nov 22, 2016 •

edited

BurntSushi commented Nov 22, 2016 •

edited

BurntSushi commented Nov 22, 2016

BurntSushi commented Nov 22, 2016

birkenfeld commented Nov 22, 2016

BurntSushi commented Nov 22, 2016

BurntSushi commented May 28, 2018

BurntSushi commented Jun 5, 2018 •

edited

garygreen commented Jun 5, 2018 •

edited

garygreen commented Jun 5, 2018

alphapapa commented Jun 22, 2018

BurntSushi commented Jul 24, 2018

fluffysquirrels commented Jul 24, 2018 via email

BurntSushi commented Jul 24, 2018 •

edited

fluffysquirrels commented Jul 24, 2018 via email

BurntSushi commented Jul 29, 2018

garygreen commented Jul 29, 2018 •

edited

BurntSushi commented Jul 29, 2018

roblourens commented Jul 30, 2018

BurntSushi commented Jul 30, 2018 •

edited

garygreen commented Jul 30, 2018 •

edited

BurntSushi commented Jul 30, 2018 •

edited

roblourens commented Jul 30, 2018

jessegrosjean commented Jul 31, 2018

BurntSushi commented Jul 31, 2018

jessegrosjean commented Jul 31, 2018

BurntSushi commented Jul 31, 2018

BurntSushi commented Aug 7, 2018

roblourens commented Aug 7, 2018

Output format with parseable match indices? #244

Output format with parseable match indices? #244

Comments

birkenfeld commented Nov 21, 2016

BurntSushi commented Nov 21, 2016

BurntSushi commented Nov 21, 2016

birkenfeld commented Nov 21, 2016

BurntSushi commented Nov 22, 2016 • edited

BurntSushi commented Nov 22, 2016 • edited

BurntSushi commented Nov 22, 2016

BurntSushi commented Nov 22, 2016

birkenfeld commented Nov 22, 2016

BurntSushi commented Nov 22, 2016

BurntSushi commented May 28, 2018

BurntSushi commented Jun 5, 2018 • edited

garygreen commented Jun 5, 2018 • edited

garygreen commented Jun 5, 2018

alphapapa commented Jun 22, 2018

BurntSushi commented Jul 24, 2018

fluffysquirrels commented Jul 24, 2018 via email

BurntSushi commented Jul 24, 2018 • edited

fluffysquirrels commented Jul 24, 2018 via email

BurntSushi commented Jul 29, 2018

garygreen commented Jul 29, 2018 • edited

BurntSushi commented Jul 29, 2018

roblourens commented Jul 30, 2018

BurntSushi commented Jul 30, 2018 • edited

garygreen commented Jul 30, 2018 • edited

begin and end when no matches

progress idea

finish idea

Order of context and match guaranteed?

BurntSushi commented Jul 30, 2018 • edited

roblourens commented Jul 30, 2018

jessegrosjean commented Jul 31, 2018

BurntSushi commented Jul 31, 2018

jessegrosjean commented Jul 31, 2018

BurntSushi commented Jul 31, 2018

BurntSushi commented Aug 7, 2018

roblourens commented Aug 7, 2018

BurntSushi commented Nov 22, 2016 •

edited

BurntSushi commented Nov 22, 2016 •

edited

BurntSushi commented Jun 5, 2018 •

edited

garygreen commented Jun 5, 2018 •

edited

BurntSushi commented Jul 24, 2018 •

edited

garygreen commented Jul 29, 2018 •

edited

BurntSushi commented Jul 30, 2018 •

edited

garygreen commented Jul 30, 2018 •

edited

`begin` and `end` when no matches

`progress` idea

`finish` idea

Order of `context` and `match` guaranteed?

BurntSushi commented Jul 30, 2018 •

edited