11.0.0 regression: Seemingly infinite loop on non-Unicode files #1247

Deewiant · 2019-04-16T07:56:31Z

What version of ripgrep are you using?

ripgrep 11.0.0 (rev d7f57d9aab)
-SIMD -AVX (compiled)
+SIMD +AVX (runtime)

And I'm comparing it to:

ripgrep 0.10.0
-SIMD -AVX (compiled)
+SIMD +AVX (runtime)

How did you install ripgrep?

From the binary releases for x86_64-unknown-linux-musl:

What operating system are you using ripgrep on?

Arch Linux

Describe your question, feature request, or bug.

I've run into a crippling performance regression on certain types of queries and non-UTF-8 files between 0.10.0 and 11.0.0, which looks like it might even be an infinite loop.

If this is a bug, what are the steps to reproduce the behavior?

A very simple way is to create a file containing only two bytes, "sä" encoded with ISO 8559-1, and search for a pattern with a short prefix that matches the "s" but not the rest, like '\bs(?:thiswillnotmatch|norwillthis)':

printf "s\xe4" > test.txt
rg '\bs(?:thiswillnotmatch|norwillthis)' test.txt

The \b does seem to be required at least in this case.

Another example file that reproduces this is sherlock.br in ripgrep's own source code, using the exact same pattern.

If this is a bug, what is the actual behavior?

11.0.0 seems to spin forever:

$ time rg-11.0 --debug '\bs(?:thiswillnotmatch|norwillthis)' test.txt >/dev/null
DEBUG|grep_regex::literal|grep-regex/src/literal.rs:115: required literal found: "s"
DEBUG|globset|globset/src/lib.rs:435: built glob set; 0 literals, 0 basenames, 11 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
DEBUG|globset|globset/src/lib.rs:435: built glob set; 3 literals, 0 basenames, 0 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes

<it's been 10 minutes and it's still spinning at 100% CPU>

If this is a bug, what is the expected behavior?

0.10.0 has no problems and gives a result in a few milliseconds:

$ time rg-0.10 --debug '\bs(?:thiswillnotmatch|norwillthis)' test.txt >/dev/null
DEBUG|grep_regex::literal|grep-regex/src/literal.rs:110: required literal found: "s"
DEBUG|globset|globset/src/lib.rs:429: built glob set; 0 literals, 0 basenames, 8 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
DEBUG|globset|globset/src/lib.rs:429: built glob set; 3 literals, 0 basenames, 0 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes

0.00user 0.00kernel 0.003elapsed

The text was updated successfully, but these errors were encountered:

This fixes a bug introduced by a bug fix for #557. In particular, the termination condition wasn't exactly right, and this appears to have slipped through the test suite. This probably reveals a hole in our test suite, which is specifically the testing of Unicode regexes with bytes::Regex on invalid UTF-8. This bug was originally reported against ripgrep: BurntSushi/ripgrep#1247

BurntSushi · 2019-04-16T12:35:59Z

Thanks for reporting this bug! This was actually a regression introduced in the underlying regex engine (as a result of fixing an unrelated bug). I've published a fix for the regex engine and brought in the updated version on ripgrep master. I'll put out a new point release of ripgrep with this fix soon.

BurntSushi · 2019-04-16T18:37:03Z

ripgrep 11.0.1 is out with this fix in it. Sorry about the regression!

Deewiant · 2019-04-16T19:03:23Z

No problem, thanks for the quick response and fix!

BurntSushi added the bug A bug. label Apr 16, 2019

BurntSushi closed this as completed in fdde2bc Apr 16, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

11.0.0 regression: Seemingly infinite loop on non-Unicode files #1247

11.0.0 regression: Seemingly infinite loop on non-Unicode files #1247

Deewiant commented Apr 16, 2019

BurntSushi commented Apr 16, 2019

BurntSushi commented Apr 16, 2019

Deewiant commented Apr 16, 2019

11.0.0 regression: Seemingly infinite loop on non-Unicode files #1247

11.0.0 regression: Seemingly infinite loop on non-Unicode files #1247

Comments

Deewiant commented Apr 16, 2019

What version of ripgrep are you using?

How did you install ripgrep?

What operating system are you using ripgrep on?

Describe your question, feature request, or bug.

If this is a bug, what are the steps to reproduce the behavior?

If this is a bug, what is the actual behavior?

If this is a bug, what is the expected behavior?

BurntSushi commented Apr 16, 2019

BurntSushi commented Apr 16, 2019

Deewiant commented Apr 16, 2019