-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
11.0.0 regression: Seemingly infinite loop on non-Unicode files #1247
Labels
bug
A bug.
Comments
BurntSushi
added a commit
to rust-lang/regex
that referenced
this issue
Apr 16, 2019
This fixes a bug introduced by a bug fix for #557. In particular, the termination condition wasn't exactly right, and this appears to have slipped through the test suite. This probably reveals a hole in our test suite, which is specifically the testing of Unicode regexes with bytes::Regex on invalid UTF-8. This bug was originally reported against ripgrep: BurntSushi/ripgrep#1247
Thanks for reporting this bug! This was actually a regression introduced in the underlying regex engine (as a result of fixing an unrelated bug). I've published a fix for the regex engine and brought in the updated version on ripgrep master. I'll put out a new point release of ripgrep with this fix soon. |
|
No problem, thanks for the quick response and fix! |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
What version of ripgrep are you using?
And I'm comparing it to:
How did you install ripgrep?
From the binary releases for
x86_64-unknown-linux-musl
:What operating system are you using ripgrep on?
Arch Linux
Describe your question, feature request, or bug.
I've run into a crippling performance regression on certain types of queries and non-UTF-8 files between 0.10.0 and 11.0.0, which looks like it might even be an infinite loop.
If this is a bug, what are the steps to reproduce the behavior?
A very simple way is to create a file containing only two bytes, "sä" encoded with ISO 8559-1, and search for a pattern with a short prefix that matches the "s" but not the rest, like
'\bs(?:thiswillnotmatch|norwillthis)'
:The
\b
does seem to be required at least in this case.Another example file that reproduces this is
sherlock.br
in ripgrep's own source code, using the exact same pattern.If this is a bug, what is the actual behavior?
11.0.0 seems to spin forever:
If this is a bug, what is the expected behavior?
0.10.0 has no problems and gives a result in a few milliseconds:
The text was updated successfully, but these errors were encountered: