Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Small performance regression in ignore-0.4.x #835

Closed
sharkdp opened this issue Feb 25, 2018 · 3 comments
Closed

Small performance regression in ignore-0.4.x #835

sharkdp opened this issue Feb 25, 2018 · 3 comments
Labels
bug A bug. question An issue that is lacking clarity on one or more points.

Comments

@sharkdp
Copy link
Sponsor Contributor

sharkdp commented Feb 25, 2018

What operating system are you using ripgrep on?

Arch Linux, x86_64

Describe your question, feature request, or bug.

I was running some benchmarks for fd and noticed that there was a small (~5%) performance regression when I updated from ignore-0.2.2 to ignore-0.4.x. I investigated some more and ran the same benchmark (fd -uu '[0-9]\.jpg$' in my home folder) multiple times while changing the version of the ignore crate:

image

There are two things to notice:

  1. The timing for ignore-0.3.0 is not visisble because it's so fast (137 ms). This is due to a bug where not all results are shown (I didn't investigate further, but the git log seems to indicate that something was wrong with ignore-0.3.0).
  2. There is a small (but statistically significant) performance regression that seems to have been introduced with commit 3cb4d13 (Custom ignore files). Interestingly, I'm running a search that doesn't even use ignore files (-uu is the same as in ripgrep).

What are the steps to reproduce the behavior?

I can reproduce this behavior with ripgrep itself:

▶ git checkout 51864c1; cargo install -f

...

▶ hyperfine --warmup 10 'rg -uu --files --iglob "*[0-9].jpg" ~'
Benchmark #1: rg -uu --files --iglob "*[0-9].jpg" ~

  Time (mean ± σ):     700.4 ms ±   5.3 ms    [User: 3.157 s, System: 2.310 s]
 
  Range (min … max):   691.4 ms … 709.0 ms

▶ git checkout 3cb4d13; cargo install -f

...

▶ hyperfine --warmup 10 'rg -uu --files --iglob "*[0-9].jpg" ~'
Benchmark #1: rg -uu --files --iglob "*[0-9].jpg" ~

  Time (mean ± σ):     725.4 ms ±   7.9 ms    [User: 3.318 s, System: 2.338 s]
 
  Range (min … max):   715.5 ms … 743.9 ms

(benchmarks performed with hyperfine on a warm disk cache)

Again, the difference is small (~4%) but significant.

@sharkdp
Copy link
Sponsor Contributor Author

sharkdp commented Feb 25, 2018

This seems to be caused by the additional Gitignore instance custom_ignore_matcher that has to be constructed in Ignore::add_child_path.

(apparently, the gitignore matchers are fully constructed even though no ignore-matching is performed)

@BurntSushi BurntSushi added bug A bug. question An issue that is lacking clarity on one or more points. labels Mar 13, 2018
@BurntSushi
Copy link
Owner

@sharkdp What corpus were your tests above on? I finally had a chance to look at this, and I can't really reproduce your results.

I'm testing on the chromium repo:

$ git remote -v
origin  git:https://github.com/nwjs/chromium.src (fetch)
origin  git:https://github.com/nwjs/chromium.src (push)
$ git rev-parse --short=10 HEAD
3682723171

Testing ripgrep on current master:

$ rg-baseline --version
ripgrep 0.8.1 (rev 507801c1f2)
+SIMD +AVX
$ hyperfine -s basic --warmup 10 'rg-baseline -uu --files'
Benchmark #1: rg-baseline -uu --files
  Time (mean ± σ):     230.6 ms ±  10.2 ms    [User: 1.004 s, System: 0.349 s]
  Range (min … max):   211.7 ms … 243.1 ms

Testing ripgrep with this PR, rebased on current master:

$ rg --version
ripgrep 0.8.1 (rev 2c2162f7bc)
+SIMD +AVX
$ hyperfine -s basic --warmup 10 'rg -uu --files'
Benchmark #1: rg -uu --files
  Time (mean ± σ):     230.2 ms ±   9.6 ms    [User: 927.0 ms, System: 383.5 ms]
  Range (min … max):   210.4 ms … 246.0 ms

@sharkdp
Copy link
Sponsor Contributor Author

sharkdp commented Apr 24, 2018

@sharkdp What corpus were your tests above on? I finally had a chance to look at this, and I can't really reproduce your results.

I was running this on my home folder - not very helpful, I'm sorry. Two things seem to be different in your case:

  • I believe the associated PR will show a more significant speedup in cases where there is a deep directory tree with lots of .gitignore files.

  • Could you please try to add the --iglob option (with some file pattern that occurs only rarely) to make sure that the execution time is not dominated by writing to standard out?

I know it's not a very scientific method, but could you try to run the following commands in your home folder (or some other large directory)? I was just re-running the benchmarks with the rebased branch (which is now included in #836), and I still see a 10% speedup:

> rg-baseline --version
ripgrep 0.8.1 (rev bf51058eb2)
-SIMD -AVX

> rg-pr --version                                               
ripgrep 0.8.1 (rev 92dcb44d10)
-SIMD -AVX

> hyperfine --warmup 10 'rg-baseline -uu --files --iglob "README*"' \
                        'rg-pr       -uu --files --iglob "README*"'
Benchmark #1: rg-baseline -uu --files --iglob "README*"

  Time (mean ± σ):     760.8 ms ±   3.5 ms    [User: 3.395 s, System: 2.511 s]
 
  Range (min … max):   756.1 ms … 766.5 ms
 
Benchmark #2: rg-pr -uu --files --iglob "README*"

  Time (mean ± σ):     692.0 ms ±  15.5 ms    [User: 2.844 s, System: 2.493 s]
 
  Range (min … max):   679.6 ms … 731.3 ms
 
Summary

'rg-pr -uu --files --iglob "README*"' ran
    1.10x faster than 'rg-baseline -uu --files --iglob "README*"'

BurntSushi added a commit that referenced this issue Apr 24, 2018
This commit makes Gitignore::empty a bit faster by avoiding allocation
and manually specializing the implementation instead of routing it through
the GitignoreBuilder.

This helps improve uses of ripgrep that traverse *many* directories, and
in particular, when the use of ignores is disabled via command line
switches.

Fixes #835, Closes #836
BurntSushi added a commit that referenced this issue Apr 24, 2018
This commit makes Gitignore::empty a bit faster by avoiding allocation
and manually specializing the implementation instead of routing it through
the GitignoreBuilder.

This helps improve uses of ripgrep that traverse *many* directories, and
in particular, when the use of ignores is disabled via command line
switches.

Fixes #835, Closes #836
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug A bug. question An issue that is lacking clarity on one or more points.
Projects
None yet
Development

No branches or pull requests

2 participants