Add support for incremental statistics #22

GoogleCodeExporter · 2015-08-23T17:41:26Z

Support incremental statistics on any statistics that can be incrementally 
collected. The consequence would be that all the statistical information would 
not have to be re-fetched each time gitinspector was executed.

Storing a hash from the options given and the calculated statistics should make 
it possible to also distinguish if the git history is changed up to a certain 
point and statistics need to be re-fetched anyway.

/Adam Waldenberg

Original issue reported on code.google.com by [email protected] on 15 Nov 2013 at 12:50

The text was updated successfully, but these errors were encountered:

GoogleCodeExporter · 2015-08-23T17:41:26Z

Original comment by [email protected] on 15 Nov 2013 at 12:51

Changed state: Accepted

imposeren · 2015-12-16T09:23:10Z

Can this be connected to following problem?

After some massive changes in repo gitinspector consumes all the CPU with a lot of git blame ... and does not finish repo processing for at least 4 hours.

Can anything be tuned to speedup processing or at least to reduce CPU usage?

adam-waldenberg · 2015-12-16T13:26:45Z

Hi @imposeren.

Gitinspector is quite slow on very large repos with a big history, as it blames every single file.

This would partly solve it, yes. When this feature is implemented (assuming it's possible) it would mean gitinspector would not have to re-blame every single file each time you run it. Instead, it would only process the files that changed since last time, making it substantially less painful.

The only thing you can really do to speed up processing is to not use the "-H" (hard) option. If you were not using it - you are out of luck. The only option would be to optimize git itself :).

imposeren · 2015-12-18T13:25:20Z

Thanks for reply. Maybe there are some options for optimizing history?

Something like this:
http:https://stevelorek.com/how-to-shrink-a-git-repository.html

But I do not know if removing unused files from history will affect gitinspector as it seems to operate only on existing files (cs this the correct?)

adam-waldenberg · 2015-12-18T13:47:35Z

@imposeren

Yes. Just running "git gc" will speed things up. Sometimes quite significantly. If you have never done it before, passing the --agressive switch might be a good idea. The following is from the git docs;

--aggressive
           Usually git gc runs very quickly while providing good disk space utilization and performance. This option will cause git gc to more
           aggressively optimize the repository at the expense of taking much more time. The effects of this optimization are persistent, so this option
           only needs to be used occasionally; every few hundred changesets or so.

I'm not sure how much the other stuff in that article will affect processing speed, but I guess it's always worth a try.

The blame section of gitinspector only operates on existing files, yes. However, with the -H flag, git still scans the whole history in order to be able to correctly blame each row to each author. So I guess even "git blame" should run faster. A blamed row can also, for example be from one of those big files so it still needs to take them into account, to some extent (even without -H passed to gitinspector).

Hard to say without a deeper investigation into the inner workings of git itself.

imposeren · 2015-12-18T13:50:45Z

@adam-waldenberg
And one more question: does gitispector blame files excluded by '-x' option?

adam-waldenberg · 2015-12-18T13:54:03Z

@imposeren

No. It does not.

adam-waldenberg · 2015-12-18T13:55:47Z

@imposeren

Neither does it blame any files that have an invalid extension. Binary files are also skipped.

imposeren · 2015-12-18T14:04:01Z

is there any way to reduce concurrency of git blame? I can see up to 8 git blame processes when git inspector runs and each consumes 40-99% of processor core

imposeren · 2015-12-18T14:18:22Z

I can already see that there are no such options:
https://github.com/ejwa/gitinspector/blob/master/gitinspector/blame.py#L31

I'll create separate issue for these and maybe will make a pull request later

adam-waldenberg · 2015-12-18T14:18:31Z

@imposeren

Gitinspector starts as many processes as there are threads/cores. There is no configuration option for it, and never will be. However, there is a constant at the top of changes.py and blame.py that controls the number of threads.

GoogleCodeExporter added Enhancement Auto Migrated Priority : Low Milestone-Release0.6.0 labels Aug 23, 2015

adam-waldenberg removed the Milestone-Release0.6.0 label Aug 24, 2015

adam-waldenberg added this to the 0.6.0 milestone Aug 24, 2015

adam-waldenberg changed the title ~~Add support for incremental statistics.~~ Add support for incremental statistics Nov 25, 2015

imposeren mentioned this issue Dec 18, 2015

Configurable concurrency #96

Closed

adam-waldenberg mentioned this issue Dec 6, 2016

[feature] Performances #136

Closed

adam-waldenberg mentioned this issue Aug 15, 2017

Advanced Statistics and Ranking #129

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for incremental statistics #22

Add support for incremental statistics #22

GoogleCodeExporter commented Aug 23, 2015

GoogleCodeExporter commented Aug 23, 2015

imposeren commented Dec 16, 2015

adam-waldenberg commented Dec 16, 2015

imposeren commented Dec 18, 2015

adam-waldenberg commented Dec 18, 2015

imposeren commented Dec 18, 2015

adam-waldenberg commented Dec 18, 2015

adam-waldenberg commented Dec 18, 2015

imposeren commented Dec 18, 2015

imposeren commented Dec 18, 2015

adam-waldenberg commented Dec 18, 2015

Add support for incremental statistics #22

Add support for incremental statistics #22

Comments

GoogleCodeExporter commented Aug 23, 2015

GoogleCodeExporter commented Aug 23, 2015

imposeren commented Dec 16, 2015

adam-waldenberg commented Dec 16, 2015

imposeren commented Dec 18, 2015

adam-waldenberg commented Dec 18, 2015

imposeren commented Dec 18, 2015

adam-waldenberg commented Dec 18, 2015

adam-waldenberg commented Dec 18, 2015

imposeren commented Dec 18, 2015

imposeren commented Dec 18, 2015

adam-waldenberg commented Dec 18, 2015