Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for incremental statistics #22

Open
GoogleCodeExporter opened this issue Aug 23, 2015 · 11 comments
Open

Add support for incremental statistics #22

GoogleCodeExporter opened this issue Aug 23, 2015 · 11 comments

Comments

@GoogleCodeExporter
Copy link

Support incremental statistics on any statistics that can be incrementally 
collected. The consequence would be that all the statistical information would 
not have to be re-fetched each time gitinspector was executed.

Storing a hash from the options given and the calculated statistics should make 
it possible to also distinguish if the git history is changed up to a certain 
point and statistics need to be re-fetched anyway.

/Adam Waldenberg

Original issue reported on code.google.com by [email protected] on 15 Nov 2013 at 12:50

@GoogleCodeExporter
Copy link
Author

Original comment by [email protected] on 15 Nov 2013 at 12:51

  • Changed state: Accepted

@adam-waldenberg adam-waldenberg added this to the 0.6.0 milestone Aug 24, 2015
@adam-waldenberg adam-waldenberg changed the title Add support for incremental statistics. Add support for incremental statistics Nov 25, 2015
@imposeren
Copy link

Can this be connected to following problem?

After some massive changes in repo gitinspector consumes all the CPU with a lot of git blame ... and does not finish repo processing for at least 4 hours.

Can anything be tuned to speedup processing or at least to reduce CPU usage?

@adam-waldenberg
Copy link
Member

Hi @imposeren.

Gitinspector is quite slow on very large repos with a big history, as it blames every single file.

This would partly solve it, yes. When this feature is implemented (assuming it's possible) it would mean gitinspector would not have to re-blame every single file each time you run it. Instead, it would only process the files that changed since last time, making it substantially less painful.

The only thing you can really do to speed up processing is to not use the "-H" (hard) option. If you were not using it - you are out of luck. The only option would be to optimize git itself :).

@imposeren
Copy link

Thanks for reply. Maybe there are some options for optimizing history?

Something like this:
http:https://stevelorek.com/how-to-shrink-a-git-repository.html

But I do not know if removing unused files from history will affect gitinspector as it seems to operate only on existing files (cs this the correct?)

@adam-waldenberg
Copy link
Member

@imposeren

Yes. Just running "git gc" will speed things up. Sometimes quite significantly. If you have never done it before, passing the --agressive switch might be a good idea. The following is from the git docs;

--aggressive
           Usually git gc runs very quickly while providing good disk space utilization and performance. This option will cause git gc to more
           aggressively optimize the repository at the expense of taking much more time. The effects of this optimization are persistent, so this option
           only needs to be used occasionally; every few hundred changesets or so.

I'm not sure how much the other stuff in that article will affect processing speed, but I guess it's always worth a try.

The blame section of gitinspector only operates on existing files, yes. However, with the -H flag, git still scans the whole history in order to be able to correctly blame each row to each author. So I guess even "git blame" should run faster. A blamed row can also, for example be from one of those big files so it still needs to take them into account, to some extent (even without -H passed to gitinspector).

Hard to say without a deeper investigation into the inner workings of git itself.

@imposeren
Copy link

@adam-waldenberg
And one more question: does gitispector blame files excluded by '-x' option?

@adam-waldenberg
Copy link
Member

@imposeren

No. It does not.

@adam-waldenberg
Copy link
Member

@imposeren

Neither does it blame any files that have an invalid extension. Binary files are also skipped.

@imposeren
Copy link

is there any way to reduce concurrency of git blame? I can see up to 8 git blame processes when git inspector runs and each consumes 40-99% of processor core

@imposeren
Copy link

I can already see that there are no such options:
https://github.com/ejwa/gitinspector/blob/master/gitinspector/blame.py#L31

I'll create separate issue for these and maybe will make a pull request later

@adam-waldenberg
Copy link
Member

@imposeren

Gitinspector starts as many processes as there are threads/cores. There is no configuration option for it, and never will be. However, there is a constant at the top of changes.py and blame.py that controls the number of threads.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants