Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The repo is way too heavy! #258

Open
natct10 opened this issue Oct 26, 2020 · 8 comments
Open

The repo is way too heavy! #258

natct10 opened this issue Oct 26, 2020 · 8 comments
Labels
bug Something isn't working help wanted Extra attention is needed optimization An optimization of something that works

Comments

@natct10
Copy link
Member

natct10 commented Oct 26, 2020

The repo is now 1.1GB which is not okay. It is likely that a dataset has been added somewhere. I will investigate this issue, but any help is welcome for this matter.
See for yourself (size key): curl https://api.github.com/repos/SubstraFoundation/distributed-learning-contributivity

@natct10 natct10 added bug Something isn't working help wanted Extra attention is needed optimization An optimization of something that works labels Oct 26, 2020
@natct10
Copy link
Member Author

natct10 commented Oct 26, 2020

You can even use: curl https://api.github.com/repos/SubstraFoundation/distributed-learning-contributivity 2> /dev/null | grep size | tr -dc '[:digit:]'

@arthurPignet
Copy link
Collaborator

I deleted PVRL, Moving-functions, Add-Imdb-dataset[...] and dvrl. All these branches had been either dropped or rebased in an other branch, which has been merged

@natct10
Copy link
Member Author

natct10 commented Oct 26, 2020

Great, thank you @arthurPignet! But the repo size seems to remain unchanged :/

@celinejacques
Copy link
Collaborator

celinejacques commented Nov 2, 2020

Hello!
I investigated a little bit this problem and found this. It seems to come from the .git/objects/pack/ folder. Here you can find an explanation about what it is.

With this command line, we can see that there are some heavy files.
git verify-pack -v .git/objects/pack/pack-*.pack | grep -v chain | sort -k3nr | head

So I try to identify in the files in question which are so heavy. I run this command :
git rev-list --objects --all | grep "$(git verify-pack -v .git/objects/pack/*.pack | sort -k 3 -n | tail -10 | awk '{print$1}')"\

Here the results:
image

So it seems that we saved models in folders which were not ignored. I hope that helps :)

@natct10
Copy link
Member Author

natct10 commented Nov 5, 2020

By the way, we really should separate code from its outputs (reports), which could be hosted on this open science oriented platform https://osf.io/. Besides, this would be totally relevant with a publication project (doi for assets, etc.)!

@natct10
Copy link
Member Author

natct10 commented Nov 5, 2020

So, the target is: patience_sept_2020-09-07_17h37 from catastrophic forgetting, dossier resultats commit.
Thank you @celinejacques for the check!

@arthurPignet
Copy link
Collaborator

Great to see that the target had been found ! @natct10 did you have the time to remove the commit ? Can we close this issue ?

@arthurPignet
Copy link
Collaborator

By the way, we really should separate code from its outputs (reports), which could be hosted on this open science oriented platform https://osf.io/. Besides, this would be totally relevant with a publication project (doi for assets, etc.)!

I suggest to open a new issue to discuss about that, I think it's a good idea

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed optimization An optimization of something that works
Projects
Development

No branches or pull requests

3 participants