Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple training runs of same model with different random seed for weight initialisation #110

Closed
KarolisRam opened this issue Jun 21, 2023 · 1 comment

Comments

@KarolisRam
Copy link

KarolisRam commented Jun 21, 2023

Model internals can vary substantially when they are retrained using the same parameters and procedures except for the random seed for weight init. This is due to underspecification and was shown in https://arxiv.org/abs/2011.03395, NLP included. Should at least one of the Pythia models have weights for maybe 5-10 identical training runs, except for a different seed?
This could show how much variance there already is in some OOD tasks between these nearly identical Pythia models, compared to variance between different models. The paper above shows that this variance for BERT can be as large for different random seeds on same model as between different models.

@KarolisRam
Copy link
Author

already addressed in the appendix, my bad.
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant