Multiple training runs of same model with different random seed for weight initialisation #110

KarolisRam · 2023-06-21T09:08:58Z

Model internals can vary substantially when they are retrained using the same parameters and procedures except for the random seed for weight init. This is due to underspecification and was shown in https://arxiv.org/abs/2011.03395, NLP included. Should at least one of the Pythia models have weights for maybe 5-10 identical training runs, except for a different seed?
This could show how much variance there already is in some OOD tasks between these nearly identical Pythia models, compared to variance between different models. The paper above shows that this variance for BERT can be as large for different random seeds on same model as between different models.

KarolisRam · 2023-06-22T09:37:07Z

already addressed in the appendix, my bad.

KarolisRam closed this as completed Jun 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiple training runs of same model with different random seed for weight initialisation #110

Multiple training runs of same model with different random seed for weight initialisation #110

KarolisRam commented Jun 21, 2023 •

edited

Loading

KarolisRam commented Jun 22, 2023

Multiple training runs of same model with different random seed for weight initialisation #110

Multiple training runs of same model with different random seed for weight initialisation #110

Comments

KarolisRam commented Jun 21, 2023 • edited Loading

KarolisRam commented Jun 22, 2023

KarolisRam commented Jun 21, 2023 •

edited

Loading