-
Notifications
You must be signed in to change notification settings - Fork 155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Details about "EleutherAI/pythia-160m-seed*" models #142
Comments
Hi! These vary both in training data shuffle and weight initialization. We did indeed train them after the paper--a couple 160m models a while ago, and quite a few new seeds more recently for some work-in-progress work. (Maybe @oskarvanderwal can confirm re: the recent ones!) |
Fantastic! Thank you so much for the quick clarification |
We are planning on releasing more Pythia models for different seeds for the smaller models. As @haileyschoelkopf mentioned, the seed is used for both the data and the weights for the new ones as well. Once we've trained all the models, I'll make sure to add more information in the README! |
Hello! Thank you for making this fantastic suite of models; I think this is one of the most important contributions to the research community in recent memory.
I have a question about the training details of the
EleutherAI/pythia-160m-seed*
models that are hosted on HF hub, and hopefully this might be a good place to ask. I'm curious specifically what the seeds that differ between these models and also presumably theEleutherAI/pythia-160m
model control. Do they control both the weight initialization and the training data shuffle order? Or perhaps only one or the other? It seems these were released after the paper, since the paper says there are no experiments over different seeds.Thank you so much for any clarification you can offer!
The text was updated successfully, but these errors were encountered: