You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello! Thank you for making this fantastic suite of models; I think this is one of the most important contributions to the research community in recent memory.
I have a question about the training details of the EleutherAI/pythia-160m-seed* models that are hosted on HF hub, and hopefully this might be a good place to ask. I'm curious specifically what the seeds that differ between these models and also presumably the EleutherAI/pythia-160m model control. Do they control both the weight initialization and the training data shuffle order? Or perhaps only one or the other? It seems these were released after the paper, since the paper says there are no experiments over different seeds.
Thank you so much for any clarification you can offer!
The text was updated successfully, but these errors were encountered:
Hi! These vary both in training data shuffle and weight initialization. We did indeed train them after the paper--a couple 160m models a while ago, and quite a few new seeds more recently for some work-in-progress work.
We are planning on releasing more Pythia models for different seeds for the smaller models. As @haileyschoelkopf mentioned, the seed is used for both the data and the weights for the new ones as well. Once we've trained all the models, I'll make sure to add more information in the README!
Hello! Thank you for making this fantastic suite of models; I think this is one of the most important contributions to the research community in recent memory.
I have a question about the training details of the
EleutherAI/pythia-160m-seed*
models that are hosted on HF hub, and hopefully this might be a good place to ask. I'm curious specifically what the seeds that differ between these models and also presumably theEleutherAI/pythia-160m
model control. Do they control both the weight initialization and the training data shuffle order? Or perhaps only one or the other? It seems these were released after the paper, since the paper says there are no experiments over different seeds.Thank you so much for any clarification you can offer!
The text was updated successfully, but these errors were encountered: