Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Details about "EleutherAI/pythia-160m-seed*" models #142

Closed
IanMagnusson opened this issue Dec 11, 2023 · 3 comments
Closed

Details about "EleutherAI/pythia-160m-seed*" models #142

IanMagnusson opened this issue Dec 11, 2023 · 3 comments

Comments

@IanMagnusson
Copy link

Hello! Thank you for making this fantastic suite of models; I think this is one of the most important contributions to the research community in recent memory.

I have a question about the training details of the EleutherAI/pythia-160m-seed* models that are hosted on HF hub, and hopefully this might be a good place to ask. I'm curious specifically what the seeds that differ between these models and also presumably the EleutherAI/pythia-160m model control. Do they control both the weight initialization and the training data shuffle order? Or perhaps only one or the other? It seems these were released after the paper, since the paper says there are no experiments over different seeds.

Thank you so much for any clarification you can offer!

@haileyschoelkopf
Copy link
Collaborator

Hi! These vary both in training data shuffle and weight initialization. We did indeed train them after the paper--a couple 160m models a while ago, and quite a few new seeds more recently for some work-in-progress work.

(Maybe @oskarvanderwal can confirm re: the recent ones!)

@IanMagnusson
Copy link
Author

Fantastic! Thank you so much for the quick clarification

@oskarvanderwal
Copy link

We are planning on releasing more Pythia models for different seeds for the smaller models. As @haileyschoelkopf mentioned, the seed is used for both the data and the weights for the new ones as well. Once we've trained all the models, I'll make sure to add more information in the README!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants