Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release of checkpoints of different steps #101

Closed
TobiasLee opened this issue May 5, 2023 · 5 comments
Closed

release of checkpoints of different steps #101

TobiasLee opened this issue May 5, 2023 · 5 comments

Comments

@TobiasLee
Copy link

Hi, thanks for your great work!

I am investigating the emergent abilities of LLM training, and I'd like to request the internal checkpoints of Pythia-12b or Pythia-7b, i.e., checkpoints saved per 1k steps. Could u kindly upload the checkpoints to Huggingface for me?

@haileyschoelkopf
Copy link
Collaborator

Hi, thanks for your interest! Can you say more about what you're looking for?

We currently make all checkpoints (every 1k steps) available in Huggingface format in the repositories on the HF Hub: https://huggingface.co/EleutherAI/pythia-70m/tree/step103000 for example stores the 70m, step 103000 (and likewise for Pythia-12b and Pythia-7b).

If there is some analysis you want to do that requires either retraining or examining the optimizer states, more than happy to upload specific checkpoints upon request! However, we likely would not be able to upload every NeoX checkpoint + optimizer state to Huggingface due to storage constraints (optimizer states cause storage requirements to at least 6x). Does what you intend to do require the non-HF models?

@haileyschoelkopf
Copy link
Collaborator

Closing because we have HF-format models already public! Please don't hesitate to reopen if you need optimizer states or anything else. :)

@jiahai-feng
Copy link

Hi Hailey,

Not the OP, but I am in fact wondering if I could have access to the optimizer states. Having the optimizer state for the last checkpoint will be the most helpful, and having the optimizer states for ~10 evenly spaced checkpoints throughout training will be more than sufficient for me.

@haileyschoelkopf
Copy link
Collaborator

Hi @jiahai-feng , yes, I can get these uploaded for you this week!

@haileyschoelkopf
Copy link
Collaborator

Hi @jiahai-feng I've uploaded all optim states to neox-ckpt-pythia-160m-v1 and likewise for 160m-deduped! Will continue uploading more models' checkpoints.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants