Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ways to load GPT-NeoX checkpoints in GPT-Neo for TPU training? #475

Closed
frankxu2004 opened this issue Dec 3, 2021 · 2 comments
Closed

Ways to load GPT-NeoX checkpoints in GPT-Neo for TPU training? #475

frankxu2004 opened this issue Dec 3, 2021 · 2 comments

Comments

@frankxu2004
Copy link

Thanks for the great project. I wonder if I could load the checkpoints for GPT-NeoX to GPT-Neo for the reason of leveraging TPU training. Or better yet does GPT-NeoX supports TPU training as well?

@StellaAthena
Copy link
Member

StellaAthena commented Dec 4, 2021

There is not currently a supported way to convert GPT-NeoX checkpoints to GPT-Neo, and you cannot use GPT-NeoX on a large TPU cluster. I do not anticipate us adding this as a feature in the future either.

For training on TPUs I recommend checking out Mesh Transformer JAX, the library that trained GPT-J.

@nikhilanayak
Copy link

Can GPT NeoX 20B be trained with Mesh Transformer JAX?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants