-
Notifications
You must be signed in to change notification settings - Fork 984
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Converting Pythia checkpoint from HF to NeoX fails #1161
Comments
Hi! You can get around this via adding What's your Huggingface version? seems the culprit is this change huggingface/transformers@253f9a3 which made inv_freq non-persistent on the HF side--I was under the impression they reverted this change but it seems I was wrong about that. Will probably update this buffer to non-persistent in GPT-NeoX, but will need to check that this does not break others' existing checkpoints. |
Thanks for the quick response. Adding |
Reopening this to track it since we haven't merged a fix yet! |
Describe the bug
Converting Pythia checkpoint from HF to NeoX fails with a missing key error regarding the rotary embeddings.
To Reproduce
Steps to reproduce the behavior:
I am running this command to convert the Pythia 410M checkpoint to NeoX (for continued pretraining):
Error trace:
Expected behavior
Conversion to NeoX without any error.
Proposed solution
From my understanding
attention.rotary_emb.inv_freq
is not a trainable parameter and thus should not be loaded from the state dict.Environment (please complete the following information):
Thanks for your amazing project!
The text was updated successfully, but these errors were encountered: