-
Notifications
You must be signed in to change notification settings - Fork 154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Model Initialization Question #129
Comments
Hi, for more info about the initialization we used, please check out the paper, as well as v1.0 of the gpt-neox library, for code used to train these models (which pairs with the config files we provide for the neox library). We use the "wang_init" and "small_init" functions respectively depending on model component, defined here: https://github.com/EleutherAI/gpt-neox/blob/71df4d5017f9f4919566a11454fe3a507ffdc632/megatron/model/init_functions.py#L112 Huggingface is not optimized for training from scratch, and so their random initializations are less likely to be well-tested or optimized for this purpose. Hope this helps! |
What is the difference between the
step 0
model weights you provided and the model weights randomly initialized with huggingface (by calling the two functions below)?I've been seeing some very different behavior between these two different initializations. (For example, your initialization always trains much faster on my custom task.)
What do I need to do to get an initialization more similar to yours?
The text was updated successfully, but these errors were encountered: