Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model Initialization Question #129

Closed
yanlai00 opened this issue Nov 3, 2023 · 1 comment
Closed

Model Initialization Question #129

yanlai00 opened this issue Nov 3, 2023 · 1 comment

Comments

@yanlai00
Copy link

yanlai00 commented Nov 3, 2023

What is the difference between the step 0 model weights you provided and the model weights randomly initialized with huggingface (by calling the two functions below)?

config = transformers.AutoConfig.from_pretrained("EleutherAI/pythia-1b")
model = transformers.AutoModelForCausalLM.from_config(config)

I've been seeing some very different behavior between these two different initializations. (For example, your initialization always trains much faster on my custom task.)

What do I need to do to get an initialization more similar to yours?

@haileyschoelkopf
Copy link
Collaborator

Hi, for more info about the initialization we used, please check out the paper, as well as v1.0 of the gpt-neox library, for code used to train these models (which pairs with the config files we provide for the neox library). We use the "wang_init" and "small_init" functions respectively depending on model component, defined here: https://github.com/EleutherAI/gpt-neox/blob/71df4d5017f9f4919566a11454fe3a507ffdc632/megatron/model/init_functions.py#L112

Huggingface is not optimized for training from scratch, and so their random initializations are less likely to be well-tested or optimized for this purpose.

Hope this helps!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants