Model Initialization Question #129

yanlai00 · 2023-11-03T19:40:25Z

What is the difference between the step 0 model weights you provided and the model weights randomly initialized with huggingface (by calling the two functions below)?

config = transformers.AutoConfig.from_pretrained("EleutherAI/pythia-1b")
model = transformers.AutoModelForCausalLM.from_config(config)

I've been seeing some very different behavior between these two different initializations. (For example, your initialization always trains much faster on my custom task.)

What do I need to do to get an initialization more similar to yours?

The text was updated successfully, but these errors were encountered:

haileyschoelkopf · 2023-11-04T13:49:16Z

Hi, for more info about the initialization we used, please check out the paper, as well as v1.0 of the gpt-neox library, for code used to train these models (which pairs with the config files we provide for the neox library). We use the "wang_init" and "small_init" functions respectively depending on model component, defined here: https://github.com/EleutherAI/gpt-neox/blob/71df4d5017f9f4919566a11454fe3a507ffdc632/megatron/model/init_functions.py#L112

Huggingface is not optimized for training from scratch, and so their random initializations are less likely to be well-tested or optimized for this purpose.

Hope this helps!

haileyschoelkopf closed this as completed Nov 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model Initialization Question #129

Model Initialization Question #129

yanlai00 commented Nov 3, 2023 •

edited

Loading

haileyschoelkopf commented Nov 4, 2023

Model Initialization Question #129

Model Initialization Question #129

Comments

yanlai00 commented Nov 3, 2023 • edited Loading

haileyschoelkopf commented Nov 4, 2023

yanlai00 commented Nov 3, 2023 •

edited

Loading