About gpt-neox-20B model hyperparameter #989

peiyingxin · 2023-07-05T08:03:36Z

A confuse about 20B model hyperparameter with hidden-size=6144, num-attention-heads=64, num-layers=44, but LLaMA or GPT model have different model hyperparameter, LLaMA-65B hidden-size=8192, num-attention-heads=64, num-layers=80, it seems that LLaMA deeper and gpt-neox-20B wider? which model hyperparameter is better?
Thank you~

StellaAthena · 2023-07-05T16:54:10Z

There is no compelling evidence for what precise width-to-depth ratio is best.

StellaAthena closed this as completed Jul 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About gpt-neox-20B model hyperparameter #989

About gpt-neox-20B model hyperparameter #989

peiyingxin commented Jul 5, 2023

StellaAthena commented Jul 5, 2023

About gpt-neox-20B model hyperparameter #989

About gpt-neox-20B model hyperparameter #989

Comments

peiyingxin commented Jul 5, 2023

StellaAthena commented Jul 5, 2023