Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add intermediate_size to GPT-NeoX models #1212

Merged
merged 11 commits into from
Sep 7, 2024
Next Next commit
Update transformer.py -> Add intermediate_size
  • Loading branch information
dtamayo-nlp committed May 10, 2024
commit 6c6a46bbc84f3f66e5019a3f4c3cce69c27ec80c
20 changes: 13 additions & 7 deletions megatron/model/transformer.py
Original file line number Diff line number Diff line change
Expand Up @@ -102,13 +102,19 @@ def __init__(
self.activation_type = neox_args.activation
self.bias_gelu_fusion = neox_args.bias_gelu_fusion

# auto scale so geglu has equal parameters
ff_mult = int(4 * 2 / 3) if self.activation_type == "geglu" else 4
ff_dim = (
int(ff_mult * neox_args.hidden_size) * 2
if self.activation_type == "geglu"
else ff_mult * neox_args.hidden_size
)

if neox_args.intermediate_size:
ff_dim = neox_args.intermediate_size

else:
# auto scale so geglu has equal parameters
ff_mult = int(4 * 2 / 3) if self.activation_type == "geglu" else 4
ff_dim = (
int(ff_mult * neox_args.hidden_size) * 2
if self.activation_type == "geglu"
else int(ff_mult * neox_args.hidden_size)
)

self.dense_h_to_4h = mpu.ColumnParallelLinear(
neox_args=neox_args,
input_size=neox_args.hidden_size,
Expand Down