Fix bug in tools/ckpts/convert_neox_to_hf.py for setting intermediate…

…_size (EleutherAI#1209) In tools/ckpts/convert_neox_to_hf.py, for neox architecture the 'intermediate_size' argument is not explicitly set, so it defaults to 24576 from: https://github.com/huggingface/transformers/blob/9fe3f585bb4ea29f209dc705d269fbe292e1128f/src/transformers/models/gpt_neox/configuration_gpt_neox.py#L48 Proposed solution: set intermediate-size to 4 * hidden-size
ishandutta2007 · pull · Jul 29, 2024 · Oct 31, 2023 · Oct 31, 2023 · Nov 1, 2023
commit c8149592d936c3a23ecff4c0092d33bd6c64fab5
diff --git a/tools/ckpts/convert_neox_to_hf.py b/tools/ckpts/convert_neox_to_hf.py
@@ -277,6 +277,11 @@ def __init__(self, neox_config):
                 ),
                 "use_parallel_residual": get_key(neox_config, "gpt-j-residual", False),
                 "layer_norm_eps": get_key(neox_config, "layernorm-epsilon", 1e-5),
+                "intermediate_size": get_key(
+                    neox_config,
+                    "intermediate-size",
+                    4 * get_key(neox_config, "hidden-size"),
+                ),
             }
         )
         hf_config = GPTNeoXConfig(**args)