Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'intermediate_size' not set in tools/ckpts/convert_neox_to_hf.py for neox model architecture #1208

Closed
jvendrow opened this issue May 3, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@jvendrow
Copy link
Contributor

jvendrow commented May 3, 2024

Description
When converting neox models to HF format, the 'intermediate_size' argument in the GPTNeoXConfig is not explicitly set, so it defaults to 24576 as per:
https://github.com/huggingface/transformers/blob/main/src/transformers/models/gpt_neox/configuration_gpt_neox.py

To Reproduce
Steps to reproduce the behavior:

  1. Train pythia-70M model
  2. Run convertion script:
$ python ./tools/ckpts/convert_neox_to_hf.py --input_dir checkpoints/pythia-70M/global_step143000/ --config_file pythia-70m.yml --output_dir hf_model/pythia-70M --precision fp16 --architecture neox  
[2024-05-03 11:17:41,262] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect)                                
Detected 'pipe-parallel-size' of 1, assuming model is saved as PipelineModule...                                                                       
> building HFTokenizer tokenizer ...
 > padded vocab (size: 50277) with 27 dummy tokens (new size: 50304) 
 0%|                                                                                                                            | 0/6 [00:00<?, ?it/s]
Traceback (most recent call last):                                                                                                                     
  File "./tools/ckpts/convert_neox_to_hf.py", line 732, in <module>                                                                                    
    main()                                                                                                                                             
  File "./tools/ckpts/convert_neox_to_hf.py", line 696, in main                                                                                        
    hf_model = convert(                                                                                                                                
  File "./tools/ckpts/convert_neox_to_hf.py", line 555, in convert                                                                                     
    hf_layer.load_state_dict(state_dict)                                                                                                               
  File "/mnt/xfs/home/jvendrow/conda_envs/pythia/lib/python3.8/site-packages/torch/nn/modules/module.py", line 2152, in load_state_dict                
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(                                                                          
RuntimeError: Error(s) in loading state_dict for GPTNeoXLayer:                                                                                         
        size mismatch for mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([2048, 512]) from checkpoint, the shape in current model is t
orch.Size([24576, 512]).                                                                                                                               
        size mismatch for mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Si
ze([24576]).                                                                                                                                           
        size mismatch for mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([512, 2048]) from checkpoint, the shape in current model is t
orch.Size([512, 24576]).

Proposed solution
It seems the intermediate size for neox architecture in general is 4 * hidden size. Suggested edit is to add the following for neox models:

args.update(
            {
                "intermediate_size": get_key(
                    neox_config,
                    "intermediate-size",
                    4 * get_key(neox_config, "hidden-size"),
                ),
            }
        )

Happy to make a PR.

@jvendrow jvendrow added the bug Something isn't working label May 3, 2024
@Quentin-Anthony
Copy link
Member

Ah nice catch. Yes I'd welcome this PR.

@jvendrow
Copy link
Contributor Author

jvendrow commented May 4, 2024

Ok great, created PR #1209.

@Quentin-Anthony
Copy link
Member

resolved in #1209

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants