'attention.bias' and 'attention.masked_bias' not in `hf_layer.state_dict()` when converting gpt-neox model to huggingface #1013

johntzwei · 2023-08-18T23:39:52Z

Describe the bug
A clear and concise description of what the bug is.

I encounter the following error when I am converting GPTNeoX models to Huggingface using the tools/convert_module_to_hf.py script.

(gpt-neox) johnny@ink-lucy:~/gpt-neox$ bash haveibeentrainedon/wikitext/pilot/convert_to_hf.sh 
[2023-08-18 23:37:21,695] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect)
> building GPT2BPETokenizer tokenizer ...
 > padded vocab (size: 50257) with 47 dummy tokens (new size: 50304)
Saving weights in fp16 precision...
  0%|                                                                                           | 0/24 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "./tools/convert_module_to_hf.py", line 307, in <module>
    hf_model = convert(args.input_dir, loaded_config, args.output_dir)
  File "./tools/convert_module_to_hf.py", line 230, in convert
    state_dict["attention.bias"] = hf_layer.state_dict()["attention.bias"]
KeyError: 'attention.bias'

Expected behavior
Successful conversion.

Proposed solution
If you comment out lines 230 and 231, the script will run through. From an eyeballing of the results, it doesn't seem like language modelling performance seriously degraded. Could this be some code that was supposed to be taken out?

Additional context
This is for a model trained with the config configs/pythia/410m.yml

The text was updated successfully, but these errors were encountered:

shuheikurita · 2023-08-26T09:59:16Z

Use pip install transformers==4.30.2

transformers>=4.31.0:
ipdb> hf_layer.state_dict().keys()
odict_keys(['input_layernorm.weight', 'input_layernorm.bias', 'post_attention_layernorm.weight', 'post_attention_layernorm.bias', 'attention.rotary_emb.inv_freq', 'attention.query_key_value.weight', 'attention.query_key_value.bias', 'attention.dense.weight', 'attention.dense.bias', 'mlp.dense_h_to_4h.weight', 'mlp.dense_h_to_4h.bias', 'mlp.dense_4h_to_h.weight', 'mlp.dense_4h_to_h.bias'])

transformers<=4.30.2 :
ipdb> hf_layer.state_dict().keys()
odict_keys(['input_layernorm.weight', 'input_layernorm.bias', 'post_attention_layernorm.weight', 'post_attention_layernorm.bias', 'attention.bias', 'attention.masked_bias', 'attention.rotary_emb.inv_freq', 'attention.query_key_value.weight', 'attention.query_key_value.bias', 'attention.dense.weight', 'attention.dense.bias', 'mlp.dense_h_to_4h.weight', 'mlp.dense_h_to_4h.bias', 'mlp.dense_4h_to_h.weight', 'mlp.dense_4h_to_h.bias'])

You can find missing 'attention.bias', 'attention.masked_bias' before transformers==4.30.2.

dashstander · 2023-09-13T20:34:19Z

Thanks so much @shuheikurita, I just made a PR to update the transformers version

johntzwei added the bug Something isn't working label Aug 18, 2023

dashstander self-assigned this Sep 13, 2023

dashstander linked a pull request Sep 13, 2023 that will close this issue

Bump transformers version and update enwik8 link #1024

Merged

dashstander mentioned this issue Sep 13, 2023

Bump transformers version and update enwik8 link #1024

Merged

Quentin-Anthony closed this as completed in #1024 Sep 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

'attention.bias' and 'attention.masked_bias' not in `hf_layer.state_dict()` when converting gpt-neox model to huggingface #1013

'attention.bias' and 'attention.masked_bias' not in `hf_layer.state_dict()` when converting gpt-neox model to huggingface #1013

johntzwei commented Aug 18, 2023 •

edited

Loading

shuheikurita commented Aug 26, 2023 •

edited

Loading

dashstander commented Sep 13, 2023

'attention.bias' and 'attention.masked_bias' not in hf_layer.state_dict() when converting gpt-neox model to huggingface #1013

'attention.bias' and 'attention.masked_bias' not in hf_layer.state_dict() when converting gpt-neox model to huggingface #1013

Comments

johntzwei commented Aug 18, 2023 • edited Loading

shuheikurita commented Aug 26, 2023 • edited Loading

dashstander commented Sep 13, 2023

'attention.bias' and 'attention.masked_bias' not in `hf_layer.state_dict()` when converting gpt-neox model to huggingface #1013

'attention.bias' and 'attention.masked_bias' not in `hf_layer.state_dict()` when converting gpt-neox model to huggingface #1013

johntzwei commented Aug 18, 2023 •

edited

Loading

shuheikurita commented Aug 26, 2023 •

edited

Loading