pythia-13b size mismatch #41

ejmichaud · 2022-12-23T00:27:45Z

When I run the following code to load up pythia-13b, I get a bunch of size mismatch errors.

model = GPTNeoXForCausalLM.from_pretrained(
        f"EleutherAI/pythia-13b",
        revision=f"step143000",
        cache_dir=f"./"
)

Errors:

Traceback (most recent call last):
  File "download_pythia_models.py", line 34, in <module>
    model = GPTNeoXForCausalLM.from_pretrained(
  File "/om2/user/ericjm/miniconda3/envs/phase-changes/lib/python3.8/site-packages/transformers/modeling_utils.py", line 2379, in from_pretrained
    ) = cls._load_pretrained_model(
  File "/om2/user/ericjm/miniconda3/envs/phase-changes/lib/python3.8/site-packages/transformers/modeling_utils.py", line 2695, in _load_pretrained_model
    raise RuntimeError(f"Error(s) in loading state_dict for {model.__class__.__name__}:\n\t{error_msg}")
RuntimeError: Error(s) in loading state_dict for GPTNeoXForCausalLM:
        size mismatch for gpt_neox.embed_in.weight: copying a param with shape torch.Size([50688, 5120]) from checkpoint, the shape in current model is torch.Size([50432, 4096]).
        size mismatch for gpt_neox.layers.0.input_layernorm.weight: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for gpt_neox.layers.0.input_layernorm.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for gpt_neox.layers.0.post_attention_layernorm.weight: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for gpt_neox.layers.0.post_attention_layernorm.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
...

These continue for every layer of the model. When I use ignore_mismatched_sizes=True in GPTNeoXForCausalLM.from_pretrained, I get this error instead:

Traceback (most recent call last):
  File "/om2/user/ericjm/the-everything-machine/experiments/pythia-0/eval.py", line 52, in <module>
    model = GPTNeoXForCausalLM.from_pretrained(
  File "/om2/user/ericjm/miniconda3/envs/phase-changes/lib/python3.8/site-packages/transformers/modeling_utils.py", line 2379, in from_pretrained
    ) = cls._load_pretrained_model(
  File "/om2/user/ericjm/miniconda3/envs/phase-changes/lib/python3.8/site-packages/transformers/modeling_utils.py", line 2636, in _load_pretrained_model
    mismatched_keys += _find_mismatched_keys(
  File "/om2/user/ericjm/miniconda3/envs/phase-changes/lib/python3.8/site-packages/transformers/modeling_utils.py", line 2564, in _find_mismatched_keys
    and state_dict[checkpoint_key].shape != model_state_dict[model_key].shape
KeyError: 'embed_out.weight'

I imagine that some config just needs to be updated to reflect the actual model sizes? I do not get this error with any of the smaller models.

The text was updated successfully, but these errors were encountered:

haileyschoelkopf · 2022-12-23T17:26:11Z

Does this occur in any 13b checkpoints other than 143000? I looked at a subset of checkpoints and saw the right config filesize for all the others I looked at.

This seems to just be the wrong config.json in that repo branch, I'm reuploading and reconverting the model from NeoX checkpoint to be on the safe side, should complete momentarily!

haileyschoelkopf closed this as completed Dec 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pythia-13b size mismatch #41

pythia-13b size mismatch #41

ejmichaud commented Dec 23, 2022

haileyschoelkopf commented Dec 23, 2022

pythia-13b size mismatch #41

pythia-13b size mismatch #41

Comments

ejmichaud commented Dec 23, 2022

haileyschoelkopf commented Dec 23, 2022