Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pythia-13b size mismatch #41

Closed
ejmichaud opened this issue Dec 23, 2022 · 1 comment
Closed

pythia-13b size mismatch #41

ejmichaud opened this issue Dec 23, 2022 · 1 comment

Comments

@ejmichaud
Copy link

When I run the following code to load up pythia-13b, I get a bunch of size mismatch errors.

model = GPTNeoXForCausalLM.from_pretrained(
        f"EleutherAI/pythia-13b",
        revision=f"step143000",
        cache_dir=f"./"
)

Errors:

Traceback (most recent call last):
  File "download_pythia_models.py", line 34, in <module>
    model = GPTNeoXForCausalLM.from_pretrained(
  File "/om2/user/ericjm/miniconda3/envs/phase-changes/lib/python3.8/site-packages/transformers/modeling_utils.py", line 2379, in from_pretrained
    ) = cls._load_pretrained_model(
  File "/om2/user/ericjm/miniconda3/envs/phase-changes/lib/python3.8/site-packages/transformers/modeling_utils.py", line 2695, in _load_pretrained_model
    raise RuntimeError(f"Error(s) in loading state_dict for {model.__class__.__name__}:\n\t{error_msg}")
RuntimeError: Error(s) in loading state_dict for GPTNeoXForCausalLM:
        size mismatch for gpt_neox.embed_in.weight: copying a param with shape torch.Size([50688, 5120]) from checkpoint, the shape in current model is torch.Size([50432, 4096]).
        size mismatch for gpt_neox.layers.0.input_layernorm.weight: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for gpt_neox.layers.0.input_layernorm.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for gpt_neox.layers.0.post_attention_layernorm.weight: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for gpt_neox.layers.0.post_attention_layernorm.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
...

These continue for every layer of the model. When I use ignore_mismatched_sizes=True in GPTNeoXForCausalLM.from_pretrained, I get this error instead:

Traceback (most recent call last):
  File "/om2/user/ericjm/the-everything-machine/experiments/pythia-0/eval.py", line 52, in <module>
    model = GPTNeoXForCausalLM.from_pretrained(
  File "/om2/user/ericjm/miniconda3/envs/phase-changes/lib/python3.8/site-packages/transformers/modeling_utils.py", line 2379, in from_pretrained
    ) = cls._load_pretrained_model(
  File "/om2/user/ericjm/miniconda3/envs/phase-changes/lib/python3.8/site-packages/transformers/modeling_utils.py", line 2636, in _load_pretrained_model
    mismatched_keys += _find_mismatched_keys(
  File "/om2/user/ericjm/miniconda3/envs/phase-changes/lib/python3.8/site-packages/transformers/modeling_utils.py", line 2564, in _find_mismatched_keys
    and state_dict[checkpoint_key].shape != model_state_dict[model_key].shape
KeyError: 'embed_out.weight'

I imagine that some config just needs to be updated to reflect the actual model sizes? I do not get this error with any of the smaller models.

@haileyschoelkopf
Copy link
Collaborator

Does this occur in any 13b checkpoints other than 143000? I looked at a subset of checkpoints and saw the right config filesize for all the others I looked at.

This seems to just be the wrong config.json in that repo branch, I'm reuploading and reconverting the model from NeoX checkpoint to be on the safe side, should complete momentarily!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants