Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Converting Pythia checkpoint from HF to NeoX fails #1161

Closed
malteos opened this issue Feb 29, 2024 · 3 comments · Fixed by #1168
Closed

Converting Pythia checkpoint from HF to NeoX fails #1161

malteos opened this issue Feb 29, 2024 · 3 comments · Fixed by #1168
Assignees
Labels
bug Something isn't working

Comments

@malteos
Copy link

malteos commented Feb 29, 2024

Describe the bug

Converting Pythia checkpoint from HF to NeoX fails with a missing key error regarding the rotary embeddings.

To Reproduce
Steps to reproduce the behavior:

I am running this command to convert the Pythia 410M checkpoint to NeoX (for continued pretraining):

OMPI_COMM_WORLD_RANK=0 CUDA_VISIBLE_DEVICES=0 python $NEOX_DIR/tools/ckpts/convert_hf_to_sequential.py \
>     --hf-model-name pythia-410m \
>     --revision 143000 \
>     --output-dir $BASE_DIR/data/pythia-410m/neox_converted_checkpoints/ \
>     --cache-dir $TRANSFORMERS_CACHE \
>     --config $BASE_DIR/neox_configs/continued-pythia-410m_pegasus.yml \
>     --test

Error trace:

Traceback (most recent call last):
  File "/netscratch/experiments/gpt-neox/tools/ckpts/convert_hf_to_sequential.py", line 581, in <module>
    load_checkpoint(
  File "/netscratch/experiments/gpt-neox/megatron/checkpointing.py", line 390, in load_checkpoint
    checkpoint_name, state_dict = model.load_checkpoint(
  File "/usr/local/lib/python3.8/dist-packages/deepspeed/runtime/engine.py", line 2599, in load_checkpoint
    load_path, client_states = self._load_checkpoint(load_dir,
  File "/usr/local/lib/python3.8/dist-packages/deepspeed/runtime/engine.py", line 2662, in _load_checkpoint
    self.load_module_state_dict(checkpoint=checkpoint,
  File "/usr/local/lib/python3.8/dist-packages/deepspeed/runtime/pipe/engine.py", line 1274, in load_module_state_dict
    self.module.load_state_dir(load_dir=self._curr_ckpt_path,
  File "/usr/local/lib/python3.8/dist-packages/deepspeed/runtime/pipe/module.py", line 598, in load_state_dir
    layer.load_state_dict(checkpoint)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1667, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for ParallelTransformerLayerPipe:
        Missing key(s) in state_dict: "attention.rotary_emb.inv_freq".

Expected behavior

Conversion to NeoX without any error.

Proposed solution

From my understanding attention.rotary_emb.inv_freq is not a trainable parameter and thus should not be loaded from the state dict.

Environment (please complete the following information):

Thanks for your amazing project!

@malteos malteos added the bug Something isn't working label Feb 29, 2024
@haileyschoelkopf
Copy link
Contributor

Hi! You can get around this via adding persistent=False to register_buffer("inv_freq".... calls in the NeoX library, for now.

What's your Huggingface version? seems the culprit is this change huggingface/transformers@253f9a3 which made inv_freq non-persistent on the HF side--I was under the impression they reverted this change but it seems I was wrong about that.

Will probably update this buffer to non-persistent in GPT-NeoX, but will need to check that this does not break others' existing checkpoints.

@malteos
Copy link
Author

malteos commented Mar 1, 2024

Thanks for the quick response. Adding persistent=False to the register_buffer calls fixed the problem!

@malteos malteos closed this as completed Mar 1, 2024
@haileyschoelkopf
Copy link
Contributor

Reopening this to track it since we haven't merged a fix yet!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants