Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes gpt mcore conversion to account for _extra_state that may be present #8618

Merged
merged 1 commit into from
Mar 9, 2024

Conversation

terrykong
Copy link
Collaborator

What does this PR do ?

Fixes gpt mcore conversion to account for _extra_state that may be present

Collection: [Note which collection this PR will affect]

Changelog

In the 24.01 container, the following:

wget https://huggingface.co/nvidia/GPT-2B-001/resolve/main/GPT-2B-001_bf16_tp1.nemo
mkdir 2b_model_checkpoint && tar -xvf GPT-2B-001_bf16_tp1.nemo -C 2b_model_checkpoint
docker run -v $PWD/2b_model_checkpoint:/inputs -v $PWD:/outputs --rm -it nvcr.io/ea-bignlp/ga-participants/nemofw-training:24.01 python /opt/NeMo/scripts/nlp_language_modeling/convert_nemo_gpt_to_mcore.py --in-folder /inputs --out-file /outputs/2b_mcore_gpt.nemo --cpu-only

will result in this error:

[NeMo I 2024-03-08 23:08:38 nlp_overrides:1108] Model MegatronGPTModel was successfully restored from /outputs/2b_mcore_gpt.nemo.
[NeMo I 2024-03-08 23:08:38 convert_nemo_gpt_to_mcore:260] Sanity checks:
[NeMo I 2024-03-08 23:08:38 convert_nemo_gpt_to_mcore:265] ✅ Number of weights match
[NeMo W 2024-03-08 23:08:38 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1876: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
      warnings.warn(
    
[NeMo I 2024-03-08 23:08:42 convert_nemo_gpt_to_mcore:292] ✅ Weights match
Traceback (most recent call last):
  File "/opt/NeMo/scripts/nlp_language_modeling/convert_nemo_gpt_to_mcore.py", line 326, in <module>
    run_sanity_checks(input_nemo_file, output_nemo_file, cpu_only=cpu_only, ignore_if_missing=ignore_if_missing)
  File "/opt/NeMo/scripts/nlp_language_modeling/convert_nemo_gpt_to_mcore.py", line 295, in run_sanity_checks
    assert len(nemo_state_dict) == 0, f"❌ unexpected items in nemo_state_dict: {nemo_state_dict}"
AssertionError: ❌ unexpected items in nemo_state_dict: OrderedDict([('model.language_model.encoder.layers.0.self_attention.query_key_value._extra_state', None), ('model.language_model.encoder.layers.0.self_attention.dense._extra_state', None), ('model.language_model.encoder.layers.0.mlp.dense_h_to_4h._extra_state', None), ('model.language_model.encoder.layers.0.mlp.dense_4h_to_h._extra_state', None), ('model.language_model.encoder.layers.1.self_attention.query_key_value._extra_state', None), ('model.language_model.encoder.layers.1.self_attention.dense._extra_state', None), 

This change just updates the check to ignore keys with "_extra_state". FWIW, the nemo_state_dict has None for these _extra_state keys and mcore_state_dict is an empty io.bytes object:

mcore_state_dict[ 'model.decoder.layers.15.mlp.linear_fc1._extra_state']
<_io.BytesIO object at 0x7fc01ea8a020>
(Pdb) mcore_state_dict[ 'model.decoder.layers.15.mlp.linear_fc1._extra_state'].read()
b''

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this 

Jenkins CI

To run Jenkins, a NeMo User with write access must comment jenkins on the PR.

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • New Feature
  • Bugfix
  • Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

  • Related to # (issue)

@cuichenx cuichenx self-requested a review March 9, 2024 00:08
Copy link
Collaborator

@cuichenx cuichenx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for fixing!

@ericharper ericharper merged commit 438db62 into NVIDIA:main Mar 9, 2024
8 of 9 checks passed
@terrykong terrykong deleted the gpt-mcore-conversion-fix branch March 9, 2024 00:19
va290 pushed a commit to va290/NeMo that referenced this pull request Mar 10, 2024
Agoniii pushed a commit to Agoniii/NeMo that referenced this pull request Mar 15, 2024
JRD971000 pushed a commit that referenced this pull request Mar 15, 2024
pablo-garay pushed a commit that referenced this pull request Mar 19, 2024
rohitrango pushed a commit to rohitrango/NeMo that referenced this pull request Jun 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants