Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds a script to convert HF checkpoints to NeoX 2.0 with mp and pp sharding #907

Merged
merged 3 commits into from
May 19, 2023

Conversation

bentherien
Copy link
Contributor

@bentherien bentherien commented Apr 27, 2023

The "convert_hf_to_sequential.py" script allows users to convert model checkpoints from HF format to NeoX 2.0 and cache them in the specified (by the config file) pipe-parallel-size and model-parallel-size.

The main use case is to enable simple caching of pre-trained models for fine-tuning in NeoX 2.0.

Included functionality:
- convert_hf_to_sequential() conversion function tested with the Pythia suite
- sharding for (MP=1, PP=0), (MP=1,PP=1), (MP>1,PP=0), (MP>1,PP>1)
- logit testing for comparing with HF models (only available for world_size=1)

Playing with the script and logit testing uncovers some interesting findings:
1. Disabling flash attention for the converted Pythia 70M model leads to numerical overflow in NeoX 2.0.
2. Even with flash attention enabled, the forward pass between the NeoX 2.0 model and the HF model is not identical (we found this to be caused by a difference in applying the rotary embedding).

@CLAassistant
Copy link

CLAassistant commented Apr 27, 2023

CLA assistant check
All committers have signed the CLA.

bentherien and others added 2 commits April 29, 2023 15:11
… related to a hardcoded value withing the conversion script (3) fixed possible bugs in the conversion script wrt the mp sharding convention
@StellaAthena
Copy link
Member

@bentherien thank you for your contribution! Please sign the CLA and we will review and merge this :)

@Quentin-Anthony Quentin-Anthony merged commit b70d004 into EleutherAI:main May 19, 2023
0 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants