Adds a script to convert HF checkpoints to NeoX 2.0 with mp and pp sharding #907

bentherien · 2023-04-27T19:11:59Z

The "convert_hf_to_sequential.py" script allows users to convert model checkpoints from HF format to NeoX 2.0 and cache them in the specified (by the config file) pipe-parallel-size and model-parallel-size.

The main use case is to enable simple caching of pre-trained models for fine-tuning in NeoX 2.0.

Included functionality:
- convert_hf_to_sequential() conversion function tested with the Pythia suite
- sharding for (MP=1, PP=0), (MP=1,PP=1), (MP>1,PP=0), (MP>1,PP>1)
- logit testing for comparing with HF models (only available for world_size=1)

Playing with the script and logit testing uncovers some interesting findings:
1. Disabling flash attention for the converted Pythia 70M model leads to numerical overflow in NeoX 2.0.
2. Even with flash attention enabled, the forward pass between the NeoX 2.0 model and the HF model is not identical (we found this to be caused by a difference in applying the rotary embedding).

CLAassistant · 2023-04-27T19:12:05Z

All committers have signed the CLA.

tools/convert_hf_to_sequential.py

… related to a hardcoded value withing the conversion script (3) fixed possible bugs in the conversion script wrt the mp sharding convention

StellaAthena · 2023-05-19T05:02:02Z

@bentherien thank you for your contribution! Please sign the CLA and we will review and merge this :)

added HF to NeoX 2.0 conversion script with mp and pp sharding

3820426

bentherien requested a review from a team as a code owner April 27, 2023 19:12

bentherien requested review from Quentin-Anthony and StellaAthena April 27, 2023 19:12

bentherien mentioned this pull request Apr 27, 2023

Create script to allow conversion from HF ckpt to neox #846

Closed

3 tasks

Ayushk4 reviewed Apr 28, 2023

View reviewed changes

tools/convert_hf_to_sequential.py Show resolved Hide resolved

bentherien and others added 2 commits April 29, 2023 15:11

(1) added missing curly brace to pythial/1-4B config; (2) fixed a bug…

575f706

… related to a hardcoded value withing the conversion script (3) fixed possible bugs in the conversion script wrt the mp sharding convention

Merge branch 'main' into main

eb8978b

Quentin-Anthony approved these changes May 19, 2023

View reviewed changes

Quentin-Anthony merged commit b70d004 into EleutherAI:main May 19, 2023
0 of 3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds a script to convert HF checkpoints to NeoX 2.0 with mp and pp sharding #907

Adds a script to convert HF checkpoints to NeoX 2.0 with mp and pp sharding #907

bentherien commented Apr 27, 2023 •

edited

Loading

CLAassistant commented Apr 27, 2023 •

edited

Loading

StellaAthena commented May 19, 2023

Adds a script to convert HF checkpoints to NeoX 2.0 with mp and pp sharding #907

Adds a script to convert HF checkpoints to NeoX 2.0 with mp and pp sharding #907

Conversation

bentherien commented Apr 27, 2023 • edited Loading

CLAassistant commented Apr 27, 2023 • edited Loading

StellaAthena commented May 19, 2023

bentherien commented Apr 27, 2023 •

edited

Loading

CLAassistant commented Apr 27, 2023 •

edited

Loading