How to convert a model parallel model to hugging face model? #880

guozhiyao · 2023-04-12T09:14:20Z

Is your feature request related to a problem? Please describe.
I train a model with "model-parallel-size": 2, and try to convert it to hugging-face model. I refer to tools/convert_to_hf.py for conversion, and the model can load parameters normally, but the generated results are random.
While I train a model with "model-parallel-size": 1, and use the same code to convert and generate, the result is normal. So I suspect it is because of the conversion code.

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

The text was updated successfully, but these errors were encountered:

haileyschoelkopf · 2023-04-13T20:58:14Z

Hi! I can look into this--you should be able to convert this to Huggingface without issues regardless of model-parallel size with our current scripts, so this is surprising and concerning.

What conversion script are you running, and what commit of this repository and DeepSpeed version are you using?

guozhiyao · 2023-04-14T11:24:33Z

The commit is 7d682df, the deepspeed version is 0.7.5, and I use the tools/convert_to_hf.py. Besides, I use the rmsnorm, how to merge the parameter of it?

haileyschoelkopf · 2023-04-14T20:18:57Z

Hi! Thanks for sharing. A couple things:

If you're using Deepspeed 0.7.5, you should try running tools/convert_sequential_to_hf.py off of the current main branch! That conversion script has some changes applied to work for the v2.0 of our library onwards.
If you are using RMSNorm, it's very surprising to me that your model converts properly to Huggingface format when MP=1, since HF doesn't support RMSNorm in GPTNeoXModel to my knowledge. I believe you'd have to write a custom version of the Huggingface code to support RMSNorm first, and then make sure to update the script to port RMSNorm properly even with MP>1.

guozhiyao · 2023-04-15T01:54:16Z

I modify the huggingface code to support rmsnorm in gptneox. I use the pp instead of mp to train the model. And load the parameter into huggingface model refer to tools/merge20b.py of 7d682df, and the inference is normal.

haileyschoelkopf · 2023-04-19T03:01:02Z

I see--I think I'm a bit confused with what cases work and do not work for you, and what code you're using for conversion If I'm understanding correctly you're experiencing the following:

When you train with PP>1 and MP=1, using a copy of tools/merge20b.py you've edited in your fork you can get the same outputs in your HF fork as in NeoX.
When you're training with MP>1, using tools/convert_to_hf.py does not work.

Could you try the following?

Update your repository copy to the latest commit
Try converting your trained model (PP=1 and MP>1 version) using tools/convert_sequential_to_hf.py with whatever RMSNorm modifications you added to tools/merge20b.py on your end
See if performance is equivalent between HF and NeoX?

For your model with PP>1, you'll want to try using (an edited RMSNorm version of) tools/convert_v1.0_to_hf.py I believe.

StellaAthena · 2023-05-11T06:25:58Z

@guozhiyao Hey, following up on this.

Quentin-Anthony · 2023-05-18T20:56:46Z

Closing this due to inactivity. Feel free to reopen if you'd like to continue investigating @guozhiyao

guozhiyao added the feature request New feature or request label Apr 12, 2023

Quentin-Anthony mentioned this issue Apr 13, 2023

How to convert a model parallel model to hugging face model? #879

Closed

Quentin-Anthony closed this as completed May 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to convert a model parallel model to hugging face model? #880

How to convert a model parallel model to hugging face model? #880

guozhiyao commented Apr 12, 2023

haileyschoelkopf commented Apr 13, 2023

guozhiyao commented Apr 14, 2023

haileyschoelkopf commented Apr 14, 2023

guozhiyao commented Apr 15, 2023

haileyschoelkopf commented Apr 19, 2023

StellaAthena commented May 11, 2023

Quentin-Anthony commented May 18, 2023

How to convert a model parallel model to hugging face model? #880

How to convert a model parallel model to hugging face model? #880

Comments

guozhiyao commented Apr 12, 2023

haileyschoelkopf commented Apr 13, 2023

guozhiyao commented Apr 14, 2023

haileyschoelkopf commented Apr 14, 2023

guozhiyao commented Apr 15, 2023

haileyschoelkopf commented Apr 19, 2023

StellaAthena commented May 11, 2023

Quentin-Anthony commented May 18, 2023