Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to convert a model parallel model to hugging face model? #880

Closed
guozhiyao opened this issue Apr 12, 2023 · 7 comments
Closed

How to convert a model parallel model to hugging face model? #880

guozhiyao opened this issue Apr 12, 2023 · 7 comments
Assignees
Labels
feature request New feature or request

Comments

@guozhiyao
Copy link

Is your feature request related to a problem? Please describe.
I train a model with "model-parallel-size": 2, and try to convert it to hugging-face model. I refer to tools/convert_to_hf.py for conversion, and the model can load parameters normally, but the generated results are random.
While I train a model with "model-parallel-size": 1, and use the same code to convert and generate, the result is normal. So I suspect it is because of the conversion code.

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

@haileyschoelkopf
Copy link
Contributor

Hi! I can look into this--you should be able to convert this to Huggingface without issues regardless of model-parallel size with our current scripts, so this is surprising and concerning.

What conversion script are you running, and what commit of this repository and DeepSpeed version are you using?

@guozhiyao
Copy link
Author

The commit is 7d682df, the deepspeed version is 0.7.5, and I use the tools/convert_to_hf.py. Besides, I use the rmsnorm, how to merge the parameter of it?

@haileyschoelkopf
Copy link
Contributor

Hi! Thanks for sharing. A couple things:

  1. If you're using Deepspeed 0.7.5, you should try running tools/convert_sequential_to_hf.py off of the current main branch! That conversion script has some changes applied to work for the v2.0 of our library onwards.

  2. If you are using RMSNorm, it's very surprising to me that your model converts properly to Huggingface format when MP=1, since HF doesn't support RMSNorm in GPTNeoXModel to my knowledge. I believe you'd have to write a custom version of the Huggingface code to support RMSNorm first, and then make sure to update the script to port RMSNorm properly even with MP>1.

@guozhiyao
Copy link
Author

I modify the huggingface code to support rmsnorm in gptneox. I use the pp instead of mp to train the model. And load the parameter into huggingface model refer to tools/merge20b.py of 7d682df, and the inference is normal.

@haileyschoelkopf
Copy link
Contributor

I see--I think I'm a bit confused with what cases work and do not work for you, and what code you're using for conversion If I'm understanding correctly you're experiencing the following:

  1. When you train with PP>1 and MP=1, using a copy of tools/merge20b.py you've edited in your fork you can get the same outputs in your HF fork as in NeoX.
  2. When you're training with MP>1, using tools/convert_to_hf.py does not work.

Could you try the following?

  • Update your repository copy to the latest commit
  • Try converting your trained model (PP=1 and MP>1 version) using tools/convert_sequential_to_hf.py with whatever RMSNorm modifications you added to tools/merge20b.py on your end
  • See if performance is equivalent between HF and NeoX?

For your model with PP>1, you'll want to try using (an edited RMSNorm version of) tools/convert_v1.0_to_hf.py I believe.

@StellaAthena
Copy link
Member

@guozhiyao Hey, following up on this.

@Quentin-Anthony
Copy link
Member

Closing this due to inactivity. Feel free to reopen if you'd like to continue investigating @guozhiyao

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants