Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reconstructing the FP32 weights | HuggingFace conversion #922

Closed
davidvblumenthal opened this issue May 4, 2023 · 3 comments
Closed

Reconstructing the FP32 weights | HuggingFace conversion #922

davidvblumenthal opened this issue May 4, 2023 · 3 comments
Labels
feature request New feature or request

Comments

@davidvblumenthal
Copy link

Is your feature request related to a problem? Please describe.
This might be more of question rather than a feature request. As far as I understand if one trains with mixed precision using deepspeed stage 2or 3, deepspeed provides a script to reconstruct the fp32 weights.

Given the scenario that I train my model using mixed precision and fp16 or even bf16 using deepspeed stage 0 or 1 is there a way to reconstruct the fp32 weights?
If so shouldn't it be the preferred case to share/use the model for inference in its fp32 form?

Thanks for the help :)

@davidvblumenthal davidvblumenthal added the feature request New feature or request label May 4, 2023
@StellaAthena
Copy link
Member

Given the scenario that I train my model using mixed precision and fp16 or even bf16 using deepspeed stage 0 or 1 is there a way to reconstruct the fp32 weights?

Yes, the model weights are stored in fp32 for stability. Typically the arithmetic operations are done in fp16 or bf16, while a master copy of the weights are stored in fp32.

If so shouldn't it be the preferred case to share/use the model for inference in its fp32 form?

Using weights in fp32 is generally considered strongly undesirable. It takes up twice as much space, meaning the largest LLM you can run inference on will be half the size (in parameters) in fp32. Additionally, inference on Volta, Ampere, and Hopper series GPUs will be significantly slower in fp32 than fp16

@davidvblumenthal
Copy link
Author

Hi Stella, thanks for your reply!

True, you are absolutely right.
What do you think about the loss in precision?
With large models > 10B parameters restoring the fp32 weights is probably undesirable due to hardware restrictions.
For smaller < 10B parameters it could make senses to use full precision weights or would you say there are no beneficial effects in terms of prediction performance using the fp32 weights relative to fp16 weights?

Thanks for your input, highly appreciate it :)

@StellaAthena
Copy link
Member

Tim Dettmers and Luke Zettlemoyer have a really excellent paper on the tradeoffs of more parameters fitting in VRAM vs better precision. I view Tim as the leading expert on this subject and mostly just repeat what he recommends as best practices :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants