-
Notifications
You must be signed in to change notification settings - Fork 983
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reconstructing the FP32 weights | HuggingFace conversion #922
Comments
Yes, the model weights are stored in fp32 for stability. Typically the arithmetic operations are done in fp16 or bf16, while a master copy of the weights are stored in fp32.
Using weights in fp32 is generally considered strongly undesirable. It takes up twice as much space, meaning the largest LLM you can run inference on will be half the size (in parameters) in fp32. Additionally, inference on Volta, Ampere, and Hopper series GPUs will be significantly slower in fp32 than fp16 |
Hi Stella, thanks for your reply! True, you are absolutely right. Thanks for your input, highly appreciate it :) |
Tim Dettmers and Luke Zettlemoyer have a really excellent paper on the tradeoffs of more parameters fitting in VRAM vs better precision. I view Tim as the leading expert on this subject and mostly just repeat what he recommends as best practices :) |
Is your feature request related to a problem? Please describe.
This might be more of question rather than a feature request. As far as I understand if one trains with mixed precision using deepspeed stage 2or 3, deepspeed provides a script to reconstruct the fp32 weights.
Given the scenario that I train my model using mixed precision and fp16 or even bf16 using deepspeed stage 0 or 1 is there a way to reconstruct the fp32 weights?
If so shouldn't it be the preferred case to share/use the model for inference in its fp32 form?
Thanks for the help :)
The text was updated successfully, but these errors were encountered: