Reconstructing the FP32 weights | HuggingFace conversion #922

davidvblumenthal · 2023-05-04T13:37:33Z

Is your feature request related to a problem? Please describe.
This might be more of question rather than a feature request. As far as I understand if one trains with mixed precision using deepspeed stage 2or 3, deepspeed provides a script to reconstruct the fp32 weights.

Given the scenario that I train my model using mixed precision and fp16 or even bf16 using deepspeed stage 0 or 1 is there a way to reconstruct the fp32 weights?
If so shouldn't it be the preferred case to share/use the model for inference in its fp32 form?

Thanks for the help :)

StellaAthena · 2023-05-05T03:07:55Z

Given the scenario that I train my model using mixed precision and fp16 or even bf16 using deepspeed stage 0 or 1 is there a way to reconstruct the fp32 weights?

Yes, the model weights are stored in fp32 for stability. Typically the arithmetic operations are done in fp16 or bf16, while a master copy of the weights are stored in fp32.

If so shouldn't it be the preferred case to share/use the model for inference in its fp32 form?

Using weights in fp32 is generally considered strongly undesirable. It takes up twice as much space, meaning the largest LLM you can run inference on will be half the size (in parameters) in fp32. Additionally, inference on Volta, Ampere, and Hopper series GPUs will be significantly slower in fp32 than fp16

davidvblumenthal · 2023-05-08T07:35:08Z

Hi Stella, thanks for your reply!

True, you are absolutely right.
What do you think about the loss in precision?
With large models > 10B parameters restoring the fp32 weights is probably undesirable due to hardware restrictions.
For smaller < 10B parameters it could make senses to use full precision weights or would you say there are no beneficial effects in terms of prediction performance using the fp32 weights relative to fp16 weights?

Thanks for your input, highly appreciate it :)

StellaAthena · 2023-05-08T13:55:15Z

Tim Dettmers and Luke Zettlemoyer have a really excellent paper on the tradeoffs of more parameters fitting in VRAM vs better precision. I view Tim as the leading expert on this subject and mostly just repeat what he recommends as best practices :)

davidvblumenthal added the feature request New feature or request label May 4, 2023

StellaAthena closed this as completed May 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reconstructing the FP32 weights | HuggingFace conversion #922

Reconstructing the FP32 weights | HuggingFace conversion #922

davidvblumenthal commented May 4, 2023

StellaAthena commented May 5, 2023

davidvblumenthal commented May 8, 2023

StellaAthena commented May 8, 2023

Reconstructing the FP32 weights | HuggingFace conversion #922

Reconstructing the FP32 weights | HuggingFace conversion #922

Comments

davidvblumenthal commented May 4, 2023

StellaAthena commented May 5, 2023

davidvblumenthal commented May 8, 2023

StellaAthena commented May 8, 2023