Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Save hidden states in bfloat16 #208

Open
norabelrose opened this issue Apr 21, 2023 · 0 comments
Open

Save hidden states in bfloat16 #208

norabelrose opened this issue Apr 21, 2023 · 0 comments
Assignees
Labels
bug Something isn't working

Comments

@norabelrose
Copy link
Member

Right now it appears that we're getting numerical overflows when casting to float16 in float32_to_int16 for some models (RWKV-LM, T0pp, Unified QA). These models were trained in bfloat16, not float16, which has a higher dynamic range and can represent larger numbers than float16 can.

bfloat16 has the nice property that its dynamic range is effectively the same as float32, it just has lower precision within that range. So we should essentially never convert large numbers into inf if we save everything in bfloat16.
Captura de pantalla 2023-04-21 a la(s) 12 43 48 p m

So I'm proposing we save everything in bfloat16 across the board. Since this isn't supported natively by datasets or its Apache Arrow backend, we'd still use the reinterpret-as-int16 hack. The other option would be to choose which precision to use dynamically based on the precision of the model's weights, but I'd prefer not to deal with that complexity right now.

@norabelrose norabelrose added the bug Something isn't working label Apr 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants