Load fp32 models in bfloat16 when possible #231

norabelrose · 2023-05-01T21:05:55Z

Several models that we'd like to evaluate on, like bigscience/mt0-xxl and allenai/unifiedqa-t5-11b, have float32 checkpoints but were actually trained in bfloat16 on TPUs. Because they're float32, we get out of memory errors when trying to run inference on them. This PR automatically detects if a checkpoint is (likely) float32 before downloading it, and sets torch_dtype=torch.bfloat16 iff torch.cuda.is_bf16_supported() is True.

Some older models, like gpt2, have fp32 checkpoints and were just trained in full precision. But it's nearly impossible for an overflow to occur when running these models in bfloat16, since bf16 has a dynamic range almost equal to that of fp32. There is a bit of precision loss, but empirically neural nets are highly robust to this— as long as there aren't any overflows. So this should be fine. We also print a warning when the downcasting does occur. Maybe we should add a flag to turn off this automatic downcasting, but I haven't included it in this PR for simplicity.

azhx

I pulled this into my branch and ran elk sweep including bigscience/mt0-xxl and it no longer OOMs

norabelrose added 2 commits May 1, 2023 20:41

Automatically use bfloat16 in some cases

a6d7830

Use bfloat16 in more cases; sanity check for int8

0d2604e

norabelrose requested review from azhx and AlexTMallen May 1, 2023 21:05

azhx approved these changes May 3, 2023

View reviewed changes

norabelrose merged commit 2d88580 into main May 3, 2023
4 checks passed

norabelrose deleted the auto-bfloat16 branch May 3, 2023 02:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Load fp32 models in bfloat16 when possible #231

Load fp32 models in bfloat16 when possible #231

norabelrose commented May 1, 2023 •

edited

azhx left a comment •

edited

Load fp32 models in bfloat16 when possible #231

Load fp32 models in bfloat16 when possible #231

Conversation

norabelrose commented May 1, 2023 • edited

azhx left a comment • edited

Choose a reason for hiding this comment

norabelrose commented May 1, 2023 •

edited

azhx left a comment •

edited