You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
These seems a set of tools we can learn from. As summarized by HN:
This is exciting because:
"Parameter Efficient" finetuning methods let you customize LLMs without having to train all the parameters
But LoRA (the most popular method) didn't match full finetuning performance on some tasks
DoRA closed the gap while still being very efficient
Quantization (representing the original weights with fewer bits per parameter) makes things even more memory-efficient
FSDP lets you spread the work over multiple GPUs, using less memory on each one.
The upshot is that where you previously needed, say, 8 fancy Nvidia A100s to fine-tune an LLM you can now do so on a few 3090s, and while it might take a little longer you're at least getting something almost as good as (or in some cases possible better than) the full finetuning equivalent.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
https://www.answer.ai/posts/2024-04-26-fsdp-qdora-llama3.html
These seems a set of tools we can learn from. As summarized by HN:
Beta Was this translation helpful? Give feedback.
All reactions