-
Notifications
You must be signed in to change notification settings - Fork 982
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cuda OOM with 20B model #616
Comments
I believe it's not possible to finetune the 20B model only on 2 x 80GB cards. A 20B model roughly requires 20*16 = 320G memory to finetune on, so for each GPU with 80GB on board, you will at least need 320G/80G = 4 cards. |
This is correct. There are some parameter-efficient finetuning techniques such as LoRA and Adapters, but naive finetuning as supported in this library requires the same resources as pretraining, not as inference. |
I am trying to finetune 20B model with APPS dataset with slim weights. The config is identical to the one you provided in the repository with some tweaks (listing them below). But i am constantly getting OOM error.
Changes to the configuration:
Setups I tried:
The only way it worked was with 8 x NVIDIA A100 80 ГБ SXM. Sadly it failed because of another mistake in configuration (doesn't matter). The thing is that now I have to wait for days or weeks to run the finetuning process again. I am using my university cluster that has only 6 nodes with such configuration that are always occupied.
Could you please comment on how to finetune the model properly with 2 x NVIDIA Tesla V100 32 ГБ NVLink or 2 x NVIDIA A100 80 ГБ SXM? What should be the configuration? Is it even possible?
The text was updated successfully, but these errors were encountered: