-
Notifications
You must be signed in to change notification settings - Fork 260
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
llama_7b model OOM issue #2051
Comments
We only guarantee the runability of models on PT eager mode on A100 40GB in our CI. It is possible that inductor uses more GPU memory than eager mode, causing OOM. Optimizing GPU memory usage with inductor is an open question. |
@xuzhao9 I tried to use 4xA100-40G to avoid the OOM issue, looks torchbench.py only use one GPU's memory, I used options like --device-index or --multiprocess, both failed. do you have any advice on multi GPU support? thanks |
@jinsong-mao Unfortunately, we don't have multi-GPU support right now due to lack of CI infra to test it. If you are interested, you could try the |
Hi
I duplicate the llama model and rename it into llama_7b, changed the model parameters according to llama_7b specification, looks like this:
skiped the CPU eager mode, only run the cuda model.
it reports the following issue when running with this command:
python userbenchmark/dynamo/dynamobench/torchbench.py -dcuda --float16 -n1 --inductor --performance --inference --filter "llama" --batch_size 1 --in_slen 32 --out_slen 3 --output-dir=torchbench_llama_test_logs
If I want to run this model, how should I fix it? my hardware is A100-40G
thanks
The text was updated successfully, but these errors were encountered: