-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPU memory very high and unbalanced when testing Gemma #1892
Comments
The figure is running with batch_size 1 |
hi! Is this using If so, I think this is sometimes expected behavior from HF. adding |
I'm running Gemma-7B with RTX 4090, It takes up a lot of GPU memory, but llama3-8B can run normally under the same prompt. When I start the tensor parallel strategy, the GPU memory is very unbalanced (as shown in the figure below). Is this a Gemma bug or is there something wrong with my usage?
My platform:Linux
Python version: 3.10.14
lm-evaluation-harness version: 0.4.2
The text was updated successfully, but these errors were encountered: