Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU memory very high and unbalanced when testing Gemma #1892

Closed
smartliuhw opened this issue May 27, 2024 · 2 comments
Closed

GPU memory very high and unbalanced when testing Gemma #1892

smartliuhw opened this issue May 27, 2024 · 2 comments

Comments

@smartliuhw
Copy link

I'm running Gemma-7B with RTX 4090, It takes up a lot of GPU memory, but llama3-8B can run normally under the same prompt. When I start the tensor parallel strategy, the GPU memory is very unbalanced (as shown in the figure below). Is this a Gemma bug or is there something wrong with my usage?

My platform:Linux
Python version: 3.10.14
lm-evaluation-harness version: 0.4.2

image

@smartliuhw
Copy link
Author

I'm running Gemma-7B with RTX 4090, It takes up a lot of GPU memory, but llama3-8B can run normally under the same prompt. When I start the tensor parallel strategy, the GPU memory is very unbalanced (as shown in the figure below). Is this a Gemma bug or is there something wrong with my usage?

My platform:Linux Python version: 3.10.14 lm-evaluation-harness version: 0.4.2

image

The figure is running with batch_size 1

@haileyschoelkopf
Copy link
Contributor

hi! Is this using --model hf with parallelize=True?

If so, I think this is sometimes expected behavior from HF. adding device_map_option='balanced' to --model_args may fix this? or simply upping the batch size.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants