-
Notifications
You must be signed in to change notification settings - Fork 229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
no longer can load 72b llava qwen on 4*H100 80GB #485
Comments
with |
Please check again with my PR: #487 and vllm 0.4.3 to check if issue is resolved. Maybe the issue is resolved somewhere here and/or vllm since your last report. I have tested multi-gpu loading and did not see obvious regression in vram usage but again under different env and diff model/gpu. |
This issue has been automatically closed due to inactivity. Please feel free to reopen it if needed. |
After updating to latest main from March 24 version of main, I can no longer run 72b without some kind of OOM.
then
Always leads now to errors below. I also tried
--mem-fraction-static=0.9
or--mem-fraction-static=0.99
and latter gets through further but then fails later still. Before I didn't have this option set at all and was working.failure with
--mem-fraction-static=0.9
:0.98 or 0.99:
no option:
The text was updated successfully, but these errors were encountered: