-
Notifications
You must be signed in to change notification settings - Fork 180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lmms-lab/llava-next-72b CUDA out of memory #483
Comments
I am not sure about your GPUs. Just providing a DP, I could run it with 480G A100 or 840G A100. |
thanks for your replying, I use NVIDIA A100-SXM4-80GB. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
when I run sglang/examples/usage/llava/srt_llava_next_test.py,
change to "lmms-lab/llava-next-72b" instead of "lmms-lab/llama3-llava-next-8b", it reports OOM as below.
could anyone take a time to give some suggestions? thank you very much!
Initialization failed. router_init_state: Traceback (most recent call last):
File "/home/ubuntu/wubing/sglang/python/sglang/srt/managers/router/manager.py", line 71, in start_router_process
model_client = ModelRpcClient(server_args, port_args, model_overide_args)
File "/home/ubuntu/wubing/sglang/python/sglang/srt/managers/router/model_rpc.py", line 739, in init
self.model_server = ModelRpcService().exposed_ModelRpcServer(
File "/home/ubuntu/wubing/sglang/python/sglang/srt/managers/router/model_rpc.py", line 73, in init
self.model_runner = ModelRunner(
File "/home/ubuntu/wubing/sglang/python/sglang/srt/managers/router/model_runner.py", line 256, in init
self.load_model()
File "/home/ubuntu/wubing/sglang/python/sglang/srt/managers/router/model_runner.py", line 279, in load_model
self.model = get_model(
File "/home/ubuntu/anaconda3/envs/llava_py310/lib/python3.10/site-packages/vllm/model_executor/model_loader/init.py", line 19, in get_model
return loader.load_model(model_config=model_config,
File "/home/ubuntu/anaconda3/envs/llava_py310/lib/python3.10/site-packages/vllm/model_executor/model_loader/loader.py", line 222, in load_model
model = _initialize_model(model_config, self.load_config,
File "/home/ubuntu/anaconda3/envs/llava_py310/lib/python3.10/site-packages/vllm/model_executor/model_loader/loader.py", line 88, in _initialize_model
return model_class(config=model_config.hf_config,
File "/home/ubuntu/wubing/sglang/python/sglang/srt/models/llava.py", line 298, in init
super().init(config, quant_config=quant_config)
File "/home/ubuntu/wubing/sglang/python/sglang/srt/models/llava.py", line 37, in init
self.language_model = LlamaForCausalLM(config, quant_config=quant_config)
File "/home/ubuntu/wubing/sglang/python/sglang/srt/models/llama2.py", line 261, in init
self.model = LlamaModel(config, quant_config=quant_config)
File "/home/ubuntu/wubing/sglang/python/sglang/srt/models/llama2.py", line 221, in init
[
File "/home/ubuntu/wubing/sglang/python/sglang/srt/models/llama2.py", line 222, in
LlamaDecoderLayer(config, i, quant_config=quant_config)
File "/home/ubuntu/wubing/sglang/python/sglang/srt/models/llama2.py", line 170, in init
self.mlp = LlamaMLP(
File "/home/ubuntu/wubing/sglang/python/sglang/srt/models/llama2.py", line 45, in init
self.down_proj = RowParallelLinear(
File "/home/ubuntu/anaconda3/envs/llava_py310/lib/python3.10/site-packages/vllm/model_executor/layers/linear.py", line 633, in init
self.quant_method.create_weights(self,
File "/home/ubuntu/anaconda3/envs/llava_py310/lib/python3.10/site-packages/vllm/model_executor/layers/linear.py", line 81, in create_weights
weight = Parameter(torch.empty(output_size_per_partition,
File "/home/ubuntu/anaconda3/envs/llava_py310/lib/python3.10/site-packages/torch/utils/_device.py", line 78, in torch_function
return func(*args, **kwargs)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 384.00 MiB. GPU
Initialization failed. detoken_init_state: init ok
Traceback (most recent call last):
File "/home/ubuntu/wubing/sglang/examples/usage/llava/srt_llava_next_test.py", line 64, in
runtime = sgl.Runtime(
File "/home/ubuntu/wubing/sglang/python/sglang/api.py", line 39, in Runtime
return Runtime(*args, **kwargs)
File "/home/ubuntu/wubing/sglang/python/sglang/srt/server.py", line 291, in init
raise RuntimeError(
RuntimeError: Initialization failed. Please see the error messages above.
The text was updated successfully, but these errors were encountered: