lmms-lab/llava-next-72b CUDA out of memory #483

bingwork · 2024-05-27T11:47:49Z

when I run sglang/examples/usage/llava/srt_llava_next_test.py,
change to "lmms-lab/llava-next-72b" instead of "lmms-lab/llama3-llava-next-8b", it reports OOM as below.
could anyone take a time to give some suggestions? thank you very much!

Initialization failed. router_init_state: Traceback (most recent call last):
File "/home/ubuntu/wubing/sglang/python/sglang/srt/managers/router/manager.py", line 71, in start_router_process
model_client = ModelRpcClient(server_args, port_args, model_overide_args)
File "/home/ubuntu/wubing/sglang/python/sglang/srt/managers/router/model_rpc.py", line 739, in init
self.model_server = ModelRpcService().exposed_ModelRpcServer(
File "/home/ubuntu/wubing/sglang/python/sglang/srt/managers/router/model_rpc.py", line 73, in init
self.model_runner = ModelRunner(
File "/home/ubuntu/wubing/sglang/python/sglang/srt/managers/router/model_runner.py", line 256, in init
self.load_model()
File "/home/ubuntu/wubing/sglang/python/sglang/srt/managers/router/model_runner.py", line 279, in load_model
self.model = get_model(
File "/home/ubuntu/anaconda3/envs/llava_py310/lib/python3.10/site-packages/vllm/model_executor/model_loader/init.py", line 19, in get_model
return loader.load_model(model_config=model_config,
File "/home/ubuntu/anaconda3/envs/llava_py310/lib/python3.10/site-packages/vllm/model_executor/model_loader/loader.py", line 222, in load_model
model = _initialize_model(model_config, self.load_config,
File "/home/ubuntu/anaconda3/envs/llava_py310/lib/python3.10/site-packages/vllm/model_executor/model_loader/loader.py", line 88, in _initialize_model
return model_class(config=model_config.hf_config,
File "/home/ubuntu/wubing/sglang/python/sglang/srt/models/llava.py", line 298, in init
super().init(config, quant_config=quant_config)
File "/home/ubuntu/wubing/sglang/python/sglang/srt/models/llava.py", line 37, in init
self.language_model = LlamaForCausalLM(config, quant_config=quant_config)
File "/home/ubuntu/wubing/sglang/python/sglang/srt/models/llama2.py", line 261, in init
self.model = LlamaModel(config, quant_config=quant_config)
File "/home/ubuntu/wubing/sglang/python/sglang/srt/models/llama2.py", line 221, in init
[
File "/home/ubuntu/wubing/sglang/python/sglang/srt/models/llama2.py", line 222, in
LlamaDecoderLayer(config, i, quant_config=quant_config)
File "/home/ubuntu/wubing/sglang/python/sglang/srt/models/llama2.py", line 170, in init
self.mlp = LlamaMLP(
File "/home/ubuntu/wubing/sglang/python/sglang/srt/models/llama2.py", line 45, in init
self.down_proj = RowParallelLinear(
File "/home/ubuntu/anaconda3/envs/llava_py310/lib/python3.10/site-packages/vllm/model_executor/layers/linear.py", line 633, in init
self.quant_method.create_weights(self,
File "/home/ubuntu/anaconda3/envs/llava_py310/lib/python3.10/site-packages/vllm/model_executor/layers/linear.py", line 81, in create_weights
weight = Parameter(torch.empty(output_size_per_partition,
File "/home/ubuntu/anaconda3/envs/llava_py310/lib/python3.10/site-packages/torch/utils/_device.py", line 78, in torch_function
return func(*args, **kwargs)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 384.00 MiB. GPU

Initialization failed. detoken_init_state: init ok
Traceback (most recent call last):
File "/home/ubuntu/wubing/sglang/examples/usage/llava/srt_llava_next_test.py", line 64, in
runtime = sgl.Runtime(
File "/home/ubuntu/wubing/sglang/python/sglang/api.py", line 39, in Runtime
return Runtime(*args, **kwargs)
File "/home/ubuntu/wubing/sglang/python/sglang/srt/server.py", line 291, in init
raise RuntimeError(
RuntimeError: Initialization failed. Please see the error messages above.

Luodian · 2024-05-28T12:50:05Z

I am not sure about your GPUs. Just providing a DP, I could run it with 480G A100 or 840G A100.

bingwork · 2024-05-29T03:42:35Z

I am not sure about your GPUs. Just providing a DP, I could run it with 4_80G A100 or 8_40G A100.

thanks for your replying, I use NVIDIA A100-SXM4-80GB.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lmms-lab/llava-next-72b CUDA out of memory #483

lmms-lab/llava-next-72b CUDA out of memory #483

bingwork commented May 27, 2024

Luodian commented May 28, 2024

bingwork commented May 29, 2024

lmms-lab/llava-next-72b CUDA out of memory #483

lmms-lab/llava-next-72b CUDA out of memory #483

Comments

bingwork commented May 27, 2024

Luodian commented May 28, 2024

bingwork commented May 29, 2024