Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lmms-lab/llava-next-72b CUDA out of memory #483

Open
bingwork opened this issue May 27, 2024 · 2 comments
Open

lmms-lab/llava-next-72b CUDA out of memory #483

bingwork opened this issue May 27, 2024 · 2 comments

Comments

@bingwork
Copy link
Contributor

when I run sglang/examples/usage/llava/srt_llava_next_test.py,
change to "lmms-lab/llava-next-72b" instead of "lmms-lab/llama3-llava-next-8b", it reports OOM as below.
could anyone take a time to give some suggestions? thank you very much!

Initialization failed. router_init_state: Traceback (most recent call last):
File "/home/ubuntu/wubing/sglang/python/sglang/srt/managers/router/manager.py", line 71, in start_router_process
model_client = ModelRpcClient(server_args, port_args, model_overide_args)
File "/home/ubuntu/wubing/sglang/python/sglang/srt/managers/router/model_rpc.py", line 739, in init
self.model_server = ModelRpcService().exposed_ModelRpcServer(
File "/home/ubuntu/wubing/sglang/python/sglang/srt/managers/router/model_rpc.py", line 73, in init
self.model_runner = ModelRunner(
File "/home/ubuntu/wubing/sglang/python/sglang/srt/managers/router/model_runner.py", line 256, in init
self.load_model()
File "/home/ubuntu/wubing/sglang/python/sglang/srt/managers/router/model_runner.py", line 279, in load_model
self.model = get_model(
File "/home/ubuntu/anaconda3/envs/llava_py310/lib/python3.10/site-packages/vllm/model_executor/model_loader/init.py", line 19, in get_model
return loader.load_model(model_config=model_config,
File "/home/ubuntu/anaconda3/envs/llava_py310/lib/python3.10/site-packages/vllm/model_executor/model_loader/loader.py", line 222, in load_model
model = _initialize_model(model_config, self.load_config,
File "/home/ubuntu/anaconda3/envs/llava_py310/lib/python3.10/site-packages/vllm/model_executor/model_loader/loader.py", line 88, in _initialize_model
return model_class(config=model_config.hf_config,
File "/home/ubuntu/wubing/sglang/python/sglang/srt/models/llava.py", line 298, in init
super().init(config, quant_config=quant_config)
File "/home/ubuntu/wubing/sglang/python/sglang/srt/models/llava.py", line 37, in init
self.language_model = LlamaForCausalLM(config, quant_config=quant_config)
File "/home/ubuntu/wubing/sglang/python/sglang/srt/models/llama2.py", line 261, in init
self.model = LlamaModel(config, quant_config=quant_config)
File "/home/ubuntu/wubing/sglang/python/sglang/srt/models/llama2.py", line 221, in init
[
File "/home/ubuntu/wubing/sglang/python/sglang/srt/models/llama2.py", line 222, in
LlamaDecoderLayer(config, i, quant_config=quant_config)
File "/home/ubuntu/wubing/sglang/python/sglang/srt/models/llama2.py", line 170, in init
self.mlp = LlamaMLP(
File "/home/ubuntu/wubing/sglang/python/sglang/srt/models/llama2.py", line 45, in init
self.down_proj = RowParallelLinear(
File "/home/ubuntu/anaconda3/envs/llava_py310/lib/python3.10/site-packages/vllm/model_executor/layers/linear.py", line 633, in init
self.quant_method.create_weights(self,
File "/home/ubuntu/anaconda3/envs/llava_py310/lib/python3.10/site-packages/vllm/model_executor/layers/linear.py", line 81, in create_weights
weight = Parameter(torch.empty(output_size_per_partition,
File "/home/ubuntu/anaconda3/envs/llava_py310/lib/python3.10/site-packages/torch/utils/_device.py", line 78, in torch_function
return func(*args, **kwargs)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 384.00 MiB. GPU

Initialization failed. detoken_init_state: init ok
Traceback (most recent call last):
File "/home/ubuntu/wubing/sglang/examples/usage/llava/srt_llava_next_test.py", line 64, in
runtime = sgl.Runtime(
File "/home/ubuntu/wubing/sglang/python/sglang/api.py", line 39, in Runtime
return Runtime(*args, **kwargs)
File "/home/ubuntu/wubing/sglang/python/sglang/srt/server.py", line 291, in init
raise RuntimeError(
RuntimeError: Initialization failed. Please see the error messages above.

@Luodian
Copy link
Contributor

Luodian commented May 28, 2024

I am not sure about your GPUs. Just providing a DP, I could run it with 480G A100 or 840G A100.

@bingwork
Copy link
Contributor Author

I am not sure about your GPUs. Just providing a DP, I could run it with 4_80G A100 or 8_40G A100.

thanks for your replying, I use NVIDIA A100-SXM4-80GB.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants