Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Llava CUDA error: device-side assert triggered #543

Open
dmilcevski opened this issue Jun 13, 2024 · 3 comments
Open

Llava CUDA error: device-side assert triggered #543

dmilcevski opened this issue Jun 13, 2024 · 3 comments

Comments

@dmilcevski
Copy link

I am trying to deploy llava-v1.6-34b on A100 80GB but am getting the following error:

../aten/src/ATen/native/cuda/Indexing.cu:1289: indexSelectLargeIndex: block: [395,0,0], thread: [94,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1289: indexSelectLargeIndex: block: [395,0,0], thread: [95,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
2024-06-13 20:58:40 | ERROR | srt.tp_worker | Exception in ModelTpServer:
Traceback (most recent call last):
  File "/sglang/python/sglang/srt/managers/controller/tp_worker.py", line 188, in exposed_step
    self.forward_step()
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/sglang/python/sglang/srt/managers/controller/tp_worker.py", line 204, in forward_step
    self.forward_fill_batch(new_batch)
  File "/sglang/python/sglang/srt/managers/controller/tp_worker.py", line 443, in forward_fill_batch
    ) = self.model_runner.forward(batch, ForwardMode.EXTEND)
  File "/sglang/python/sglang/srt/managers/controller/model_runner.py", line 422, in forward
    return self.forward_extend_multi_modal(batch)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/sglang/python/sglang/srt/managers/controller/model_runner.py", line 411, in forward_extend_multi_modal
    return self.model.forward(
  File "/sglang/python/sglang/srt/models/llava.py", line 110, in forward
    .cpu()
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


2024-06-13 20:58:40 | ERROR | srt.controller | Exception in ControllerSingle:
Traceback (most recent call last):
  File "/sglang/python/sglang/srt/managers/controller/manager_single.py", line 93, in start_controller_process
    loop.run_until_complete(controller.loop_for_forward())
  File "uvloop/loop.pyx", line 1517, in uvloop.loop.Loop.run_until_complete
  File "/sglang/python/sglang/srt/managers/controller/manager_single.py", line 44, in loop_for_forward
    out_pyobjs = await self.model_client.step(next_step_input)
  File "/sglang/python/sglang/srt/managers/controller/tp_worker.py", line 753, in _func
    return f(*args, **kwargs)
  File "/sglang/python/sglang/srt/managers/controller/tp_worker.py", line 188, in exposed_step
    self.forward_step()
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/sglang/python/sglang/srt/managers/controller/tp_worker.py", line 204, in forward_step
    self.forward_fill_batch(new_batch)
  File "/sglang/python/sglang/srt/managers/controller/tp_worker.py", line 443, in forward_fill_batch
    ) = self.model_runner.forward(batch, ForwardMode.EXTEND)
  File "/sglang/python/sglang/srt/managers/controller/model_runner.py", line 422, in forward
    return self.forward_extend_multi_modal(batch)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/sglang/python/sglang/srt/managers/controller/model_runner.py", line 411, in forward_extend_multi_modal
    return self.model.forward(
  File "/sglang/python/sglang/srt/models/llava.py", line 110, in forward
    .cpu()
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Does anybody have an idea how to fix the issue? Thanks

@dmilcevski
Copy link
Author

There were many hanging processes so I needed to kill them and re-deploy slang again. However, now I get a different issue, again coming from the llava implementation:

2024-06-14 08:36:05 | ERROR | srt.tp_worker | Exception in ModelTpServer:
Traceback (most recent call last):
  File "/sglang/python/sglang/srt/managers/controller/tp_worker.py", line 188, in exposed_step
    self.forward_step()
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/sglang/python/sglang/srt/managers/controller/tp_worker.py", line 204, in forward_step
    self.forward_fill_batch(new_batch)
  File "/sglang/python/sglang/srt/managers/controller/tp_worker.py", line 443, in forward_fill_batch
    ) = self.model_runner.forward(batch, ForwardMode.EXTEND)
  File "/sglang/python/sglang/srt/managers/controller/model_runner.py", line 422, in forward
    return self.forward_extend_multi_modal(batch)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/sglang/python/sglang/srt/managers/controller/model_runner.py", line 411, in forward_extend_multi_modal
    return self.model.forward(
  File "/sglang/python/sglang/srt/models/llava.py", line 105, in forward
    input_embeds = self.language_model.model.embed_tokens(input_ids)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/layers/vocab_parallel_embedding.py", line 100, in forward
    output_parallel = F.embedding(masked_input, self.weight)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/functional.py", line 2264, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: CUDA error: device-side assert triggered
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


2024-06-14 08:36:05 | ERROR | srt.controller | Exception in ControllerSingle:
Traceback (most recent call last):
  File "/sglang/python/sglang/srt/managers/controller/manager_single.py", line 93, in start_controller_process
    loop.run_until_complete(controller.loop_for_forward())
  File "uvloop/loop.pyx", line 1517, in uvloop.loop.Loop.run_until_complete
  File "/sglang/python/sglang/srt/managers/controller/manager_single.py", line 44, in loop_for_forward
    out_pyobjs = await self.model_client.step(next_step_input)
  File "/sglang/python/sglang/srt/managers/controller/tp_worker.py", line 753, in _func
    return f(*args, **kwargs)
  File "/sglang/python/sglang/srt/managers/controller/tp_worker.py", line 188, in exposed_step
    self.forward_step()
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/sglang/python/sglang/srt/managers/controller/tp_worker.py", line 204, in forward_step
    self.forward_fill_batch(new_batch)
  File "/sglang/python/sglang/srt/managers/controller/tp_worker.py", line 443, in forward_fill_batch
    ) = self.model_runner.forward(batch, ForwardMode.EXTEND)
  File "/sglang/python/sglang/srt/managers/controller/model_runner.py", line 422, in forward
    return self.forward_extend_multi_modal(batch)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/sglang/python/sglang/srt/managers/controller/model_runner.py", line 411, in forward_extend_multi_modal
    return self.model.forward(
  File "/sglang/python/sglang/srt/models/llava.py", line 105, in forward
    input_embeds = self.language_model.model.embed_tokens(input_ids)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/layers/vocab_parallel_embedding.py", line 100, in forward
    output_parallel = F.embedding(masked_input, self.weight)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/functional.py", line 2264, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: CUDA error: device-side assert triggered
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Anybody ideas how to fix this? Thanks

@taroshi
Copy link

taroshi commented Jun 19, 2024

1 used one gpu card is ok, but two has the same problem

@dmilcevski
Copy link
Author

I explicitly restricted the access to 1 GPU with CUDA_VISIBLE_DEVICES=0. I do have more GPUs on the node, but it should only use this device, plus I am getting this in the logs, so it means it uses one device:

2024-06-12 08:03:55 | INFO | srt.model_runner | [gpu_id=0] Set cuda device.
2024-06-12 08:03:55 | INFO | srt.model_runner | [gpu_id=0] Init nccl begin.
2024-06-12 08:03:56 | INFO | srt.model_runner | [gpu_id=0] Load weight begin. avail mem=78.59 GB

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants