Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Assertion srcIndex < srcSelectDimSize failed" in Docker on some systems #1568

Closed
Careiner opened this issue Jun 18, 2024 · 3 comments
Closed

Comments

@Careiner
Copy link

Careiner commented Jun 18, 2024

I run a LLaVA system as presented in this repository in a docker compose setup using official Cuda docker images and run into an error on some systems with my custom trained models.
On a server using Nvidia A100 my setup works: All is fine and all models work as expected.
On a server using a Nvidia RTX A6000: This model works https://huggingface.co/liuhaotian/llava-v1.6-mistral-7b , but a custom trained LLavA-mistral7b gives this error during inference (on the A100 server the custom model runs without problems):

Log:

llava_worker-1 | ../aten/src/ATen/native/cuda/Indexing.cu:1289: indexSelectLargeIndex: block: [434,0,0], thread: [64,0,0] Assertion srcIndex < srcSelectDimSize failed.
llava_worker-1 | ../aten/src/ATen/native/cuda/Indexing.cu:1289: indexSelectLargeIndex: block: [434,0,0], thread: [65,0,0] Assertion srcIndex < srcSelectDimSize failed.
llava_worker-1 | ../aten/src/ATen/native/cuda/Indexing.cu:1289: indexSelectLargeIndex: block: [434,0,0], thread: [66,0,0] Assertion srcIndex < srcSelectDimSize failed.
[...]
llava_worker-1 | ../aten/src/ATen/native/cuda/Indexing.cu:1289: indexSelectLargeIndex: block: [434,0,0], thread: [94,0,0] Assertion srcIndex < srcSelectDimSize failed.
llava_worker-1 | ../aten/src/ATen/native/cuda/Indexing.cu:1289: indexSelectLargeIndex: block: [434,0,0], thread: [95,0,0] Assertion srcIndex < srcSelectDimSize failed.
llava_worker-1 | 2024-06-18 14:27:21 | ERROR | stderr | Exception in thread Thread-3:
llava_worker-1 | 2024-06-18 14:27:21 | ERROR | stderr | Traceback (most recent call last):
llava_worker-1 | 2024-06-18 14:27:21 | ERROR | stderr | File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner
llava_worker-1 | 2024-06-18 14:27:21 | ERROR | stderr | self.run()
llava_worker-1 | 2024-06-18 14:27:21 | ERROR | stderr | File "/usr/lib/python3.8/threading.py", line 870, in run
llava_worker-1 | 2024-06-18 14:27:21 | ERROR | stderr | self._target(*self._args, **self._kwargs)
llava_worker-1 | 2024-06-18 14:27:21 | ERROR | stderr | File "/usr/local/lib/python3.8/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
llava_worker-1 | 2024-06-18 14:27:21 | ERROR | stderr | return func(*args, **kwargs)
llava_worker-1 | 2024-06-18 14:27:21 | ERROR | stderr | File "/usr/local/lib/python3.8/dist-packages/transformers/generation/utils.py", line 1736, in generate
llava_worker-1 | 2024-06-18 14:27:21 | ERROR | stderr | result = self._sample(
llava_worker-1 | 2024-06-18 14:27:21 | ERROR | stderr | File "/usr/local/lib/python3.8/dist-packages/transformers/generation/utils.py", line 2375, in _sample
llava_worker-1 | 2024-06-18 14:27:21 | ERROR | stderr | outputs = self(
llava_worker-1 | 2024-06-18 14:27:21 | ERROR | stderr | File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
llava_worker-1 | 2024-06-18 14:27:21 | ERROR | stderr | return self._call_impl(*args, **kwargs)
llava_worker-1 | 2024-06-18 14:27:21 | ERROR | stderr | File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
llava_worker-1 | 2024-06-18 14:27:21 | ERROR | stderr | return forward_call(*args, **kwargs)
llava_worker-1 | 2024-06-18 14:27:21 | ERROR | stderr | File "/usr/local/lib/python3.8/dist-packages/transformers/models/mistral/modeling_mistral.py", line 1139, in forward
llava_worker-1 | 2024-06-18 14:27:21 | ERROR | stderr | outputs = self.model(
llava_worker-1 | 2024-06-18 14:27:21 | ERROR | stderr | File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
llava_worker-1 | 2024-06-18 14:27:21 | ERROR | stderr | return self._call_impl(*args, **kwargs)
llava_worker-1 | 2024-06-18 14:27:21 | ERROR | stderr | File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
llava_worker-1 | 2024-06-18 14:27:21 | ERROR | stderr | return forward_call(*args, **kwargs)
llava_worker-1 | 2024-06-18 14:27:21 | ERROR | stderr | File "/usr/local/lib/python3.8/dist-packages/transformers/models/mistral/modeling_mistral.py", line 985, in forward
llava_worker-1 | 2024-06-18 14:27:21 | ERROR | stderr | attention_mask = _prepare_4d_causal_attention_mask_for_sdpa(
llava_worker-1 | 2024-06-18 14:27:21 | ERROR | stderr | File "/usr/local/lib/python3.8/dist-packages/transformers/modeling_attn_mask_utils.py", line 372, in _prepare_4d_causal_attention_mask_for_sdpa
llava_worker-1 | 2024-06-18 14:27:21 | ERROR | stderr | ignore_causal_mask = AttentionMaskConverter._ignore_causal_mask_sdpa(
llava_worker-1 | 2024-06-18 14:27:21 | ERROR | stderr | File "/usr/local/lib/python3.8/dist-packages/transformers/modeling_attn_mask_utils.py", line 279, in _ignore_causal_mask_sdpa
llava_worker-1 | 2024-06-18 14:27:21 | ERROR | stderr | elif (is_training or not is_tracing) and torch.all(attention_mask == 1):
llava_worker-1 | 2024-06-18 14:27:21 | ERROR | stderr | RuntimeError: CUDA error: device-side assert triggered
llava_worker-1 | 2024-06-18 14:27:21 | ERROR | stderr | CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
llava_worker-1 | 2024-06-18 14:27:21 | ERROR | stderr | For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
llava_worker-1 | 2024-06-18 14:27:21 | ERROR | stderr | Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

Can you give me any advice on what can cause this different behavior on different machine despite using the same Docker setup?

Thank you!

@Careiner
Copy link
Author

In case this error log is more helpful (run with CUDA_LAUNCH_BLOCKING = 1)

` [...]
llava_worker-1      | ../aten/src/ATen/native/cuda/Indexing.cu:1289: indexSelectLargeIndex: block: [431,0,0], thread: [91,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
llava_worker-1      | ../aten/src/ATen/native/cuda/Indexing.cu:1289: indexSelectLargeIndex: block: [431,0,0], thread: [92,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
llava_worker-1      | ../aten/src/ATen/native/cuda/Indexing.cu:1289: indexSelectLargeIndex: block: [431,0,0], thread: [93,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
llava_worker-1      | ../aten/src/ATen/native/cuda/Indexing.cu:1289: indexSelectLargeIndex: block: [431,0,0], thread: [94,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
llava_worker-1      | ../aten/src/ATen/native/cuda/Indexing.cu:1289: indexSelectLargeIndex: block: [431,0,0], thread: [95,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
llava_worker-1      | 2024-06-19 13:11:02 | ERROR | stderr | Exception in thread Thread-3 (generate):
llava_worker-1      | 2024-06-19 13:11:02 | ERROR | stderr | Traceback (most recent call last):
llava_worker-1      | 2024-06-19 13:11:02 | ERROR | stderr |   File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
llava_worker-1      | 2024-06-19 13:11:02 | ERROR | stderr |     self.run()
llava_worker-1      | 2024-06-19 13:11:02 | ERROR | stderr |   File "/usr/lib/python3.10/threading.py", line 953, in run
llava_worker-1      | 2024-06-19 13:11:02 | ERROR | stderr |     self._target(*self._args, **self._kwargs)
llava_worker-1      | 2024-06-19 13:11:02 | ERROR | stderr |   File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
llava_worker-1      | 2024-06-19 13:11:02 | ERROR | stderr |     return func(*args, **kwargs)
llava_worker-1      | 2024-06-19 13:11:02 | ERROR | stderr |   File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 1736, in generate
llava_worker-1      | 2024-06-19 13:11:02 | ERROR | stderr |     result = self._sample(
llava_worker-1      | 2024-06-19 13:11:02 | ERROR | stderr |   File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 2375, in _sample
llava_worker-1      | 2024-06-19 13:11:02 | ERROR | stderr |     outputs = self(
llava_worker-1      | 2024-06-19 13:11:02 | ERROR | stderr |   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
llava_worker-1      | 2024-06-19 13:11:02 | ERROR | stderr |     return self._call_impl(*args, **kwargs)
llava_worker-1      | 2024-06-19 13:11:02 | ERROR | stderr |   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
llava_worker-1      | 2024-06-19 13:11:02 | ERROR | stderr |     return forward_call(*args, **kwargs)
llava_worker-1      | 2024-06-19 13:11:02 | ERROR | stderr |   File "/usr/local/lib/python3.10/dist-packages/transformers/models/mistral/modeling_mistral.py", line 1139, in forward
llava_worker-1      | 2024-06-19 13:11:02 | ERROR | stderr |     outputs = self.model(
llava_worker-1      | 2024-06-19 13:11:02 | ERROR | stderr |   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
llava_worker-1      | 2024-06-19 13:11:02 | ERROR | stderr |     return self._call_impl(*args, **kwargs)
llava_worker-1      | 2024-06-19 13:11:02 | ERROR | stderr |   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
llava_worker-1      | 2024-06-19 13:11:02 | ERROR | stderr |     return forward_call(*args, **kwargs)
llava_worker-1      | 2024-06-19 13:11:02 | ERROR | stderr |   File "/usr/local/lib/python3.10/dist-packages/transformers/models/mistral/modeling_mistral.py", line 968, in forward
llava_worker-1      |  2024-06-19 13:11:02 | ERROR | stderr |     inputs_embeds = self.embed_tokens(input_ids)
llava_worker-1      | 2024-06-19 13:11:02 | ERROR | stderr |   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
llava_worker-1      | 2024-06-19 13:11:02 | ERROR | stderr |     return self._call_impl(*args, **kwargs)
llava_worker-1      | 2024-06-19 13:11:02 | ERROR | stderr |   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
llava_worker-1      | 2024-06-19 13:11:02 | ERROR | stderr |     return forward_call(*args, **kwargs)
llava_worker-1      | 2024-06-19 13:11:02 | ERROR | stderr |   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/sparse.py", line 163, in forward
llava_worker-1      | 2024-06-19 13:11:02 | ERROR | stderr |     return F.embedding(
llava_worker-1      | 2024-06-19 13:11:02 | ERROR | stderr |   File "/usr/local/lib/python3.10/dist-packages/torch/nn/functional.py", line 2264, in embedding
llava_worker-1      | 2024-06-19 13:11:02 | ERROR | stderr |     return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
llava_worker-1      | 2024-06-19 13:11:02 | ERROR | stderr | RuntimeError: CUDA error: device-side assert triggered
llava_worker-1      | 2024-06-19 13:11:02 | ERROR | stderr | Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
`

@Careiner
Copy link
Author

Careiner commented Jul 26, 2024

I found a solution for my problem.
For me it was not a CUDA bug and that it worked on a different machine was pure coincidence.

The architecture of the model is picked based on the name of the folder the model is in. Therefore if the naming conventions are not met then the model is loaded in a wrong architecture and will crash during its first inference.
In my case I used a LLaVA Mistral model and just named it "mistral_XXXX". This let to the error shown above.
When I renamed the model folder to "llava_mistral_XXXX" the model worked fine.
In gradio_web_server.py you can see which keywords are looked for in the name of the model folder.

To prevent this maybe a log would be a big help to display which model architecture is loaded

@Careiner
Copy link
Author

Hopefully the solution in my last comment will help someone.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant