-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"Assertion srcIndex < srcSelectDimSize
failed" in Docker on some systems
#1568
Comments
In case this error log is more helpful (run with CUDA_LAUNCH_BLOCKING = 1)
|
I found a solution for my problem. The architecture of the model is picked based on the name of the folder the model is in. Therefore if the naming conventions are not met then the model is loaded in a wrong architecture and will crash during its first inference. To prevent this maybe a log would be a big help to display which model architecture is loaded |
Hopefully the solution in my last comment will help someone. |
I run a LLaVA system as presented in this repository in a docker compose setup using official Cuda docker images and run into an error on some systems with my custom trained models.
On a server using Nvidia A100 my setup works: All is fine and all models work as expected.
On a server using a Nvidia RTX A6000: This model works https://huggingface.co/liuhaotian/llava-v1.6-mistral-7b , but a custom trained LLavA-mistral7b gives this error during inference (on the A100 server the custom model runs without problems):
Log:
Can you give me any advice on what can cause this different behavior on different machine despite using the same Docker setup?
Thank you!
The text was updated successfully, but these errors were encountered: