Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA error 35 at ggml-cuda.cu:452: CUDA driver version is insufficient for CUDA runtime version #226

Open
dynamite9999 opened this issue Jun 3, 2023 · 0 comments

Comments

@dynamite9999
Copy link

Hi ggml experts

I am trying to get this to work on a public cloud VM GPU in a docker container. I am having issues with the CUDA driver.

I am able to run this nicely on my host vm [ A40 GPU] , but when I create a docker container, and run on the same host vm, I get the following error:

CUDA error 35 at ggml-cuda.cu:452: CUDA driver version is insufficient for CUDA runtime version

BACKGROUND 👍

  1. DOCKER CONTAINER LIBRARIES AND CUDA PACKAGES

[ GENERATES CUDA ERROR ABOVE ]

DOCKER CONTAINER LIBRARIES

root@d06011521e48:/app/llm# ldd main
linux-vdso.so.1 (0x00007ffe01d46000)
libcublas.so.11 => /usr/local/cuda/lib64/libcublas.so.11 (0x00007f270bc00000)
libcudart.so.11.0 => /usr/local/cuda/lib64/libcudart.so.11.0 (0x00007f270b800000)
libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f270b5d6000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f2711927000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f2711907000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f270b3ae000)
libcublasLt.so.11 => /usr/local/cuda/lib64/libcublasLt.so.11 (0x00007f26e6e00000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f2711900000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f27118fb000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f27118f6000)
/lib64/ld-linux-x86-64.so.2 (0x00007f2711b2e000)

DOCKER CONTAINER CUDA PACKAGES

root@d06011521e48:/app/# dpkg-query -l |grep cuda
ii cuda-cccl-11-8 11.8.89-1 amd64 CUDA CCCL
ii cuda-command-line-tools-11-8 11.8.0-1 amd64 CUDA command-line tools
ii cuda-compat-11-8 520.61.05-1 amd64 CUDA Compatibility Platform
ii cuda-compiler-11-8 11.8.0-1 amd64 CUDA compiler
ii cuda-cudart-11-8 11.8.89-1 amd64 CUDA Runtime native Libraries
ii cuda-cudart-dev-11-8 11.8.89-1 amd64 CUDA Runtime native dev links, headers
ii cuda-cuobjdump-11-8 11.8.86-1 amd64 CUDA cuobjdump
ii cuda-cupti-11-8 11.8.87-1 amd64 CUDA profiling tools runtime libs.
ii cuda-cupti-dev-11-8 11.8.87-1 amd64 CUDA profiling tools interface.
ii cuda-cuxxfilt-11-8 11.8.86-1 amd64 CUDA cuxxfilt
ii cuda-driver-dev-11-8 11.8.89-1 amd64 CUDA Driver native dev stub library
ii cuda-gdb-11-8 11.8.86-1 amd64 CUDA-GDB
ii cuda-keyring 1.0-1 all GPG keyring for the CUDA repository
ii cuda-libraries-11-8 11.8.0-1 amd64 CUDA Libraries 11.8 meta-package
ii cuda-libraries-dev-11-8 11.8.0-1 amd64 CUDA Libraries 11.8 development meta-package
ii cuda-memcheck-11-8 11.8.86-1 amd64 CUDA-MEMCHECK
ii cuda-minimal-build-11-8 11.8.0-1 amd64 Minimal CUDA 11.8 toolkit build packages.
ii cuda-nvcc-11-8 11.8.89-1 amd64 CUDA nvcc
ii cuda-nvdisasm-11-8 11.8.86-1 amd64 CUDA disassembler
ii cuda-nvml-dev-11-8 11.8.86-1 amd64 NVML native dev links, headers
ii cuda-nvprof-11-8 11.8.87-1 amd64 CUDA Profiler tools
ii cuda-nvprune-11-8 11.8.86-1 amd64 CUDA nvprune
ii cuda-nvrtc-11-8 11.8.89-1 amd64 NVRTC native runtime libraries
ii cuda-nvrtc-dev-11-8 11.8.89-1 amd64 NVRTC native dev links, headers
ii cuda-nvtx-11-8 11.8.86-1 amd64 NVIDIA Tools Extension
ii cuda-profiler-api-11-8 11.8.86-1 amd64 CUDA Profiler API
ii cuda-sanitizer-11-8 11.8.86-1 amd64 CUDA Sanitizer
ii cuda-toolkit-11-8-config-common 11.8.89-1 all Common config package for CUDA Toolkit 11.8.
ii cuda-toolkit-11-config-common 11.8.89-1 all Common config package for CUDA Toolkit 11.
ii cuda-toolkit-config-common 12.0.146-1 all Common config package for CUDA Toolkit.
hi libnccl-dev 2.15.5-1+cuda11.8 amd64 NVIDIA Collective Communication Library (NCCL) Development Files
hi libnccl2 2.15.5-1+cuda11.8 amd64 NVIDIA Collective Communication Library (NCCL) Runtime

  1. HOST VM A40 GPU LIBRARIES AND CUDA PACKAGES

[ WORKS FINE ]

HOST VM LIBRARIES:

root@ggmls:~/app/app/ggmls# ldd main
linux-vdso.so.1 (0x00007ffd333a8000)
libcublas.so.11 => /lib/x86_64-linux-gnu/libcublas.so.11 (0x00007ff94de00000)
libcudart.so.11.0 => /lib/x86_64-linux-gnu/libcudart.so.11.0 (0x00007ff94da00000)
libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007ff94d7d6000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007ff9578d6000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007ff9578b6000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007ff94d5ae000)
libcublasLt.so.11 => /lib/x86_64-linux-gnu/libcublasLt.so.11 (0x00007ff937e00000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007ff9578af000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007ff9578aa000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007ff9578a5000)
/lib64/ld-linux-x86-64.so.2 (0x00007ff957ae3000)

HOST VM PACKAGES:

root@ggmls:/app/app/netai_llm# dpkg-query -l |grep cuda
rc cuda-repo-ubuntu2204-12-1-local 12.1.1-530.30.02-1 amd64 cuda repository configuration files
ii libcudart11.0:amd64 11.5.117
11.5.1-1ubuntu1 amd64 NVIDIA CUDA Runtime Library
ii nvidia-cuda-dev:amd64 11.5.1-1ubuntu1 amd64 NVIDIA CUDA development files
ii nvidia-cuda-gdb 11.5.11411.5.1-1ubuntu1 amd64 NVIDIA CUDA Debugger (GDB)
ii nvidia-cuda-toolkit 11.5.1-1ubuntu1 amd64 NVIDIA CUDA development toolkit
ii nvidia-cuda-toolkit-doc 11.5.1-1ubuntu1 all NVIDIA CUDA and OpenCL documentation
root@netaisyslog:
/app/app/netai_llm#

QUESTIONS:

  1. I used this in my Dockerfile, does anyone see any issues:
    FROM nvidia/cuda:11.8.0-devel-ubuntu22.04

Install necessary packages

RUN apt-get update &&
apt-get install -y cuda-11.0

  1. I am confused at the different ways to install cuda in a Docker container, such as sudo apt-get install -y nvidia-container-runtime, and was wondering if anyone can please help me understand how to get this to work correctly inside a docker container that will run on a A40 or A100 public cloud vm ? Does anyone have an example Dockerfile that installs the correct cuda drivers for this

Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant