local-ai: ggml_cuda_init: failed to initialize CUDA: CUDA driver is a stub library #320145

teto · 2024-06-15T22:24:58Z

Describe the bug

Not sure if it's a local-ai (or rather one of its dependency llama-cpp/gpt4all) but I can't leverage GPU inference of my nvidia RTX3060 because of ggml_cuda_init: failed to initialize CUDA: CUDA driver is a stub library error I think.

➜ nix run github:teto/nixpkgs/local-ai-with-cuda#local-ai -- --debug
11:51PM INF loading environment variables from file envFile=/home/teto/.config/localai.env
11:51PM INF Setting logging to info
11:51PM INF Starting LocalAI using 4 threads, with models path: /home/teto/models
11:51PM INF LocalAI version: v2.16.0 ()
WARNING: failed to read int from file: open /sys/class/drm/card0/device/numa_node: no such file or directory
11:51PM INF Preloading models from /home/teto/models

  Model name: mistral                                                         


11:51PM ERR error establishing configuration directory watcher error="unable to establish watch on the LocalAI Configuration Directory: no such file or directory"
11:51PM INF core/startup process completed!
11:51PM INF LocalAI API is listening! Please connect to the endpoint for API documentation. endpoint=https://0.0.0.0:11111
....
ggml_cuda_init: failed to initialize CUDA: CUDA driver is a stub library

nvidia-smi output looks ok:

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07              Driver Version: 550.90.07      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |

Following an advice on the nixpkgs cuda matrix, I tried to run the previous command with LD_DEBUG=libs and ended up with

�[90m10:21PM�[0m DBG GRPC(mistral-7b-openorca.Q6_K.gguf-127.0.0.1:37653): stderr     633360:	 search path=/nix/store/bn7pnigb0f8874m6riiw6dngsmdyic1g-gcc-13.3.0-lib/lib/glibc-hwcaps/x86-64-v3:/nix/store/bn7pnigb0f8874m6riiw6dngsmdyic1g-gcc-13.3.0-lib/lib/glibc-hwcaps/x86-64-v2:/nix/store/bn7pnigb0f8874m6riiw6dngsmdyic1g-gcc-13.3.0-lib/lib:/nix/store/zx9yfgv4ag607b8m3dgcp5p94b6vd13c-cuda_cudart-12.2.140-lib/lib/stubs/glibc-hwcaps/x86-64-v3:/nix/store/zx9yfgv4ag607b8m3dgcp5p94b6vd13c-cuda_cudart-12.2.140-lib/lib/stubs/glibc-hwcaps/x86-64-v2:/nix/store/zx9yfgv4ag607b8m3dgcp5p94b6vd13c-cuda_cudart-12.2.140-lib/lib/stubs		(RUNPATH from file /tmp/localai/backend_data/backend-assets/grpc/llama-cpp-avx2)

if that helps

Steps To Reproduce

Steps to reproduce the behavior:
I've pusehd master with one commit to enable cuda and unfree by default so one can test with

run nix run github:teto/nixpkgs/local-ai-with-cuda#local-ai -- --debug on a cuda machine
start a request (you can do this from the browser localhost:8080 )
check the output

Expected behavior

Have GPU inference. It used to work btw but I would have hard time tracking down, nvidia drivers are hit and miss and there seems to have been a lot of nvidia related changes recently in nixpkgs.

Additional context

I found this related issue but not sure what to do with it horovod/horovod#3831 I think we want to keep using shared libraries

➜ ls -l /run/opengl-driver/lib | head
lrwxrwxrwx - root  1 janv.  1970 d3d -> /nix/store/fh3p3s1gg0ick2f295zfwi2jlr78166r-mesa-24.1.1-drivers/lib/d3d
dr-xr-xr-x - root  1 janv.  1970 dri
lrwxrwxrwx - root  1 janv.  1970 gbm -> /nix/store/w7fcnyxkxara9fixrmigzrir3k8fbdb3-nvidia-x11-550.90.07-6.8.12/lib/gbm
lrwxrwxrwx - root  1 janv.  1970 nvidia -> /nix/store/w7fcnyxkxara9fixrmigzrir3k8fbdb3-nvidia-x11-550.90.07-6.8.12/lib/nvidia
lrwxrwxrwx - root  1 janv.  1970 systemd -> /nix/store/w7fcnyxkxara9fixrmigzrir3k8fbdb3-nvidia-x11-550.90.07-6.8.12/lib/systemd
dr-xr-xr-x - root  1 janv.  1970 vdpau
lrwxrwxrwx - root  1 janv.  1970 libcuda.so -> /nix/store/w7fcnyxkxara9fixrmigzrir3k8fbdb3-nvidia-x11-550.90.07-6.8.12/lib/libcuda.so
lrwxrwxrwx - root  1 janv.  1970 libcuda.so.1 -> /nix/store/w7fcnyxkxara9fixrmigzrir3k8fbdb3-nvidia-x11-550.90.07-6.8.12/lib/libcuda.so.1

Feel free to close if this is the wrong place to submit but I would appreciate any workaround/tip.

Notify maintainers

Metadata

Please run nix-shell -p nix-info --run "nix-info -m" and paste the result.

[user@system:~]$ nix-shell -p nix-info --run "nix-info -m"
output here

Add a 👍 reaction to issues you find important.

The text was updated successfully, but these errors were encountered:

teto · 2024-06-22T16:00:37Z

using llama-cpp directly works fine so I suspect a problem in local-ai building. I will stick to llama-cpp for now

SomeoneSerge · 2024-06-26T01:03:09Z

ggml_cuda_init: failed to initialize CUDA: CUDA driver is a stub library

This is ${cudaPackages.cuda_cudart.stubs}/lib/libcuda.so being loaded instead of ${addDriverRunpath.driverLink}/lib/libcuda.so

teto added the 0.kind: bug label Jun 15, 2024

teto mentioned this issue Jun 22, 2024

local-ai: 2.16.0 -> 2.17.1 #321699

Merged

13 tasks

ck3d mentioned this issue Jul 3, 2024

local-ai: fix libcuda.so stub #324387

Merged

13 tasks

mirrexagon mentioned this issue Jul 4, 2024

koboldcpp: init at 1.66 #314450

Merged

13 tasks

ck3d closed this as completed in #324387 Jul 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

local-ai: ggml_cuda_init: failed to initialize CUDA: CUDA driver is a stub library #320145

local-ai: ggml_cuda_init: failed to initialize CUDA: CUDA driver is a stub library #320145

teto commented Jun 15, 2024

teto commented Jun 22, 2024

SomeoneSerge commented Jun 26, 2024

local-ai: ggml_cuda_init: failed to initialize CUDA: CUDA driver is a stub library #320145

local-ai: ggml_cuda_init: failed to initialize CUDA: CUDA driver is a stub library #320145

Comments

teto commented Jun 15, 2024

Describe the bug

Steps To Reproduce

Expected behavior

Additional context

Notify maintainers

Metadata

teto commented Jun 22, 2024

SomeoneSerge commented Jun 26, 2024