-
Notifications
You must be signed in to change notification settings - Fork 6.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
have an NVIDIA GPU, but can not use. #4726
Comments
Hi @pengyuxiang1, sorry you hit this. Do you see anything in the logs regarding GPU detection?
|
"I have the same problem. After I typed the journalctl -fu ollama command and entered a prompt, the logs appeared as follows:"
i have nvidia 3050 6gb mobile
|
@pengyuxiang1 @AzizEmir unfortunately your logs only show recent output and omit earlier log messages where we're trying to discover the GPUs. Can you try the following instead so we can try to isolate the failure to discover your GPUs
Then in another terminal, try to run one model, and share the results of the server log. |
sudo systemctl stop ollama
OLLAMA_DEBUG=1 ollama serve 2>&1 | tee server.log I wrote the commands, then I started the ollama service. When I typed 'ollama list', the models were not listed. Is this normal? I reinstalled it and ran the tests.
|
@AzizEmir the 999 cuda errors in your logs are "unknown" low-level driver errors from the nvidia stack. You can most likely resolve this by following the guide here https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#container-fails-to-run-on-nvidia-gpu If you recently upgraded to the 555 driver, you may want to re-run our install script as there have been some changes in the way nvidia sets up the drivers which required changes to our install flow to make sure the uvm driver is properly loaded. |
I run the command and the following log appears: 2024/06/03 10:25:01 routes.go:1028: INFO server config env="map[OLLAMA_DEBUG:true OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST: OLLAMA_KEEP_ALIVE: OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:1 OLLAMA_MAX_QUEUE:512 OLLAMA_MAX_VRAM:0 OLLAMA_MODELS: OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http:https://localhost https://localhost http:https://localhost:* https://localhost:* http:https://127.0.0.1 https://127.0.0.1 http:https://127.0.0.1:* https://127.0.0.1:* http:https://0.0.0.0 https://0.0.0.0 http:https://0.0.0.0:* https://0.0.0.0:*] OLLAMA_RUNNERS_DIR: OLLAMA_TMPDIR:]"
time=2024-06-03T10:25:01.929+08:00 level=INFO source=images.go:729 msg="total blobs: 0"
time=2024-06-03T10:25:01.929+08:00 level=INFO source=images.go:736 msg="total unused blobs removed: 0"
time=2024-06-03T10:25:01.930+08:00 level=INFO source=routes.go:1074 msg="Listening on 127.0.0.1:11434 (version 0.1.39)"
time=2024-06-03T10:25:01.930+08:00 level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama4014666769/runners
time=2024-06-03T10:25:01.930+08:00 level=DEBUG source=payload.go:180 msg=extracting variant=cpu file=build/linux/x86_64/cpu/bin/ollama_llama_server.gz
time=2024-06-03T10:25:01.930+08:00 level=DEBUG source=payload.go:180 msg=extracting variant=cpu_avx file=build/linux/x86_64/cpu_avx/bin/ollama_llama_server.gz
time=2024-06-03T10:25:01.930+08:00 level=DEBUG source=payload.go:180 msg=extracting variant=cpu_avx2 file=build/linux/x86_64/cpu_avx2/bin/ollama_llama_server.gz
time=2024-06-03T10:25:01.930+08:00 level=DEBUG source=payload.go:180 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcublas.so.11.gz
time=2024-06-03T10:25:01.930+08:00 level=DEBUG source=payload.go:180 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcublasLt.so.11.gz
time=2024-06-03T10:25:01.930+08:00 level=DEBUG source=payload.go:180 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcudart.so.11.0.gz
time=2024-06-03T10:25:01.930+08:00 level=DEBUG source=payload.go:180 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/ollama_llama_server.gz
time=2024-06-03T10:25:01.930+08:00 level=DEBUG source=payload.go:180 msg=extracting variant=rocm_v60002 file=build/linux/x86_64/rocm_v60002/bin/deps.txt.gz
time=2024-06-03T10:25:01.930+08:00 level=DEBUG source=payload.go:180 msg=extracting variant=rocm_v60002 file=build/linux/x86_64/rocm_v60002/bin/ollama_llama_server.gz
time=2024-06-03T10:25:05.102+08:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama4014666769/runners/cpu
time=2024-06-03T10:25:05.102+08:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama4014666769/runners/cpu_avx
time=2024-06-03T10:25:05.102+08:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama4014666769/runners/cpu_avx2
time=2024-06-03T10:25:05.102+08:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama4014666769/runners/cuda_v11
time=2024-06-03T10:25:05.102+08:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama4014666769/runners/rocm_v60002
time=2024-06-03T10:25:05.102+08:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu_avx cpu_avx2 cuda_v11 rocm_v60002 cpu]"
time=2024-06-03T10:25:05.102+08:00 level=DEBUG source=payload.go:45 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY"
time=2024-06-03T10:25:05.102+08:00 level=DEBUG source=sched.go:90 msg="starting llm scheduler"
time=2024-06-03T10:25:05.102+08:00 level=DEBUG source=gpu.go:122 msg="Detecting GPUs"
time=2024-06-03T10:25:05.102+08:00 level=DEBUG source=gpu.go:261 msg="Searching for GPU library" name=libcuda.so*
time=2024-06-03T10:25:05.102+08:00 level=DEBUG source=gpu.go:280 msg="gpu library search" globs="[/usr/local/cuda-11.0/lib64/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]"
time=2024-06-03T10:25:05.110+08:00 level=DEBUG source=gpu.go:313 msg="discovered GPU libraries" paths="[/usr/lib/libcuda.so.515.65.01 /usr/lib64/libcuda.so.515.65.01]"
library /usr/lib/libcuda.so.515.65.01 load err: /usr/lib/libcuda.so.515.65.01: wrong ELF class: ELFCLASS32
time=2024-06-03T10:25:05.111+08:00 level=DEBUG source=gpu.go:342 msg="Unable to load nvcuda" library=/usr/lib/libcuda.so.515.65.01 error="Unable to load /usr/lib/libcuda.so.515.65.01 library to query for Nvidia GPUs: /usr/lib/libcuda.so.515.65.01: wrong ELF class: ELFCLASS32"
CUDA driver version: 11.7
time=2024-06-03T10:25:05.175+08:00 level=DEBUG source=gpu.go:127 msg="detected GPUs" count=1 library=/usr/lib64/libcuda.so.515.65.01
time=2024-06-03T10:25:05.175+08:00 level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2"
[GPU-08bf9adf-c6f1-1ed9-641d-573b522a9d68] CUDA totalMem 7680 mb
[GPU-08bf9adf-c6f1-1ed9-641d-573b522a9d68] CUDA freeMem 7583 mb
[GPU-08bf9adf-c6f1-1ed9-641d-573b522a9d68] Compute Capability 7.5
time=2024-06-03T10:25:05.390+08:00 level=DEBUG source=amd_linux.go:322 msg="amdgpu driver not detected /sys/module/amdgpu"
releasing nvcuda library
time=2024-06-03T10:25:05.390+08:00 level=INFO source=types.go:71 msg="inference compute" id=GPU-08bf9adf-c6f1-1ed9-641d-573b522a9d68 library=cuda compute=7.5 driver=11.7 name="Tesla T4" total="7.5 GiB" available="7.4 GiB"
[GIN] 2024/06/03 - 10:25:25 | 200 | 33.667µs | 127.0.0.1 | HEAD "/"
[GIN] 2024/06/03 - 10:25:25 | 200 | 174.645µs | 127.0.0.1 | GET "/api/tags"
[GIN] 2024/06/03 - 10:25:36 | 200 | 24.344µs | 127.0.0.1 | HEAD "/"
[GIN] 2024/06/03 - 10:25:36 | 200 | 45.405µs | 127.0.0.1 | GET "/api/tags" There seems to be an error in the log: time=2024-06-03T10:25:05.111+08:00 level=DEBUG source=gpu.go:342 msg="Unable to load nvcuda" library=/usr/lib/libcuda.so.515.65.01 error="Unable to load /usr/lib/libcuda.so.515.65.01 library to query for Nvidia GPUs: /usr/lib/libcuda.so.515.65.01: wrong ELF class: ELFCLASS32" Then, at this point, another terminal runs the command 'ollama list' and there is no model list output. |
I have the same issue. I regularly used Ollama with Docker, but for the past few days, it has stopped utilizing the GPU. I tried the images ollama/ollama:0.1.39 and ollama/ollama:0.1.41, but the problem persists.
|
ollama is not working on docker container. I think something is missing in Debian 12. For testing, I set up GPU virtualization and installed Fedora on the virtual machine. I performed a network installation of CUDA Toolkit 12.5 and then installed the nvidia-driver (555.42.02). There are no issues on the virtual machine :D Ollama LM Studio |
I running ollama by nvidia/cuda:12.5.0-runtime-ubuntu22.04 on WSL2
run nvidia-smi cmd on container
|
@pengyuxiang1 your log shows it did discover your GPU
It looks like you haven't pulled or run any models. What happens if you run
If that doesn't load onto the GPU, please share the updated logs or any errors you see. |
I'm facing a (possihbly) related issue
Seemed to fix the issue for me. Logs for posterity.
I am not using running ollama from a container:
|
fyi has same issue running proxmox vm with ubuntu 24.04, had to make sure to use host cpu and not x86-64-v2-AES the default setting. Need to have vector extentions for cpu, AVX and AVX2 lscpu | grep -i avx |
Docker Desktop 4.31 was released 2024-06-06 and includes NVIDIA Container Toolkit 1.15.0, which resolves my issue. |
I believe all the issues have been resolved now with the troubleshooting steps. If anyone is still having problems, please make sure to upgrade to the latest version, and if that doesn't clear it, share your latest server log and I'll reopen the issue. |
What is the issue?
the script is functioning normally.
![image](https://private-user-images.githubusercontent.com/38313899/335290302-83c83eb7-366f-4de8-846a-e7393913fce7.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjE2MjAyOTcsIm5iZiI6MTcyMTYxOTk5NywicGF0aCI6Ii8zODMxMzg5OS8zMzUyOTAzMDItODNjODNlYjctMzY2Zi00ZGU4LTg0NmEtZTczOTM5MTNmY2U3LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA3MjIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNzIyVDAzNDYzN1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWIxYWNmMTQxNWEyYjg2YWQ1ZTZiNmQ1ZWUxYTkwOGQ0YmQ3ZTQ3MzRhOTY1MjZmOTM0NmZmYTExMTE3MTZjMDYmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.YZMfKwbHqGsYpedShfDno4YL8WPSex_cgfedO9_dr50)
![image](https://private-user-images.githubusercontent.com/38313899/335290373-e36fbfd1-0de6-43b4-86c6-5acd656ccc2a.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjE2MjAyOTcsIm5iZiI6MTcyMTYxOTk5NywicGF0aCI6Ii8zODMxMzg5OS8zMzUyOTAzNzMtZTM2ZmJmZDEtMGRlNi00M2I0LTg2YzYtNWFjZDY1NmNjYzJhLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA3MjIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNzIyVDAzNDYzN1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWIzMjdhNzJjNmRhNDQ5ZDBkNDcxODU2NDU1ZTJkNmVlMWVhNTY0YjE0ZDk2NTY1M2ZkODM2YTQxYjE1NWI0NWEmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.w9IBUvXalSbfO2GRYU1n8s8LhEXZwVdtThYd8LMgaKw)
but, the current program does not utilize the GPU.
Two days ago I found that there was a problem with the install.sh:
#4679
but now it seems that there is not only a problem with the script.
OS
Linux
GPU
Nvidia
CPU
No response
Ollama version
0.1.39
The text was updated successfully, but these errors were encountered: