sync : llama.cpp (CUDA ReLU, CPU-only with CUDA, bloom fix, etc) #607

ggerganov · 2023-11-13T13:11:32Z

Also reverts the CUDA memory pull stuff, which was synced last time, but was later reverted in llama.cpp

copilot:all

ggml-ci

FSSRepo · 2023-11-13T14:22:08Z

@slaren I think ggml_init_cublas() should be removed from ggml_init(...) by now. I was creating a model converter for stable-diffusion.cpp. The only thing I need is the gguf module, but it requires me to create a ggml_context to add tensors to the gguf_context. However, when it compiles with GGML_CUBLAS=ON, ggml_init calls ggml_init_cublas unnecessarily, leading to the error mentioned in this comment.

ggerganov · 2023-11-13T14:23:44Z

With this change, you can now run with CUDA_VISIBLE_DEVICES=-1 and ggml_init_cublas() should do nothing in this case

slaren · 2023-11-13T14:29:20Z

Once we move llama.cpp to ggml-backend, we will be able to remove the ggml-cuda calls from ggml.c and make the backend more self-contained, but as it is now it is required to support the "old" way to use ggml-cuda. But as @ggerganov says it, it should be possible to use a CUDA build for CPU-only after ggerganov/llama.cpp#3946

* It seems some new warning were added recently that exposed this. I wrote the code that included this unused variable originally and it is indeed not needed.

sync : llama.cpp (CUDA ReLU, CPU-only with CUDA, bloom fix, etc)

d84ce7d

ggml-ci

slaren approved these changes Nov 13, 2023

View reviewed changes

ggerganov merged commit 844dbb8 into master Nov 13, 2023
10 checks passed

ggerganov deleted the sync branch November 13, 2023 14:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sync : llama.cpp (CUDA ReLU, CPU-only with CUDA, bloom fix, etc) #607

sync : llama.cpp (CUDA ReLU, CPU-only with CUDA, bloom fix, etc) #607

ggerganov commented Nov 13, 2023 •

edited

Loading

FSSRepo commented Nov 13, 2023 •

edited

Loading

ggerganov commented Nov 13, 2023

slaren commented Nov 13, 2023

sync : llama.cpp (CUDA ReLU, CPU-only with CUDA, bloom fix, etc) #607

sync : llama.cpp (CUDA ReLU, CPU-only with CUDA, bloom fix, etc) #607

Conversation

ggerganov commented Nov 13, 2023 • edited Loading

FSSRepo commented Nov 13, 2023 • edited Loading

ggerganov commented Nov 13, 2023

slaren commented Nov 13, 2023

ggerganov commented Nov 13, 2023 •

edited

Loading

FSSRepo commented Nov 13, 2023 •

edited

Loading