cuda: set backend type to GPU when init tensor #792

danbev · 2024-04-09T07:05:10Z

This commit sets the backend type to GPU when initializing a tensor in the CUDA backend.

The motivation for this change is that currently the backend type of the tensor is still set to CPU after the tensor is initialized by the CUDA backend. Other backends like sycl and kompute set the backend type to GPU, and this change makes the CUDA backend consistent with those backends.

This can be reproduced using the following steps:

Patch examples/simple/simple-backend.cpp

$ cat simple-backend.cpp.patch 
diff --git a/examples/simple/simple-backend.cpp b/examples/simple/simple-backend.cpp
index 4ae6f3c..844914e 100644
--- a/examples/simple/simple-backend.cpp
+++ b/examples/simple/simple-backend.cpp
@@ -81,8 +81,10 @@ void load_model(simple_model & model, float * a, float * b, int rows_A, int cols
     model.a = ggml_new_tensor_2d(model.ctx, GGML_TYPE_F32, cols_A, rows_A);
     model.b = ggml_new_tensor_2d(model.ctx, GGML_TYPE_F32, cols_B, rows_B);
 
+    printf("a before alloc_ctx_tensors: %d\n", model.a->backend);
     // create a backend buffer (backend memory) and alloc the tensors from the context
     model.buffer = ggml_backend_alloc_ctx_tensors(model.ctx, model.backend);
+    printf("a after alloc_ctx_tensors: %d\n", model.a->backend);
 
     // load data from cpu memory to backend buffer
     ggml_backend_tensor_set(model.a, a, 0, ggml_nbytes(model.a));
$ git apply simple-backend.cpp.patch

Build with CUDA support enabled:

$ mkdir build && cd build
$ cmake .. -DGGML_CUDA=ON
$ make -j8 simple-backend

Run example without changes in this pull request:

$ ./bin/simple-backend 
load_model: using CUDA backend
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:   no
ggml_cuda_init: CUDA_USE_TENSOR_CORES: yes
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 4070, compute capability 8.9, VMM: yes
a before alloc_ctx_tensors: 0
a after alloc_ctx_tensors: 0
main: compute buffer size: 0.1250 KB
mul mat (4 x 3) (transposed result):
[ 60.00 110.00 54.00 29.00
 55.00 90.00 126.00 28.00
 50.00 54.00 42.00 64.00 ]

Run example with changes in this pull request:

$ ./bin/simple-backend 
load_model: using CUDA backend
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:   no
ggml_cuda_init: CUDA_USE_TENSOR_CORES: yes
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 4070, compute capability 8.9, VMM: yes
a before alloc_ctx_tensors: 0
a after alloc_ctx_tensors: 10
main: compute buffer size: 0.1250 KB
mul mat (4 x 3) (transposed result):
[ 60.00 110.00 54.00 29.00
 55.00 90.00 126.00 28.00
 50.00 54.00 42.00 64.00 ]

This commit sets the backend type to GPU when initializing a tensor in the CUDA backend. The motivation for this change is that currently the backend type of the tensor is still set to CPU after the tensor is initialized by the CUDA backend. Other backends like sycl and kompute set the backend type to GPU, and this change makes the CUDA backend consistent with those backends. Signed-off-by: Daniel Bevenius <[email protected]>

slaren · 2024-04-09T19:58:24Z

ggml_tensor::backend is deprecated and will be removed once all the backends stop depending on it.

slaren closed this Apr 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cuda: set backend type to GPU when init tensor #792

cuda: set backend type to GPU when init tensor #792

danbev commented Apr 9, 2024

slaren commented Apr 9, 2024

cuda: set backend type to GPU when init tensor #792

cuda: set backend type to GPU when init tensor #792

Conversation

danbev commented Apr 9, 2024

slaren commented Apr 9, 2024