Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cuda: set backend type to GPU when init tensor #792

Closed
wants to merge 1 commit into from

Conversation

danbev
Copy link
Contributor

@danbev danbev commented Apr 9, 2024

This commit sets the backend type to GPU when initializing a tensor in the CUDA backend.

The motivation for this change is that currently the backend type of the tensor is still set to CPU after the tensor is initialized by the CUDA backend. Other backends like sycl and kompute set the backend type to GPU, and this change makes the CUDA backend consistent with those backends.


This can be reproduced using the following steps:

  1. Patch examples/simple/simple-backend.cpp
$ cat simple-backend.cpp.patch 
diff --git a/examples/simple/simple-backend.cpp b/examples/simple/simple-backend.cpp
index 4ae6f3c..844914e 100644
--- a/examples/simple/simple-backend.cpp
+++ b/examples/simple/simple-backend.cpp
@@ -81,8 +81,10 @@ void load_model(simple_model & model, float * a, float * b, int rows_A, int cols
     model.a = ggml_new_tensor_2d(model.ctx, GGML_TYPE_F32, cols_A, rows_A);
     model.b = ggml_new_tensor_2d(model.ctx, GGML_TYPE_F32, cols_B, rows_B);
 
+    printf("a before alloc_ctx_tensors: %d\n", model.a->backend);
     // create a backend buffer (backend memory) and alloc the tensors from the context
     model.buffer = ggml_backend_alloc_ctx_tensors(model.ctx, model.backend);
+    printf("a after alloc_ctx_tensors: %d\n", model.a->backend);
 
     // load data from cpu memory to backend buffer
     ggml_backend_tensor_set(model.a, a, 0, ggml_nbytes(model.a));
$ git apply simple-backend.cpp.patch 
  1. Build with CUDA support enabled:
$ mkdir build && cd build
$ cmake .. -DGGML_CUDA=ON
$ make -j8 simple-backend
  1. Run example without changes in this pull request:
$ ./bin/simple-backend 
load_model: using CUDA backend
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:   no
ggml_cuda_init: CUDA_USE_TENSOR_CORES: yes
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 4070, compute capability 8.9, VMM: yes
a before alloc_ctx_tensors: 0
a after alloc_ctx_tensors: 0
main: compute buffer size: 0.1250 KB
mul mat (4 x 3) (transposed result):
[ 60.00 110.00 54.00 29.00
 55.00 90.00 126.00 28.00
 50.00 54.00 42.00 64.00 ]
  1. Run example with changes in this pull request:
$ ./bin/simple-backend 
load_model: using CUDA backend
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:   no
ggml_cuda_init: CUDA_USE_TENSOR_CORES: yes
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 4070, compute capability 8.9, VMM: yes
a before alloc_ctx_tensors: 0
a after alloc_ctx_tensors: 10
main: compute buffer size: 0.1250 KB
mul mat (4 x 3) (transposed result):
[ 60.00 110.00 54.00 29.00
 55.00 90.00 126.00 28.00
 50.00 54.00 42.00 64.00 ]

This commit sets the backend type to GPU when initializing a tensor
in the CUDA backend.

The motivation for this change is that currently the backend type of
the tensor is still set to CPU after the tensor is initialized by the
CUDA backend. Other backends like sycl and kompute set the backend
type to GPU, and this change makes the CUDA backend consistent with
those backends.

Signed-off-by: Daniel Bevenius <[email protected]>
@slaren
Copy link
Collaborator

slaren commented Apr 9, 2024

ggml_tensor::backend is deprecated and will be removed once all the backends stop depending on it.

@slaren slaren closed this Apr 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants