-
Notifications
You must be signed in to change notification settings - Fork 926
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: cuda implementation for ggml_conv_transpose_1d
#854
base: master
Are you sure you want to change the base?
feat: cuda implementation for ggml_conv_transpose_1d
#854
Conversation
Alright, this is ready for review. The key questions/concerns I have are as follows: I added some tests for
to
,
|
…e subsequent tests to fail
The github runners cannot run CUDA code. There is the ggml-ci can run CUDA tests, but it only monitors branches in this repository. A test should be added to |
src/ggml-cuda/conv-transpose-1d.cu
Outdated
|
||
int out_index = global_index / dst_ne0; | ||
|
||
int accumulator = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
int
-> float
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah that seems to fix the issue I was experiencing; it's bizarre that the tests still passed even in cuda mode with the types accidentally set to int
Thanks so much!
src/ggml-cuda/conv-transpose-1d.cu
Outdated
|
||
|
||
int kernel_weight = src0[kernel_offset + weight_idx]; | ||
int input_value = src1[input_offset+i]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
int
-> float
I fixed the accumulator bug, added a test to test-backend-ops; ready for review again :^) |
cudaStream_t stream = ctx.stream(); | ||
|
||
GGML_ASSERT(src0->type == GGML_TYPE_F32); | ||
GGML_ASSERT( dst->type == GGML_TYPE_F32); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also assert contiguous src0
and src1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see update of ggml_backend_cuda_supports_op()
to indicate that this operation is now supported. While at it, update the rest of the GPU backends to return false
in their corresponding supports_op()
functions so that the tests do not fail
Maybe add a few more small tests to tests-backend-ops
just in case
adds a cuda implementation for
ggml_conv_transpose_1d
may extend the CPU version of the op to allow padding (currently, padding must be zero).
Currently a draft, posting now to prevent duplication of effort, I will remove draft status when ready for review.