Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: cuda implementation for ggml_conv_transpose_1d #854

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

balisujohn
Copy link
Contributor

adds a cuda implementation for ggml_conv_transpose_1d

may extend the CPU version of the op to allow padding (currently, padding must be zero).

Currently a draft, posting now to prevent duplication of effort, I will remove draft status when ready for review.

@balisujohn balisujohn marked this pull request as draft June 10, 2024 07:00
@balisujohn
Copy link
Contributor Author

balisujohn commented Jun 13, 2024

Alright, this is ready for review. The key questions/concerns I have are as follows:

I added some tests for ggml_conv_transpose_1d in test-conv-transpose-1d.cpp some of these tests mirror the conv1 1d transpose tests in test-conv-transpose.c I have verified that the tests in test-conv-transpose-1d.cpp pass in both cuda and CPU mode, but I'm not sure if what's currently in the PR will run the tests in CUDA mode in the CI/CD, I had to remove the manual #define GGML_USE_CUBLASI had added in order for the CI/CD build to work. Curious how best to combine the new tests I added with what's already there, and how to make sure the tests I added are actually running with CUDA in the CI/CD process.

There is an extremely bizarre problem I'm encountering when trying to use this op with tortoise, the result is all zeros, but if this line is changed from

    dst[global_index] = accumulator;

to

    dst[global_index] += accumulator;

, the result is no longer all zeros.
I'm going to try to isolate the error into a self-contained example in a ggml fork so other people can take a look at it.

Some more background, the last test in test-conv-transpose-1d.cpp uses tensors that are the same shape and calls the op with the same params as the one with the zero behavior in tortoise, and seems to work correctly, so I don't think it's just because of the shape of the tensors and parameters to the op. It seems like src0 and src1 are pointing to zero valued vectors in this case, so even though the logic is correct the accumulator is being set to zero. And for some reason dst is pre-populated with non-zero values, so both += and no assignment to dst result in nonzero values being present in the result.
(The weird bug is fixed)

@balisujohn balisujohn marked this pull request as ready for review June 13, 2024 22:41
@slaren
Copy link
Collaborator

slaren commented Jun 14, 2024

The github runners cannot run CUDA code. There is the ggml-ci can run CUDA tests, but it only monitors branches in this repository. A test should be added to test-backend-ops as well.


int out_index = global_index / dst_ne0;

int accumulator = 0;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

int -> float

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah that seems to fix the issue I was experiencing; it's bizarre that the tests still passed even in cuda mode with the types accidentally set to int Thanks so much!



int kernel_weight = src0[kernel_offset + weight_idx];
int input_value = src1[input_offset+i];

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

int -> float

@balisujohn
Copy link
Contributor Author

I fixed the accumulator bug, added a test to test-backend-ops; ready for review again :^)

cudaStream_t stream = ctx.stream();

GGML_ASSERT(src0->type == GGML_TYPE_F32);
GGML_ASSERT( dst->type == GGML_TYPE_F32);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also assert contiguous src0 and src1

Copy link
Owner

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see update of ggml_backend_cuda_supports_op() to indicate that this operation is now supported. While at it, update the rest of the GPU backends to return false in their corresponding supports_op() functions so that the tests do not fail

Maybe add a few more small tests to tests-backend-ops just in case

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants