Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Matmul on 4d tensors with cuda backend #672

Closed
balisujohn opened this issue Dec 29, 2023 · 8 comments
Closed

Matmul on 4d tensors with cuda backend #672

balisujohn opened this issue Dec 29, 2023 · 8 comments

Comments

@balisujohn
Copy link
Contributor

balisujohn commented Dec 29, 2023

Thanks to all contributors for your work on ggml so far.

I have tensors with shape [18,18,16,4] and [18,64,16,4] and I am trying to multiply them together to get a tensor of shape [64,18,16,4]. It seems like the resulting tensor in the computational graph has the right shape, but I get this error

GGML_ASSERT: /home/john/ggml/src/ggml-cuda.cu:7387: src0->ne[3] == 1

I am using a version of ggml effectively the same as this commit. Is there a canonical way to broadcast matmul the first 2 dimensions of 4d tensors with ggml using the CUDA backend, or would I need to add this?

@balisujohn
Copy link
Contributor Author

(It occurs to me I might be able to consolidate dimensions 3 and 4 to make them into 3d tensors before multiplying. I'll give that a try)

@ggerganov
Copy link
Owner

Yes, you should be able to ggml_reshape_3d to achieve this.

However, where is this assert - I cannot seem to find it? There is only this one, but it is in ggml_cuda_cpy:

GGML_ASSERT(src0->ne[3] == 1);

@balisujohn
Copy link
Contributor Author

balisujohn commented Dec 29, 2023

So it looks like the issue was actually coming from a ggml_cont command. So I have a 4d tensor that's the result of permute so it's not contiguous. If I try ggml_reshape_3d I get

GGML_ASSERT: /home/john/ggml/src/ggml.c:6882: ggml_is_contiguous(a)

And if I try ggml_cont_3d,

I get

GGML_ASSERT: /home/john/ggml/src/ggml-cuda.cu:7387: src0->ne[3] == 1

So this seems like a bit of a catch-22. Is there a way to get around this?

If I try to use the raw 4d tensor in the ggml_mul-mat, I get

GGML_ASSERT: /home/john/ggml/src/ggml.c:6496: !ggml_is_transposed(a)

Which is what motivated the ggml_cont in the first place.

@balisujohn
Copy link
Contributor Author

Ok so the deal is GGML_OP_CONT results in ggml_cuda_cpy being called in ggml_cuda.cu. and ggml_cuda_cpy asserts that both tensors it operates on are no more than 3d. Would there be interest making ggml_cuda_cpy support 4d tensors? I might be willing to work on it myself.

@balisujohn
Copy link
Contributor Author

balisujohn commented Dec 31, 2023

(It seems like 4d cont is implemented for ggml.c so this is mostly a task of reverse engineering a for loop implementation into a cuda kernel)

@slaren
Copy link
Collaborator

slaren commented Dec 31, 2023

Yes, it would be great to fix these issues in the CUDA backend. Most of the code in the CUDA backend was written specifically for llama.cpp, and over time it has been expanded to be more general, but there is still a lot of work to do.

@balisujohn
Copy link
Contributor Author

I think I got it to work with ggml_view_3d so I'm closing this issue.

@balisujohn
Copy link
Contributor Author

reopening pending PR to fix

@balisujohn balisujohn reopened this Jan 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants