-
Notifications
You must be signed in to change notification settings - Fork 966
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Matmul on 4d tensors with cuda backend #672
Comments
(It occurs to me I might be able to consolidate dimensions 3 and 4 to make them into 3d tensors before multiplying. I'll give that a try) |
Yes, you should be able to However, where is this assert - I cannot seem to find it? There is only this one, but it is in Line 9061 in 6474641
|
So it looks like the issue was actually coming from a
And if I try I get
So this seems like a bit of a catch-22. Is there a way to get around this? If I try to use the raw 4d tensor in the
Which is what motivated the ggml_cont in the first place. |
Ok so the deal is GGML_OP_CONT results in |
(It seems like 4d cont is implemented for ggml.c so this is mostly a task of reverse engineering a for loop implementation into a cuda kernel) |
Yes, it would be great to fix these issues in the CUDA backend. Most of the code in the CUDA backend was written specifically for llama.cpp, and over time it has been expanded to be more general, but there is still a lot of work to do. |
I think I got it to work with |
reopening pending PR to fix |
Thanks to all contributors for your work on ggml so far.
I have tensors with shape
[18,18,16,4]
and[18,64,16,4]
and I am trying to multiply them together to get a tensor of shape[64,18,16,4]
. It seems like the resulting tensor in the computational graph has the right shape, but I get this errorI am using a version of ggml effectively the same as this commit. Is there a canonical way to broadcast matmul the first 2 dimensions of 4d tensors with ggml using the CUDA backend, or would I need to add this?
The text was updated successfully, but these errors were encountered: