-
Notifications
You must be signed in to change notification settings - Fork 966
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question] How do I force a computation of a tensor/force a dependency between 2 tensors? #610
Comments
The short answer is that you cannot do what you want with the current custom op API, but you can kind of hack it to make it work for your case. You can simulate a struct ggml_tensor * t = ggml_map_custom1(ctx, dep1, fun, ...);
t->src[1] = dep2;
t->src[2] = dep3;
t->src[3] = dep4;
t->src[4] = dep5;
t->src[5] = dep6;
void fun(struct ggml_tensor * dst , const struct ggml_tensor * dep1, int ith, int nth, void * userdata) {
struct ggml_tensor * dep2 = dst->src[1];
struct ggml_tensor * dep3 = dst->src[2];
// etc
} The size of To propagate the op as a dependency in the graph, you should write the result to the tensor returned by struct ggml_tensor * t = ggml_map_custom1_inplace(ctx, ggml_new_tensor_4d(...), ...); The tensor returned by It is also possible to use |
Thank you very much! It works. I had to up |
I will use
ggml_map_custom1
to perform a complicated custom operation which is too costly to be implemented naively usingggml
operations. This operation depends not only on the value of an argumentstruct ggml_tensor * a
, but will also require 6 other tensors, one of which will be written to.I plan to pass these additional 6 tensors in
void * userdata
. However, I expect thatggml
would not see that these tensors need to be computed before computing the custom op. I need in some way to forceggml
to compute them.As an ugly crutch, I've considered "entangling" tensors through mathematically useless operations, but this crutch fails:
Would be great to have an answer to either of 2 questions:
ggml_map_custom1
, while having them properly computed?For the context, the operation is
wkv
in RWKV v5 model. Current implementation uses matmuls and is too slow & memory hungry for serious usage.The text was updated successfully, but these errors were encountered: