You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
One of the major demons I fought while working on RWKV/rwkv.cpp#74 is ggml's mysterious computation graph work tensor, that is allocated the first time ggml_graph_compute is called. I was trying to perfectly estimate memory usage of the graph, so I manually counted objects and calls to ggml functions while the graph was being built. But when I had gotten the memory usage perfectly down to the last byte, ggml_graph_compute just tries to allocate a totally arbitrary amount of memory.
If ggml provides a library function to estimate the size of the computation graph work tensor, then instead of guessing, I can call that function and then allocate a new scratch to contain it. It's slightly less optimal than doing it during context construction, but at that point I don't have a context or a graph yet, and can't get one yet because it requires memory (go figure).
It would also be nice if I could tell ggml to allocate that work tensor early without having to actually do any graph computation.
The text was updated successfully, but these errors were encountered:
I agree - the current creation of the "work" tensor by ggml_graph_compute() is a bad design decision.
I also had trouble with it recently. Will fix this
One of the major demons I fought while working on RWKV/rwkv.cpp#74 is ggml's mysterious computation graph work tensor, that is allocated the first time
ggml_graph_compute
is called. I was trying to perfectly estimate memory usage of the graph, so I manually counted objects and calls to ggml functions while the graph was being built. But when I had gotten the memory usage perfectly down to the last byte,ggml_graph_compute
just tries to allocate a totally arbitrary amount of memory.I didn't want to over-estimate for smaller models or especially under-estimate for larger models. It took a while to debug which tensor was the culprit (the largest mat-mul)—I hardcoded the dimensions of this tensor to estimate the upper bound for the computation graph work tensor, but this is not a great solution.
If ggml provides a library function to estimate the size of the computation graph work tensor, then instead of guessing, I can call that function and then allocate a new scratch to contain it. It's slightly less optimal than doing it during context construction, but at that point I don't have a context or a graph yet, and can't get one yet because it requires memory (go figure).
It would also be nice if I could tell ggml to allocate that work tensor early without having to actually do any graph computation.
The text was updated successfully, but these errors were encountered: