Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add library function to estimate size of computation graph work tensor #214

Open
LoganDark opened this issue May 30, 2023 · 1 comment
Open
Assignees
Labels
enhancement New feature or request

Comments

@LoganDark
Copy link
Contributor

One of the major demons I fought while working on RWKV/rwkv.cpp#74 is ggml's mysterious computation graph work tensor, that is allocated the first time ggml_graph_compute is called. I was trying to perfectly estimate memory usage of the graph, so I manually counted objects and calls to ggml functions while the graph was being built. But when I had gotten the memory usage perfectly down to the last byte, ggml_graph_compute just tries to allocate a totally arbitrary amount of memory.

I didn't want to over-estimate for smaller models or especially under-estimate for larger models. It took a while to debug which tensor was the culprit (the largest mat-mul)—I hardcoded the dimensions of this tensor to estimate the upper bound for the computation graph work tensor, but this is not a great solution.

If ggml provides a library function to estimate the size of the computation graph work tensor, then instead of guessing, I can call that function and then allocate a new scratch to contain it. It's slightly less optimal than doing it during context construction, but at that point I don't have a context or a graph yet, and can't get one yet because it requires memory (go figure).

It would also be nice if I could tell ggml to allocate that work tensor early without having to actually do any graph computation.

@ggerganov ggerganov added the enhancement New feature or request label May 30, 2023
@ggerganov
Copy link
Owner

I agree - the current creation of the "work" tensor by ggml_graph_compute() is a bad design decision.
I also had trouble with it recently. Will fix this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants