Estimate memory requirements for graph #260

smspillaz · 2023-06-15T10:36:45Z

This is sort of in a similar light to #214 , but a bit more general.

It would be useful to be able to estimate the total context memory requirement given some computation graph or a list of tensor descriptions. This would make the implementation of newer models that much easier, since the implementer doesn't need to estimate all the memory usage manually.

For computation graphs, this wouldn't be more overhead as long as the computation graph size was constant between invocations. In that case the context's memory buffer can be re-used (I've successfully done this for GPT2 in https://github.com/smspillaz/ggml-gobject).

I think in order to implement this, you could have a flag on ggml_context such that when new tensors are created in that context, they don't actually allocate any memory for the data (the object overhead can either go into its own memory pool or on to the stack/heap). Writing to the tensors would be a no-op, as well as ggml_graph_compute. Once the computation graph has been created, then the library consumer can query the context's estimated memory usage, which could be done by walking all the objects in the ggml_object list and tallying up their sizes.

I haven't looked very closely at the details - maybe data allocations are needed in order to build the graph somehow which would make this infeasible. But if not, I could try doing this myself and submitting a pull request, if it belongs in the library.

The text was updated successfully, but these errors were encountered:

LoganDark · 2023-06-16T04:33:38Z

this is mostly possible if you don't mind reading the implementation of every function to figure out exactly what it does:

https://github.com/saharNooby/rwkv.cpp/blob/6b26e0db28b26f0fb2c73c5aa6ff490818fb1456/rwkv.cpp#L942-L958

https://github.com/saharNooby/rwkv.cpp/blob/6b26e0db28b26f0fb2c73c5aa6ff490818fb1456/rwkv.cpp#L505-L519

ggerganov · 2023-06-18T10:19:19Z

Yes, this is annoying currently that you have to pre-compute the necessary size. I'm thinking about ways to solve this.
The proposed solution is one way to do it. Will try to prioritize this feature soon

LoganDark · 2023-06-19T20:41:32Z

this is mostly possible if you don't mind reading the implementation of every function to figure out exactly what it does:

https://github.com/saharNooby/rwkv.cpp/blob/6b26e0db28b26f0fb2c73c5aa6ff490818fb1456/rwkv.cpp#L942-L958

https://github.com/saharNooby/rwkv.cpp/blob/6b26e0db28b26f0fb2c73c5aa6ff490818fb1456/rwkv.cpp#L505-L519

the latest version of GGML trashed this so severely (WHY do ggml_views allocate ANOTHER extra tensor now??) that I'm going to have to redo the entire system, so that's fun

smspillaz · 2023-08-03T21:32:33Z

Was just wondering if there was any update on this - I can also start looking into this myself

slaren · 2023-08-03T21:54:09Z

There is an implementation in llama.cpp that does this, among other things. It is not entirely automated as you are suggesting here, you have to avoid writing to the tensors while creating a dummy graph for measuring the memory requirements.
ggerganov/llama.cpp#2411

LoganDark · 2023-08-03T22:02:26Z

Was just wondering if there was any update on this - I can also start looking into this myself

well, rwkv.cpp has a new implementation if you're interested that uses "future tensors"—basically predicting the amount of objects and memory that will be used by each tensor operation, and the prediction functions get quite a bit nicer:

https://github.com/saharNooby/rwkv.cpp/blob/84f34c548b4d24981a0a6f2ee5c4030686f26ced/rwkv.cpp#L481-L612

https://github.com/saharNooby/rwkv.cpp/blob/84f34c548b4d24981a0a6f2ee5c4030686f26ced/rwkv.cpp#L770-L790

https://github.com/saharNooby/rwkv.cpp/blob/84f34c548b4d24981a0a6f2ee5c4030686f26ced/rwkv.cpp#L825-L880

https://github.com/saharNooby/rwkv.cpp/blob/84f34c548b4d24981a0a6f2ee5c4030686f26ced/rwkv.cpp#L958-L1021

https://github.com/saharNooby/rwkv.cpp/blob/84f34c548b4d24981a0a6f2ee5c4030686f26ced/rwkv.cpp#L1066-L1127C2

https://github.com/saharNooby/rwkv.cpp/blob/84f34c548b4d24981a0a6f2ee5c4030686f26ced/rwkv.cpp#L1179-L1251

other than that, I have nothing >/

smspillaz · 2023-08-04T13:45:37Z

OK, perhaps I can try to backport ggerganov/llama.cpp#2411 to here?

smspillaz · 2023-08-04T19:06:36Z

Created #433 . I want to try and implement this in ggml-gobject as well, just to test that it works correctly (no reason why it shouldn't, since the allocator parts are relatively standalone)

Green-Sky · 2023-08-05T17:57:54Z

@ggerganov occasionally syncs the ggml code in ggml/whisper.cpp/llama.cpp, i suppose you just have to poke him, and he will do it... sometime he has time :)

Green-Sky mentioned this issue Jun 22, 2023

support for larger models monatis/clip.cpp#9

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Estimate memory requirements for graph #260

Estimate memory requirements for graph #260

smspillaz commented Jun 15, 2023 •

edited

Loading

LoganDark commented Jun 16, 2023 •

edited

Loading

ggerganov commented Jun 18, 2023

LoganDark commented Jun 19, 2023

smspillaz commented Aug 3, 2023

slaren commented Aug 3, 2023

LoganDark commented Aug 3, 2023

smspillaz commented Aug 4, 2023 •

edited

Loading

smspillaz commented Aug 4, 2023 •

edited

Loading

Green-Sky commented Aug 5, 2023

Estimate memory requirements for graph #260

Estimate memory requirements for graph #260

Comments

smspillaz commented Jun 15, 2023 • edited Loading

LoganDark commented Jun 16, 2023 • edited Loading

ggerganov commented Jun 18, 2023

LoganDark commented Jun 19, 2023

smspillaz commented Aug 3, 2023

slaren commented Aug 3, 2023

LoganDark commented Aug 3, 2023

smspillaz commented Aug 4, 2023 • edited Loading

smspillaz commented Aug 4, 2023 • edited Loading

Green-Sky commented Aug 5, 2023

smspillaz commented Jun 15, 2023 •

edited

Loading

LoganDark commented Jun 16, 2023 •

edited

Loading

smspillaz commented Aug 4, 2023 •

edited

Loading

smspillaz commented Aug 4, 2023 •

edited

Loading