Allow prechecking how much memory tensor creation will require #1349

KerfuffleV2 · 2023-05-07T09:40:46Z

Add ggml_tensor_required_memory function to calculate how much memory creating a tensor will require.

Also add ggml_used_scratch_mem function so applications can determine how much scratch memory was used. This trivial function will just return 0 if there's no current scratch buffer.

ggml_tensor_required_memory takes the same arguments as ggml_new_tensor_impl with the addition of two size_t pointers(required_ctx, and required_scratch). The logic is basically just a cut-and-paste from ggml_new_tensor_impl: It will calculate how much scratch and context memory is required and increment the the supplied size_t pointers by that amount. Then it returns true/false depending on whether those values fit in the context/scratch memory.

The reason it increments rather than just setting the required_ctx/_scratch arguments is to allow checking that multiple tensors will fit in advance of actually trying to create them. This could be considered a partial replacement for #1325 since memory limits are probably the most common reason for GGML to bail out during graph construction.

It also is something that can facilitate #1325 to recover in the case of operations that require creating multiple tensors like the map ops.

I converted ggml_new_tensor_impl to use the new function for its memory validation. Right now I think it's not taking full advantage of the change and recalculates some stuff.

Note: This isn't quite a draft, but it is lightly tested. It compiles without warnings for me and llama.cpp built on it can load/run inference on models. If this is something that would get merged, it would be a good idea to have better C programmers than me look at it closely.

… creating a tensor will require. Also add ggml_used_scratch_mem function so applications can determine how much scratch memory was used.

…tch space.

KerfuffleV2 · 2023-05-21T11:47:45Z

@ggerganov If you have a moment, I could use some direction here and in #1325.

Right now there are merge conflicts since quite a bit of time has passed, but it's not clear if these pulls are something you'd ever want to merge, whether you want me to change my approach, etc.

If I'm on the right track, of course I would be happy fix them and get these pulls into a mergeable state. But right now it's not clear if that would be wasted effort or not.

ggerganov · 2023-05-21T12:28:58Z

I don't see how the proposed function will be used in practice.
Lets say I have:

c = ggml_op(ctx, a, b);

How do I use ggml_tensor_required_memory() to calculated in advance the memory needed for this operation?

I still think there is a better solution by using the no_alloc parameter of the ggml_context:

llama.cpp/ggml.h

Line 409 in 265db98

bool no_alloc; // don't allocate memory for the tensor data

The idea is you run the eval function once with no_alloc == true without calling ggml_graph_compute() at the end to record how much memory is needed. After that you run the same eval function, but this time with no_alloc == false.
I guess we might need to add some extra size counters to the ggml_context to keep track of the required memory during the first pass, but that should be a simple change.

KerfuffleV2 · 2023-05-21T15:51:16Z

Thanks for the reply!

I don't see how the proposed function will be used in practice.

It's not really something that's aimed at individual developers consuming the GGML library directly.

To use it, you have to know fairly internal details about what the dimensions of the tensor for that op will be, whether it creates other sub-tensors, etc.

The main use case I was thinking of is for #1325:

Right now there are two factors that make actually recovering consistently impractical:

It's impossible to free a tensor after it's created without freeing the entire context.
Some tensor ops create other tensors behind the scenes. For example, my own ggml_map_binary_f32 operation: it needs to create another tensor to hold the function pointer.

Because of point 2, if the second tensor create fails there is just no way to recover without having to free the whole context and start again.

So the place where it would be useful immediately is for internal use, assuming the recoverable errors pull is something that would actually get merged.

It is also something that could be useful for larger projects like language bindings or wrappers for GGML. A binding can't require its consumers to do something like do the no_alloc song and dance. Just crashing the entire application for a recoverable error, which originates in some library is very unexpected behavior in that sort of situation. Especially for languages where correctness/safety is a big selling point like Rust. Someone using a Rust crate is not going to expect a call into an API with a recoverable error to bring their entire application down.

That also would be the case for high level languages like Python. Imagine you run a Python script to parse some JSON file and because of a typo in the file it actually calls abort() and kills the whole program without the chance for recovery. Someone using a Python package wouldn't expect (or want) that behavior.

Anyway, that's my case for just allowing errors to be recoverable in general. I want to be clear that I'm not talking about situations where one of GGML's internal invariants was violated (implying a bug in GGML). In that case, it makes perfect sense to abort() and force the problem to be dealt with prompt.

edit: In case it wasn't clear, although someone writing a binding wouldn't really want to have to dig through the GGML source code to find internal details like whether an op creates sub tensors and then ensure that it fits, ggml_tensor_required_memory would make it possible.

KerfuffleV2 · 2023-05-25T06:13:58Z

Closing this, at least for now as it doesn't seem like there's interest.

ggerganov · 2023-05-25T07:02:02Z

Apologies for ignoring these - the proposed changes require more thought from my side and are mainly relevant for 3rd party software integration. This is currently low priority, but will become important in the future

Will eventually get back to this after finishing some pending core features

KerfuffleV2 · 2023-05-25T07:34:37Z

No problem (also no passive aggressive motivation for closing this, just doesn't really make too much sense for just one person to be pushing for it). I've also been looking at alternative ways to solve the problem without needing those changes in my own project.

KerfuffleV2 added 2 commits May 7, 2023 03:26

Add ggml_tensor_required_memory function to calculate how much memory…

8995174

… creating a tensor will require. Also add ggml_used_scratch_mem function so applications can determine how much scratch memory was used.

Output correct error message in ggml_new_tensor_impl when out of scra…

ab4f1fa

…tch space.

KerfuffleV2 added the enhancement New feature or request label May 7, 2023

KerfuffleV2 closed this May 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow prechecking how much memory tensor creation will require #1349

Allow prechecking how much memory tensor creation will require #1349

KerfuffleV2 commented May 7, 2023 •

edited

Loading

KerfuffleV2 commented May 21, 2023

ggerganov commented May 21, 2023

KerfuffleV2 commented May 21, 2023 •

edited

Loading

KerfuffleV2 commented May 25, 2023

ggerganov commented May 25, 2023

KerfuffleV2 commented May 25, 2023

Allow prechecking how much memory tensor creation will require #1349

Allow prechecking how much memory tensor creation will require #1349

Conversation

KerfuffleV2 commented May 7, 2023 • edited Loading

KerfuffleV2 commented May 21, 2023

ggerganov commented May 21, 2023

KerfuffleV2 commented May 21, 2023 • edited Loading

KerfuffleV2 commented May 25, 2023

ggerganov commented May 25, 2023

KerfuffleV2 commented May 25, 2023

KerfuffleV2 commented May 7, 2023 •

edited

Loading

KerfuffleV2 commented May 21, 2023 •

edited

Loading