-
Notifications
You must be signed in to change notification settings - Fork 8.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow prechecking how much memory tensor creation will require #1349
Allow prechecking how much memory tensor creation will require #1349
Conversation
… creating a tensor will require. Also add ggml_used_scratch_mem function so applications can determine how much scratch memory was used.
@ggerganov If you have a moment, I could use some direction here and in #1325. Right now there are merge conflicts since quite a bit of time has passed, but it's not clear if these pulls are something you'd ever want to merge, whether you want me to change my approach, etc. If I'm on the right track, of course I would be happy fix them and get these pulls into a mergeable state. But right now it's not clear if that would be wasted effort or not. |
I don't see how the proposed function will be used in practice. c = ggml_op(ctx, a, b); How do I use I still think there is a better solution by using the Line 409 in 265db98
The idea is you run the eval function once with |
Thanks for the reply!
It's not really something that's aimed at individual developers consuming the GGML library directly. To use it, you have to know fairly internal details about what the dimensions of the tensor for that op will be, whether it creates other sub-tensors, etc. The main use case I was thinking of is for #1325: Right now there are two factors that make actually recovering consistently impractical:
Because of point 2, if the second tensor create fails there is just no way to recover without having to free the whole context and start again. So the place where it would be useful immediately is for internal use, assuming the recoverable errors pull is something that would actually get merged. It is also something that could be useful for larger projects like language bindings or wrappers for GGML. A binding can't require its consumers to do something like do the That also would be the case for high level languages like Python. Imagine you run a Python script to parse some JSON file and because of a typo in the file it actually calls Anyway, that's my case for just allowing errors to be recoverable in general. I want to be clear that I'm not talking about situations where one of GGML's internal invariants was violated (implying a bug in GGML). In that case, it makes perfect sense to edit: In case it wasn't clear, although someone writing a binding wouldn't really want to have to dig through the GGML source code to find internal details like whether an op creates sub tensors and then ensure that it fits, |
Closing this, at least for now as it doesn't seem like there's interest. |
Apologies for ignoring these - the proposed changes require more thought from my side and are mainly relevant for 3rd party software integration. This is currently low priority, but will become important in the future Will eventually get back to this after finishing some pending core features |
No problem (also no passive aggressive motivation for closing this, just doesn't really make too much sense for just one person to be pushing for it). I've also been looking at alternative ways to solve the problem without needing those changes in my own project. |
Add
ggml_tensor_required_memory
function to calculate how much memory creating a tensor will require.Also add
ggml_used_scratch_mem
function so applications can determine how much scratch memory was used. This trivial function will just return 0 if there's no current scratch buffer.ggml_tensor_required_memory
takes the same arguments asggml_new_tensor_impl
with the addition of twosize_t
pointers(required_ctx
, andrequired_scratch
). The logic is basically just a cut-and-paste fromggml_new_tensor_impl
: It will calculate how much scratch and context memory is required and increment the the suppliedsize_t
pointers by that amount. Then it returnstrue
/false
depending on whether those values fit in the context/scratch memory.The reason it increments rather than just setting the
required_ctx
/_scratch
arguments is to allow checking that multiple tensors will fit in advance of actually trying to create them. This could be considered a partial replacement for #1325 since memory limits are probably the most common reason for GGML to bail out during graph construction.It also is something that can facilitate #1325 to recover in the case of operations that require creating multiple tensors like the map ops.
I converted
ggml_new_tensor_impl
to use the new function for its memory validation. Right now I think it's not taking full advantage of the change and recalculates some stuff.Note: This isn't quite a draft, but it is lightly tested. It compiles without warnings for me and llama.cpp built on it can load/run inference on models. If this is something that would get merged, it would be a good idea to have better C programmers than me look at it closely.