-
Notifications
You must be signed in to change notification settings - Fork 966
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
why add 512 : ggml_backend_alloc_buffer(backend_kv, memory_size + 512*2); #637
Comments
Edit: ignore this comment, I was confused. See next comment by @slaren The The Lines 556 to 560 in 1f6c60d
The code in the first screenshot that you posted should be rewritten like this: // old
model.buffer_kv = ggml_backend_alloc_buffer(backend_kv, memory_size + 512*2);
// new
model.buffer_kv = ggml_backend_alloc_buffer(backend_kv, memory_size + 2*ggml_tensor_overhead()); The
In some case, you can create the model.buffer_kv = ggml_backend_alloc_buffer(backend_kv, 2*ggml_tensor_overhead()); |
I think there is some confusion here. The 512 is added to account for any possible padding and alignment overhead that the backend may require. The way to calculate the exact memory requirements can be seen in Lines 766 to 777 in 3f66942
It was added recently and not every example has been updated to use it, but generally, if you have a fixed list of tensors that you need to allocate into a backend buffer, using this function is the preferred way to allocate them (when using ggml-backend). The backend buffers can only be used to store the tensor data, not the tensor metadata, so it is never necessary to add any |
thank all of you for answering my questions ! i am also confused hahaha |
You should only use
Yes, this is correct. If possible, use |
I don't understand why 512 needs to be added, but it looks like a routine operation. May I ask what it aims at?
The text was updated successfully, but these errors were encountered: