why add 512 : ggml_backend_alloc_buffer(backend_kv, memory_size + 512*2); #637

EveningLin · 2023-12-06T03:59:36Z

I don't understand why 512 needs to be added, but it looks like a routine operation. May I ask what it aims at？

ggerganov · 2023-12-06T07:48:58Z

Edit: ignore this comment, I was confused. See next comment by @slaren

The ggml_context contains the "meta" information for each tensor - this is the ggml_tensor struct. It can also optionally contain the tensor data, if the ggml_context is created with .no_alloc = false.

The 512 were the old way to account for the size overhead of the meta information.
Today, you should use the dedicated ggml_tensor_overhead() and ggml_graph_overhead() functions for that. Here is an example:

ggml/examples/gpt-2/main.cpp

Lines 556 to 560 in 1f6c60d

 
 // since we are using ggml-alloc, this buffer only needs enough space to hold the ggml_tensor and ggml_cgraph structs, but not the tensor data 

 static size_t buf_size = ggml_tensor_overhead()*GPT2_MAX_NODES + ggml_graph_overhead_custom(GPT2_MAX_NODES, false); 

 static std::vector<uint8_t> buf(buf_size);

The code in the first screenshot that you posted should be rewritten like this:

// old
model.buffer_kv = ggml_backend_alloc_buffer(backend_kv, memory_size + 512*2);

// new
model.buffer_kv = ggml_backend_alloc_buffer(backend_kv, memory_size + 2*ggml_tensor_overhead());

The backend_kv context contains 2 tensors:

the total data size of the 2 tensors is memory_size
the overhead from their meta information is 2*ggml_tensor_overhead()

In some case, you can create the ggml_context with .no_alloc = true. In these cases, the ggml_context will contain only the meta information without the tensor data. So the buffer size in the example above would be just:

model.buffer_kv = ggml_backend_alloc_buffer(backend_kv, 2*ggml_tensor_overhead());

slaren · 2023-12-06T10:41:12Z

I think there is some confusion here. The 512 is added to account for any possible padding and alignment overhead that the backend may require. The way to calculate the exact memory requirements can be seen in ggml_backend_alloc_ctx_tensors_from_buft:

ggml/src/ggml-alloc.c

Lines 766 to 777 in 3f66942

 ggml_backend_buffer_t ggml_backend_alloc_ctx_tensors_from_buft(struct ggml_context * ctx, ggml_backend_buffer_type_t buft) { 

 GGML_ASSERT(ggml_get_no_alloc(ctx) == true); 

 size_t alignment = ggml_backend_buft_get_alignment(buft); 

 size_t nbytes = 0; 

 for (struct ggml_tensor * t = ggml_get_first_tensor(ctx); t != NULL; t = ggml_get_next_tensor(ctx, t)) { 

 if (t->data == NULL && t->view_src == NULL) { 

 nbytes += GGML_PAD(ggml_backend_buft_get_alloc_size(buft, t), alignment); 

 } 

 }

It was added recently and not every example has been updated to use it, but generally, if you have a fixed list of tensors that you need to allocate into a backend buffer, using this function is the preferred way to allocate them (when using ggml-backend).

The backend buffers can only be used to store the tensor data, not the tensor metadata, so it is never necessary to add any ggml_tensor_overhead to the size of the buffers.

EveningLin · 2023-12-07T04:01:09Z

thank all of you for answering my questions ! i am also confused hahaha
model.buffer_kv = ggml_backend_alloc_buffer(backend_kv, 2*ggml_tensor_overhead());It is an incorrect way of expression？？
can 512 be replaced by ggml_tensor_overhead?
512 is just an estimate of the memory used to solve alignment and other issues？？

slaren · 2023-12-07T18:35:44Z

You should only use ggml_tensor_overhead in the context buffers, which is where the tensor metadata is allocated, not in the backend buffers that are only used for the tensor data.

512 is just an estimate of the memory used to solve alignment and other issues？？

Yes, this is correct. If possible, use ggml_backend_alloc_ctx_tensors_from_buft or ggml_backend_alloc_ctx_tensors instead of calculating the buffer size yourself.

slaren closed this as completed Jan 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

why add 512 : ggml_backend_alloc_buffer(backend_kv, memory_size + 512*2); #637

why add 512 : ggml_backend_alloc_buffer(backend_kv, memory_size + 512*2); #637

EveningLin commented Dec 6, 2023

ggerganov commented Dec 6, 2023 •

edited

Loading

slaren commented Dec 6, 2023 •

edited

Loading

EveningLin commented Dec 7, 2023

slaren commented Dec 7, 2023

why add 512 : ggml_backend_alloc_buffer(backend_kv, memory_size + 512*2); #637

why add 512 : ggml_backend_alloc_buffer(backend_kv, memory_size + 512*2); #637

Comments

EveningLin commented Dec 6, 2023

ggerganov commented Dec 6, 2023 • edited Loading

slaren commented Dec 6, 2023 • edited Loading

EveningLin commented Dec 7, 2023

slaren commented Dec 7, 2023

ggerganov commented Dec 6, 2023 •

edited

Loading

slaren commented Dec 6, 2023 •

edited

Loading