Magic number in example #791

wilderfield · 2024-04-08T23:56:43Z

ggml/examples/simple/simple-ctx.cpp

Line 29 in bb8d8cf

ctx_size += 1024; // some overhead

Can this magic number 1024 be explained, or perhaps improved to some calculation?

Does it depend on the size of the output?

(I notice that if I increase the size of the input tensors this example stops working).

The text was updated successfully, but these errors were encountered:

wilderfield · 2024-04-09T00:03:09Z

@FSSRepo First off, thanks for contributing this example. Just want to include you on this issue to discuss this. Do you recall why you picked 1024 for this overhead? Can we calculate this instead?

FSSRepo · 2024-04-09T00:10:38Z

That number is a small extra space for the data since some operations require padding; this is necessary when performing calculations with the context (without using ggml-alloc, which internally adds that small overhead).

As for calculating it, it's just a matter of trying. Try removing it and see what happens.

wilderfield · 2024-04-09T14:13:40Z

I was gdb'ing last night, and I saw that when building the graph, memory is allocated from the context's memory pool for the output tensor. It happened somewhere under ggml_mul_mat(). This logic doesn't account for that correct?

If the input is 4096x2 , 2x4096 ... and output is 4096*4096 ... the ctx_size would not have enough space if we don't account for the output tensor size. (This example highlights how the output size can be far greater than the sum of the two inputs).

Also, do we even need to reserve space for the two inputs? They are allocated in the example?

FSSRepo · 2024-04-09T17:46:27Z

You're right, that 1024 should be the size of the output tensor data. Honestly, I'm not sure how to calculate it correctly before creating the context. @slaren Any idea on how to calculate the compute buffer size before creating the compute graph with the legacy API?

The maximum memory buffer in gpt-2 example is 256 MB:

ggml/examples/gpt-2/main-ctx.cpp

Lines 409 to 429 in 98875cd

 static size_t buf_size = 256u*1024*1024; 

 static void * buf = malloc(buf_size); 

 if (mem_per_token > 0 && mem_per_token*N > buf_size) { 

 const size_t buf_size_new = 1.1*(mem_per_token*N); // add 10% to account for ggml object overhead 

 //printf("\n%s: reallocating buffer from %zu to %zu bytes\n", __func__, buf_size, buf_size_new); 

 // reallocate 

 buf_size = buf_size_new; 

 buf = realloc(buf, buf_size); 

 if (buf == nullptr) { 

 fprintf(stderr, "%s: failed to allocate %zu bytes\n", __func__, buf_size); 

 return false; 

 } 

 } 

 struct ggml_init_params params = { 

 /*.mem_size =*/ buf_size, 

 /*.mem_buffer =*/ buf, 

 /*.no_alloc =*/ false, 

 };

slaren · 2024-04-09T18:36:52Z

You would have to pad the size of the tensor to the alignment value. My recommendation is to use ggml-alloc for compute buffers, and ggml_backend_alloc_ctx_tensors for static tensor buffers, and let it do it for you.

wilderfield · 2024-04-09T19:43:18Z

Tangentially, I also wanted to profile the matrix multiplication.
I put a loop and timers around this line:

ggml/examples/simple/simple-ctx.cpp

Line 66 in bb8d8cf

ggml_graph_compute_with_ctx(model.ctx, gf, n_threads);

1000 iterations.

Again, I see the context running out of memory. How could this example be modified to run iteratively?

slaren · 2024-04-09T19:50:09Z

ggml_graph_compute_with_ctx uses the context buffer to allocate a work buffer. Calling it repeatedly will cause the work buffer to be allocated on every iteration, until it runs out of memory. This is not a good way to test the performance of an operation since it will include other overheads, such as starting the threads. test-backend-ops has an option to test the performance of individual ops.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Magic number in example #791

Magic number in example #791

wilderfield commented Apr 8, 2024

wilderfield commented Apr 9, 2024

FSSRepo commented Apr 9, 2024

wilderfield commented Apr 9, 2024

FSSRepo commented Apr 9, 2024 •

edited

Loading

slaren commented Apr 9, 2024

wilderfield commented Apr 9, 2024

slaren commented Apr 9, 2024 •

edited

Loading

Magic number in example #791

Magic number in example #791

Comments

wilderfield commented Apr 8, 2024

wilderfield commented Apr 9, 2024

FSSRepo commented Apr 9, 2024

wilderfield commented Apr 9, 2024

FSSRepo commented Apr 9, 2024 • edited Loading

slaren commented Apr 9, 2024

wilderfield commented Apr 9, 2024

slaren commented Apr 9, 2024 • edited Loading

FSSRepo commented Apr 9, 2024 •

edited

Loading

slaren commented Apr 9, 2024 •

edited

Loading