-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Magic number in example #791
Comments
@FSSRepo First off, thanks for contributing this example. Just want to include you on this issue to discuss this. Do you recall why you picked 1024 for this overhead? Can we calculate this instead? |
That number is a small extra space for the data since some operations require padding; this is necessary when performing calculations with the context (without using ggml-alloc, which internally adds that small overhead). As for calculating it, it's just a matter of trying. Try removing it and see what happens. |
I was gdb'ing last night, and I saw that when building the graph, memory is allocated from the context's memory pool for the output tensor. It happened somewhere under ggml_mul_mat(). This logic doesn't account for that correct? If the input is 4096x2 , 2x4096 ... and output is 4096*4096 ... the ctx_size would not have enough space if we don't account for the output tensor size. (This example highlights how the output size can be far greater than the sum of the two inputs). Also, do we even need to reserve space for the two inputs? They are allocated in the example? |
You're right, that 1024 should be the size of the output tensor data. Honestly, I'm not sure how to calculate it correctly before creating the context. @slaren Any idea on how to calculate the compute buffer size before creating the compute graph with the legacy API? The maximum memory buffer in gpt-2 example is 256 MB: ggml/examples/gpt-2/main-ctx.cpp Lines 409 to 429 in 98875cd
|
You would have to pad the size of the tensor to the alignment value. My recommendation is to use ggml-alloc for compute buffers, and |
Tangentially, I also wanted to profile the matrix multiplication. ggml/examples/simple/simple-ctx.cpp Line 66 in bb8d8cf
1000 iterations. Again, I see the context running out of memory. How could this example be modified to run iteratively? |
|
ggml/examples/simple/simple-ctx.cpp
Line 29 in bb8d8cf
Can this magic number
1024
be explained, or perhaps improved to some calculation?Does it depend on the size of the output?
(I notice that if I increase the size of the input tensors this example stops working).
The text was updated successfully, but these errors were encountered: