-
Notifications
You must be signed in to change notification settings - Fork 960
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ggml-alloc v3 #727
ggml-alloc v3 #727
Conversation
@ggerganov There are some changes here from llama.cpp, I will rebase after the next sync |
Ok, will sync tomorrow morning |
That will be very useful. Great. I have macro guarded hacks in all backends in order to do this easily :D |
6efa534
to
8005421
Compare
Should be OK to rebase now |
Thank you! Other than some cleanup and removing some prints, this should be good to review. I have also updated whisper.cpp and made a few more changes to it, such as using |
I am not sure why the mpt test in the ggml CI is failing, it works for me locally, and it shouldn't be affected by the changes. From the logs I suspect that something is failing during the model conversion. |
It needs some python module: Nevermind, let's remove it #728 |
ggml-ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice improvements and simpler API 👍
Merge at will
ggml-ci
Will merge after CI. @ggerganov what would be the best way to sync these changes in llama.cpp? I am thinking that either you could open a sync PR and I would add the changes necessary to llama.cpp there, or I could open a new PR that includes all the changes here. |
I'll open a sync PR in |
Overview of the changes
Graph allocator
ggml_gallocr_reserve
): calculates the offsets within the buffer where to allocate all the tensors in the graphggml_gallocr_alloc_graph
): allocates the tensors using the list of offsets generated in the reserve stepggml_gallocr_reserve
manually, however, doing so with a worst-case graph will avoid buffer reallocationsggml_gallocr_alloc_graph
. When only one graph needs to be evaluated, there is no need to create a different copy for measure.ggml_set_input
, and set after the graph has been allocated. Setting the input flag will ensure that the tensors are not overwritten before they are used in the graph.ggml_set_output
. This will ensure that the outputs are never overwritten, removing the need of hacks such as adding a dummy dependency at the end of the graph.Tensor allocator
ggml_tallocr
that can be used to allocate tensors, but it has been reworkedggml_backend_alloc_ctx_tensors
when possible since it handles all the details of tensor allocation, including splitting the tensors into multiple buffers if necessary, butggml_tallocr
can still be used for more advanced casesOther
ggml_backend_sched
example target togpt-2-sched
(wasgpt-2-backend2
), source file tomain-sched.cpp
(wasmain.cpp
).