ggml-alloc v3 #727

slaren · 2024-02-09T18:30:37Z

Overview of the changes

Graph allocator

Measure allocators have been removed
The graph allocator works in two steps:
- Reserve (ggml_gallocr_reserve): calculates the offsets within the buffer where to allocate all the tensors in the graph
- Allocate (ggml_gallocr_alloc_graph): allocates the tensors using the list of offsets generated in the reserve step
The reserve step is done automatically when the graph topology changes or the tensor sizes increase
It is not necessary to call ggml_gallocr_reserve manually, however, doing so with a worst-case graph will avoid buffer reallocations
Unlike the measure graphs in the previous version, the graphs used to reserve are not modified, and can be used directly with ggml_gallocr_alloc_graph. When only one graph needs to be evaluated, there is no need to create a different copy for measure.
Graph allocation cannot fail now due to out of space in the buffers (but the buffer allocation may still fail)
The buffers are private to the graph allocator and cannot be accessed directly
It is no longer possible to allocate tensors manually. Instead, inputs must be flagged with ggml_set_input, and set after the graph has been allocated. Setting the input flag will ensure that the tensors are not overwritten before they are used in the graph.
It is possible to set a tensor as an output with ggml_set_output. This will ensure that the outputs are never overwritten, removing the need of hacks such as adding a dummy dependency at the end of the graph.

Tensor allocator

There is still a ggml_tallocr that can be used to allocate tensors, but it has been reworked
This is now a very lightweight allocator that cannot free tensors, and its only state is a buffer and the current offset within the buffer
Applications should use ggml_backend_alloc_ctx_tensors when possible since it handles all the details of tensor allocation, including splitting the tensors into multiple buffers if necessary, but ggml_tallocr can still be used for more advanced cases

Other

Renamed gpt-2 ggml_backend_sched example target to gpt-2-sched (was gpt-2-backend2), source file to main-sched.cpp (was main.cpp).

slaren · 2024-02-09T18:31:54Z

@ggerganov There are some changes here from llama.cpp, I will rebase after the next sync

ggerganov · 2024-02-09T18:44:51Z

Ok, will sync tomorrow morning

YavorGIvanov · 2024-02-09T20:16:27Z

It is possible to set a tensor as an output with ggml_set_output. This will ensure that the outputs are never overwritten, removing the need of hacks such as adding a dummy dependency at the end of the graph.

That will be very useful. Great. I have macro guarded hacks in all backends in order to do this easily :D

ggerganov · 2024-02-10T09:08:21Z

Should be OK to rebase now

slaren · 2024-02-10T12:42:31Z

Thank you! Other than some cleanup and removing some prints, this should be good to review. I have also updated whisper.cpp and made a few more changes to it, such as using ggml_backend_alloc_ctx_tensors.

slaren · 2024-02-10T13:39:54Z

I am not sure why the mpt test in the ggml CI is failing, it works for me locally, and it shouldn't be affected by the changes. From the logs I suspect that something is failing during the model conversion.

ggerganov · 2024-02-10T13:58:00Z

It needs some python module:

https://github.com/ggml-org/ci/blob/2e349ee53c4f858b48b8f8e222ad0e46f118928e/ggml/16/ac25c2fa308202c32927d131b33265f5588bc3/ggml-3-arm64-cpu/stdall#L3042C1-L3044C46

Nevermind, let's remove it #728

ggml-ci

examples/whisper/whisper.cpp

ggml-ci

ggerganov

Nice improvements and simpler API 👍

Merge at will

include/ggml/ggml.h

ggml-ci

slaren · 2024-02-11T12:18:00Z

Will merge after CI.

@ggerganov what would be the best way to sync these changes in llama.cpp? I am thinking that either you could open a sync PR and I would add the changes necessary to llama.cpp there, or I could open a new PR that includes all the changes here.

ggerganov · 2024-02-11T12:38:14Z

I'll open a sync PR in llama.cpp now

slaren force-pushed the sl/alloc-v3 branch from ead5812 to 8c6ab94 Compare February 9, 2024 20:13

slaren force-pushed the sl/alloc-v3 branch 3 times, most recently from 6efa534 to 8005421 Compare February 10, 2024 01:09

slaren force-pushed the sl/alloc-v3 branch from 8005421 to ca94353 Compare February 10, 2024 12:40

slaren marked this pull request as ready for review February 10, 2024 12:41

slaren force-pushed the sl/alloc-v3 branch from f1a6ba5 to 16ac25c Compare February 10, 2024 13:13

slaren added 2 commits February 10, 2024 15:05

ggml-alloc v3

76a5f91

ggml-ci

fix ci

a5604c1

ggml-ci

slaren force-pushed the sl/alloc-v3 branch from 16ac25c to a5604c1 Compare February 10, 2024 14:05

slaren commented Feb 10, 2024

View reviewed changes

examples/whisper/whisper.cpp Outdated Show resolved Hide resolved

whisper : check for backend buffer allocation failures

07a6442

slaren force-pushed the sl/alloc-v3 branch from dcedb6c to 07a6442 Compare February 10, 2024 16:04

slaren added 2 commits February 10, 2024 18:11

whisper : avoid leaks when initialization fails

17100d1

cleanup

1742af7

ggml-ci

ggerganov approved these changes Feb 11, 2024

View reviewed changes

include/ggml/ggml.h Outdated Show resolved Hide resolved

style fixes

39d42e4

ggml-ci

ggerganov merged commit 5070f07 into master Feb 11, 2024
10 checks passed

ggerganov deleted the sl/alloc-v3 branch February 11, 2024 12:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ggml-alloc v3 #727

ggml-alloc v3 #727

slaren commented Feb 9, 2024 •

edited

Loading

slaren commented Feb 9, 2024

ggerganov commented Feb 9, 2024

YavorGIvanov commented Feb 9, 2024

ggerganov commented Feb 10, 2024

slaren commented Feb 10, 2024 •

edited

Loading

slaren commented Feb 10, 2024

ggerganov commented Feb 10, 2024 •

edited

Loading

ggerganov left a comment

slaren commented Feb 11, 2024

ggerganov commented Feb 11, 2024

ggml-alloc v3 #727

ggml-alloc v3 #727

Conversation

slaren commented Feb 9, 2024 • edited Loading

Overview of the changes

Graph allocator

Tensor allocator

Other

slaren commented Feb 9, 2024

ggerganov commented Feb 9, 2024

YavorGIvanov commented Feb 9, 2024

ggerganov commented Feb 10, 2024

slaren commented Feb 10, 2024 • edited Loading

slaren commented Feb 10, 2024

ggerganov commented Feb 10, 2024 • edited Loading

ggerganov left a comment

Choose a reason for hiding this comment

slaren commented Feb 11, 2024

ggerganov commented Feb 11, 2024

slaren commented Feb 9, 2024 •

edited

Loading

slaren commented Feb 10, 2024 •

edited

Loading

ggerganov commented Feb 10, 2024 •

edited

Loading