-
Notifications
You must be signed in to change notification settings - Fork 966
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problems with Metal and iOS #647
Comments
Hi @ggerganov, I know you must be busy with a lot of things. Is there any way to take a look at this problem? Thank you very much. |
I have the same issue, and based on the discussion referenced above, it seems this is possible by using multiple buffers. |
Somehow I've completely missed this issue - apologies Does it work if you patch diff --git a/src/ggml-metal.m b/src/ggml-metal.m
index 6c2a8d0..87ee5bd 100644
--- a/src/ggml-metal.m
+++ b/src/ggml-metal.m
@@ -741,8 +741,10 @@ bool ggml_metal_add_buffer(
size_aligned += (size_page - (size_aligned % size_page));
}
+ const size_t maxBufferLength = 1024 * 1024 * 1024; // 1 GiB
+
// the buffer fits into the max buffer size allowed by the device
- if (size_aligned <= ctx->device.maxBufferLength) {
+ if (size_aligned <= maxBufferLength) {
ctx->buffers[ctx->n_buffers].name = name;
ctx->buffers[ctx->n_buffers].data = data;
ctx->buffers[ctx->n_buffers].size = size;
@@ -761,8 +763,8 @@ bool ggml_metal_add_buffer(
// this overlap between the views will guarantee that the tensor with the maximum size will fully fit into
// one of the views
const size_t size_ovlp = ((max_size + size_page - 1) / size_page + 1) * size_page; // round-up 2 pages just in case
- const size_t size_step = ctx->device.maxBufferLength - size_ovlp;
- const size_t size_view = ctx->device.maxBufferLength;
+ const size_t size_step = maxBufferLength - size_ovlp;
+ const size_t size_view = maxBufferLength;
for (size_t i = 0; i < size; i += size_step) {
const size_t size_step_aligned = (i + size_view <= size) ? size_view : (size_aligned - i);
@@ -2648,8 +2650,10 @@ ggml_backend_buffer_t ggml_backend_metal_buffer_from_ptr(void * data, size_t siz
id<MTLDevice> device = ggml_backend_metal_get_device();
+ const size_t maxBufferLength = 1024 * 1024 * 1024; // 1 GiB
+
// the buffer fits into the max buffer size allowed by the device
- if (size_aligned <= device.maxBufferLength) {
+ if (size_aligned <= maxBufferLength) {
ctx->buffers[ctx->n_buffers].data = data;
ctx->buffers[ctx->n_buffers].size = size;
@@ -2667,8 +2671,8 @@ ggml_backend_buffer_t ggml_backend_metal_buffer_from_ptr(void * data, size_t siz
// this overlap between the views will guarantee that the tensor with the maximum size will fully fit into
// one of the views
const size_t size_ovlp = ((max_size + size_page - 1) / size_page + 1) * size_page; // round-up 2 pages just in case
- const size_t size_step = device.maxBufferLength - size_ovlp;
- const size_t size_view = device.maxBufferLength;
+ const size_t size_step = maxBufferLength - size_ovlp;
+ const size_t size_view = maxBufferLength;
for (size_t i = 0; i < size; i += size_step) {
const size_t size_step_aligned = (i + size_view <= size) ? size_view : (size_aligned - i); |
ggml-backend only attempts to split the buffer in |
This is more serious now, because is affecting other libs, like whisper, since now we are using same ggml repository:
|
I create the patch, but it don't solve, problem still the same. diff --git a/vendor/ggml/src/ggml-metal.m b/vendor/ggml/src/ggml-metal.m
index a17ef46..292dd44 100644
--- a/vendor/ggml/src/ggml-metal.m
+++ b/vendor/ggml/src/ggml-metal.m
@@ -742,7 +742,9 @@ bool ggml_metal_add_buffer(
}
// the buffer fits into the max buffer size allowed by the device
- if (size_aligned <= ctx->device.maxBufferLength) {
+ const size_t maxBufferLength = 1024 * 1024 * 1024; // 1 GiB
+
+ if (size_aligned <= maxBufferLength) {
ctx->buffers[ctx->n_buffers].name = name;
ctx->buffers[ctx->n_buffers].data = data;
ctx->buffers[ctx->n_buffers].size = size;
@@ -761,8 +763,8 @@ bool ggml_metal_add_buffer(
// this overlap between the views will guarantee that the tensor with the maximum size will fully fit into
// one of the views
const size_t size_ovlp = ((max_size + size_page - 1) / size_page + 1) * size_page; // round-up 2 pages just in case
- const size_t size_step = ctx->device.maxBufferLength - size_ovlp;
- const size_t size_view = ctx->device.maxBufferLength;
+ const size_t size_step = maxBufferLength - size_ovlp;
+ const size_t size_view = maxBufferLength;
for (size_t i = 0; i < size; i += size_step) {
const size_t size_step_aligned = (i + size_view <= size) ? size_view : (size_aligned - i);
@@ -2653,7 +2655,9 @@ ggml_backend_buffer_t ggml_backend_metal_buffer_from_ptr(void * data, size_t siz
id<MTLDevice> device = ggml_backend_metal_get_device();
// the buffer fits into the max buffer size allowed by the device
- if (size_aligned <= device.maxBufferLength) {
+ const size_t maxBufferLength = 1024 * 1024 * 1024; // 1 GiB
+
+ if (size_aligned <= maxBufferLength) {
ctx->buffers[ctx->n_buffers].data = data;
ctx->buffers[ctx->n_buffers].size = size;
@@ -2671,8 +2675,8 @@ ggml_backend_buffer_t ggml_backend_metal_buffer_from_ptr(void * data, size_t siz
// this overlap between the views will guarantee that the tensor with the maximum size will fully fit into
// one of the views
const size_t size_ovlp = ((max_size + size_page - 1) / size_page + 1) * size_page; // round-up 2 pages just in case
- const size_t size_step = device.maxBufferLength - size_ovlp;
- const size_t size_view = device.maxBufferLength;
+ const size_t size_step = maxBufferLength - size_ovlp;
+ const size_t size_view = maxBufferLength;
for (size_t i = 0; i < size; i += size_step) {
const size_t size_step_aligned = (i + size_view <= size) ? size_view : (size_aligned - i); |
I'm also seeing this in whisper.cpp when using it on iOS. @ggerganov is it possible to fix this and to allocate multiple buffers like @slaren suggested? |
I'll take a look now |
Here is a fix for The general purpose solution for automatically splitting buffers in Let me know if you give it a try and confirm that it works. Will merge it later today |
I tested the fix for |
I open an issue there to try get some help to fix: |
Hi @ggerganov, Can you help us with a general solution for this problem? Thanks. |
A fix for this has been merged in llama.cpp (ggerganov/llama.cpp#5181). Applications that use |
It was added to ggml and whisper.cpp too? My project use ggml, whisper.cpp and llama.cpp together. |
Yes, This issue is solved in ggml, and all that we can do at this point is wait for 3rd party applications to update their model loading code to take advantage of this, so I am closing this issue. |
Pls @leejet add support for this in the last GGML version. |
Hi,
We have a problem with Metal in iOS:
Reference:
leejet/stable-diffusion.cpp#108
Can anyone help us with this?
Thanks.
The text was updated successfully, but these errors were encountered: