Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ggml-alloc after the addition of new operations #475

Closed
YavorGIvanov opened this issue Aug 25, 2023 · 11 comments
Closed

Fix ggml-alloc after the addition of new operations #475

YavorGIvanov opened this issue Aug 25, 2023 · 11 comments
Labels
bug Something isn't working

Comments

@YavorGIvanov
Copy link
Collaborator

YavorGIvanov commented Aug 25, 2023

There is special logic there for inplace operations and ones that use view for the result of the operation. With the addition of SAM/Stable Diffusion and others we have new operations doing both. Fix ggml-alloc to take this into consideration and implement mechanism to make sure it is easier to add operations without breaking ggml-alloc in the future.

@YavorGIvanov YavorGIvanov added the bug Something isn't working label Aug 25, 2023
@slaren
Copy link
Collaborator

slaren commented Aug 25, 2023

I assume that the issue is with this code:

ggml/src/ggml.c

Lines 14634 to 14638 in 415dbf1

const bool inplace = (bool) ((int32_t *) dst->op_params)[0];
if (!inplace && params->type == GGML_TASK_INIT) {
memcpy((char *) dst->data, (char *) src0->data, ggml_nbytes(dst));
return;
}

The problem is that it is not possible to know if a tensor is a view with a no_alloc context. The implementation of ggml-alloc was designed to be as non-intrusive as possible, but if it is going to become a part of the core of ggml, then it would be good to have a generic way to tell if a tensor is a view.

This could be solved by adding a field or two to ggml_tensor: a pointer to the parent of the view (or NULL if it is not a view), and possibly another field to specify the offset within the view parent. I think this would also help simplify other code that needs to handle views in a different way, such as the graph import/export code.

@YavorGIvanov
Copy link
Collaborator Author

YavorGIvanov commented Aug 26, 2023

I assume that the issue is with this code:

This is a good guess, but currently I am trying to use the ggml-alloc only for the prompt encoder and mask decoder parts of the SAM. The ADD_REL_POS operation is used in the image encoder. However, when I start using it for the image encoder this operation will probably be a problem. Can I fix it by adding it to the proper switch cases in the ggml-alloc implementation ?

This could be solved by adding a field or two to ggml_tensor: a pointer to the parent of the view (or NULL if it is not a view), and possibly another field to specify the offset within the view parent. I think this would also help simplify other code that needs to handle views in a different way, such as the graph import/export code.

Something of this sort should indeed be done. I think we have to simplify the ggml alloc API, make it easier to debug/find problems and check that all operations are supported as we added a lot of new ones. The code segment you linked is the same as in other operation's implementation. Then we can add it to the other examples as this approch is indeed more memory efficient. I can help with that.

@slaren
Copy link
Collaborator

slaren commented Aug 26, 2023

You could fix your problem by replacing the inplace flag in op_params with a check to determine if dst and src0 point to the same data instead (ie. dst->data == src0->data). Possibly, you may also want to check that dst and src0 have the same layout in memory (so same dimensions in ne and same offsets in nb).

The allocator can also make operations inplace automatically when possible, but the operation needs to be in the list of operations that can run inplace (in ggml_op_can_inplace). However, this does not set any flag in op_params.

Generally, the manual _inplace operations are not supported by the allocator due to the issue pointed before, it is just not possible to tell if a tensor created with ggml_view_tensor is a view with a no_alloc context. In most cases this is not an issue because the allocator will run the operation as if it weren't inplace, and the implementation of most ops that can run inplace doesn't care if it is inplace or not, they will behave in the same way regardless. But your ops set a flag when it is supposed to be inplace and behave differently. That's the core of the issue.

@YavorGIvanov
Copy link
Collaborator Author

YavorGIvanov commented Aug 26, 2023

Ok, I get it. Thank you. I will try making the image encoder work too.

I just made the prompt encoder and mask decoder work with ggml-alloc, but had to put ggml_allocr_alloc(..) calls in some places, which I don't think I am supposed to as their data is not "input". However, I either got a crash when removing them or wrong results. Probably something wrong with the operations which are used around those unexpectedly needed ggml_allocr_alloc(..) calls. I figured out where to put them by going through the tensors step by step and seeing where the results go wrong.

Here is one example. I have added the comment "TODO: This alloc shouldn't be needed" in all three such places in the code.
https://github.com/YavorGIvanov/sam.cpp/blob/ac06e46dd2f4e29d6c9772af177dd6b7badf2b23/sam.cpp#L1806

@slaren
Copy link
Collaborator

slaren commented Aug 26, 2023

I think that the issue is the ggml_add_inplace. This seems to work:

diff --git a/ggml b/ggml
--- a/ggml
+++ b/ggml
@@ -1 +1 @@
-Subproject commit e77c6ba5b394e558c0e00eee40018207235f91ae
+Subproject commit e77c6ba5b394e558c0e00eee40018207235f91ae-dirty
diff --git a/sam.cpp b/sam.cpp
index a17bec1..2ff7381 100644
--- a/sam.cpp
+++ b/sam.cpp
@@ -1480,7 +1480,7 @@ prompt_encoder_result sam_encode_prompt(
         struct ggml_tensor * t_cos = ggml_map_custom1(ctx0, cur, ggml_sam_cos, GGML_N_TASKS_MAX, NULL);

         cur = ggml_new_tensor_2d(ctx0, GGML_TYPE_F32, t_sin->ne[0] + t_cos->ne[0], cur->ne[1]);
-        ggml_allocr_alloc(state.allocr, cur); // TODO: This alloc should not be needed
+        //ggml_allocr_alloc(state.allocr, cur); // TODO: This alloc should not be needed

         ggml_build_forward_expand(gf, ggml_cpy(ctx0, t_sin, ggml_view_2d(ctx0, cur, t_sin->ne[0], t_sin->ne[1], cur->nb[1], 0)));
         ggml_build_forward_expand(gf, ggml_cpy(ctx0, t_cos, ggml_view_2d(ctx0, cur, t_sin->ne[0], t_sin->ne[1], cur->nb[1], t_sin->nb[1])));
@@ -1493,7 +1493,8 @@ prompt_encoder_result sam_encode_prompt(

     // add point_embeddings[1] to label == 1
     // ref: https://github.com/facebookresearch/segment-anything/blob/main/segment_anything/modeling/prompt_encoder.py#L90
-    ggml_build_forward_expand(gf, ggml_add_inplace(ctx0, ggml_view_2d(ctx0, cur, cur->ne[0], 1, cur->nb[1], 0), enc.pt_embd[1]));
+    struct ggml_tensor * v = ggml_view_2d(ctx0, cur, cur->ne[0], 1, cur->nb[1], 0);
+    ggml_build_forward_expand(gf, ggml_cpy(ctx0, ggml_add_inplace(ctx0, v, enc.pt_embd[1]), v));

     struct ggml_tensor * embd_prompt_sparse = cur;
     ggml_build_forward_expand(gf, embd_prompt_sparse);
@@ -1803,7 +1804,7 @@ bool sam_decode_mask(
     {
         // ConvTranspose2d
         keys = ggml_conv_transpose_2d_p0(ctx0, dec.output_upscaling_0_w, keys, 2);
-        ggml_allocr_alloc(state.allocr, keys); // TODO: This alloc shouldn't be needed
+        //ggml_allocr_alloc(state.allocr, keys); // TODO: This alloc shouldn't be needed
         keys = ggml_add(ctx0, keys, ggml_repeat(ctx0,
                                      ggml_reshape_3d(ctx0, dec.output_upscaling_0_b, 1, 1, dec.output_upscaling_0_b->ne[0]),
                                      keys));
@@ -1815,7 +1816,7 @@ bool sam_decode_mask(

         // ConvTranspose2d
         keys = ggml_conv_transpose_2d_p0(ctx0, dec.output_upscaling_3_w, keys, 2);
-        ggml_allocr_alloc(state.allocr, keys); // TODO: This alloc shouldn't be needed
+        //ggml_allocr_alloc(state.allocr, keys); // TODO: This alloc shouldn't be needed
         keys = ggml_add(ctx0, ggml_repeat(ctx0,
                                 ggml_reshape_3d(ctx0, dec.output_upscaling_3_b, 1, 1, dec.output_upscaling_3_b->ne[0]),
                                 keys), keys);

Additionally, the ggml_new_i32 in ggml_conv_transpose_2d_p0 is an issue, since this is a no_alloc context. It should be moved to op_params instead:

diff --git a/src/ggml.c b/src/ggml.c
index f883da8..0f5b962 100644
--- a/src/ggml.c
+++ b/src/ggml.c
@@ -7100,12 +7100,15 @@ struct ggml_tensor * ggml_conv_transpose_2d_p0(
         a->ne[2], b->ne[3],
     };

+
     struct ggml_tensor* result = ggml_new_tensor(ctx, GGML_TYPE_F32, 4, ne);
+
+    ggml_set_op_params_i32(result, 0, stride);
+
     result->op = GGML_OP_CONV_TRANSPOSE_2D;
     result->grad = is_node ? ggml_dup_tensor(ctx, result) : NULL;
     result->src[0] = a;
     result->src[1] = b;
-    result->src[2] = ggml_new_i32(ctx, stride);

     return result;
 }
@@ -13501,7 +13504,6 @@ static void ggml_compute_forward_conv_transpose_2d(
         const struct ggml_compute_params * params,
         const struct ggml_tensor * src0,
         const struct ggml_tensor * src1,
-        const struct ggml_tensor * opt0,
               struct ggml_tensor * dst) {
     GGML_ASSERT(src0->type == GGML_TYPE_F16);
     GGML_ASSERT(src1->type == GGML_TYPE_F32);
@@ -13561,7 +13563,7 @@ static void ggml_compute_forward_conv_transpose_2d(
         return;
     }

-    const int32_t stride = ((const int32_t*)(opt0->data))[0];
+    const int32_t stride = ggml_get_op_params_i32(dst, 0);

     // total patches in dst
     const int np = ne2;
@@ -15734,7 +15736,7 @@ static void ggml_compute_forward(struct ggml_compute_params * params, struct ggm
             } break;
         case GGML_OP_CONV_TRANSPOSE_2D:
             {
-                ggml_compute_forward_conv_transpose_2d(params, tensor->src[0], tensor->src[1], tensor->src[2], tensor);
+                ggml_compute_forward_conv_transpose_2d(params, tensor->src[0], tensor->src[1], tensor);
             } break;
         case GGML_OP_POOL_1D:
             {

@slaren
Copy link
Collaborator

slaren commented Aug 26, 2023

Another minor issue I noticed while looking into this, when trying to convert the model with:

python convert-pth-to-ggml.py sam_vit_b_01ec64.pth 1 

It tried to write the output to the root directory (since os.path.dirname(fname_model) returns an empty string).

@slaren
Copy link
Collaborator

slaren commented Aug 26, 2023

I noticed that while after these changes it doesn't crash, the results are still wrong. So this still isn't solved, I am not sure where is the issue.

@YavorGIvanov
Copy link
Collaborator Author

+    struct ggml_tensor * v = ggml_view_2d(ctx0, cur, cur->ne[0], 1, cur->nb[1], 0);
+    ggml_build_forward_expand(gf, ggml_cpy(ctx0, ggml_add_inplace(ctx0, v, enc.pt_embd[1]), v)

The fix for the add_inplace works and the result is still correct. Not sure why we need the additional ggml_cpy there though. However, I already did locally the removal of ggml_new_i32 in ggml_conv_transpose_2d_p0 before writing here and this doesn't solve the need for the ggml_allocr_alloc(..) there in order for the result to be correct. You noticed that too.

If you keep the ggml_allocr_alloc(..) calls after the ggml_transpose_conv_2d_p0 operation then the result is still correct. I also tried to debug this, as I wrote this operation, and implemented the fix for the ggml_new_i32 usage, but didn't commit it in the ggml submodule as I forgot. Really sorry for that as I wasted some of your time. Additionally I found that while computing the ggml_transpose_conv_2d_p0 operation the result is correct (printf in the _forward function), but if I save the tensor after the operation and print it later it is totally wrong. Obviously some memory reusage happens there. Tried using the "GGML_ALLOCATOR_DEBUG" in ggml-alloc and AT_PRINTF to find what happens, but is hard for me to fully understand the debug allocator output (I find the operation where the problem happens, but not sure what and why the memory gets reused without the need for it).

For now I will keep this two ggml_allocr_alloc(..) calls until I figure out what goes wrong and will try to make the image encoder work. Will tell you once I know what happened (I am going on vacation tomorrow, so if I don't manage today, it may take until next week).

Will also fix the convert script issue you mentioned.

Thanks for the help. I appreciate it 🙏🏻

@slaren
Copy link
Collaborator

slaren commented Aug 28, 2023

The fix for the add_inplace works and the result is still correct. Not sure why we need the additional ggml_cpy there though.

The reason is that with the allocator, _inplace operations don't actually happen inplace for the reasons explained earlier. Without the ggml_cpy, the results of the add operation wouldn't be reflected in cur. Sorry, I should have changed the ggml_add_inplace to ggml_add as well to make it clear what is happening.

Your solution of pre-allocating cur also works because then the ggml_view_tensor in the ggml_add_inplace will be able to set the data member of the view correctly (otherwise it would just be NULL, and indistinguishable from a non-allocated tensor).

@YavorGIvanov
Copy link
Collaborator Author

The fix for the add_inplace works and the result is still correct. Not sure why we need the additional ggml_cpy there though.

The reason is that with the allocator, _inplace operations don't actually happen inplace for the reasons explained earlier. Without the ggml_cpy, the results of the add operation wouldn't be reflected in cur. Sorry, I should have changed the ggml_add_inplace to ggml_add as well to make it clear what is happening.

Your solution of pre-allocating cur also works because then the ggml_view_tensor in the ggml_add_inplace will be able to set the data member of the view correctly (otherwise it would just be NULL, and indistinguishable from a non-allocated tensor).

I think I understood it finally. I only need to figure out the problem around the ggml_transpose_conv_2d_p0(..) operation.
SAM's inference performance got a bit slower and I wll investigate why. It may be due to some operations not actually happening inplace.

I made a PR with my progress and your suggested fixes #490. Thank you again. I will keep grinding :D

@YavorGIvanov
Copy link
Collaborator Author

YavorGIvanov commented Sep 10, 2023

@slaren

I will close this as I don't think it is relevant anymore and open another issue if some concrete problem with ggml-alloc is found.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants