-
Notifications
You must be signed in to change notification settings - Fork 966
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix ggml-alloc after the addition of new operations #475
Comments
I assume that the issue is with this code: Lines 14634 to 14638 in 415dbf1
The problem is that it is not possible to know if a tensor is a view with a This could be solved by adding a field or two to |
This is a good guess, but currently I am trying to use the ggml-alloc only for the prompt encoder and mask decoder parts of the SAM. The ADD_REL_POS operation is used in the image encoder. However, when I start using it for the image encoder this operation will probably be a problem. Can I fix it by adding it to the proper switch cases in the ggml-alloc implementation ?
Something of this sort should indeed be done. I think we have to simplify the ggml alloc API, make it easier to debug/find problems and check that all operations are supported as we added a lot of new ones. The code segment you linked is the same as in other operation's implementation. Then we can add it to the other examples as this approch is indeed more memory efficient. I can help with that. |
You could fix your problem by replacing the inplace flag in The allocator can also make operations inplace automatically when possible, but the operation needs to be in the list of operations that can run inplace (in Generally, the manual |
Ok, I get it. Thank you. I will try making the image encoder work too. I just made the prompt encoder and mask decoder work with ggml-alloc, but had to put ggml_allocr_alloc(..) calls in some places, which I don't think I am supposed to as their data is not "input". However, I either got a crash when removing them or wrong results. Probably something wrong with the operations which are used around those unexpectedly needed ggml_allocr_alloc(..) calls. I figured out where to put them by going through the tensors step by step and seeing where the results go wrong. Here is one example. I have added the comment "TODO: This alloc shouldn't be needed" in all three such places in the code. |
I think that the issue is the diff --git a/ggml b/ggml
--- a/ggml
+++ b/ggml
@@ -1 +1 @@
-Subproject commit e77c6ba5b394e558c0e00eee40018207235f91ae
+Subproject commit e77c6ba5b394e558c0e00eee40018207235f91ae-dirty
diff --git a/sam.cpp b/sam.cpp
index a17bec1..2ff7381 100644
--- a/sam.cpp
+++ b/sam.cpp
@@ -1480,7 +1480,7 @@ prompt_encoder_result sam_encode_prompt(
struct ggml_tensor * t_cos = ggml_map_custom1(ctx0, cur, ggml_sam_cos, GGML_N_TASKS_MAX, NULL);
cur = ggml_new_tensor_2d(ctx0, GGML_TYPE_F32, t_sin->ne[0] + t_cos->ne[0], cur->ne[1]);
- ggml_allocr_alloc(state.allocr, cur); // TODO: This alloc should not be needed
+ //ggml_allocr_alloc(state.allocr, cur); // TODO: This alloc should not be needed
ggml_build_forward_expand(gf, ggml_cpy(ctx0, t_sin, ggml_view_2d(ctx0, cur, t_sin->ne[0], t_sin->ne[1], cur->nb[1], 0)));
ggml_build_forward_expand(gf, ggml_cpy(ctx0, t_cos, ggml_view_2d(ctx0, cur, t_sin->ne[0], t_sin->ne[1], cur->nb[1], t_sin->nb[1])));
@@ -1493,7 +1493,8 @@ prompt_encoder_result sam_encode_prompt(
// add point_embeddings[1] to label == 1
// ref: https://github.com/facebookresearch/segment-anything/blob/main/segment_anything/modeling/prompt_encoder.py#L90
- ggml_build_forward_expand(gf, ggml_add_inplace(ctx0, ggml_view_2d(ctx0, cur, cur->ne[0], 1, cur->nb[1], 0), enc.pt_embd[1]));
+ struct ggml_tensor * v = ggml_view_2d(ctx0, cur, cur->ne[0], 1, cur->nb[1], 0);
+ ggml_build_forward_expand(gf, ggml_cpy(ctx0, ggml_add_inplace(ctx0, v, enc.pt_embd[1]), v));
struct ggml_tensor * embd_prompt_sparse = cur;
ggml_build_forward_expand(gf, embd_prompt_sparse);
@@ -1803,7 +1804,7 @@ bool sam_decode_mask(
{
// ConvTranspose2d
keys = ggml_conv_transpose_2d_p0(ctx0, dec.output_upscaling_0_w, keys, 2);
- ggml_allocr_alloc(state.allocr, keys); // TODO: This alloc shouldn't be needed
+ //ggml_allocr_alloc(state.allocr, keys); // TODO: This alloc shouldn't be needed
keys = ggml_add(ctx0, keys, ggml_repeat(ctx0,
ggml_reshape_3d(ctx0, dec.output_upscaling_0_b, 1, 1, dec.output_upscaling_0_b->ne[0]),
keys));
@@ -1815,7 +1816,7 @@ bool sam_decode_mask(
// ConvTranspose2d
keys = ggml_conv_transpose_2d_p0(ctx0, dec.output_upscaling_3_w, keys, 2);
- ggml_allocr_alloc(state.allocr, keys); // TODO: This alloc shouldn't be needed
+ //ggml_allocr_alloc(state.allocr, keys); // TODO: This alloc shouldn't be needed
keys = ggml_add(ctx0, ggml_repeat(ctx0,
ggml_reshape_3d(ctx0, dec.output_upscaling_3_b, 1, 1, dec.output_upscaling_3_b->ne[0]),
keys), keys); Additionally, the diff --git a/src/ggml.c b/src/ggml.c
index f883da8..0f5b962 100644
--- a/src/ggml.c
+++ b/src/ggml.c
@@ -7100,12 +7100,15 @@ struct ggml_tensor * ggml_conv_transpose_2d_p0(
a->ne[2], b->ne[3],
};
+
struct ggml_tensor* result = ggml_new_tensor(ctx, GGML_TYPE_F32, 4, ne);
+
+ ggml_set_op_params_i32(result, 0, stride);
+
result->op = GGML_OP_CONV_TRANSPOSE_2D;
result->grad = is_node ? ggml_dup_tensor(ctx, result) : NULL;
result->src[0] = a;
result->src[1] = b;
- result->src[2] = ggml_new_i32(ctx, stride);
return result;
}
@@ -13501,7 +13504,6 @@ static void ggml_compute_forward_conv_transpose_2d(
const struct ggml_compute_params * params,
const struct ggml_tensor * src0,
const struct ggml_tensor * src1,
- const struct ggml_tensor * opt0,
struct ggml_tensor * dst) {
GGML_ASSERT(src0->type == GGML_TYPE_F16);
GGML_ASSERT(src1->type == GGML_TYPE_F32);
@@ -13561,7 +13563,7 @@ static void ggml_compute_forward_conv_transpose_2d(
return;
}
- const int32_t stride = ((const int32_t*)(opt0->data))[0];
+ const int32_t stride = ggml_get_op_params_i32(dst, 0);
// total patches in dst
const int np = ne2;
@@ -15734,7 +15736,7 @@ static void ggml_compute_forward(struct ggml_compute_params * params, struct ggm
} break;
case GGML_OP_CONV_TRANSPOSE_2D:
{
- ggml_compute_forward_conv_transpose_2d(params, tensor->src[0], tensor->src[1], tensor->src[2], tensor);
+ ggml_compute_forward_conv_transpose_2d(params, tensor->src[0], tensor->src[1], tensor);
} break;
case GGML_OP_POOL_1D:
{ |
Another minor issue I noticed while looking into this, when trying to convert the model with: python convert-pth-to-ggml.py sam_vit_b_01ec64.pth 1 It tried to write the output to the root directory (since |
I noticed that while after these changes it doesn't crash, the results are still wrong. So this still isn't solved, I am not sure where is the issue. |
The fix for the add_inplace works and the result is still correct. Not sure why we need the additional ggml_cpy there though. However, I already did locally the removal of ggml_new_i32 in ggml_conv_transpose_2d_p0 before writing here and this doesn't solve the need for the ggml_allocr_alloc(..) there in order for the result to be correct. You noticed that too. If you keep the ggml_allocr_alloc(..) calls after the ggml_transpose_conv_2d_p0 operation then the result is still correct. I also tried to debug this, as I wrote this operation, and implemented the fix for the ggml_new_i32 usage, but didn't commit it in the ggml submodule as I forgot. Really sorry for that as I wasted some of your time. Additionally I found that while computing the ggml_transpose_conv_2d_p0 operation the result is correct (printf in the _forward function), but if I save the tensor after the operation and print it later it is totally wrong. Obviously some memory reusage happens there. Tried using the "GGML_ALLOCATOR_DEBUG" in ggml-alloc and AT_PRINTF to find what happens, but is hard for me to fully understand the debug allocator output (I find the operation where the problem happens, but not sure what and why the memory gets reused without the need for it). For now I will keep this two ggml_allocr_alloc(..) calls until I figure out what goes wrong and will try to make the image encoder work. Will tell you once I know what happened (I am going on vacation tomorrow, so if I don't manage today, it may take until next week). Will also fix the convert script issue you mentioned. Thanks for the help. I appreciate it 🙏🏻 |
The reason is that with the allocator, Your solution of pre-allocating |
I think I understood it finally. I only need to figure out the problem around the ggml_transpose_conv_2d_p0(..) operation. I made a PR with my progress and your suggested fixes #490. Thank you again. I will keep grinding :D |
I will close this as I don't think it is relevant anymore and open another issue if some concrete problem with ggml-alloc is found. |
There is special logic there for inplace operations and ones that use view for the result of the operation. With the addition of SAM/Stable Diffusion and others we have new operations doing both. Fix ggml-alloc to take this into consideration and implement mechanism to make sure it is easier to add operations without breaking ggml-alloc in the future.
The text was updated successfully, but these errors were encountered: