Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: memset dst to 0 in ggml_conv_transpose_1d #591

Merged
merged 3 commits into from
Oct 24, 2023

Conversation

PABannier
Copy link
Contributor

@PABannier PABannier commented Oct 23, 2023

I came across a nasty bug in the Encodec.cpp library using ggml_conv_transpose_1d.

If the dst->data is not memset to 0, pre-existing values can be stored there, resulting in the wrong output.
This is due to the fact that values are accumulated into dst->data: dst_data[i10*s0 + i00] += v;.

This PR introduces two fixes:

  • memset of dst->data to 0
  • Fix wrong indexation of wdata buffer in ggml_compute_forward_conv_transpose_1d_f32.

@slaren
Copy link
Collaborator

slaren commented Oct 23, 2023

#589 may be related, but the conv_transpose_2d test still fails. Does it need the same fix?

@slaren
Copy link
Collaborator

slaren commented Oct 23, 2023

@YavorGIvanov I wonder if this is also the cause of the issues in sam.cpp with ggml-alloc that required some of the tensors to be pre-allocated.

@PABannier
Copy link
Contributor Author

@slaren Very possible. I can push a fix for ggml_conv_transpose_2d if needed.

@ggerganov
Copy link
Owner

Thanks for fixing this - let's fix ggml_conv_transpose_2d dst memset in this PR as well

@PABannier
Copy link
Contributor Author

@slaren @YavorGIvanov FYI, I just pushed a fix for conv_transpose_2d as well.

@YavorGIvanov
Copy link
Collaborator

@YavorGIvanov I wonder if this is also the cause of the issues in sam.cpp with ggml-alloc that required some of the tensors to be pre-allocated.

Probably it is indeed the reason it happened. I will update to latest ggml after this fix is merged and will verify over in the Sam.cpp repo by removing them

@slaren
Copy link
Collaborator

slaren commented Oct 24, 2023

I have verified that this fixes #589.

@slaren slaren linked an issue Oct 24, 2023 that may be closed by this pull request
@ggerganov ggerganov merged commit 4300996 into ggerganov:master Oct 24, 2023
4 checks passed
@PABannier PABannier deleted the fix_nasty_conv_transpose_1d branch October 24, 2023 19:26
@YavorGIvanov
Copy link
Collaborator

YavorGIvanov commented Oct 24, 2023

Yup. This was the reason I "needed" the two allocs after conv_transpose_2d op in Sam.cpp.
I update to latest ggml + removed them and everything works fine - YavorGIvanov/sam.cpp@8100281

@FSSRepo
Copy link
Collaborator

FSSRepo commented Nov 2, 2023

@slaren In stable diffusion, something similar also happened when using the im2col function because it doesn't usually traverse the entire dst array. It is assumed that dst will always have values of 0, but I've encountered non-random values (they are always the same, and I'm not sure why this happens; it's possible that a kernel accesses beyond the allowed address and writes extra data). When I use cudaMemset, ggml_conv_2d gives me the correct values this fixes my bug.

In cold thinking, I believe that the fact that a kernel accesses memory already owned by another operation, in this case, before the conv2d operation, there is ggml_silu_inplace. I think this operation requires ggml_allocr_alloc to avoid writing over the next block of memory, which belongs to im2col..

Separate the functions ggml_silu_inplace and ggml_conv2d in a separate example to test if my assumption was correct. It turns out I get the correct result, so I still don't understand how the im2col dest memory block arrives with data when it should all be 0.

This just happends in cuda backend.

inline void ggml_cuda_op_im2col(
    const ggml_tensor * src0, const ggml_tensor * src1, ggml_tensor * dst,
    const float * src0_dd, const float * src1_dd, float * dst_dd, const cudaStream_t & main_stream) {

    GGML_ASSERT(src0->type == GGML_TYPE_F16);
    GGML_ASSERT(src1->type == GGML_TYPE_F32);
    GGML_ASSERT( dst->type == GGML_TYPE_F16);

    const int32_t s0 = ((const int32_t*)(dst->op_params))[0];
    const int32_t s1 = ((const int32_t*)(dst->op_params))[1];
    const int32_t p0 = ((const int32_t*)(dst->op_params))[2];
    const int32_t p1 = ((const int32_t*)(dst->op_params))[3];
    const int32_t d0 = ((const int32_t*)(dst->op_params))[4];
    const int32_t d1 = ((const int32_t*)(dst->op_params))[5];

    const bool is_2D = ((const int32_t*)(dst->op_params))[6] == 1;

    const int64_t N = src1->ne[is_2D ? 3 : 2];
    const int64_t IC = src1->ne[is_2D ? 2 : 1];
    const int64_t IH = is_2D ? src1->ne[1] : 1;
    const int64_t IW = src1->ne[0];

    const int64_t KH = is_2D ? src0->ne[1] : 1;
    const int64_t KW = src0->ne[0];

    const int64_t OH = is_2D ? dst->ne[2] : 1;
    const int64_t OW = dst->ne[1];

    // unexpected behavior: sometimes dst_dd has data
    cudaMemsetAsync(dst_dd, 0, ggml_nbytes(dst), main_stream);

    im2col_f32_f16_cuda(src1_dd, (half*) dst_dd,
        OH, IW, IH, OW, IC, KH, KW, N,
        src1->nb[is_2D ? 3 : 2] / 4, // nb is byte offset, src is type float32
        src1->nb[is_2D ? 2 : 1] / 4, // nb is byte offset, src is type float32
        s0, s1, p0, p1, d0, d1, main_stream);

    (void) src0;
    (void) src0_dd;
}

@ggerganov
Copy link
Owner

In general, ggml operations should never assume that the dst data is zero, so if your operation does not need to write to all destination elements, it still has to zero out the rest. Note that this is the reason that such operations on the CPU first call memset over dst->data. For example:

ggml/src/ggml.c

Lines 12796 to 12798 in 54cec9f

if (params->type == GGML_TASK_INIT) {
memset(dst->data, 0, ggml_nbytes(dst));
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

test-conv-transpose fails when building with sanitizers enabled
5 participants