`fix`: memset dst to 0 in `ggml_conv_transpose_1d` #591

PABannier · 2023-10-23T15:54:14Z

I came across a nasty bug in the Encodec.cpp library using ggml_conv_transpose_1d.

If the dst->data is not memset to 0, pre-existing values can be stored there, resulting in the wrong output.
This is due to the fact that values are accumulated into dst->data: dst_data[i10*s0 + i00] += v;.

This PR introduces two fixes:

memset of dst->data to 0
Fix wrong indexation of wdata buffer in ggml_compute_forward_conv_transpose_1d_f32.

slaren · 2023-10-23T16:06:07Z

#589 may be related, but the conv_transpose_2d test still fails. Does it need the same fix?

slaren · 2023-10-23T16:54:00Z

@YavorGIvanov I wonder if this is also the cause of the issues in sam.cpp with ggml-alloc that required some of the tensors to be pre-allocated.

PABannier · 2023-10-23T18:45:14Z

@slaren Very possible. I can push a fix for ggml_conv_transpose_2d if needed.

ggerganov · 2023-10-23T18:54:54Z

Thanks for fixing this - let's fix ggml_conv_transpose_2d dst memset in this PR as well

PABannier · 2023-10-23T22:50:51Z

@slaren @YavorGIvanov FYI, I just pushed a fix for conv_transpose_2d as well.

YavorGIvanov · 2023-10-24T07:31:01Z

@YavorGIvanov I wonder if this is also the cause of the issues in sam.cpp with ggml-alloc that required some of the tensors to be pre-allocated.

Probably it is indeed the reason it happened. I will update to latest ggml after this fix is merged and will verify over in the Sam.cpp repo by removing them

slaren · 2023-10-24T09:48:43Z

I have verified that this fixes #589.

YavorGIvanov · 2023-10-24T20:50:57Z

Yup. This was the reason I "needed" the two allocs after conv_transpose_2d op in Sam.cpp.
I update to latest ggml + removed them and everything works fine - YavorGIvanov/sam.cpp@8100281

FSSRepo · 2023-11-02T04:06:41Z

@slaren In stable diffusion, something similar also happened when using the im2col function because it doesn't usually traverse the entire dst array. It is assumed that dst will always have values of 0, but I've encountered non-random values (they are always the same, and I'm not sure why this happens; it's possible that a kernel accesses beyond the allowed address and writes extra data). When I use cudaMemset, ggml_conv_2d gives me the correct values this fixes my bug.

In cold thinking, I believe that the fact that a kernel accesses memory already owned by another operation, in this case, before the conv2d operation, there is ggml_silu_inplace. I think this operation requires ggml_allocr_alloc to avoid writing over the next block of memory, which belongs to im2col..

Separate the functions ggml_silu_inplace and ggml_conv2d in a separate example to test if my assumption was correct. It turns out I get the correct result, so I still don't understand how the im2col dest memory block arrives with data when it should all be 0.

This just happends in cuda backend.

inline void ggml_cuda_op_im2col(
    const ggml_tensor * src0, const ggml_tensor * src1, ggml_tensor * dst,
    const float * src0_dd, const float * src1_dd, float * dst_dd, const cudaStream_t & main_stream) {

    GGML_ASSERT(src0->type == GGML_TYPE_F16);
    GGML_ASSERT(src1->type == GGML_TYPE_F32);
    GGML_ASSERT( dst->type == GGML_TYPE_F16);

    const int32_t s0 = ((const int32_t*)(dst->op_params))[0];
    const int32_t s1 = ((const int32_t*)(dst->op_params))[1];
    const int32_t p0 = ((const int32_t*)(dst->op_params))[2];
    const int32_t p1 = ((const int32_t*)(dst->op_params))[3];
    const int32_t d0 = ((const int32_t*)(dst->op_params))[4];
    const int32_t d1 = ((const int32_t*)(dst->op_params))[5];

    const bool is_2D = ((const int32_t*)(dst->op_params))[6] == 1;

    const int64_t N = src1->ne[is_2D ? 3 : 2];
    const int64_t IC = src1->ne[is_2D ? 2 : 1];
    const int64_t IH = is_2D ? src1->ne[1] : 1;
    const int64_t IW = src1->ne[0];

    const int64_t KH = is_2D ? src0->ne[1] : 1;
    const int64_t KW = src0->ne[0];

    const int64_t OH = is_2D ? dst->ne[2] : 1;
    const int64_t OW = dst->ne[1];

    // unexpected behavior: sometimes dst_dd has data
    cudaMemsetAsync(dst_dd, 0, ggml_nbytes(dst), main_stream);

    im2col_f32_f16_cuda(src1_dd, (half*) dst_dd,
        OH, IW, IH, OW, IC, KH, KW, N,
        src1->nb[is_2D ? 3 : 2] / 4, // nb is byte offset, src is type float32
        src1->nb[is_2D ? 2 : 1] / 4, // nb is byte offset, src is type float32
        s0, s1, p0, p1, d0, d1, main_stream);

    (void) src0;
    (void) src0_dd;
}

ggerganov · 2023-11-02T19:15:57Z

In general, ggml operations should never assume that the dst data is zero, so if your operation does not need to write to all destination elements, it still has to zero out the rest. Note that this is the reason that such operations on the CPU first call memset over dst->data. For example:

ggml/src/ggml.c

Lines 12796 to 12798 in 54cec9f

 if (params->type == GGML_TASK_INIT) { 

 memset(dst->data, 0, ggml_nbytes(dst)); 

 }

PABannier added 2 commits October 23, 2023 17:49

wrong indexation of kernel buffer

25bae1a

memset in dst

05a8ccc

ggerganov approved these changes Oct 23, 2023

View reviewed changes

apply same fix to ggml_conv_transpose_2d

71f932e

slaren linked an issue Oct 24, 2023 that may be closed by this pull request

test-conv-transpose fails when building with sanitizers enabled #589

Closed

ggerganov merged commit 4300996 into ggerganov:master Oct 24, 2023
4 checks passed

PABannier deleted the fix_nasty_conv_transpose_1d branch October 24, 2023 19:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`fix`: memset dst to 0 in `ggml_conv_transpose_1d` #591

`fix`: memset dst to 0 in `ggml_conv_transpose_1d` #591

PABannier commented Oct 23, 2023 •

edited

Loading

slaren commented Oct 23, 2023 •

edited

Loading

slaren commented Oct 23, 2023

PABannier commented Oct 23, 2023

ggerganov commented Oct 23, 2023

PABannier commented Oct 23, 2023

YavorGIvanov commented Oct 24, 2023

slaren commented Oct 24, 2023

YavorGIvanov commented Oct 24, 2023 •

edited

Loading

FSSRepo commented Nov 2, 2023 •

edited

Loading

ggerganov commented Nov 2, 2023

fix: memset dst to 0 in ggml_conv_transpose_1d #591

fix: memset dst to 0 in ggml_conv_transpose_1d #591

Conversation

PABannier commented Oct 23, 2023 • edited Loading

slaren commented Oct 23, 2023 • edited Loading

slaren commented Oct 23, 2023

PABannier commented Oct 23, 2023

ggerganov commented Oct 23, 2023

PABannier commented Oct 23, 2023

YavorGIvanov commented Oct 24, 2023

slaren commented Oct 24, 2023

YavorGIvanov commented Oct 24, 2023 • edited Loading

FSSRepo commented Nov 2, 2023 • edited Loading

ggerganov commented Nov 2, 2023

`fix`: memset dst to 0 in `ggml_conv_transpose_1d` #591

`fix`: memset dst to 0 in `ggml_conv_transpose_1d` #591

PABannier commented Oct 23, 2023 •

edited

Loading

slaren commented Oct 23, 2023 •

edited

Loading

YavorGIvanov commented Oct 24, 2023 •

edited

Loading

FSSRepo commented Nov 2, 2023 •

edited

Loading