-
Notifications
You must be signed in to change notification settings - Fork 972
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix
: memset dst to 0 in ggml_conv_transpose_1d
#591
fix
: memset dst to 0 in ggml_conv_transpose_1d
#591
Conversation
#589 may be related, but the |
@YavorGIvanov I wonder if this is also the cause of the issues in |
@slaren Very possible. I can push a fix for ggml_conv_transpose_2d if needed. |
Thanks for fixing this - let's fix |
@slaren @YavorGIvanov FYI, I just pushed a fix for |
Probably it is indeed the reason it happened. I will update to latest ggml after this fix is merged and will verify over in the Sam.cpp repo by removing them |
I have verified that this fixes #589. |
Yup. This was the reason I "needed" the two allocs after conv_transpose_2d op in Sam.cpp. |
@slaren In stable diffusion, something similar also happened when using the im2col function because it doesn't usually traverse the entire
Separate the functions This just happends in cuda backend. inline void ggml_cuda_op_im2col(
const ggml_tensor * src0, const ggml_tensor * src1, ggml_tensor * dst,
const float * src0_dd, const float * src1_dd, float * dst_dd, const cudaStream_t & main_stream) {
GGML_ASSERT(src0->type == GGML_TYPE_F16);
GGML_ASSERT(src1->type == GGML_TYPE_F32);
GGML_ASSERT( dst->type == GGML_TYPE_F16);
const int32_t s0 = ((const int32_t*)(dst->op_params))[0];
const int32_t s1 = ((const int32_t*)(dst->op_params))[1];
const int32_t p0 = ((const int32_t*)(dst->op_params))[2];
const int32_t p1 = ((const int32_t*)(dst->op_params))[3];
const int32_t d0 = ((const int32_t*)(dst->op_params))[4];
const int32_t d1 = ((const int32_t*)(dst->op_params))[5];
const bool is_2D = ((const int32_t*)(dst->op_params))[6] == 1;
const int64_t N = src1->ne[is_2D ? 3 : 2];
const int64_t IC = src1->ne[is_2D ? 2 : 1];
const int64_t IH = is_2D ? src1->ne[1] : 1;
const int64_t IW = src1->ne[0];
const int64_t KH = is_2D ? src0->ne[1] : 1;
const int64_t KW = src0->ne[0];
const int64_t OH = is_2D ? dst->ne[2] : 1;
const int64_t OW = dst->ne[1];
// unexpected behavior: sometimes dst_dd has data
cudaMemsetAsync(dst_dd, 0, ggml_nbytes(dst), main_stream);
im2col_f32_f16_cuda(src1_dd, (half*) dst_dd,
OH, IW, IH, OW, IC, KH, KW, N,
src1->nb[is_2D ? 3 : 2] / 4, // nb is byte offset, src is type float32
src1->nb[is_2D ? 2 : 1] / 4, // nb is byte offset, src is type float32
s0, s1, p0, p1, d0, d1, main_stream);
(void) src0;
(void) src0_dd;
} |
In general, Lines 12796 to 12798 in 54cec9f
|
I came across a nasty bug in the Encodec.cpp library using
ggml_conv_transpose_1d
.If the
dst->data
is notmemset
to 0, pre-existing values can be stored there, resulting in the wrong output.This is due to the fact that values are accumulated into
dst->data
:dst_data[i10*s0 + i00] += v;
.This PR introduces two fixes:
memset
ofdst->data
to 0wdata
buffer inggml_compute_forward_conv_transpose_1d_f32
.