ggml: replace conv 1D - 2D stage_0 and stage_1 with im2col and mul_mat #564

FSSRepo · 2023-10-09T18:53:05Z

Fix this #559

This pull request aims to add ggml_im2col operator, which I have no idea what it is (I'm serious). I only know that it converts images into columns, eliminating the stage 0 of conv1d and conv2d, and refactors ggml_mul_mat to support float16 source data types, presumably to reduce memory usage, and remove the stage 1 of conv1d and conv2d. Additionally, I want to implement CUDA kernels for these operations.

Conv1D = im2col (1D) + Matrix Multiplication

GGML_API struct ggml_tensor * ggml_conv_1d(
        struct ggml_context * ctx,
        struct ggml_tensor  * a,
        struct ggml_tensor  * b,
        int                   s0,
        int                   p0,
        int                   d0) {
    struct ggml_tensor * result = ggml_im2col(ctx, a, b, s0, 0, p0, 0, d0, 0, false);
    result = ggml_mul_mat(ctx, a, result);
    return result;
}

Conv2D = im2col (2D) + Matrix Multiplication

struct ggml_tensor * ggml_conv_2d(
    struct ggml_context * ctx,
    struct ggml_tensor  * a,
    struct ggml_tensor  * b,
    int                  s0,
    int                  s1,
    int                  p0,
    int                  p1,
    int                  d0,
    int                  d1) {

    struct ggml_tensor * result = ggml_im2col(ctx, a, b, s0, s1, p0, p1, d0, d1, true); // [N, OH, OW, IC * KH * KW]
    result = ggml_mul_mat(ctx, a, result);

    return result;

}

Where:

GGML_API struct ggml_tensor * ggml_im2col(
            struct ggml_context * ctx,
            struct ggml_tensor  * a,
            struct ggml_tensor  * b,
            int                  s0,
            int                  s1,
            int                  p0,
            int                  p1,
            int                  d0,
            int                  d1,
            bool             is_2D);

Here's the plan to finish this PR. I'll be working on this as soon as possible since I won't have more time due to my studies:

remove ggml_conv_1d_stage_0 and ggml_conv_1d_stage_1.
remove ggml_conv_2d_stage_0 and ggml_conv_2d_stage_1.
add ggml_im2col for 1D and 2D, I am going to investigate the difference in order to refactor the code better and avoid duplication.
add ggml_compute_forward_mul_mat_f16_f32 for support float 16 matrix multiplication, this reduces memory.
Research how stage 1 (General Matrix Multiply) of Conv1D differs from Conv2D, to see if it involves the same matrix multiplication or not.
Add the CUDA kernels, I already have them done, I just want to finish the previous ones.
Perform final tests on the conv1d and conv2d functions and im2col, optimize results.
reuse ggml_mul_mat, and remove gemm_fp16_out_fp32. I'm testing the differences.
Refactor the code (remove unused comments and the GGML_OP_CONV_1D and GGML_OP_CONV_2D remainder implementations that will no longer be used)."

        GGML_OP_ALIBI,
        GGML_OP_CLAMP,
        GGML_OP_CONV_1D, // this
        GGML_OP_CONV_TRANSPOSE_1D,
        GGML_OP_CONV_2D, // this
        GGML_OP_IM2COL,

Suggestions and feedback are welcome.

FSSRepo · 2023-10-09T19:54:18Z

Added some 'unit' tests:

master:

load_model: ggml tensor size    = 320 bytes
load_model: backend buffer size = 0.00 MB
main: compute buffer size: 0.00 MB
ggml_conv2d (64): PASS

PR:

load_model: ggml tensor size    = 320 bytes
load_model: backend buffer size = 0.00 MB
GGML_ASSERT: D:\proyectos\ggml\src\ggml.c:6425: ggml_can_mul_mat(a, b)

Incompatible matrix multiplication (the current implementation doesn't support 4D tensors?)

😂

Edit:

Fixed:

struct ggml_tensor * ggml_mul_mat(
        struct ggml_context * ctx,
        struct ggml_tensor  * a,
        struct ggml_tensor  * b) {
    bool special_case = (a->ne[0] * a->ne[1] * a->ne[2]) == b->ne[0]; // admit custom multiplication
    GGML_ASSERT(ggml_can_mul_mat(a, b) || special_case);
    GGML_ASSERT(!ggml_is_transposed(a));

    bool is_node = false;

    if (a->grad || b->grad) {
        is_node = true;
    }

    const int64_t ne[4] = {
        special_case ? b->ne[1] : a->ne[1],
        b->ne[special_case ? 2 : 1],
        special_case ? a->ne[3] : b->ne[2],
        b->ne[3] };
    struct ggml_tensor * result = ggml_new_tensor(ctx, GGML_TYPE_F32, MAX(a->n_dims, b->n_dims), ne);

    result->op   = GGML_OP_MUL_MAT;
    result->grad = is_node ? ggml_dup_tensor(ctx, result) : NULL;
    result->src[0] = a;
    result->src[1] = b;

    return result;
}

Result:

D:\proyectos\ggml\build\bin\Release>test-conv2d.exe
load_model: ggml tensor size    = 320 bytes
load_model: backend buffer size = 0.00 MB
Tensor im2col: [36, 4, 4, 1]
Tensor a: [3, 3, 4, 4]
Tensor mul_mat: [4, 4, 4, 1]
main: compute buffer size: 0.00 MB
Tensor im2col: [36, 4, 4, 1]
Tensor a: [3, 3, 4, 4]
Tensor mul_mat: [4, 4, 4, 1]
ggml_conv2d (64): PASS

FSSRepo · 2023-10-09T21:25:03Z

Fixed (CPU Backend):

D:\proyectos\ggml\build\bin\Release>test-conv1d.exe
load_model: ggml tensor size    = 320 bytes
load_model: backend buffer size = 0.00 MB
main: compute buffer size: 0.00 MB
ggml_conv1d (15): PASS
D:\proyectos\ggml\build\bin\Release>test-conv2d.exe
load_model: ggml tensor size    = 320 bytes
load_model: backend buffer size = 0.00 MB
main: compute buffer size: 0.00 MB
ggml_conv2d (64): PASS

FSSRepo · 2023-10-10T20:11:07Z

Final test with new data and results:

ggerganov

Overall, I think this is great - I'm surprised you pulled it so quickly!

However, there are some things in ggml.c that should be done in a simpler way. See my. comments.

src/ggml.c

FSSRepo · 2023-10-11T16:32:39Z

@ggerganov @slaren I need some feedback about ggml_mul_mat

In the following mult of matrices with the shape (3,2) x (2, 3):

$$ \begin{bmatrix} 2 & 8 \\ 5 & 1 \\ 4 & 2 \\ \end{bmatrix} \times \begin{bmatrix} 10 & 9 & 5 \\ 5 & 9 & 4 \\ \end{bmatrix} = \begin{bmatrix} 60 & 90 & 42 \\ 55 & 54 & 29 \\ 50 & 54 & 28 \\ \end{bmatrix} $$

To resolve it with gemm_f16_out_f32 I need to transpose B (2, 3) -> (3, 2) but ggml_mul_mat fail

D:\proyectos\cpp-projects\ggml-test\build\Release>mult-mat-test.exe
Transposed B:
[10.0 5.0
9.0 9.0
5.0 4.0]
load_model: ggml tensor size    = 320 bytes
load_model: backend buffer size = 1084 bytes
Performing ggml_gemm test:
60.0 90.0 42.0
55.0 54.0 29.0
50.0 54.0 28.0
gemm_mult: PASSED
main: compute buffer size: 0.0625 KB

Performing ggml_mul_mat test:
105.0 48.0 -123612456.0
78.0 37.0 -54790864.0
-1873442825329845586624512.0 -0.0 0.0
ggml_mul_mat (6): FAILED

Am I making some mistake? look the test test-mul-mat that I added. I tried using ggml_transponse and ggml_reshape but no good result, the ggml_mul_mat result shape is wrong.

ggerganov · 2023-10-11T18:52:16Z

In ggml, ne[0] is always the number of columns in the tensor.
Also, ggml_mul_mat returns transposed result

FSSRepo · 2023-10-11T19:46:35Z

In ggml, ne[0] is always the number of columns in the tensor. Also, ggml_mul_mat returns transposed result

I closed the PR accidentally, Thats explains many things 😂😂😂😂😂😂😂😂

FSSRepo · 2023-10-14T14:35:40Z

Conv2D GEMM

Matrix A: [3, 3, 10, 10]
Matrix B (im2col): [90, 8, 6, 1]
Result: [8, 6, 10, 1]

m: 10, n: 48, k: 90

Where:
Matrix A (m, k) = (10, 10 * 3 * 3) -> reshape internally -> [10, 90]
Matrix B (im2col) No Transposed (n, k) = (1 * 6 * 8, 10 * 3 * 3) -> reshape internally -> [48, 90]

Result Conv2D (n, m) = [48, 10] -> reshape internally -> (1 * 10, 6 * 8)

Conv2D ggml_mul_mat

Remember (columns, rows) in ggml_mul_mat
matrix A = [90, 10] (need reshape)
Matrix B = [90, 48] (no require transpose) (need reshape)
Result Transposed = [10, 48]

Result Conv2D = Transpose([10, 48]) = [48, 10] (need reshape)

FSSRepo · 2023-10-14T15:54:41Z

@ggerganov

Now Conv2d is this to reuse ggml_mul_mat and im2col must be F32.

We need implement F16 x F16 = F32 in ggml_mul_mat to reduce memory usage. This is urgent because projects like Sam, Clip, and Llava now in Llama.cpp require im2col to be in F16.

struct ggml_tensor* im2col = ggml_im2col(ctx0, model.a, model.b, s0, s1, p0, p1, d0, d1, true); // f32 because ggml_mul_mat doesn't support fp16 source = fp32 output
struct ggml_tensor* conv2d_res = ggml_reshape_4d(ctx0,
            ggml_cont(ctx0, ggml_transpose(ctx0,
            ggml_mul_mat(ctx0,
            ggml_cont(ctx0, ggml_reshape_2d(ctx0, model.a, (model.a->ne[0] * model.a->ne[1] * model.a->ne[2]),  model.a->ne[3])),
            ggml_cont(ctx0, ggml_reshape_2d(ctx0, im2col_res, im2col_res->ne[0],  (im2col_res->ne[3] * im2col_res->ne[2] * im2col_res->ne[1])))))),
            im2col_res->ne[1], im2col_res->ne[2], model.a->ne[3], im2col_res->ne[3]);

FSSRepo · 2023-10-14T17:34:40Z

Error due im2col is F16 and ggml_mul_mat unsupport fp16 sources:

CPU Backend:

load_model: ggml tensor size    = 320 bytes
load_model: backend buffer size = 0.00 MB
main: compute buffer size: 0.02 MB
GGML_ASSERT: D:\proyectos\ggml\src\ggml.c:11730: nb10 == sizeof(float) // fixed

GPU Backend:

load_model: ggml tensor size    = 320 bytes
load_model: backend buffer size = 0.00 MB
load_model: using CUDA backend
ggml_init_cublas: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3050 Laptop GPU, compute capability 8.6
main: compute buffer size: 0.02 MB
ggml_cuda_cpy: unsupported type combination (f16 to f16)
GGML_ASSERT: D:\proyectos\ggml\src\ggml-cuda.cu:7168: false

FSSRepo · 2023-10-14T22:38:01Z

@slaren Does the CUDA implementation of ggml_mul_mat support data sources of type fp16?

I tried to fix this GGML_ASSERT: D:\proyectos\ggml\src\ggml-cuda.cu:7168: false implementing 'ggml_cpy_f16_f16_cuda', but now fails cuBLAS error 7 at D:\proyectos\ggml\src\ggml-cuda.cu:6337: an unsupported value or parameter was passed to the function.

cublasGemmEx(g_cublas_handles[id], CUBLAS_OP_T, CUBLAS_OP_N,
                    row_diff, src1_ncols, ne10,
                    &alpha_f16, src0_ptr, CUDA_R_16F, ne00,
                                src1_ptr, CUDA_R_16F, ne10,
                    &beta_f16,   dst_f16, CUDA_R_16F, ldc,
                    CUBLAS_COMPUTE_16F,
                    CUBLAS_GEMM_DEFAULT_TENSOR_OP));

slaren · 2023-10-14T22:46:07Z

It should work, but maybe there is a bug with fp16 src1. Uncommenting this line may show more details about the error:

ggml/src/ggml-cuda.cu

Line 5681 in f04b02b

// CUBLAS_CHECK(cublasLoggerConfigure(1, 1, 0, nullptr));

FSSRepo · 2023-10-14T23:39:31Z

@slaren

To avoid the assertion:

ggml_cuda_cpy: unsupported type combination (f16 to f16)
GGML_ASSERT: D:\proyectos\ggml\src\ggml-cuda.cu:7168: false

 else if (src0->type == GGML_TYPE_F16 && src1->type == GGML_TYPE_F16) {
        ggml_cpy_f16_f16_cuda(src0_ddc, src1_ddc, ne, ne00, ne01, nb00, nb01, nb02,
                              ne10, ne11, nb10, nb11, nb12, main_stream);
    }

static __device__ void cpy_1_f16_f16(const char * cxi, char * cdsti) {
    const half * xi = (const half *) cxi;
    half * dsti = (half *) cdsti;

    *dsti = *xi;
}

static void ggml_cpy_f16_f16_cuda(
    const char * cx, char * cdst, const int ne,
    const int ne00, const int ne01, const int nb00, const int nb01, const int nb02,
    const int ne10, const int ne11, const int nb10, const int nb11, const int nb12, cudaStream_t stream) {

    const int num_blocks = (ne + CUDA_CPY_BLOCK_SIZE - 1) / CUDA_CPY_BLOCK_SIZE;
    cpy_f32_f16<cpy_1_f16_f16><<<num_blocks, CUDA_CPY_BLOCK_SIZE, 0, stream>>>
        (cx, cdst, ne, ne00, ne01, nb00, nb01, nb02, ne10, ne11, nb10, nb11, nb12);
}

I! cuBLAS (v12.0) function cublasStatus_t __cdecl cublasSetStream_v2(struct cublasContext *,struct CUstream_st *) called:
i!  handle: type=cublasHandle_t; val=POINTER (IN HEX:0x0000015CD6DC1730)
i!  streamId: type=SOME TYPE; val=POINTER (IN HEX:0x0000015CBBBD36B0)
i! Time: 2023-10-14T19:38:20 elapsed from start 0.016667 minutes or 1.000000 seconds
i!Process=12976; Thread=18488; GPU=0; Handle=POINTER (IN HEX:0x0000015CD6DC1730); StreamId=POINTER (IN HEX:0x0000000000000000) (defaultStream); MathMode=CUBLAS_TF32_TENSOR_OP_MATH
i! COMPILED WITH: Microsoft Visual Studio / 192628806.0
I! cuBLAS (v12.0) function cublasStatus_t __cdecl cublasGemmEx(struct cublasContext *,cublasOperation_t,cublasOperation_t,int,int,int,const void *,const void *,enum cudaDataType_t,int,const void *,enum cudaDataType_t,int,const void *,void *,enum cudaDataType_t,int,cublasComputeType_t,cublasGemmAlgo_t) called:
i!  handle: type=cublasHandle_t; val=POINTER (IN HEX:0x0000015CD6DC1730)
i!  transa: type=cublasOperation_t; val=CUBLAS_OP_T(1)
i!  transb: type=cublasOperation_t; val=CUBLAS_OP_N(0)
i!  m: type=int; val=10
i!  n: type=int; val=8
i!  k: type=int; val=30
i!  alpha: type=void; val=POINTER (IN HEX:0x0000000F12CFEE98)
i!  A: type=void; val=POINTER (IN HEX:0x000000071A820E00)
i!  Atype: type=cudaDataType_t; val=CUDA_R_16F(2)
i!  lda: type=int; val=30
i!  B: type=void; val=NULL_PTR(POINTER (IN HEX:0x0000000000000000))
i!  Btype: type=cudaDataType_t; val=CUDA_R_16F(2)
i!  ldb: type=int; val=30
i!  beta: type=void; val=POINTER (IN HEX:0x0000000F12CFEF28)
i!  C: type=void; val=POINTER (IN HEX:0x000000071A821600)
i!  Ctype: type=cudaDataType_t; val=CUDA_R_16F(2)
i!  ldc: type=int; val=10
i!  computeType: type=cublasComputeType_t; val=CUBLAS_COMPUTE_16F(64)
i!  algo: type=SOME TYPE; val=CUBLAS_GEMM_DEFAULT_TENSOR_OP(99)
i! Time: 2023-10-14T19:38:20 elapsed from start 0.016667 minutes or 1.000000 seconds
i!Process=12976; Thread=18488; GPU=0; Handle=POINTER (IN HEX:0x0000015CD6DC1730); StreamId=POINTER (IN HEX:0x0000015CBBBD36B0); MathMode=CUBLAS_TF32_TENSOR_OP_MATH
i! COMPILED WITH: Microsoft Visual Studio / 192628806.0

cuBLAS error 7 at D:\proyectos\ggml\src\ggml-cuda.cu:6337: an unsupported value or parameter was passed to the function
current device: 0

The ggml_mul_mat in cpu backend works with the same matrices.

slaren · 2023-10-14T23:42:41Z

Try this change:

diff --git a/ggml-cuda.cu b/ggml-cuda.cu
index 654d363..96e63e0 100644
--- a/ggml-cuda.cu
+++ b/ggml-cuda.cu
@@ -6295,7 +6295,7 @@ inline void ggml_cuda_op_mul_mat_cublas(
             src1_as_f16 = (half *) ggml_cuda_pool_malloc(ne * sizeof(half), &src1_as);
             to_fp16_cuda(src1_ddf_i, src1_as_f16, ne, stream);
         }
-        const half * src1_ptr = src1->type == GGML_TYPE_F16 ? (const half *) src1_ddq_i : src1_as_f16;
+        const half * src1_ptr = src1->type == GGML_TYPE_F16 ? (const half *) src1_ddf_i : src1_as_f16;

         size_t dst_as = 0;
         half * dst_f16 = (half *) ggml_cuda_pool_malloc(row_diff*src1_ncols * sizeof(half), &dst_as);

src/ggml.c

FSSRepo · 2023-10-16T10:42:15Z

As a final opinion on the topic, I would expect to see what people who truly need these features have to say about whether the current method of returning is suitable or the one you propose, as long as they don't have to use 'ggml_cont' and 'ggml_transpose.' If, in any way, they had to do it when operating with the regular version of 'ggml_conv,' then the ideal scenario would be to return the matrix without transposing it.

ggerganov · 2023-10-16T11:56:01Z

The whisper example produces wrong results on this branch:

ggml/ci/run.sh

Lines 199 to 219 in 97809b3

 # whisper 

 function gg_run_whisper { 

 cd ${SRC} 

 gg_wget models-mnt/whisper/ https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.en.bin 

 gg_wget models-mnt/whisper/ https://github.com/ggerganov/whisper.cpp/raw/master/samples/jfk.wav 

 cd build-ci-release 

 set -e 

 path_models="../models-mnt/whisper/" 

 model_f16="${path_models}/ggml-base.en.bin" 

 audio_0="${path_models}/jfk.wav" 

 (time ./bin/whisper -m ${model_f16} -f ${audio_0} 2>&1 | tee -a $OUT/${ci}-main.log) || true 

 set +e 

 }

It uses ggml_conv_1d_ph() so likely the problem is there:

ggml/examples/whisper/whisper.cpp

Lines 1542 to 1561 in 97809b3

 // convolution + gelu 

 { 

 cur = ggml_conv_1d_ph(ctx0, model.e_conv_1_w, mel, 1, 1); 

 cur = ggml_add(ctx0, 

 ggml_repeat(ctx0, 

 model.e_conv_1_b, 

 cur), 

 cur); 

 cur = ggml_gelu(ctx0, cur); 

 cur = ggml_conv_1d_ph(ctx0, model.e_conv_2_w, cur, 2, 1); 

 cur = ggml_add(ctx0, 

 ggml_repeat(ctx0, 

 model.e_conv_2_b, 

 cur), 

 cur); 

 cur = ggml_gelu(ctx0, cur); 

 }

In ggml, we have the following identity:

z0 = ggml_mul_mat(ctx, x, y);
z1 = ggml_cont(ctx, ggml_transpose(ctx, ggml_mul_mat(ctx, y, x)));

z0 == z1

The ggml_cont operation is very expensive, so it definitely should not be implicitly applied.
The conv operators have to be implemented to output the correct results without ggml_cont and ggml_transpose.

FSSRepo · 2023-10-16T12:29:26Z

The ggml_cont operation is very expensive, so it definitely should not be implicitly applied.
The conv operators have to be implemented to output the correct results without ggml_cont and ggml_transpose.

I will see if ggml_cont can be omitted in both conv1d and conv2d, but ggml_transpose definitely needs to be included for it to produce the same results as the master branch implementation. I will also try to identify the issue with examples that require these operations. This is quite a complex task, and I'm not very familiar with how to reimplement here the functionality of conv1d and conv2d in PyTorch.

ggerganov · 2023-10-16T12:30:44Z

@FSSRepo I think you can leave the branch as it is and I will try to fix it

FSSRepo · 2023-10-16T12:58:42Z

@FSSRepo I think you can leave the branch as it is and I will try to fix it

I'm going to review the whisper issue with this PR for now, so I won't be adding any more commits to this branch.

FSSRepo · 2023-10-16T19:02:45Z

@ggerganov I don't get some error in the latest commit.

master:

And so my fellow Americans, ask not what your country can do for you, ask what you can do for your country.

d8539f3:

 And so my fellow Americans, ask not what your country can do for you, ask what you can do for your country.

ggerganov · 2023-10-16T20:38:01Z

The computation was failing when using BLAS - on the CPU it supports only F32. I fixed that in 53f805e

I now applied the identity transformation that I told you about earlier:

diff --git a/src/ggml.c b/src/ggml.c
index 927f03a..b33a741 100644
--- a/src/ggml.c
+++ b/src/ggml.c
@@ -7482,10 +7482,10 @@ GGML_API struct ggml_tensor * ggml_conv_1d(
         int                   p0,
         int                   d0) {
     struct ggml_tensor * result = ggml_im2col(ctx, a, b, s0, 0, p0, 0, d0, 0, false); // [N, OL, IC * K]
-    result = ggml_reshape_3d(ctx, ggml_cont(ctx, ggml_transpose(ctx,
+    result = ggml_reshape_3d(ctx,
             ggml_mul_mat(ctx,
-            ggml_reshape_2d(ctx, a, (a->ne[0] * a->ne[1]),  a->ne[2]), // [OC，IC, K] => [OC, IC * K]
-            ggml_reshape_2d(ctx, result, result->ne[0],  (result->ne[2] * result->ne[1]))))), // [N, OL, IC * K] => [N*OL, IC * K]
+            ggml_reshape_2d(ctx, result, result->ne[0],  (result->ne[2] * result->ne[1])), // [N, OL, IC * K] => [N*OL, IC * K]
+            ggml_reshape_2d(ctx, a, (a->ne[0] * a->ne[1]),  a->ne[2])), // [OC，IC, K] => [OC, IC * K]
             result->ne[1],  a->ne[2], result->ne[2]); // [N, OC, OL]
     return result;
 }

This again produces correct results for test-conv-1d and for whisper.
And we no longer need ggml_cont and ggml_transpose.

FSSRepo · 2023-10-16T22:15:00Z

In Conv2D

you can replace (a->ne[0] * a->ne[1] * a->ne[2]) -> result->ne[0], I forgot that.

ggml_reshape_2d(ctx, a, result->ne[0],  a->ne[3])), // [OC，IC, KH, KW] => [OC, IC * KH * KW]

The same

ggml_reshape_2d(ctx, a, (a->ne[0] * a->ne[1] * a->ne[2]),  a->ne[3])), // [OC，IC, KH, KW] => [OC, IC * KH * KW]

And Conv1D too (a->ne[0] * a->ne[1]) -> result->ne[0]

FSSRepo · 2023-10-16T22:21:26Z

The identity that you refer is just swap the matrix operands? 😂 Good that, I didn't get it.

ggerganov · 2023-10-23T19:00:14Z

Will get back to this PR soon, sorry for delay

FSSRepo · 2023-10-23T19:17:47Z

Will get back to this PR soon, sorry for delay

Don't worry. The next week I will add cuda kernels for ggml_concat, ggml_upscale and ggml_group_norm.

Today I made the kernels for these ops and will do some tests in stable diffusion. This PR implementation of conv2dworks perfectly in stable-diffusion.cpp

src/ggml-cuda.cu

Co-authored-by: slaren <[email protected]>

FSSRepo added 4 commits October 9, 2023 12:17

added conv2d stage 0 - 1 cuda kernels

3c50e46

add im2col + refactor conv1d and conv2d

27b3ab3

fix params invalid index

d5c329b

add conv1d and conv2d unit tests

574735c

resolving wrong values and fix mul_mat validation

2358d15

FSSRepo added 3 commits October 10, 2023 13:24

improve tests + reduce code duplication

ca56f51

add cuda kernels

15ceadb

more data test

872cc04

fix ggml_op_count to 70

bb340dc

ggerganov reviewed Oct 10, 2023

View reviewed changes

src/ggml.c Outdated Show resolved Hide resolved

src/ggml.c Outdated Show resolved Hide resolved

src/ggml.c Outdated Show resolved Hide resolved

src/ggml.c Outdated Show resolved Hide resolved

add temp test - gemm != mul_mat

419b4b8

tests : fix test-mul-mat matrix multiplication

af312e4

FSSRepo closed this Oct 11, 2023

FSSRepo reopened this Oct 11, 2023

test-mul-mat match gemm == ggml_mul_mat with conv2d op

c692f61

replaced gemm by ggml_mul_mat

3dad5e6

ggml_mul_mat cpu backend support fp16 src1

fde8828

ggerganov reviewed Oct 16, 2023

View reviewed changes

src/ggml.c Outdated Show resolved Hide resolved

explain conv1d reshapes

d8539f3

ggerganov added 3 commits October 16, 2023 23:24

ggml : fix tests on Arm + do not use BLAS for F16 data

53f805e

tests : fix FP16 handling on Arm

3b9022a

ggml : avoid ggml_cont and ggml_transpose in ggml_conv_xd

7193df2

Merge branch 'ggerganov:master' into master

c4c0265

Merge branch 'ggerganov:master' into master

7a4544b

ggerganov mentioned this pull request Nov 10, 2023

whisper : add full CUDA and Metal offloading ggerganov/whisper.cpp#1472

Merged

ggerganov added 3 commits November 10, 2023 14:45

Merge branch 'master' into HEAD

e0bbb9f

ci : switch back to release

f1879c0

cuda : fix wrong pointer usage

439a79f

ggerganov mentioned this pull request Nov 10, 2023

whisper : support ggml_conv with CUDA and Metal ggerganov/whisper.cpp#1473

Merged

ggerganov added 2 commits November 11, 2023 10:30

ggml : add metal support for im2col and f16xf16 mul mat

a729f6b

ggml : im2col opts

406cbc1

slaren reviewed Nov 11, 2023

View reviewed changes

src/ggml-cuda.cu Outdated Show resolved Hide resolved

Update src/ggml-cuda.cu

da25cf0

Co-authored-by: slaren <[email protected]>

ggerganov merged commit ba779f1 into ggerganov:master Nov 12, 2023
4 checks passed

This was referenced Nov 13, 2023

sync : ggml (im2col, GPU conv, 32-bit arm compat) ggerganov/llama.cpp#4060

Merged

ggml : replace conv stage_0 and stage_1 with im2col and mul_mat #559

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ggml: replace conv 1D - 2D stage_0 and stage_1 with im2col and mul_mat #564

ggml: replace conv 1D - 2D stage_0 and stage_1 with im2col and mul_mat #564

FSSRepo commented Oct 9, 2023 •

edited

Loading

FSSRepo commented Oct 9, 2023 •

edited

Loading

FSSRepo commented Oct 9, 2023 •

edited

Loading

FSSRepo commented Oct 10, 2023

ggerganov left a comment

FSSRepo commented Oct 11, 2023 •

edited

Loading

ggerganov commented Oct 11, 2023

FSSRepo commented Oct 11, 2023 •

edited

Loading

FSSRepo commented Oct 14, 2023 •

edited

Loading

FSSRepo commented Oct 14, 2023 •

edited

Loading

FSSRepo commented Oct 14, 2023 •

edited

Loading

FSSRepo commented Oct 14, 2023 •

edited

Loading

slaren commented Oct 14, 2023

FSSRepo commented Oct 14, 2023 •

edited

Loading

slaren commented Oct 14, 2023

FSSRepo commented Oct 16, 2023

ggerganov commented Oct 16, 2023 •

edited

Loading

FSSRepo commented Oct 16, 2023 •

edited

Loading

ggerganov commented Oct 16, 2023

FSSRepo commented Oct 16, 2023 •

edited

Loading

FSSRepo commented Oct 16, 2023 •

edited

Loading

ggerganov commented Oct 16, 2023

FSSRepo commented Oct 16, 2023 •

edited

Loading

FSSRepo commented Oct 16, 2023

ggerganov commented Oct 23, 2023

FSSRepo commented Oct 23, 2023 •

edited

Loading

ggml: replace conv 1D - 2D stage_0 and stage_1 with im2col and mul_mat #564

ggml: replace conv 1D - 2D stage_0 and stage_1 with im2col and mul_mat #564

Conversation

FSSRepo commented Oct 9, 2023 • edited Loading

FSSRepo commented Oct 9, 2023 • edited Loading

FSSRepo commented Oct 9, 2023 • edited Loading

FSSRepo commented Oct 10, 2023

ggerganov left a comment

Choose a reason for hiding this comment

FSSRepo commented Oct 11, 2023 • edited Loading

ggerganov commented Oct 11, 2023

FSSRepo commented Oct 11, 2023 • edited Loading

FSSRepo commented Oct 14, 2023 • edited Loading

Conv2D GEMM

Conv2D ggml_mul_mat

FSSRepo commented Oct 14, 2023 • edited Loading

FSSRepo commented Oct 14, 2023 • edited Loading

FSSRepo commented Oct 14, 2023 • edited Loading

slaren commented Oct 14, 2023

FSSRepo commented Oct 14, 2023 • edited Loading

slaren commented Oct 14, 2023

FSSRepo commented Oct 16, 2023

ggerganov commented Oct 16, 2023 • edited Loading

FSSRepo commented Oct 16, 2023 • edited Loading

ggerganov commented Oct 16, 2023

FSSRepo commented Oct 16, 2023 • edited Loading

FSSRepo commented Oct 16, 2023 • edited Loading

ggerganov commented Oct 16, 2023

FSSRepo commented Oct 16, 2023 • edited Loading

FSSRepo commented Oct 16, 2023

ggerganov commented Oct 23, 2023

FSSRepo commented Oct 23, 2023 • edited Loading

FSSRepo commented Oct 9, 2023 •

edited

Loading

FSSRepo commented Oct 9, 2023 •

edited

Loading

FSSRepo commented Oct 9, 2023 •

edited

Loading

FSSRepo commented Oct 11, 2023 •

edited

Loading

FSSRepo commented Oct 11, 2023 •

edited

Loading

FSSRepo commented Oct 14, 2023 •

edited

Loading

FSSRepo commented Oct 14, 2023 •

edited

Loading

FSSRepo commented Oct 14, 2023 •

edited

Loading

FSSRepo commented Oct 14, 2023 •

edited

Loading

FSSRepo commented Oct 14, 2023 •

edited

Loading

ggerganov commented Oct 16, 2023 •

edited

Loading

FSSRepo commented Oct 16, 2023 •

edited

Loading

FSSRepo commented Oct 16, 2023 •

edited

Loading

FSSRepo commented Oct 16, 2023 •

edited

Loading

FSSRepo commented Oct 16, 2023 •

edited

Loading

FSSRepo commented Oct 23, 2023 •

edited

Loading