ggml : full broadcast in mul, add, div + ggml_mul_mat_id, ggml_argsort, ggml_top_k #625

slaren · 2023-11-30T19:47:05Z

The CPU implementation should be done, I will add CUDA and Metal next.

slaren · 2023-12-01T00:11:13Z

It should work now, but still need to test the performance.

ggerganov

whisper.cpp can go back to using a broadcasted ggml_add for dim 0 after this change. Will update it and check the performance later today

FSSRepo · 2023-12-01T19:31:00Z

I will test this in stable diffusion with cuda

FSSRepo · 2023-12-01T20:01:14Z

Works, but ... We will have to sacrifice some performance for the sake of maximizing compatibility.

Your implementation

UNET Model

========== CUDA Timings =========
[       ADD] - 38.222000 ms - 419 - 0.091222 ms
[       MUL] - 13.909000 ms - 125 - 0.111272 ms
[    CONCAT] - 2.308000 ms - 12 - 0.192333 ms
[      NORM] - 2.567000 ms - 48 - 0.053479 ms
[GROUP_NORM] - 5.310000 ms - 61 - 0.087049 ms
[   MUL_MAT] - 121.505997 ms - 361 - 0.336582 ms
[     SCALE] - 2.243000 ms - 32 - 0.070094 ms
[      CONT] - 11.947000 ms - 160 - 0.074669 ms
[  SOFT_MAX] - 102.802002 ms - 32 - 3.212563 ms
[    IM2COL] - 66.832001 ms - 97 - 0.688990 ms
[   UPSCALE] - 0.577000 ms - 3 - 0.192333 ms
[     UNARY] - 4.678000 ms - 84 - 0.055690 ms
Total Time: 372.901001 ms

AutoEncoder

========== CUDA Timings =========
[       ADD] - 123.287003 ms - 85 - 1.450435 ms
[       MUL] - 48.032001 ms - 30 - 1.601067 ms
[GROUP_NORM] - 56.298000 ms - 30 - 1.876600 ms
[   MUL_MAT] - 167.600006 ms - 41 - 4.087805 ms
[     SCALE] - 0.856000 ms - 1 - 0.856000 ms
[      CONT] - 0.718000 ms - 3 - 0.239333 ms
[  SOFT_MAX] - 2.402000 ms - 1 - 2.402000 ms
[    IM2COL] - 951.328003 ms - 39 - 24.393026 ms
[   UPSCALE] - 4.278000 ms - 3 - 1.426000 ms
[     UNARY] - 22.635000 ms - 29 - 0.780517 ms
Total Time: 1377.433960 ms

My implementation:

UNET model

[       ADD] - 21.768999 ms - 419 - 0.051955 ms
[       MUL] - 8.439000 ms - 125 - 0.067512 ms
[    CONCAT] - 2.272000 ms - 12 - 0.189333 ms
[      NORM] - 2.565000 ms - 48 - 0.053438 ms
[GROUP_NORM] - 5.298000 ms - 61 - 0.086852 ms
[   MUL_MAT] - 120.456001 ms - 361 - 0.333673 ms
[     SCALE] - 2.285000 ms - 32 - 0.071406 ms
[      CONT] - 11.619000 ms - 160 - 0.072619 ms
[  SOFT_MAX] - 101.762001 ms - 32 - 3.180063 ms
[    IM2COL] - 64.297997 ms - 97 - 0.662866 ms
[   UPSCALE] - 0.560000 ms - 3 - 0.186667 ms
[     UNARY] - 4.590000 ms - 84 - 0.054643 ms
Total Time: 345.912994 ms

AutoEncoder

========== CUDA Timings =========
[       ADD] - 65.053001 ms - 85 - 0.765329 ms
[       MUL] - 23.393000 ms - 30 - 0.779767 ms
[GROUP_NORM] - 56.570999 ms - 30 - 1.885700 ms
[   MUL_MAT] - 170.505997 ms - 41 - 4.158683 ms
[     SCALE] - 0.872000 ms - 1 - 0.872000 ms
[      CONT] - 0.731000 ms - 3 - 0.243667 ms
[  SOFT_MAX] - 2.396000 ms - 1 - 2.396000 ms
[    IM2COL] - 945.870972 ms - 39 - 24.253101 ms
[   UPSCALE] - 4.749000 ms - 3 - 1.583000 ms
[     UNARY] - 22.777000 ms - 29 - 0.785414 ms
Total Time: 1292.918945 ms

slaren · 2023-12-01T20:24:33Z

This should be faster. If it isn't, we can add different versions of the kernels when broadcasting is not required.

FSSRepo · 2023-12-03T00:10:32Z

CUDA with latest changes, no improved performance, I have already run the same image generation several times, and the time remain the same.

========== CUDA Timings =========
[       ADD] - 40.991001 ms - 419 - 0.097831 ms
[       MUL] - 14.959000 ms - 125 - 0.119672 ms
[    CONCAT] - 2.246000 ms - 12 - 0.187167 ms
[      NORM] - 2.598000 ms - 48 - 0.054125 ms
[GROUP_NORM] - 5.277000 ms - 61 - 0.086508 ms
[   MUL_MAT] - 120.263000 ms - 361 - 0.333138 ms
[     SCALE] - 2.360000 ms - 32 - 0.073750 ms
[      CONT] - 11.507000 ms - 160 - 0.071919 ms
[  SOFT_MAX] - 102.699997 ms - 32 - 3.209375 ms
[    IM2COL] - 64.056999 ms - 97 - 0.660381 ms
[   UPSCALE] - 0.550000 ms - 3 - 0.183333 ms
[     UNARY] - 4.660000 ms - 84 - 0.055476 ms
Total Time: 372.167999 ms

AutoEncoder

========== CUDA Timings =========
[       ADD] - 75.974998 ms - 85 - 0.893824 ms
[       MUL] - 28.681999 ms - 30 - 0.956067 ms
[GROUP_NORM] - 55.887001 ms - 30 - 1.862900 ms
[   MUL_MAT] - 166.731995 ms - 41 - 4.066634 ms
[     SCALE] - 0.884000 ms - 1 - 0.884000 ms
[      CONT] - 0.730000 ms - 3 - 0.243333 ms
[  SOFT_MAX] - 2.397000 ms - 1 - 2.397000 ms
[    IM2COL] - 948.924011 ms - 39 - 24.331385 ms
[   UPSCALE] - 4.275000 ms - 3 - 1.425000 ms
[     UNARY] - 22.627001 ms - 29 - 0.780241 ms
Total Time: 1307.113037 ms

slaren · 2023-12-03T00:45:59Z

Ok, thanks for testing. I am adding a performance option to test-backend-ops that will make easier testing the performance of the kernels.

ggerganov · 2023-12-04T07:42:29Z

Updated the whisper example to use the broadcasted ggml_add and I don't observe any performance difference both with Metal and CUDA compared to the pre-broadcasted version that we had. So I think it is good

ggerganov · 2023-12-04T14:04:19Z

Any of the new ops that I can help with implementing for Metal?

slaren · 2023-12-04T14:07:02Z

We need to implement these ops in Metal:

ggml_div
ggml_sum_rows
ggml_argsort
ggml_mul_mat_id

src/ggml-cuda.cu

slaren · 2023-12-04T19:43:58Z

@ggerganov I am adding tests for quants and I am seeing some unexpected errors, even with just the CPU backend.

q5_0 and q5_1 generate some UB warnings:

/home/diego/code/ggml/src/ggml.c:17957:48: runtime error: shift exponent 32 is too large for 32-bit type 'unsigned int'
/home/diego/code/ggml/src/ggml.c:17957:62: runtime error: shift exponent 32 is too large for 32-bit type 'unsigned int'
  MUL_MAT(type_a=q5_0,type_b=f32,m=32,n=32,k=256,bs=[1,1],nr=[1,1]): OK
  ome/diego/code/ggml/src/ggml.c:17987:48: runtime error: shift exponent 32 is too large for 32-bit type 'unsigned int'
/home/diego/code/ggml/src/ggml.c:17987:62: runtime error: shift exponent 32 is too large for 32-bit type 'unsigned int'
  MUL_MAT(type_a=q5_1,type_b=f32,m=32,n=32,k=256,bs=[1,1],nr=[1,1]): OK

q2_k and q3_k generate NaNs:

  Error: MUL_MAT: NaN at index 6175
  MUL_MAT(type_a=q2_K,type_b=f32,m=32,n=32,k=256,bs=[10,1],nr=[1,1]): FAIL
  MUL_MAT(type_a=q2_K,type_b=f32,m=32,n=32,k=256,bs=[10,1],nr=[2,1]): OK
    Error: MUL_MAT: NaN at index 26655
  MUL_MAT(type_a=q2_K,type_b=f32,m=32,n=32,k=256,bs=[10,10],nr=[1,1]): FAIL
    Error: MUL_MAT: NaN at index 95263
  MUL_MAT(type_a=q2_K,type_b=f32,m=32,n=32,k=256,bs=[10,10],nr=[2,1]): FAIL
  MUL_MAT(type_a=q2_K,type_b=f32,m=32,n=32,k=256,bs=[10,10],nr=[1,2]): OK
    Error: MUL_MAT: NaN at index 35871
  MUL_MAT(type_a=q2_K,type_b=f32,m=32,n=32,k=256,bs=[10,10],nr=[2,2]): FAIL
  MUL_MAT(type_a=q3_K,type_b=f32,m=32,n=32,k=256,bs=[1,1],nr=[1,1]): OK
  MUL_MAT(type_a=q3_K,type_b=f32,m=32,n=32,k=256,bs=[10,1],nr=[1,1]): OK
  MUL_MAT(type_a=q3_K,type_b=f32,m=32,n=32,k=256,bs=[10,1],nr=[2,1]): OK
    Error: MUL_MAT: NaN at index 26655
  MUL_MAT(type_a=q3_K,type_b=f32,m=32,n=32,k=256,bs=[10,10],nr=[1,1]): FAIL
    Error: MUL_MAT: NaN at index 59423
  MUL_MAT(type_a=q3_K,type_b=f32,m=32,n=32,k=256,bs=[10,10],nr=[2,1]): FAIL
  MUL_MAT(type_a=q3_K,type_b=f32,m=32,n=32,k=256,bs=[10,10],nr=[1,2]): OK
    Error: MUL_MAT: NaN at index 208927
  MUL_MAT(type_a=q3_K,type_b=f32,m=32,n=32,k=256,bs=[10,10],nr=[2,2]): FAIL

slaren · 2023-12-04T19:49:08Z

Looks like the q2_k and q3_k NaNs don't happen in the M3, so I guess this is an issue with the AVX implementation.

The Metal backend fails every quant test, due to too much error. This doesn't happen with CUDA.

ggerganov · 2023-12-04T19:58:50Z

The Q2_K and Q3_K NaNs do not occur on my Ubuntu machine with AVX

Yeah, the Metal matrix multiplications have NMSE of ~0.000015 (excluding those that are not supported -- should assert for them). Not sure why

slaren · 2023-12-04T20:05:14Z

Does your Ubuntu machine support AVX512? My flags are AVX, AVX2, F16C, FMA.

slaren · 2023-12-04T20:08:39Z

Building without CUDA solves the issue, I didn't realize that the matrices that I am testing with are big enough to enable automatic offloading. So this is a CUDA issue.

ggerganov · 2023-12-04T20:10:56Z

Yes, same flags + sse3:

[  1%] Building C object src/CMakeFiles/ggml.dir/ggml.c.o
cd /home/ggerganov/development/github/ggml/build/src && /usr/bin/cc -DGGML_BUILD -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_EXPORTS -I/home/ggerganov/development/github/ggml/src/. -I/home/ggerganov/development/github/ggml/src/../include -I/home/ggerganov/development/github/ggml/src/../include/ggml -mavx -mavx2 -mfma -mf16c -msse3 -O3 -DNDEBUG -std=gnu11 -fPIC -Wall -Wpedantic -Wformat=2 -Wno-unused -Wstrict-prototypes -Werror=vla -Wunused -Wextra -Wcast-qual -Wdouble-promotion -Wshadow -Wno-unused-function -Wmissing-prototypes -MD -MT src/CMakeFiles/ggml.dir/ggml.c.o -MF CMakeFiles/ggml.dir/ggml.c.o.d -o CMakeFiles/ggml.dir/ggml.c.o -c /home/ggerganov/development/github/ggml/src/ggml.c
[  2%] Building C object src/CMakeFiles/ggml.dir/ggml-alloc.c.o
cd /home/ggerganov/development/github/ggml/build/src && /usr/bin/cc -DGGML_BUILD -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_EXPORTS -I/home/ggerganov/development/github/ggml/src/. -I/home/ggerganov/development/github/ggml/src/../include -I/home/ggerganov/development/github/ggml/src/../include/ggml -mavx -mavx2 -mfma -mf16c -msse3 -O3 -DNDEBUG -std=gnu11 -fPIC -Wall -Wpedantic -Wformat=2 -Wno-unused -Wstrict-prototypes -Werror=vla -Wunused -Wextra -Wcast-qual -Wdouble-promotion -Wshadow -Wno-unused-function -Wmissing-prototypes -MD -MT src/CMakeFiles/ggml.dir/ggml-alloc.c.o -MF CMakeFiles/ggml.dir/ggml-alloc.c.o.d -o CMakeFiles/ggml.dir/ggml-alloc.c.o -c /home/ggerganov/development/github/ggml/src/ggml-alloc.c
[  3%] Building C object src/CMakeFiles/ggml.dir/ggml-backend.c.o
cd /home/ggerganov/development/github/ggml/build/src && /usr/bin/cc -DGGML_BUILD -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_EXPORTS -I/home/ggerganov/development/github/ggml/src/. -I/home/ggerganov/development/github/ggml/src/../include -I/home/ggerganov/development/github/ggml/src/../include/ggml -mavx -mavx2 -mfma -mf16c -msse3 -O3 -DNDEBUG -std=gnu11 -fPIC -Wall -Wpedantic -Wformat=2 -Wno-unused -Wstrict-prototypes -Werror=vla -Wunused -Wextra -Wcast-qual -Wdouble-promotion -Wshadow -Wno-unused-function -Wmissing-prototypes -MD -MT src/CMakeFiles/ggml.dir/ggml-backend.c.o -MF CMakeFiles/ggml.dir/ggml-backend.c.o.d -o CMakeFiles/ggml.dir/ggml-backend.c.o -c /home/ggerganov/development/github/ggml/src/ggml-backend.c
[  4%] Building C object src/CMakeFiles/ggml.dir/ggml-quants.c.o
cd /home/ggerganov/development/github/ggml/build/src && /usr/bin/cc -DGGML_BUILD -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_EXPORTS -I/home/ggerganov/development/github/ggml/src/. -I/home/ggerganov/development/github/ggml/src/../include -I/home/ggerganov/development/github/ggml/src/../include/ggml -mavx -mavx2 -mfma -mf16c -msse3 -O3 -DNDEBUG -std=gnu11 -fPIC -Wall -Wpedantic -Wformat=2 -Wno-unused -Wstrict-prototypes -Werror=vla -Wunused -Wextra -Wcast-qual -Wdouble-promotion -Wshadow -Wno-unused-function -Wmissing-prototypes -MD -MT src/CMakeFiles/ggml.dir/ggml-quants.c.o -MF CMakeFiles/ggml.dir/ggml-quants.c.o.d -o CMakeFiles/ggml.dir/ggml-quants.c.o -c /home/ggerganov/development/github/ggml/src/ggml-quants.c
[  5%] Linking C shared library libggml.so
cd /home/ggerganov/development/github/ggml/build/src && /home/ggerganov/.local/lib/python3.10/site-packages/cmake/data/bin/cmake -E cmake_link_script CMakeFiles/ggml.dir/link.txt --verbose=1
/usr/bin/cc -fPIC  -mavx -mavx2 -mfma -mf16c -msse3 -O3 -DNDEBUG -shared -Wl,-soname,libggml.so -o libggml.so CMakeFiles/ggml.dir/ggml.c.o "CMakeFiles/ggml.dir/ggml-alloc.c.o" "CMakeFiles/ggml.dir/ggml-backend.c.o" "CMakeFiles/ggml.dir/ggml-quants.c.o"  -Wl,-rpath,:::::::::::::: -lm -ldl 
make[2]: Leaving directory '/home/ggerganov/development/github/ggml/build'

AMD Ryzen 9 5950X 16-Core Processor

FSSRepo · 2023-12-04T20:23:25Z

@slaren I'm not exactly sure what ggml_mul_mat_id is for; it might be the indirect GEMM operation. If so, according to this paper, it seems like it could provide the possibility of optimizing convolution algorithms in terms of both performance and memory (avoiding the memory overhead of im2col), although I could be mistaken.

slaren · 2023-12-04T20:39:39Z

The mul_mat_id is an indirect GEMM, but at the matrix level. So what this allows is choosing a matrix at run time based on the result of some operation. This paper seems to require indirection at the row level, so I don't think that we could use mul_mat_id to implement this. They mention that im2col is often used to take advantage of highly optimized BLAS libraries, but their algorithm doesn't seem compatible with a BLAS GEMM.

slaren · 2023-12-04T21:21:31Z

After fixing the sizes of the matrices, the error with Metal is lower than CUDA in most cases, even with MMQ. Without MMQ it is probably because we do the mat mul in F16, and with MMQ it may be because CUDA quantizes src1 to q8_1 instead of q8_0. I guess we should just increase the NMSE threshold for mat mul.

src/ggml-cuda.cu

…, add mat-vec tests

…_mat_id

ggml-ci

slaren · 2023-12-05T11:06:29Z

I have changed the mul mat id implementation in ggml-cuda to force a synchronization to copy the id to the CPU and use the standard ggml_cuda_mul_mat. It is probably going to add a bit of latency compared to making the choice entirely on the GPU, but it is a much simpler implementation. I think we need to do some refactoring of ggml-cuda before we can replicate the entire behavior of ggml_cuda_mul_mat on the device side.

ggerganov · 2023-12-05T11:47:53Z

Yes, some CUDA refactoring is needed soon. I'm also planning a refactoring of the Metal backend.

What do you think should be the next steps? Should we start integrating ggml_backend into llama.cpp or are there more things that need to be done first?

Btw, the test-backend-ops is really awesome! I see you are planning to also add performance stats

slaren · 2023-12-05T11:57:18Z

I think the main blockers left to be able to use ggml-backend in llama.cpp are OpenCL support and CUDA split tensors. It shouldn't take too long to add these, but I don't think it is quite ready yet.

slaren · 2023-12-05T12:34:35Z

I thinks this can be merged already if it is blocking #621 or anything else. I still want to add performance testing to test-backend-ops and add optimized kernels for add/mul/div when no broadcasting is required, but that can be done in another PR.

slaren · 2023-12-05T12:40:00Z

ci/run.sh

+ # TODO: disabled until we fix test-backend-ops
+ #set -e


Since the tests are passing now, maybe this can be enabled again.

slaren marked this pull request as ready for review December 1, 2023 00:10

ggerganov approved these changes Dec 1, 2023

View reviewed changes

slaren force-pushed the mul-add-bcast branch from b2df41f to 63db003 Compare December 2, 2023 20:43

slaren changed the title ~~ggml : support broadcasting in dim 0 in add and mul~~ ggml : full broadcast in mul, add, div + ggml_mul_mat_id, ggml_sort, ggml_top_k, Dec 2, 2023

slaren changed the title ~~ggml : full broadcast in mul, add, div + ggml_mul_mat_id, ggml_sort, ggml_top_k,~~ ggml : full broadcast in mul, add, div + ggml_mul_mat_id, ggml_sort, ggml_top_k Dec 2, 2023

slaren changed the title ~~ggml : full broadcast in mul, add, div + ggml_mul_mat_id, ggml_sort, ggml_top_k~~ ggml : full broadcast in mul, add, div + ggml_mul_mat_id, ggml_argsort, ggml_top_k Dec 2, 2023

FSSRepo mentioned this pull request Dec 4, 2023

ggml: new gpu kernels + extends ggml_leaky_relu + ggml_pad #621

Merged

10 tasks

ggerganov mentioned this pull request Dec 4, 2023

Unable to attach with GDB when hitting GGML_ASSERT after backend v2 changes #630

Closed

ggerganov reviewed Dec 4, 2023

View reviewed changes

src/ggml-cuda.cu Outdated Show resolved Hide resolved

ggerganov mentioned this pull request Dec 5, 2023

false alarm of error check in ggml_new_tensor_impl() #629

Closed

ggerganov reviewed Dec 5, 2023

View reviewed changes

src/ggml-cuda.cu Show resolved Hide resolved

ggml : support broadcasting in dim 0 in add and mul

7af93f5

slaren and others added 17 commits December 5, 2023 11:56

move kernel

1030f9b

metal : add ggml_argsort

f61d556

mul_mat_id : fix missing init task

cdc3583

cuda/metal: fix argsort synchronization

92272ae

metal : add ggml_mul_mat_id

52545d0

ggml-cuda : add mul_mat_id for f16 + tensor cores

f2b7fed

test-backend-ops : add tests for quants mat mul

3dbc372

ggml : fix q5_0 and q5_1 hist stats

3b4745f

test-backend-ops : use smaller matrices to avoid automatic offloading…

0a37f34

…, add mat-vec tests

metal : fix alibi to match the CPU behavior

7dc0da5

metal : check dimensions in supports_op

343b3af

test-backend-ops : reduce error threshold for mat muls

c5db45c

ggml-cuda : simplify dequantize funs, add supports_op by type for mul…

fa138d0

…_mat_id

ggml-cuda : support quantized types in mul_mat_id with cublas

276a8c8

ggml-cuda : add fallback over CPU for mul_mat_id

f7a24b6

test-backend-ops : increase mul mat error threshold

101be8f

cleanup

c54c2cb

ggml-ci

slaren force-pushed the mul-add-bcast branch from 88f346c to c54c2cb Compare December 5, 2023 10:58

test-backend-ops : fix usage

ca1b243

cleanup

092db34

slaren commented Dec 5, 2023

View reviewed changes

ggerganov added 2 commits December 5, 2023 14:47

ci : re-enable tests

5805ff9

metal : fix compile warnings

8b08406

ggerganov approved these changes Dec 5, 2023

View reviewed changes

slaren merged commit 703825f into master Dec 5, 2023
4 checks passed

slaren deleted the mul-add-bcast branch December 5, 2023 12:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ggml : full broadcast in mul, add, div + ggml_mul_mat_id, ggml_argsort, ggml_top_k #625

ggml : full broadcast in mul, add, div + ggml_mul_mat_id, ggml_argsort, ggml_top_k #625

slaren commented Nov 30, 2023

slaren commented Dec 1, 2023

ggerganov left a comment

FSSRepo commented Dec 1, 2023

FSSRepo commented Dec 1, 2023 •

edited

Loading

slaren commented Dec 1, 2023

FSSRepo commented Dec 3, 2023 •

edited

Loading

slaren commented Dec 3, 2023

ggerganov commented Dec 4, 2023

ggerganov commented Dec 4, 2023

slaren commented Dec 4, 2023

slaren commented Dec 4, 2023

slaren commented Dec 4, 2023 •

edited

Loading

ggerganov commented Dec 4, 2023

slaren commented Dec 4, 2023

slaren commented Dec 4, 2023

ggerganov commented Dec 4, 2023

FSSRepo commented Dec 4, 2023 •

edited

Loading

slaren commented Dec 4, 2023 •

edited

Loading

slaren commented Dec 4, 2023

slaren commented Dec 5, 2023 •

edited

Loading

ggerganov commented Dec 5, 2023

slaren commented Dec 5, 2023

slaren commented Dec 5, 2023

slaren Dec 5, 2023

ggml : full broadcast in mul, add, div + ggml_mul_mat_id, ggml_argsort, ggml_top_k #625

ggml : full broadcast in mul, add, div + ggml_mul_mat_id, ggml_argsort, ggml_top_k #625

Conversation

slaren commented Nov 30, 2023

slaren commented Dec 1, 2023

ggerganov left a comment

Choose a reason for hiding this comment

FSSRepo commented Dec 1, 2023

FSSRepo commented Dec 1, 2023 • edited Loading

Your implementation

My implementation:

slaren commented Dec 1, 2023

FSSRepo commented Dec 3, 2023 • edited Loading

slaren commented Dec 3, 2023

ggerganov commented Dec 4, 2023

ggerganov commented Dec 4, 2023

slaren commented Dec 4, 2023

slaren commented Dec 4, 2023

slaren commented Dec 4, 2023 • edited Loading

ggerganov commented Dec 4, 2023

slaren commented Dec 4, 2023

slaren commented Dec 4, 2023

ggerganov commented Dec 4, 2023

FSSRepo commented Dec 4, 2023 • edited Loading

slaren commented Dec 4, 2023 • edited Loading

slaren commented Dec 4, 2023

slaren commented Dec 5, 2023 • edited Loading

ggerganov commented Dec 5, 2023

slaren commented Dec 5, 2023

slaren commented Dec 5, 2023

slaren Dec 5, 2023

Choose a reason for hiding this comment

FSSRepo commented Dec 1, 2023 •

edited

Loading

FSSRepo commented Dec 3, 2023 •

edited

Loading

slaren commented Dec 4, 2023 •

edited

Loading

FSSRepo commented Dec 4, 2023 •

edited

Loading

slaren commented Dec 4, 2023 •

edited

Loading

slaren commented Dec 5, 2023 •

edited

Loading