sync : llama.cpp #677

ggerganov · 2024-01-03T09:30:57Z

Can't decide if we should be squashing these sync PRs or merge the commits as is.
On one hand we preserve a more detailed history, but on the other hand each commit is replicated across all synced repos.

Signed-off-by: hydai <[email protected]>

* feat: add avx_vnni based on intel documents * ggml: add avx vnni based on intel document * llama: add avx vnni information display * docs: add more details about using oneMKL and oneAPI for intel processors * docs: add more details about using oneMKL and oneAPI for intel processors * docs: add more details about using oneMKL and oneAPI for intel processors * docs: add more details about using oneMKL and oneAPI for intel processors * docs: add more details about using oneMKL and oneAPI for intel processors * Update ggml.c Fix indentation upgate Co-authored-by: Georgi Gerganov <[email protected]> --------- Co-authored-by: Georgi Gerganov <[email protected]>

ggml-ci

* ggml : disable fast-math for Metal (cmake build only) ggml-ci * metal : fix Metal API debug warnings * cmake : add -fno-inline for Metal build (llama/4545) * metal : fix API debug warnings * metal : fix compile warnings * metal : use uint64_t for strides * cmake : rename option to LLAMA_METAL_SHADER_DEBUG * metal : fix mat-vec Q8_0 kernel for BS > 1 * metal : normalize mat-vec kernel signatures * cmake : respect LLAMA_QKK_64 option * metal : fix mat-vec Q4_K kernel for QK_K == 64 ggml-ci

* ggml : disable fast-math for Metal (cmake build only) ggml-ci * metal : fix Metal API debug warnings * cmake : add -fno-inline for Metal build (llama/4545) * metal : fix API debug warnings * metal : fix compile warnings * metal : use uint64_t for strides * cmake : rename option to LLAMA_METAL_SHADER_DEBUG * metal : fix mat-vec Q8_0 kernel for BS > 1 * metal : normalize mat-vec kernel signatures * cmake : respect LLAMA_QKK_64 option * metal : fix mat-vec Q4_K kernel for QK_K == 64 * metal : optimizing ggml_mul_mat_id (wip) * metal : minor fix * metal : opt mul_mm_id

ggml-ci

slaren · 2024-01-03T12:11:14Z

I think it is worth it to keep the commits. Currently, when looking at the last commit that changed a line, to understand why that change is there, many times we only get "sync". It would be useful to have the correct history.

src/ggml-cuda.cu

Co-authored-by: slaren <[email protected]>

ggerganov and others added 10 commits January 3, 2024 11:24

scripts : fix sync order + metal sed

67ff4b1

cuda: fix vmm oom issue on NVIDIA AGX Orin (llama/4687)

1e51d18

Signed-off-by: hydai <[email protected]>

CUDA: fix tensor core logic for Pascal and HIP (llama/4682)

e60302e

CUDA: fixed tensor cores not being used on RDNA3 (llama/4697)

6aab606

ggml : add ggml_vdotq_s32 alias (llama/4715)

d95c7d5

ggml-ci

sync : llama.cpp

b5422f4

ggml-ci

metal : add kernel_get_rows_i32

5b6f3ae

ggml-ci

ggerganov changed the title ~~scripts : fix sync order + metal sed~~ sync : llama.cpp Jan 3, 2024

cuda : mark I16 and I32 ops as unsupported

ec9d3e5

ggml-ci

ggerganov force-pushed the sync branch from c9c4a0d to ec9d3e5 Compare January 3, 2024 11:02

ggerganov requested a review from slaren January 3, 2024 11:17

slaren approved these changes Jan 3, 2024

View reviewed changes

src/ggml-cuda.cu Outdated Show resolved Hide resolved

Update src/ggml-cuda.cu

3f6bcb6

Co-authored-by: slaren <[email protected]>

ggerganov merged commit 3fd01e0 into master Jan 3, 2024
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sync : llama.cpp #677

sync : llama.cpp #677

ggerganov commented Jan 3, 2024

slaren commented Jan 3, 2024

sync : llama.cpp #677

sync : llama.cpp #677

Conversation

ggerganov commented Jan 3, 2024

slaren commented Jan 3, 2024