Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

k-quants with super-block size of 64 #2001

Merged
merged 35 commits into from
Jun 26, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
d2f12ac
k_quants: WIP super-blocks with 64 weights
Kawrakow Jun 21, 2023
9fe2a2b
k_quants: WIP super-blocks with 64 weights
Kawrakow Jun 21, 2023
1f6195c
k_quants: WIP super-blocks with 64 weights
Kawrakow Jun 21, 2023
aebd547
k_quants: WIP super-blocks with 64 weights
Kawrakow Jun 21, 2023
2b2ab31
k_quants: WIP super-blocks with 64 weights
Kawrakow Jun 21, 2023
bcf8c5c
k_quants: WIP super-blocks with 64 weights
Kawrakow Jun 21, 2023
c6c3536
k_quants: WIP super-blocks with 64 weights
Kawrakow Jun 21, 2023
5aae4b8
k_quants: WIP super-blocks with 64 weights
Kawrakow Jun 22, 2023
41e46ec
k_quants: WIP super-blocks with 64 weights
Kawrakow Jun 22, 2023
460dd84
k_quants: WIP super-blocks with 64 weights
Kawrakow Jun 22, 2023
3bd9ae7
k_quants: WIP super-blocks with 64 weights
Kawrakow Jun 22, 2023
03f30c8
k_quants: WIP super-blocks with 64 weights
Kawrakow Jun 22, 2023
cda47a6
k_quants: WIP super-blocks with 64 weights
Kawrakow Jun 22, 2023
80c75fe
k_quants: WIP super-blocks with 64 weights
Kawrakow Jun 22, 2023
2b2a13c
k_quants: WIP super-blocks with 64 weights
Kawrakow Jun 22, 2023
9d27d8d
k_quants: WIP super-blocks with 64 weights
Kawrakow Jun 22, 2023
2ff543c
k_quants: WIP super-blocks with 64 weights
Kawrakow Jun 22, 2023
d92c5a9
k_quants: WIP super-blocks with 64 weights
Kawrakow Jun 23, 2023
fae24af
k_quants: WIP super-blocks with 64 weights
Kawrakow Jun 23, 2023
e1bbcfc
k_quants: WIP super-blocks with 64 weights
Kawrakow Jun 23, 2023
167a0bb
k_quants: WIP super-blocks with 64 weights
Kawrakow Jun 23, 2023
6081a65
k_quants: WIP super-blocks with 64 weights
Kawrakow Jun 23, 2023
ff83e32
k_quants: WIP super-blocks with 64 weights
Kawrakow Jun 23, 2023
285eeb1
k_quants: WIP super-blocks with 64 weights
Kawrakow Jun 23, 2023
8b98d01
k_quants: call them _K, not _k, also on Metal
Kawrakow Jun 23, 2023
558a194
k_quants: correctly define QK_K in llama.cpp
Kawrakow Jun 23, 2023
333ffcc
Fixed bug in q4_K quantization added with the 64-block addition
Kawrakow Jun 23, 2023
88412a1
Simplify via lambda
Kawrakow Jun 23, 2023
aeefd4e
k_quants: swicth Q3_K to 4-bit scales when QK_K = 64
Kawrakow Jun 24, 2023
ce19b96
k_quants: switch Q4_K to 4-bit scales when QK_K = 64
Kawrakow Jun 24, 2023
4f61506
k_quants: forgot to add the Metal changes in last commit
Kawrakow Jun 24, 2023
ccf4901
k_quants: change Q5_K to be type 0 when QK_K = 64
Kawrakow Jun 24, 2023
2da3a59
k_quants: AVX2 implementation for new 64-weight Q5_K
Kawrakow Jun 24, 2023
53e81ca
k_quants: 10% faster ARM_NEON Q5_K dot product
Kawrakow Jun 24, 2023
5fd8337
k_quants: fixed issue caused by merging with master
Kawrakow Jun 26, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 9 additions & 5 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,7 @@ set(LLAMA_CUDA_KQUANTS_ITER "2" CACHE STRING "llama: iters./thread per block for
option(LLAMA_CLBLAST "llama: use CLBlast" OFF)
option(LLAMA_METAL "llama: use Metal" OFF)
option(LLAMA_K_QUANTS "llama: use k-quants" ON)
option(LLAMA_QKK_64 "llama: use super-block size of 64 for k-quants" OFF)

option(LLAMA_BUILD_TESTS "llama: build tests" ${LLAMA_STANDALONE})
option(LLAMA_BUILD_EXAMPLES "llama: build examples" ${LLAMA_STANDALONE})
Expand Down Expand Up @@ -225,6 +226,14 @@ if (LLAMA_BLAS)
endif()
endif()

if (LLAMA_K_QUANTS)
set(GGML_SOURCES_EXTRA ${GGML_SOURCES_EXTRA} k_quants.c k_quants.h)
add_compile_definitions(GGML_USE_K_QUANTS)
if (LLAMA_QKK_64)
add_compile_definitions(GGML_QKK_64)
endif()
endif()

if (LLAMA_CUBLAS)
cmake_minimum_required(VERSION 3.17)

Expand Down Expand Up @@ -289,11 +298,6 @@ if (LLAMA_METAL)
)
endif()

if (LLAMA_K_QUANTS)
set(GGML_SOURCES_EXTRA ${GGML_SOURCES_EXTRA} k_quants.c k_quants.h)
add_compile_definitions(GGML_USE_K_QUANTS)
endif()

if (LLAMA_CLBLAST)
find_package(CLBlast)
if (CLBlast_FOUND)
Expand Down
9 changes: 8 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -43,8 +43,11 @@ endif

# keep standard at C11 and C++11
# -Ofast tends to produce faster code, but may not be available for some compilers.
#OPT = -Ofast
ifdef LLAMA_FAST
OPT = -Ofast
else
OPT = -O3
endif
CFLAGS = -I. $(OPT) -std=c11 -fPIC
CXXFLAGS = -I. -I./examples $(OPT) -std=c++11 -fPIC
LDFLAGS =
Expand Down Expand Up @@ -131,6 +134,10 @@ ifndef LLAMA_NO_K_QUANTS
CFLAGS += -DGGML_USE_K_QUANTS
CXXFLAGS += -DGGML_USE_K_QUANTS
OBJS += k_quants.o
ifdef LLAMA_QKK_64
CFLAGS += -DGGML_QKK_64
CXXFLAGS += -DGGML_QKK_64
endif
endif

ifndef LLAMA_NO_ACCELERATE
Expand Down
Loading