Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

4-bit Integer quantisation #27

Merged
merged 38 commits into from
Mar 29, 2023
Merged
Changes from 1 commit
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
8f45628
gq : attempt at n-bit quantization
ggerganov Feb 21, 2023
b0a46fd
gq : add amax based method 3
ggerganov Feb 22, 2023
da2de94
gq : progress on method 2
ggerganov Feb 22, 2023
aa5506c
gq : method 4 (AVX2)
ggerganov Feb 23, 2023
1fc11de
gq : method 4 (ARM)
ggerganov Feb 23, 2023
dae323c
gq : method 4 (AVX2 attempt) + method 5 (no min)
ggerganov Feb 24, 2023
349e917
gq : method 5 (ARM)
ggerganov Feb 24, 2023
ff4c653
gpt-2 : model conversion for Q4_0 quantization
ggerganov Feb 25, 2023
21514b7
ggml : Q4_0 quantization support (ggml_get_rows())
ggerganov Feb 25, 2023
ff54fda
gpt-2 : loading Q4_0 quantized model
ggerganov Feb 25, 2023
2219c11
ggml : q4_0 quantization support
ggerganov Feb 25, 2023
5bdfce2
ggml : q4_1 quantization support (seems to work for bigger models)
ggerganov Feb 25, 2023
b0cab89
gpt-2 : add gpt-2-quantize tool for quantizing f32 GPT-2 models
ggerganov Feb 25, 2023
2f11888
ggml : 4-bit quantization works (only scalar for now)
ggerganov Feb 25, 2023
b82c27f
gq : add method 6 (ARM)
ggerganov Feb 25, 2023
2e75d8f
ggml : vectorized mad q4_0 (ARM)
ggerganov Feb 25, 2023
b0c22a4
ggml : vectorized quantize_row_q4_0 (ARM)
ggerganov Feb 26, 2023
3c757a4
ggml : simplify mad q4_0 (ARM)
ggerganov Feb 26, 2023
e3ad879
ggml : minor indentations
ggerganov Feb 26, 2023
8abcab4
gpt-j : support for 4-bit quantized model inference
ggerganov Feb 26, 2023
c21972c
ggml : GGML_ASSERT() instead of assert() where appropriate
ggerganov Feb 26, 2023
4a56c5b
gpt : avoid ggml_transpose on model tensors (new models!)
ggerganov Feb 26, 2023
904605c
gpt-2 : minor
ggerganov Feb 26, 2023
441a38f
gpt-j : fix conversion for FP16 models (such as GPT-JT-6B)
ggerganov Feb 26, 2023
5336828
ggml : add ggml_compute_forward_rope_f16()
ggerganov Feb 26, 2023
99af48e
gpt : fix memory usage computation
ggerganov Feb 26, 2023
6aae09e
ggml : fix ggml_is_contiguous() to take into account blck size
ggerganov Feb 26, 2023
d0ac5eb
whisper : add whisper-qunatize tool
ggerganov Feb 26, 2023
37d427d
whisper : add support for quantized models
ggerganov Feb 26, 2023
e904a58
whisper : mem usage based on model format type
ggerganov Feb 26, 2023
63a8f62
gpt : seems not worth to use FP16 for KV cache
ggerganov Feb 26, 2023
519ce47
gpt : support quantisation of f16 models files
ggerganov Feb 26, 2023
9881c2b
ggml : fixes for rpi4
ggerganov Feb 26, 2023
a85bc0f
whisper : add Q4_1 model sizes
ggerganov Feb 26, 2023
c4f1403
ggml : add WASM SIMD for Q4_0
ggerganov Feb 27, 2023
331a862
utils : print quantization histograms
ggerganov Mar 6, 2023
154fcc3
ggml : sync all changes from llama.cpp and whisper.cpp
ggerganov Mar 29, 2023
724c45d
ggml : finalize the Q4_1 quantization for ARM_NEON
ggerganov Mar 29, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
ggml : GGML_ASSERT() instead of assert() where appropriate
  • Loading branch information
ggerganov committed Mar 29, 2023
commit c21972cb86dc090c1631fba22efcf65ea51d8e24
Loading