ggml : vectorize Q8_0 quantization #127

ggerganov · 2023-05-03T18:36:01Z

close #124

ggerganov · 2023-05-03T18:36:59Z

Please try this branch and see if the performance for Q4_0 is improved

skeskinen · 2023-05-03T19:54:46Z

Performance of q4_0 has indeed improved with this branch. 👍 Now all 4 sizes seem to be pretty even in performance.

all-MiniLM-L6-v2

Data Type	STSBenchmark	eval time	EmotionClassification	eval time
f32	0.8201	6.04	0.4082	10.00
f16	0.8201	5.57	0.4085	9.17
q4_0	0.8175	5.39	0.3911	11.41
q4_1	0.8223	6.03	0.4027	10.08
sbert	0.8203	2.64	0.4085	5.08
sbert-batchless	0.8203	11.93	0.4085	14.61

all-MiniLM-L12-v2

Data Type	STSBenchmark	eval time	EmotionClassification	eval time
f32	0.8306	13.75	0.4117	20.93
f16	0.8306	10.94	0.4119	18.23
q4_0	0.8310	10.71	0.4183	19.87
q4_1	0.8325	11.14	0.4093	18.57
sbert	0.8309	4.99	0.4117	8.91
sbert-batchless	0.8309	23.70	0.4117	27.96

ggerganov/ggml#127 (comment)

== Relevant log messages from source repo: commit 799fdc1b5d888b8a8682baf112e1c2a2df0df1c4 Author: Georgi Gerganov <[email protected]> Date: Wed May 3 23:24:20 2023 +0300 ggml : vectorize Q8_0 quantization ggerganov/ggml#127 (comment)

ggml : vectorize Q8_0 quantization

fd4d6aa

ggerganov marked this pull request as ready for review May 3, 2023 20:21

ggerganov merged commit 94a24c9 into master May 3, 2023

ggerganov deleted the simd-q8_0 branch May 3, 2023 20:22

ggerganov added a commit to ggerganov/llama.cpp that referenced this pull request May 3, 2023

ggml : vectorize Q8_0 quantization

799fdc1

ggerganov/ggml#127 (comment)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ggml : vectorize Q8_0 quantization #127

ggml : vectorize Q8_0 quantization #127

ggerganov commented May 3, 2023 •

edited

ggerganov commented May 3, 2023

skeskinen commented May 3, 2023

ggml : vectorize Q8_0 quantization #127

ggml : vectorize Q8_0 quantization #127

Conversation

ggerganov commented May 3, 2023 • edited

ggerganov commented May 3, 2023

skeskinen commented May 3, 2023

all-MiniLM-L6-v2

all-MiniLM-L12-v2

ggerganov commented May 3, 2023 •

edited