Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ggml : vectorize Q8_0 quantization #127

Merged
merged 1 commit into from
May 3, 2023
Merged

ggml : vectorize Q8_0 quantization #127

merged 1 commit into from
May 3, 2023

Conversation

ggerganov
Copy link
Owner

@ggerganov ggerganov commented May 3, 2023

close #124

@ggerganov
Copy link
Owner Author

@skeskinen

Please try this branch and see if the performance for Q4_0 is improved

@skeskinen
Copy link
Contributor

Performance of q4_0 has indeed improved with this branch. 👍 Now all 4 sizes seem to be pretty even in performance.

all-MiniLM-L6-v2

Data Type STSBenchmark eval time EmotionClassification eval time
f32 0.8201 6.04 0.4082 10.00
f16 0.8201 5.57 0.4085 9.17
q4_0 0.8175 5.39 0.3911 11.41
q4_1 0.8223 6.03 0.4027 10.08
sbert 0.8203 2.64 0.4085 5.08
sbert-batchless 0.8203 11.93 0.4085 14.61

all-MiniLM-L12-v2

Data Type STSBenchmark eval time EmotionClassification eval time
f32 0.8306 13.75 0.4117 20.93
f16 0.8306 10.94 0.4119 18.23
q4_0 0.8310 10.71 0.4183 19.87
q4_1 0.8325 11.14 0.4093 18.57
sbert 0.8309 4.99 0.4117 8.91
sbert-batchless 0.8309 23.70 0.4117 27.96

@ggerganov ggerganov marked this pull request as ready for review May 3, 2023 20:21
@ggerganov ggerganov merged commit 94a24c9 into master May 3, 2023
@ggerganov ggerganov deleted the simd-q8_0 branch May 3, 2023 20:22
ggerganov added a commit to ggerganov/llama.cpp that referenced this pull request May 3, 2023
github-actions bot pushed a commit to KerfuffleV2/ggml-sys-bleedingedge that referenced this pull request May 4, 2023
== Relevant log messages from source repo:

commit 799fdc1b5d888b8a8682baf112e1c2a2df0df1c4
Author: Georgi Gerganov <[email protected]>
Date:   Wed May 3 23:24:20 2023 +0300

    ggml : vectorize Q8_0 quantization

    ggerganov/ggml#127 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Vectorize quantize_row_q8_0()
2 participants