Unit test for quantization functions #953

unbounded · 2023-04-13T22:51:49Z

Use ggml_internal_get_quantize_fn to loop through all quantization formats and run sanity checks on the implemented functions.
They are run by ctest, but also accept a few command line parameters for more output.

This is a quick test with generated data, so the measurements are not very useful to guide perplexity, but they might surface issues like #876

Also add a microbenchmark that times these functions directly without running the rest of the GGML graph.

Some similarity with #653, but I think there is value both in having tests that run the full GGML graph and tests for specific issues with the SIMD implementations.

Example output:

test-quantize-fns -v

 q4_0 absolute quantization error: ok (0.001466)
 q4_0 reference implementation error: ok (0.000000)
 q4_0 dot product error: ok (0.002492)
 q4_1 absolute quantization error: ok (0.001296)
 q4_1 reference implementation error: ok (0.000000)
 q4_1 dot product error: ok (0.012034)
0 tests failed

test-quantize-perf -3 --op vec_dot_q

q4_0
  vec_dot_q
    3200 values (0.01 MB)
      min cycles/32 vals   :      2.95
      avg cycles/32 vals   :      2.97
      float32 throughput   :     59.60 GB/s
      quantized throughput :      9.31 GB/s
    64000 values (0.24 MB)
      min cycles/32 vals   :      2.54
      avg cycles/32 vals   :      3.89
      float32 throughput   :     45.85 GB/s
      quantized throughput :      7.16 GB/s
    640000 values (2.44 MB)
      min cycles/32 vals   :      2.52
      avg cycles/32 vals   :      2.77
      float32 throughput   :     64.26 GB/s
      quantized throughput :     10.04 GB/s

q4_1
  vec_dot_q
    3200 values (0.01 MB)
      min cycles/32 vals   :      5.44
      avg cycles/32 vals   :      5.48
      float32 throughput   :     29.80 GB/s
      quantized throughput :      5.59 GB/s
    64000 values (0.24 MB)
      min cycles/32 vals   :      5.21
      avg cycles/32 vals   :      6.79
      float32 throughput   :     26.20 GB/s
      quantized throughput :      4.91 GB/s
    640000 values (2.44 MB)
      min cycles/32 vals   :      5.05
      avg cycles/32 vals   :      5.06
      float32 throughput   :     35.32 GB/s
      quantized throughput :      6.62 GB/s

llama.cpp

prusnak · 2023-04-14T17:00:45Z

Please fix build failures.

If the build failures turn out to be more problematic, we can extract the first commit and submit it as a separate pull request, which can be reviewed and merged pretty quickly.

Then we can rebase this branch/PR and try to figure out why the quantization tests fail.

prusnak · 2023-04-14T17:10:54Z

I went ahead and extracted the first commit (including my suggestion from above #953 (comment)) as Pull Request #970

sw · 2023-04-14T17:57:28Z

You might remove test-quantize.c, that was my rather lazy attempt at a unit test.

prusnak · 2023-04-14T18:07:29Z

#970 has been merged

Please rebase the branch on top of current master:

git checkout master
git pull
git checkout quantize-tests
git rebase master
git push --force

or you can rebase interactively with git rebase -i master and drop the first commit

sw · 2023-04-15T09:05:11Z

tests/test-quantize-fns.cpp

+// Generate synthetic data
+void generate_data(float offset, size_t n, float * dst) {
+ for (size_t i = 0; i < n; i++) {
+ dst[i] = 0.1 + 2*cosf(i + offset);


I think this (or the maximum errors) needs improvement.
I tried varying this slightly, and with -0.2 + 2*cosf(i + offset), q4_0 dot product fails.

We should try to create data that matches the distribution in the actual model, maybe using std::normal_distribution. @prusnak made some histograms of the models: #397 (comment)

Since Q4_0 and Q4_1 effectively differ in how they handle a bias in the data (0.1 in your case), we might want to test separately with and without bias.

I can try to match the distribution better but I somewhat disagree with the reasoning here - it doesn't matter if the data matches the model, as long as the test fails when an implementation is broken.
If anything it might be good to add some "unusual" patterns like all zeroes, all negative/positive etc.

Yes, maybe it's better to have deterministic test data. So it's just a matter of the thresholds being set too tight?

Edit: I can't seem to reproduce the problem right now. So I guess the maximum errors are okay as they are.

Use the ggml_internal_get_quantize_fn function to loop through all quantization formats and run a sanity check on the result. Also add a microbenchmark that times these functions directly without running the rest of the GGML graph.

Fix issues uncovered in CI - need to use sizes divisible by 32*8 for loop unrolling - use intrinsic header that should work on Mac

Per PR comment, subsumed by test-quantize-fns

ggerganov · 2023-04-21T18:14:47Z

Somehow I've lost track of this PR - sorry

What is left to be done before merge?
I see a comment by @sw that does not seem to be addressed yet

prusnak reviewed Apr 14, 2023

View reviewed changes

llama.cpp Outdated Show resolved Hide resolved

prusnak mentioned this pull request Apr 14, 2023

Expose type name from ggml #970

Merged

unbounded force-pushed the quantize-tests branch 2 times, most recently from 2a0ffeb to cd3bc37 Compare April 14, 2023 21:44

sw reviewed Apr 15, 2023

View reviewed changes

sw mentioned this pull request Apr 15, 2023

Refactor ggml.c for future tensor types #1001

Merged

unbounded added 3 commits April 16, 2023 00:22

Unit test for quantization functions

ebee501

Use the ggml_internal_get_quantize_fn function to loop through all quantization formats and run a sanity check on the result. Also add a microbenchmark that times these functions directly without running the rest of the GGML graph.

test-quantize-fns: CI fixes

8bd7dd6

Fix issues uncovered in CI - need to use sizes divisible by 32*8 for loop unrolling - use intrinsic header that should work on Mac

test-quantize: remove

6071228

Per PR comment, subsumed by test-quantize-fns

unbounded force-pushed the quantize-tests branch from cd3bc37 to e95a833 Compare April 15, 2023 22:34

test-quantize: fix for q8_0 intermediates

e95a833

sw mentioned this pull request Apr 21, 2023

RMSE-optimized quants for all quantization types #1106

Closed

ggerganov added testing Everything test related high priority Very important issue labels Apr 22, 2023

ggerganov assigned unbounded Apr 22, 2023

sw approved these changes Apr 22, 2023

View reviewed changes

ggerganov merged commit 5f93949 into ggerganov:master Apr 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unit test for quantization functions #953

Unit test for quantization functions #953

unbounded commented Apr 13, 2023

prusnak commented Apr 14, 2023 •

edited

Loading

prusnak commented Apr 14, 2023 •

edited

Loading

sw commented Apr 14, 2023

prusnak commented Apr 14, 2023

sw Apr 15, 2023 •

edited

Loading

unbounded Apr 21, 2023

sw Apr 22, 2023 •

edited

Loading

ggerganov commented Apr 21, 2023

Unit test for quantization functions #953

Unit test for quantization functions #953

Conversation

unbounded commented Apr 13, 2023

prusnak commented Apr 14, 2023 • edited Loading

prusnak commented Apr 14, 2023 • edited Loading

sw commented Apr 14, 2023

prusnak commented Apr 14, 2023

sw Apr 15, 2023 • edited Loading

Choose a reason for hiding this comment

unbounded Apr 21, 2023

Choose a reason for hiding this comment

sw Apr 22, 2023 • edited Loading

Choose a reason for hiding this comment

ggerganov commented Apr 21, 2023

prusnak commented Apr 14, 2023 •

edited

Loading

prusnak commented Apr 14, 2023 •

edited

Loading

sw Apr 15, 2023 •

edited

Loading

sw Apr 22, 2023 •

edited

Loading