Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unit test for quantization functions #953

Merged
merged 4 commits into from
Apr 22, 2023

Conversation

unbounded
Copy link
Collaborator

Use ggml_internal_get_quantize_fn to loop through all quantization formats and run sanity checks on the implemented functions.
They are run by ctest, but also accept a few command line parameters for more output.

This is a quick test with generated data, so the measurements are not very useful to guide perplexity, but they might surface issues like #876

Also add a microbenchmark that times these functions directly without running the rest of the GGML graph.

Some similarity with #653, but I think there is value both in having tests that run the full GGML graph and tests for specific issues with the SIMD implementations.

Example output:

test-quantize-fns -v
 q4_0 absolute quantization error: ok (0.001466)
 q4_0 reference implementation error: ok (0.000000)
 q4_0 dot product error: ok (0.002492)
 q4_1 absolute quantization error: ok (0.001296)
 q4_1 reference implementation error: ok (0.000000)
 q4_1 dot product error: ok (0.012034)
0 tests failed
test-quantize-perf -3 --op vec_dot_q
q4_0
  vec_dot_q
    3200 values (0.01 MB)
      min cycles/32 vals   :      2.95
      avg cycles/32 vals   :      2.97
      float32 throughput   :     59.60 GB/s
      quantized throughput :      9.31 GB/s
    64000 values (0.24 MB)
      min cycles/32 vals   :      2.54
      avg cycles/32 vals   :      3.89
      float32 throughput   :     45.85 GB/s
      quantized throughput :      7.16 GB/s
    640000 values (2.44 MB)
      min cycles/32 vals   :      2.52
      avg cycles/32 vals   :      2.77
      float32 throughput   :     64.26 GB/s
      quantized throughput :     10.04 GB/s

q4_1
  vec_dot_q
    3200 values (0.01 MB)
      min cycles/32 vals   :      5.44
      avg cycles/32 vals   :      5.48
      float32 throughput   :     29.80 GB/s
      quantized throughput :      5.59 GB/s
    64000 values (0.24 MB)
      min cycles/32 vals   :      5.21
      avg cycles/32 vals   :      6.79
      float32 throughput   :     26.20 GB/s
      quantized throughput :      4.91 GB/s
    640000 values (2.44 MB)
      min cycles/32 vals   :      5.05
      avg cycles/32 vals   :      5.06
      float32 throughput   :     35.32 GB/s
      quantized throughput :      6.62 GB/s

llama.cpp Outdated Show resolved Hide resolved
@prusnak
Copy link
Collaborator

prusnak commented Apr 14, 2023

Please fix build failures.

If the build failures turn out to be more problematic, we can extract the first commit and submit it as a separate pull request, which can be reviewed and merged pretty quickly.

Then we can rebase this branch/PR and try to figure out why the quantization tests fail.

@prusnak
Copy link
Collaborator

prusnak commented Apr 14, 2023

I went ahead and extracted the first commit (including my suggestion from above #953 (comment)) as Pull Request #970

@sw
Copy link
Collaborator

sw commented Apr 14, 2023

You might remove test-quantize.c, that was my rather lazy attempt at a unit test.

@prusnak
Copy link
Collaborator

prusnak commented Apr 14, 2023

#970 has been merged

Please rebase the branch on top of current master:

git checkout master
git pull
git checkout quantize-tests
git rebase master
git push --force

or you can rebase interactively with git rebase -i master and drop the first commit

@unbounded unbounded force-pushed the quantize-tests branch 2 times, most recently from 2a0ffeb to cd3bc37 Compare April 14, 2023 21:44
// Generate synthetic data
void generate_data(float offset, size_t n, float * dst) {
for (size_t i = 0; i < n; i++) {
dst[i] = 0.1 + 2*cosf(i + offset);
Copy link
Collaborator

@sw sw Apr 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this (or the maximum errors) needs improvement.
I tried varying this slightly, and with -0.2 + 2*cosf(i + offset), q4_0 dot product fails.

We should try to create data that matches the distribution in the actual model, maybe using std::normal_distribution. @prusnak made some histograms of the models: #397 (comment)

Since Q4_0 and Q4_1 effectively differ in how they handle a bias in the data (0.1 in your case), we might want to test separately with and without bias.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can try to match the distribution better but I somewhat disagree with the reasoning here - it doesn't matter if the data matches the model, as long as the test fails when an implementation is broken.
If anything it might be good to add some "unusual" patterns like all zeroes, all negative/positive etc.

Copy link
Collaborator

@sw sw Apr 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, maybe it's better to have deterministic test data. So it's just a matter of the thresholds being set too tight?

Edit: I can't seem to reproduce the problem right now. So I guess the maximum errors are okay as they are.

Use the ggml_internal_get_quantize_fn function to loop through all
quantization formats and run a sanity check on the result.

Also add a microbenchmark that times these functions directly without
running the rest of the GGML graph.
Fix issues uncovered in CI
 - need to use sizes divisible by 32*8 for loop unrolling
 - use intrinsic header that should work on Mac
Per PR comment, subsumed by test-quantize-fns
@ggerganov
Copy link
Owner

Somehow I've lost track of this PR - sorry

What is left to be done before merge?
I see a comment by @sw that does not seem to be addressed yet

@ggerganov ggerganov added testing Everything test related high priority Very important issue labels Apr 22, 2023
@ggerganov ggerganov merged commit 5f93949 into ggerganov:master Apr 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
high priority Very important issue testing Everything test related
Development

Successfully merging this pull request may close these issues.

4 participants