Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
TL;DR See #5055
Before the recent two-bit quantization and importance matrix related changes, there were two low-bit quantization types available in
llama.cpp
:Q2_K
andQ3_K_S
.Q2_K
was basically a 3-bit quantization with just theattn_k
andattn_q
tensors quantized with 2 bit. The table shows their model sizes and perplexities (wiki.test.raw, n_ctx = 512
) for LLaMA-v2-70B:After the recent changes,
Q2_K
has become an actual 2-bit quantization (less than 3 bits-per-weight), has a LLaMA-v-70B model size of 23.71 GiB, and a perplexity of4.0039
(using an importance matrix derived fromwiki.train.raw
).Q3_K_S
has increased very slightly to 27.86 GiB, but has a better perplexity of3.6603
. Based on #5005 there is a need to have an intermediate step in terms of model size between the newQ2_K
andQ3_K_S
. This PR adds such a quantization type asQ3_K_XS
. The following table summarizes the new situation for LLaMA-v2-70BThe table on a graph:
![q2_3_quants](https://private-user-images.githubusercontent.com/48489457/298368708-e1a94838-7889-4ce6-a7a0-2f9aade8095f.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTkyNzMwMDQsIm5iZiI6MTcxOTI3MjcwNCwicGF0aCI6Ii80ODQ4OTQ1Ny8yOTgzNjg3MDgtZTFhOTQ4MzgtNzg4OS00Y2U2LWE3YTAtMmY5YWFkZTgwOTVmLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA2MjQlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNjI0VDIzNDUwNFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTg2ZTBjYWFhMWYxNzhkOTllMzVjYWY4ZDRmNDk4ZTU5OWNmNTgzYjlhMjBmMzNkNWI5NWEyNjg4NWYzOTMyMzgmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.blb5kk7QdV0Fq_uQVp75bBwAOwZEfT9najELeYtENow)