Add ability to use importance matrix for all k-quants (#4930)

Co-authored-by: Iwan Kawrakow <[email protected]>
ggerganov · Jan 14, 2024 · 467a882 · 467a882
1 parent bb0c139
commit 467a882
Show file tree

Hide file tree

Showing 4 changed files with 462 additions and 16 deletions.
diff --git a/examples/quantize/quantize.cpp b/examples/quantize/quantize.cpp
@@ -82,7 +82,7 @@ static void usage(const char * executable) {
  printf(" --allow-requantize: Allows requantizing tensors that have already been quantized. Warning: This can severely reduce quality compared to quantizing from 16bit or 32bit\n");
  printf(" --leave-output-tensor: Will leave output.weight un(re)quantized. Increases model size but may also increase quality, especially when requantizing\n");
  printf(" --pure: Disable k-quant mixtures and quantize all tensors to the same type\n");
- printf(" --imatrixfile_name: use data in file_name as importance matrix for quant optimizations\n");
+ printf(" --imatrix file_name: use data in file_name as importance matrix for quant optimizations\n");
  printf(" --include-weights tensor_name: use importance matrix for this/these tensor(s)\n");
  printf(" --exclude-weights tensor_name: use importance matrix for this/these tensor(s)\n");
  printf("Note: --include-weights and --exclude-weights cannot be used together\n");