[Documentation] How to quantize using ggml? #320

mahimairaja · 2023-06-28T15:51:43Z

Recently I found an awesome open-source LLM that has used ggml to quantize their model to 4-bit. I would love to learn more about similar practices. Any sort of blog, tutorial or documentation would be helpful.

* Update Makefile to detect AVX512 support and add compiler flags if it's available * Based on existing AVX2 implementation, dot product on one 32-value block of 4-bit quantized ints at a time * Perform 8 bit -> 16 bit sign extension and multiply+add on 32 values at time instead of 16 * Use built-in AVX512 horizontal reduce add to get sum at the end * Manual unrolling on inner dot product loop to reduce loop counter overhead

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Documentation] How to quantize using ggml? #320

[Documentation] How to quantize using ggml? #320

mahimairaja commented Jun 28, 2023

[Documentation] How to quantize using ggml? #320

[Documentation] How to quantize using ggml? #320

Comments

mahimairaja commented Jun 28, 2023