This is the official PyTorch implementation of "LLM-QBench: A Benchmark Towards the Best Practice for Post-training Quantization of Large Language Models", and also an efficient LLM compression tool with various advanced compression methods, supporting multiple inference backends.
benchmark
deployment
tool
evaluation
falcon
pruning
llama
quantization
opt
post-training-quantization
awq
ptq
large-language-models
llm
smoothquant
internlm
llama2
internlm2
llama3
omniquant
-
Updated
Jun 18, 2024 - Python