Name		Name	Last commit message	Last commit date
Latest commit History 90 Commits
experiments		experiments
low_high_gptq		low_high_gptq
quant		quant
.gitignore		.gitignore
README.md		README.md
bnn_eval.py		bnn_eval.py
bnn_train_test.py		bnn_train_test.py
datautils.py		datautils.py
evaluate.py		evaluate.py
mmlu_avg.py		mmlu_avg.py
utils.py		utils.py

Repository files navigation

Binary Quantization for Large Language Models

This project focuses on binary quantization techniques applied to large language models. Binary quantization is a process of reducing the memory footprint and computational complexity of neural networks by representing weights and activations as binary values (-1 or +1) instead of traditional floating-point numbers. This technique has gained significant attention in the field of deep learning due to its potential to improve efficiency and enable deployment on resource-constrained devices.

Introduction

The exponential growth of language models has posed challenges in terms of model size, computational requirements, and energy consumption. Binary quantization offers a promising solution to address these challenges by reducing the memory footprint and improving the inference speed of large language models. This project aims to provide an implementation of binary quantization techniques that can be applied to various popular language models.

Model support

Huggingface models

facebook/opt-125m
facebook/opt-1.3b
huggyllama/llama-7b
huggyllama/llama-13b
meta-llama/Llama-2-7b-chat-hf
meta-llama/Llama-2-13b-chat-hf
openchat/openchat_v3.2

Usage

Enviroment Setting

conda create -n binary_llm python=3.10 pip
pip install torch transformers lm_eval accelerate tensorboardX bitsandbytes sentencepiece

Note python version must>=3.10

Training

Run the script with the desired arguments:

python train_model.py --model_id <pretrained_model_id> --dataset <dataset_name> [--debug]

Arguments:

model_id: Pretrained model ID (default: "facebook/opt-350m")
dataset: Dataset name (default: "Abirate/english_quotes")
debug: Enable debug mode (optional)

Example: binarizing via Xnor algorithm with the guidance of KD.

CUDA_VISIBLE_DEVICES=5 python bnn_train_layerwise_w_KD.py --binarization_method='xnor'

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Binary Quantization for Large Language Models

Introduction

Model support

Usage

Enviroment Setting

Training

About

Releases

Packages

Contributors 3

Languages

License

hahnyuan/PB-LLM

Folders and files

Latest commit

History

Repository files navigation

Binary Quantization for Large Language Models

Introduction

Model support

Usage

Enviroment Setting

Training

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages