LLaMA Efficient Tuning

👋 Join our WeChat.

Changelog

[23/06/03] Now we support quantized training and inference (aka QLoRA). Try --quantization_bit 4/8 argument to work with quantized model. (experimental feature)

[23/05/31] Now we support training the BLOOM & BLOOMZ models in this repo. Try --model_name_or_path bigscience/bloomz-7b1-mt argument to use the BLOOMZ model.

Supported Models

LLaMA (7B/13B/33B/65B)
BLOOM & BLOOMZ (560M/1.1B/1.7B/3B/7.1B/176B)

Supported Training Approaches

(Continually) pre-training
- Full-parameter tuning
- Partial-parameter tuning
- LoRA
- QLoRA
Supervised fine-tuning
- Full-parameter tuning
- Partial-parameter tuning
- LoRA
- QLoRA
RLHF
- LoRA
- QLoRA

Provided Datasets

Please refer to data/README.md for details.

Some datasets require confirmation before using them, so we recommend logging in with your HuggingFace account using these commands.

pip install --upgrade huggingface_hub
huggingface-cli login

Requirement

Python 3.8+ and PyTorch 1.13.1+
🤗Transformers, Datasets, Accelerate, PEFT and TRL
protobuf, cpm_kernels and sentencepiece
jieba, rouge_chinese and nltk (used at evaluation)
gradio and mdtex2html (used in web_demo.py)

And powerful GPUs!

Getting Started

Data Preparation (optional)

Please refer to data/example_dataset for checking the details about the format of dataset files. You can either use a single .json file or a dataset loading script with multiple files to create a custom dataset.

Note: please update data/dataset_info.json to use your custom dataset. About the format of this file, please refer to data/README.md.

Dependence Installation (optional)

git clone https://github.com/hiyouga/LLaMA-Efficient-Tuning.git
conda create -n llama_etuning python=3.10
conda activate llama_etuning
cd LLaMA-Efficient-Tuning
pip install -r requirements.txt

LLaMA Weights Preparation

Download the weights of the LLaMA models.
Convert them to HF format using the following command.

python -m transformers.models.llama.convert_llama_weights_to_hf \
    --input_dir path_to_llama_weights --model_size 7B --output_dir path_to_llama_model

(Continually) Pre-Training

CUDA_VISIBLE_DEVICES=0 python src/train_pt.py \
    --model_name_or_path path_to_llama_model \
    --do_train \
    --dataset wiki_demo \
    --finetuning_type lora \
    --output_dir path_to_pt_checkpoint \
    --overwrite_cache \
    --per_device_train_batch_size 4 \
    --gradient_accumulation_steps 4 \
    --lr_scheduler_type cosine \
    --logging_steps 10 \
    --save_steps 1000 \
    --learning_rate 5e-5 \
    --num_train_epochs 3.0 \
    --plot_loss \
    --fp16

Supervised Fine-Tuning

CUDA_VISIBLE_DEVICES=0 python src/train_sft.py \
    --model_name_or_path path_to_llama_model \
    --do_train \
    --dataset alpaca_gpt4_en \
    --finetuning_type lora \
    --checkpoint_dir path_to_pt_checkpoint \
    --output_dir path_to_sft_checkpoint \
    --overwrite_cache \
    --per_device_train_batch_size 4 \
    --gradient_accumulation_steps 4 \
    --lr_scheduler_type cosine \
    --logging_steps 10 \
    --save_steps 1000 \
    --learning_rate 5e-5 \
    --num_train_epochs 3.0 \
    --resume_lora_training False \
    --plot_loss \
    --fp16

QLoRA

CUDA_VISIBLE_DEVICES=0 python src/train_sft.py \
    --model_name_or_path /xx/model/model_weights/Ziya-LLaMA-13B \
    --do_train \
    --dataset xx \
    --finetuning_type lora \
    --output_dir /xx/output \
    --overwrite_cache \
    --per_device_train_batch_size 4 \
    --gradient_accumulation_steps 4 \
    --lr_scheduler_type cosine \
    --logging_steps 10 \
    --save_steps 1000 \
    --learning_rate 1e-3 \
    --num_train_epochs 10.0 \
    --resume_lora_training False \
    --plot_loss \
    --fp16 \
    --quantization_bit 4

Reward Model Training

CUDA_VISIBLE_DEVICES=0 python src/train_rm.py \
    --model_name_or_path path_to_llama_model \
    --do_train \
    --dataset comparison_gpt4_en \
    --finetuning_type lora \
    --checkpoint_dir path_to_pt_checkpoint \
    --output_dir path_to_rm_checkpoint \
    --per_device_train_batch_size 4 \
    --gradient_accumulation_steps 4 \
    --lr_scheduler_type cosine \
    --logging_steps 10 \
    --save_steps 1000 \
    --learning_rate 1e-5 \
    --num_train_epochs 1.0 \
    --plot_loss \
    --fp16

PPO Training (RLHF)

CUDA_VISIBLE_DEVICES=0 python src/train_ppo.py \
    --model_name_or_path path_to_llama_model \
    --do_train \
    --dataset alpaca_gpt4_en \
    --finetuning_type lora \
    --checkpoint_dir path_to_pt_checkpoint,path_to_sft_checkpoint \
    --reward_model path_to_rm_checkpoint \
    --output_dir path_to_ppo_checkpoint \
    --per_device_train_batch_size 2 \
    --gradient_accumulation_steps 4 \
    --lr_scheduler_type cosine \
    --logging_steps 10 \
    --save_steps 1000 \
    --learning_rate 1e-5 \
    --num_train_epochs 1.0 \
    --resume_lora_training False \
    --plot_loss

Distributed Training

accelerate config # configure the environment
accelerate launch src/train_XX.py # arguments (same as above)

Evaluation (BLEU and ROUGE_CHINESE)

CUDA_VISIBLE_DEVICES=0 python src/train_sft.py \
    --model_name_or_path path_to_llama_model \
    --do_eval \
    --dataset alpaca_gpt4_en \
    --checkpoint_dir path_to_checkpoint \
    --output_dir path_to_eval_result \
    --per_device_eval_batch_size 8 \
    --max_samples 50 \
    --predict_with_generate

We recommend using --per_device_eval_batch_size=1 and --max_target_length 128 in INT8 evaluation.

CLI Demo

python src/cli_demo.py \
    --model_name_or_path path_to_llama_model \
    --checkpoint_dir path_to_checkpoint

Web Demo

python src/web_demo.py \
    --model_name_or_path path_to_llama_model \
    --checkpoint_dir path_to_checkpoint

Export model

python src/export_model.py \
    --model_name_or_path path_to_llama_model \
    --checkpoint_dir path_to_checkpoint \
    --output_dir path_to_export

License

This repository is licensed under the Apache-2.0 License.

Please follow the Model Card to use the LLaMA models.

Please follow the RAIL License to use the BLOOM & BLOOMZ models.

Citation

If this work is helpful, please cite as:

@Misc{llama-efficient-tuning,
  title = {LLaMA Efficient Tuning},
  author = {hiyouga},
  howpublished = {\url{https://github.com/hiyouga/LLaMA-Efficient-Tuning}},
  year = {2023}
}

Acknowledgement

This repo is a sibling of ChatGLM-Efficient-Tuning. They share a similar code structure of efficient tuning on large language models.

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
assets		assets
data		data
src		src
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLaMA Efficient Tuning

Changelog

Supported Models

Supported Training Approaches

Provided Datasets

Requirement

Getting Started

Data Preparation (optional)

Dependence Installation (optional)

LLaMA Weights Preparation

(Continually) Pre-Training

Supervised Fine-Tuning

QLoRA

Reward Model Training

PPO Training (RLHF)

Distributed Training

Evaluation (BLEU and ROUGE_CHINESE)

CLI Demo

Web Demo

Export model

License

Citation

Acknowledgement

About

Releases

Packages

Languages

License

FartyPants/LLaMA-Efficient-Tuning

Folders and files

Latest commit

History

Repository files navigation

LLaMA Efficient Tuning

Changelog

Supported Models

Supported Training Approaches

Provided Datasets

Requirement

Getting Started

Data Preparation (optional)

Dependence Installation (optional)

LLaMA Weights Preparation

(Continually) Pre-Training

Supervised Fine-Tuning

QLoRA

Reward Model Training

PPO Training (RLHF)

Distributed Training

Evaluation (BLEU and ROUGE_CHINESE)

CLI Demo

Web Demo

Export model

License

Citation

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages