RLHF Training for LLMs

This repository contains implementations for Reinforcement Learning with Human Feedback (RLHF) training of Large Language Models (LLMs) using Supervised Fine-Tuning (SFT), Reward Modeling, and Proximal Policy Optimization (PPO). The goal is to create a modular and maintainable codebase for replicating RLHF training on LLMs like LLaMA. The following codebase is specific to LLaMa 2, so while the components can work universally, data related components (such as special token formatting) need to be modified to fit other models.

Installation

Clone the repository:

git clone https://github.com/lightmatmul/rlhf_training.git
cd rlhf_training

Create and activate a virtual environment:

python -m venv env
source env/bin/activate  # On Windows, use `env\Scripts\activate

Install the required packages:
```
pip install -r requirements.txt
```

Usage

Supervised Fine-Tuning

To train a model using Supervised Fine-Tuning (SFT), run the following script:

python scripts/train_sft.py

Reward Modeling

To train a reward model, run the following script:

python scripts/train_reward.py

Proximal Policy Optimization

To train a model using Proximal Policy Optimization (PPO), run the following script:

python scripts/train_ppo.py

Configuration

The configuration files are located in the configs/ directory. Here’s a brief description of each:

lora_config.py: Contains the configuration for LoRA (Low-Rank Adaptation).
reward_config.py: Contains the constants and configurations specific to Reward Modeling.
ppo_config.py: Contains the constants and configurations specific to PPO.

Evaluation

GPT is used as AI evaluator to determine evaluate the impact of the alignment tuning compared to the original supervised finetuned model:

python eval/gpt_evaluator.py
python eval/count_wins.py

Inference

To interact with the trained models, run the following scriptL:

python scripts/inference.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RLHF Training for LLMs

Table of Contents

Installation

Usage

Supervised Fine-Tuning

Reward Modeling

Proximal Policy Optimization

Configuration

Evaluation

Inference

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
configs		configs
data		data
eval		eval
models		models
scripts		scripts
utils		utils
README.md		README.md
requirements.txt		requirements.txt

lightmatmul/RLHF-trainer

Folders and files

Latest commit

History

Repository files navigation

RLHF Training for LLMs

Table of Contents

Installation

Usage

Supervised Fine-Tuning

Reward Modeling

Proximal Policy Optimization

Configuration

Evaluation

Inference

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages