DiffPPO: Combining Diffusion Models with PPO to Improve Sample Efficiency and Exploration in Reinforcement Learning

Overview

DiffPPO is a reinforcement learning framework that integrates diffusion models with Proximal Policy Optimization (PPO) to enhance sample efficiency and exploration capabilities. This project, implemented using the robomimic framework, utilizes the D4RL dataset for experiments, demonstrating improved performance in environments with limited data.

Training Artifacts

All training datasets, pretrained models, training logs, and videos can be accessed through the following Google Drive link:

Google Drive - DiffPPO Training Artifacts

Citation

If you find this project useful for your research, please consider citing our work:

Paper: Enhancing Sample Efficiency and Exploration in Reinforcement Learning through the Integration of Diffusion Models and Proximal Policy Optimization

Authors: Tianci Gao, Dmitriev D. Dmitry, Neusypin A. Konstantin, Bo Yang, Shengren Rao

Year: 2024

Link: https://arxiv.org/abs/2409.01427

Project Structure

├── datasets/                   # Directory for storing datasets
├── models/                     # Pretrained models
├── scripts/                    # Scripts for training, evaluation, and visualization
│   ├── train.py                # Script for training the model
│   ├── evaluate.py             # Script for evaluating the model
│   └── visualize_results.py    # Script for visualizing results
├── notebooks/                  # Jupyter Notebooks for analysis and visualization
├── configs/                    # Configuration files
│   └── PPO.json                # Configuration for the PPO algorithm
├── README.md                   # Project documentation
└── requirements.txt            # Python dependencies

Getting Started

Prerequisites

To get started with DiffPPO, ensure that you have the following software installed:

Python 3.8
Conda (optional, but recommended for managing environments)

Installation

Clone the repository:

git clone https://github.com/yourusername/DiffPPO.git
cd DiffPPO

Create and activate a Python virtual environment:

conda create -n diffppo_env python=3.8
conda activate diffppo_env

Install the required dependencies:
```
pip install -r requirements.txt
```

Dataset

The project utilizes the D4RL dataset. You can download the dataset using the provided script:

bash scripts/download_dataset.sh

Alternatively, you can refer to the D4RL documentation for more details.

Usage

Training

To train the model, use the following command:

python scripts/train.py --config configs/PPO.json

Evaluation

After training, evaluate the model's performance using:

python scripts/evaluate.py --model-path models/my_trained_model.pth

Visualization

Visualize the training results with:

python scripts/visualize_results.py --log-dir logs/

Results

The experiments conducted in this project demonstrate that integrating diffusion models to generate synthetic trajectories significantly enhances the sample efficiency and exploration capabilities of the PPO algorithm. Below is an example of the cumulative rewards achieved across different tasks:

Contribution

We welcome contributions to DiffPPO. If you would like to contribute, please follow these steps:

Fork the repository.
Create a new branch (git checkout -b new-feature).
Commit your changes (git commit -am 'Add new feature').
Push to the branch (git push origin new-feature).
Create a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
algo		algo
config		config
env		env
exps		exps
scripts		scripts
trained_models		trained_models
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DiffPPO: Combining Diffusion Models with PPO to Improve Sample Efficiency and Exploration in Reinforcement Learning

Overview

Training Artifacts

Citation

Project Structure

Getting Started

Prerequisites

Installation

Dataset

Usage

Training

Evaluation

Visualization

Results

Contribution

License

About

Releases

Packages

Languages

License

TianciGao/DiffPPO

Folders and files

Latest commit

History

Repository files navigation

DiffPPO: Combining Diffusion Models with PPO to Improve Sample Efficiency and Exploration in Reinforcement Learning

Overview

Training Artifacts

Citation

Project Structure

Getting Started

Prerequisites

Installation

Dataset

Usage

Training

Evaluation

Visualization

Results

Contribution

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages