This is an original PyTorch implementation of the ExORL framework from
Don't Change the Algorithm, Change the Data: Exploratory Data for Offline Reinforcement Learning by
Denis Yarats*, David Brandfonbrener*, Hao Liu, Misha Laskin, Pieter Abbeel, Alessandro Lazaric, and Lerrel Pinto.
*Equal contribution.
Install MuJoCo if it is not already the case:
- Download MuJoCo binaries here.
- Unzip the downloaded archive into
~/.mujoco/
. - Append the MuJoCo subdirectory bin path into the env variable
LD_LIBRARY_PATH
.
Install the following libraries:
sudo apt update
sudo apt install libosmesa6-dev libgl1-mesa-glx libglfw3 unzip
Install dependencies:
conda env create -f conda_env.yml
conda activate exorl
We provide exploratory datasets for 6 DeepMind Control Stuite domains
Domain | Dataset name | Available task names |
---|---|---|
Cartpole | cartpole |
cartpole_balance , cartpole_balance_sparse , cartpole_swingup , cartpole_swingup_sparse |
Cheetah | cheetah |
cheetah_run , cheetah_run_backward |
Jaco Arm | jaco |
jaco_reach_top_left , jaco_reach_top_right , jaco_reach_bottom_left , jaco_reach_bottom_right |
Point Mass Maze | point_mass_maze |
point_mass_maze_reach_top_left , point_mass_maze_reach_top_right , point_mass_maze_reach_bottom_left , point_mass_maze_reach_bottom_right |
Quadruped | quadruped |
quadruped_walk , quadruped_run |
Walker | walker |
walker_stand , walker_walk , walker_run |
For each domain we collected datasets by running 9 unsupervised RL algorithms from URLB for total of 10M
steps. Here is the list of algorithms
Unsupervised RL method | Name | Paper |
---|---|---|
APS | aps |
paper |
APT(ICM) | icm_apt |
paper |
DIAYN | diayn |
paper |
Disagreement | disagreement |
paper |
ICM | icm |
paper |
ProtoRL | proto |
paper |
Random | random |
N/A |
RND | rnd |
paper |
SMM | smm |
paper |
You can download a dataset by running ./download.sh <DOMAIN> <ALGO>
, for example to download ProtoRL dataset for Walker, run
./download.sh walker proto
The script will download the dataset from S3 and store it under datasets/walker/proto/
, where you can find episodes (under buffer
) and episode videos (under video
).
We also provide implementation of 5 offline RL algorithms for evaluating the datasets
Offline RL method | Name | Paper |
---|---|---|
Behavior Cloning | bc |
paper |
CQL | cql |
paper |
CRR | crr |
paper |
TD3+BC | td3_bc |
paper |
TD3 | td3 |
paper |
After downloading required datasets, you can evaluate it using offline RL methon for a specific task. For example, to evaluate a dataset collected by ProtoRL on Walker for the waling task using TD3+BC you can run
python train_offline.py agent=td3_bc expl_agent=proto task=walker_walk
Logs are stored in the output
folder. To launch tensorboard run:
tensorboard --logdir output
If you use this repo in your research, please consider citing the paper as follows:
@article{yarats2022exorl,
title={Don't Change the Algorithm, Change the Data: Exploratory Data for Offline Reinforcement Learning},
author={Denis Yarats, David Brandfonbrener, Hao Liu, Michael Laskin, Pieter Abbeel, Alessandro Lazaric, Lerrel Pinto},
journal={arXiv preprint arXiv:2201.13425},
year={2022}
}
The majority of ExORL is licensed under the MIT license, however portions of the project are available under separate license terms: DeepMind is licensed under the Apache 2.0 license.