Source code for the paper MAHALO: Unifying Offline Reinforcement Learning and Imitation Learning from Observations from ICML 2023.
Installing MuJoCo. Our repo uses MuJoCo 210. If you don't have it installed on your machine, this can be done by running the following script.
. install.sh
Installing this repo.
conda create -n mahalo python=3.7
conda activate mahalo
pip install -e .
To run MAHALO:
python main.py --env {ENV} --scenario {SCENARIO} --log_dir mahalo_results/ --beta {BETA} --alpha_beta_ratio 100000 --clip_v --remove_terminals
Options
--env
:hopper-all-v2
,walker2d-all-v2
,halfcheetah-all-v2
--scenario
:il
,ilfo
,rl_expert
,rlfo
,rl_sample
*Note: our Meta-World experiments rely on data and code that are not yet open-sourced. We will update this repo once that process is finished.
To run baseline algorithms:
python baselines.py --env {ENV} --scenario {SCENARIO} --algo {ALGO} --log_dir baseline_results/ --beta {BETA} --clip_v --remove_terminals
Options
--env
:hopper-all-v2
,walker2d-all-v2
,halfcheetah-all-v2
--scenario
:il
,ilfo
,rl_expert
,rlfo
,rl_sample
--algo
:rp
: reward prediction (RP as in paper)ap
: action prediction (AP as in paper)arp
: reward-action predictionuds
: UDS implemented by ATAC (UDS as in paper)uds-a
: variant of UDS without knowing the common transitions between dynamics and reward datasets (UDS-A as in paper)common
: running ATAC on common transitions (ATAC as in paper)oracle
: running ATAC with full action and reward information (Oracle as in paper)
*Note: BC and BCO results can be obtained by pretraining of common
and ap
, respectively.
If you use this repo, please cite:
@inproceedings{li2023mahalo,
title={MAHALO: Unifying Offline Reinforcement Learning and Imitation Learning from Observations},
author={Li, Anqi and Boots, Byron and Cheng, Ching-An},
booktitle={International Conference on Machine Learning},
year={2023},
organization={PMLR}
}