The official code of the paper "Enhancing Inverse Reinforcement Learning with Weighted Causal Entropy"
We study inverse reinforcement learning (IRL), the problem of recovering a reward function from expert's demonstrated trajectories. We propose a way to enhance IRL by adding a weight function to the maximum causal entropy framework, with the motivation of having the ability to control and learn the stochasticity of the modelled policy. Our IRL framework and algorithms allow to learn both a reward function and the structure of the entropy terms added to the Markov Decision Processes, thus enhancing the IRL procedure. Our numerical experiments using human and simulated demonstrations and with discrete and continuous IRL tasks show that our approach outperforms prior methods.
This repo tested on MuJoCo v1.3.1, please make sure install MuJoCo first. Note that you need a MuJoCo license. Please follow the instruction in mujoco-py for help.
Run pip install -r requirements.txt
for installing Python libraries.
We use Soft Actor-Critic(SAC)1 for training experts and collecting demonstrations:
- Run
python train_expert.py --env_id <env> --num_steps <numb-of-steps>
- Run
python collect_demo.py --env_id <env> --weight <pretrained-expert-path>
Each expert tested with 6 imitation algorithms based on 2 existed studies: GAIL2 and AIRL3
python train_imitation.py --env_id <env> --test_env_id <eval-env> --seed <seed> <arguments>
Algorithms | Arguments |
---|---|
GAIL | --algo gail |
Weighted GAIL | --algo gail --weighted |
AIRL | --algo airl |
Weighted AIRL | --algo airl --weighted |
AIRL state-only | --algo airl --state_only |
Weighted AIRL state-only | --algo airl --weighted --state_only |
Checking run_mujoco.cmd, run_disabled_ant.cmd, run_point_maze.cmd for more details.
We evaluate on Mujoco tasks and transfer learning tasks with 8 different seeds without tunning hyperparameters.
- Run
python benchmark_score.py
for generating improvement graphs over 1M steps
- Run
python create_heatmap.py
for showing the heat map of Point Mass-Maze environment
- This code is based on the gail-airl-ppo.pytorch repository 4.
- The transfer environments are pulled from the official AIRL repository 5
Footnotes
-
Haarnoja, T., Zhou, A., Abbeel, P., & Levine, S. (2018). Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. ICML. ↩
-
Ho, Jonathan and Stefano Ermon. “Generative Adversarial Imitation Learning.” NIPS (2016). ↩
-
Fu, J., Luo, K., & Levine, S. (2018). Learning Robust Rewards with Adversarial Inverse Reinforcement Learning. ArXiv, abs/1710.11248. ↩
-
PyTorch implementation of GAIL and AIRL based on PPO. https://github.com/ku2482/gail-airl-ppo.pytorch ↩
-
Implementations for imitation learning/IRL algorithms in RLLAB. https://github.com/justinjfu/inverse_rl/tree/master/inverse_rl/envs ↩