WeightedIRL

The official code of the paper "Enhancing Inverse Reinforcement Learning with Weighted Causal Entropy"

Abstract

We study inverse reinforcement learning (IRL), the problem of recovering a reward function from expert's demonstrated trajectories. We propose a way to enhance IRL by adding a weight function to the maximum causal entropy framework, with the motivation of having the ability to control and learn the stochasticity of the modelled policy. Our IRL framework and algorithms allow to learn both a reward function and the structure of the entropy terms added to the Markov Decision Processes, thus enhancing the IRL procedure. Our numerical experiments using human and simulated demonstrations and with discrete and continuous IRL tasks show that our approach outperforms prior methods.

Setup

This repo tested on MuJoCo v1.3.1, please make sure install MuJoCo first. Note that you need a MuJoCo license. Please follow the instruction in mujoco-py for help. Run pip install -r requirements.txt for installing Python libraries.

Train Expert

We use Soft Actor-Critic(SAC)¹ for training experts and collecting demonstrations:

Run python train_expert.py --env_id <env> --num_steps <numb-of-steps>
Run python collect_demo.py --env_id <env> --weight <pretrained-expert-path>

Train Imitation

Each expert tested with 6 imitation algorithms based on 2 existed studies: GAIL² and AIRL³

python train_imitation.py --env_id <env> --test_env_id <eval-env> --seed <seed> <arguments>

Algorithms	Arguments
GAIL	`--algo gail`
Weighted GAIL	`--algo gail --weighted`
AIRL	`--algo airl`
Weighted AIRL	`--algo airl --weighted`
AIRL state-only	`--algo airl --state_only`
Weighted AIRL state-only	`--algo airl --weighted --state_only`

Examples

Checking run_mujoco.cmd, run_disabled_ant.cmd, run_point_maze.cmd for more details.

Experimental Settings

We evaluate on Mujoco tasks and transfer learning tasks with 8 different seeds without tunning hyperparameters.

Visualization

Run python benchmark_score.py for generating improvement graphs over 1M steps

Run python create_heatmap.py for showing the heat map of Point Mass-Maze environment

Notes:

This code is based on the gail-airl-ppo.pytorch repository ⁴.
The transfer environments are pulled from the official AIRL repository ⁵

Haarnoja, T., Zhou, A., Abbeel, P., & Levine, S. (2018). Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. ICML. ↩
Ho, Jonathan and Stefano Ermon. “Generative Adversarial Imitation Learning.” NIPS (2016). ↩
Fu, J., Luo, K., & Levine, S. (2018). Learning Robust Rewards with Adversarial Inverse Reinforcement Learning. ArXiv, abs/1710.11248. ↩
PyTorch implementation of GAIL and AIRL based on PPO. https://github.com/ku2482/gail-airl-ppo.pytorch ↩
Implementations for imitation learning/IRL algorithms in RLLAB. https://github.com/justinjfu/inverse_rl/tree/master/inverse_rl/envs ↩

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WeightedIRL

Abstract

Setup

Train Expert

Train Imitation

Examples

Experimental Settings

Visualization

Notes:

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
airl_envs		airl_envs
gail_airl_ppo		gail_airl_ppo
.gitignore		.gitignore
README.md		README.md
benchmark_score.py		benchmark_score.py
collect_demo.py		collect_demo.py
create_heatmap.py		create_heatmap.py
run_disabled_ant.cmd		run_disabled_ant.cmd
run_mujoco.cmd		run_mujoco.cmd
run_point_maze.cmd		run_point_maze.cmd
train_expert.py		train_expert.py
train_imitation.py		train_imitation.py

vietbtx/WeightedIRL

Folders and files

Latest commit

History

Repository files navigation

WeightedIRL

Abstract

Setup

Train Expert

Train Imitation

Examples

Experimental Settings

Visualization

Notes:

Footnotes

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages