Skip to content

Implementation of the paper "Enhancing Inverse Reinforcement Learning with Weighted Causal Entropy"

Notifications You must be signed in to change notification settings

vietbtx/WeightedIRL

Repository files navigation

WeightedIRL

The official code of the paper "Enhancing Inverse Reinforcement Learning with Weighted Causal Entropy"

Abstract

We study inverse reinforcement learning (IRL), the problem of recovering a reward function from expert's demonstrated trajectories. We propose a way to enhance IRL by adding a weight function to the maximum causal entropy framework, with the motivation of having the ability to control and learn the stochasticity of the modelled policy. Our IRL framework and algorithms allow to learn both a reward function and the structure of the entropy terms added to the Markov Decision Processes, thus enhancing the IRL procedure. Our numerical experiments using human and simulated demonstrations and with discrete and continuous IRL tasks show that our approach outperforms prior methods.

Setup

This repo tested on MuJoCo v1.3.1, please make sure install MuJoCo first. Note that you need a MuJoCo license. Please follow the instruction in mujoco-py for help. Run pip install -r requirements.txt for installing Python libraries.

Train Expert

We use Soft Actor-Critic(SAC)1 for training experts and collecting demonstrations:

  • Run python train_expert.py --env_id <env> --num_steps <numb-of-steps>
  • Run python collect_demo.py --env_id <env> --weight <pretrained-expert-path>

Train Imitation

Each expert tested with 6 imitation algorithms based on 2 existed studies: GAIL2 and AIRL3

  • python train_imitation.py --env_id <env> --test_env_id <eval-env> --seed <seed> <arguments>
Algorithms Arguments
GAIL --algo gail
Weighted GAIL --algo gail --weighted
AIRL --algo airl
Weighted AIRL --algo airl --weighted
AIRL state-only --algo airl --state_only
Weighted AIRL state-only --algo airl --weighted --state_only

Examples

Checking run_mujoco.cmd, run_disabled_ant.cmd, run_point_maze.cmd for more details.

Experimental Settings

We evaluate on Mujoco tasks and transfer learning tasks with 8 different seeds without tunning hyperparameters.

Visualization

  • Run python benchmark_score.py for generating improvement graphs over 1M steps

Screenshot 2022-08-12 at 7 49 03 AM

  • Run python create_heatmap.py for showing the heat map of Point Mass-Maze environment

Screenshot 2022-08-12 at 7 56 28 AM

Notes:

  • This code is based on the gail-airl-ppo.pytorch repository 4.
  • The transfer environments are pulled from the official AIRL repository 5

Footnotes

  1. Haarnoja, T., Zhou, A., Abbeel, P., & Levine, S. (2018). Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. ICML.

  2. Ho, Jonathan and Stefano Ermon. “Generative Adversarial Imitation Learning.” NIPS (2016).

  3. Fu, J., Luo, K., & Levine, S. (2018). Learning Robust Rewards with Adversarial Inverse Reinforcement Learning. ArXiv, abs/1710.11248.

  4. PyTorch implementation of GAIL and AIRL based on PPO. https://github.com/ku2482/gail-airl-ppo.pytorch

  5. Implementations for imitation learning/IRL algorithms in RLLAB. https://github.com/justinjfu/inverse_rl/tree/master/inverse_rl/envs

About

Implementation of the paper "Enhancing Inverse Reinforcement Learning with Weighted Causal Entropy"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published