Efficient Risk-Averse Reinforcement Learning

This repo by Ido Greenberg implements the Cross-entropy Soft-Risk optimization algorithm (CeSoR) from the paper Efficient Risk-Averse Reinforcement Learning by Greenberg, Chow, Ghavamzadeh and Mannor.

Also see the related cross-entropy-method PyPI package.


Summary of the results of 3 agents (risk-neutral PG, standard risk-averse GCVaR, and our CeSoR) over 3 benchmarks. Top: the lower quantiles of the agent scores. Bottom: sample episodes.


A sample episode of CeSoR in the Driving Game. The goal is to follow the leader as closely as possible without colliding.

Installation

pip install -e .

Quick start - examples

CEM_Example.ipynb: Minimal and explained example for interacting with the Cross Entropy module directly: implementation for a new family of distributions, running of a sampling process and analysis of the results.
GuardedMazeExample.ipynb, DrivingExample.ipynb, ServersExample.ipynb: End-to-end examples for CeSoR in 3 benchmarks: presentation of the benchmark, training and testing (or alternatively loading existing results), analysis and visual demonstrations.

Background

In risk-averse Reinforcement Learning (RL), the goal is to optimize some risk-measure of the returns, which inherently focuses on the lower quantiles of the returns distribution. This poses two difficulties: first, by focusing on certain quantiles we ignore some of the agent experience and thus reduce the sample efficiency. Second, ignoring the higher quantiles specifically leads to blindness to success: the optimizer is not exposed at all to beneficial behaviors of the agent. To overcome these challenges, we present CeSoR - Cross-entropy Soft-Risk optimization algorithm. CeSoR leverages the Cross Entropy method to sample the lower quantiles over the environment conditions (minimizing over epistemic uncertainty); while using soft risk-level scheduling to expose the optimizer to the higher quantiles of the agent performance (maximizing over aleatoric uncertainty). CeSoR can be applied to various models (e.g., neural networks), on top of any policy gradient algorithm. On benchmarks of maze navigation, autonomous driving and computational resources allocation, we show that CeSoR achieves better risk-measures than standard methods of both risk-neutral and risk-averse policy gradient, and sometimes works even when the standard risk-averse policy gradient completely fails.

Name		Name	Last commit message	Last commit date
Latest commit History 140 Commits
Examples		Examples
Images		Images
.gitignore		.gitignore
Agents.py		Agents.py
CEM.py		CEM.py
GCVaR.py		GCVaR.py
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Efficient Risk-Averse Reinforcement Learning

Installation

Quick start - examples

Background

Algorithm

About

Releases

Packages

Languages

License

ido90/CeSoR

Folders and files

Latest commit

History

Repository files navigation

Efficient Risk-Averse Reinforcement Learning

Installation

Quick start - examples

Background

Algorithm

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages