Deep-RL-Toolkit

Overview

Deep RL Toolkit is a flexible and high-efficient reinforcement learning framework. RLToolkit is developed for practitioners with the following advantages:

Reproducible. We provide algorithms that stably reproduce the result of many influential reinforcement learning algorithms.
Extensible. Build new algorithms quickly by inheriting the abstract class in the framework.
Reusable. Algorithms provided in the repository could be directly adapted to a new task by defining a forward network and training mechanism will be built automatically.
Elastic: allows to elastically and automatically allocate computing resources on the cloud.
Lightweight: the core codes <1,000 lines (check Demo).
Stable: much more stable than Stable Baselines 3 by utilizing various ensemble methods.

Table of Content

Deep-RL-Toolkit

Supported Algorithms

RLToolkit implements the following model-free deep reinforcement learning (DRL) algorithms:

Supported Envs

OpenAI Gym
Atari
MuJoCo
PyBullet

For the details of DRL algorithms, please check out the educational webpage OpenAI Spinning Up.

Examples

If you want to learn more about deep reinforcemnet learning, please read the deep-rl-class and run the examples.

Quick Start

git clone https://github.com/jianzhnie/deep-rl-toolkit.git

# Run the DQN algorithm on the CartPole-v0 environment
python examples/cleanrl/cleanrl_runner.py --env CartPole-v0 --algo dqn
python examples/cleanrl/cleanrl_runner.py --env CartPole-v0 --algo ddqn
python examples/cleanrl/cleanrl_runner.py --env CartPole-v0 --algo dueling_dqn
python examples/cleanrl/cleanrl_runner.py --env CartPole-v0 --algo dueling_ddqn

# Run the C51 algorithm on the CartPole-v0 environment
python examples/cleanrl/cleanrl_runner.py --env CartPole-v0 --algo c51

# Run the DDPG algorithm on the Pendulum-v1 environment
python examples/cleanrl/cleanrl_runner.py --env Pendulum-v0 --algo ddpg

# Run the PPO algorithm on the CartPole-v0 environment
python examples/cleanrl/cleanrl_runner.py --env CartPole-v0 --algo ppo

References

Reference Papers

Deep Q-Network (DQN) _{^{(V. Mnih et al. 2015)}}
Double DQN (DDQN) _{^{(H. Van Hasselt et al. 2015)}}
Advantage Actor Critic (A2C)
Vanilla Policy Gradient (VPG)
Natural Policy Gradient (NPG) _{^{(S. Kakade et al. 2002)}}
Trust Region Policy Optimization (TRPO) _{^{(J. Schulman et al. 2015)}}
Proximal Policy Optimization (PPO) _{^{(J. Schulman et al. 2017)}}
Deep Deterministic Policy Gradient (DDPG) _{^{(T. Lillicrap et al. 2015)}}
Twin Delayed DDPG (TD3) _{^{(S. Fujimoto et al. 2018)}}
Soft Actor-Critic (SAC) _{^{(T. Haarnoja et al. 2018)}}
SAC with automatic entropy adjustment (SAC-AEA) _{^{(T. Haarnoja et al. 2018)}}

Name		Name	Last commit message	Last commit date
Latest commit History 239 Commits
docs/images		docs/images
examples		examples
rltoolkit		rltoolkit
.flake8		.flake8
.gitignore		.gitignore
.isort.cfg		.isort.cfg
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deep-RL-Toolkit

Overview

Table of Content

Supported Algorithms

Supported Envs

Examples

Quick Start

References

Reference Papers

References code

About

Releases

Packages

Languages

License

jianzhnie/deep-rl-toolkit

Folders and files

Latest commit

History

Repository files navigation

Deep-RL-Toolkit

Overview

Table of Content

Supported Algorithms

Supported Envs

Examples

Quick Start

References

Reference Papers

References code

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages