Name		Name	Last commit message	Last commit date
Latest commit History 116 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
PPO		PPO
PPO_RND		PPO_RND
PPO_continous with squashing function		PPO_continous with squashing function
PPO_continous		PPO_continous
Result		Result
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Repository files navigation

PPO-RND

Simple code to demonstrate Deep Reinforcement Learning by using Proximal Policy Optimization and Random Network Distillation in Tensorflow 2 and Pytorch

Getting Started

This project is using Pytorch and Tensorflow 2 for Deep Learning Framework and using Gym for Reinforcement Learning Environment.
Although it's not required, but i recommend run this project on a PC with GPU and 8 GB Ram

Prerequisites

Make sure you have installed Pytorch and Gym.

Click here to install gym

You can use either Pytorch or Tensorflow 2

Click here to install pytorch
Click here to install tensorflow 2

Installing

Just clone this project into your work folder

git clone https://github.com/wisnunugroho21/reinforcement_learning_ppo_rnd.git

Running the project

After you clone the project, run following script in cmd/terminal :

Pytorch version

cd reinforcement_learning_ppo_rnd/PPO_RND/pytorch
python ppo_rnd_2_taxi_final_blow.py

Tensorflow 2 version

cd reinforcement_learning_ppo_rnd/PPO_RND/'tensorflow 2'
python ppo_taxi_final_blow_tensorflow.py

Proximal Policy Optimization

PPO is motivated by the same question as TRPO: how can we take the biggest possible improvement step on a policy using the data we currently have, without stepping so far that we accidentally cause performance collapse? Where TRPO tries to solve this problem with a complex second-order method, PPO is a family of first-order methods that use a few other tricks to keep new policies close to old. PPO methods are significantly simpler to implement, and empirically seem to perform at least as well as TRPO.

There are two primary variants of PPO: PPO-Penalty and PPO-Clip.

PPO-Penalty approximately solves a KL-constrained update like TRPO, but penalizes the KL-divergence in the objective function instead of making it a hard constraint, and automatically adjusts the penalty coefficient over the course of training so that it’s scaled appropriately.
PPO-Clip doesn’t have a KL-divergence term in the objective and doesn’t have a constraint at all. Instead relies on specialized clipping in the objective function to remove incentives for the new policy to get far from the old policy.

This repository use PPO-Clip
You can read full detail of PPO in here

Random Network Distillation

Random Network Distillation (RND), a prediction-based method for encouraging reinforcement learning agents to explore their environments through curiosity, which for the first time exceeds average human performance on Montezuma’s Revenge. RND achieves state-of-the-art performance, periodically finds all 24 rooms and solves the first level without using demonstrations or having access to the underlying state of the game.

RND incentivizes visiting unfamiliar states by measuring how hard it is to predict the output of a fixed random neural network on visited states. In unfamiliar states it’s hard to guess the output, and hence the reward is high. It can be applied to any reinforcement learning algorithm, is simple to implement and efficient to scale.

You can read full detail of RND in here

Result

LunarLander using PPO (Non RND)

Result Gif	Award Progress Graph

Bipedal using PPO (Non RND)

Result Gif

Pendulum using PPO (Non RND)

Result Gif	Award Progress Graph

Pong using PPO (Non RND)

Result Gif

Future Development

For now, I focus on how to implement this project on more difficult environment (Atari games, MuJoCo, etc)

Contributing

This project is far from finish and will be improved anytime . Any fix, contribute, or idea would be very appreciated

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PPO-RND

Getting Started

Prerequisites

Installing

Running the project

Pytorch version

Tensorflow 2 version

Proximal Policy Optimization

Random Network Distillation

Result

LunarLander using PPO (Non RND)

Bipedal using PPO (Non RND)

Pendulum using PPO (Non RND)

Pong using PPO (Non RND)

Future Development

Contributing

About

Releases

Packages

Languages

License

wisnunugroho21/reinforcement_learning_ppo_rnd

Folders and files

Latest commit

History

Repository files navigation

PPO-RND

Getting Started

Prerequisites

Installing

Running the project

Pytorch version

Tensorflow 2 version

Proximal Policy Optimization

Random Network Distillation

Result

LunarLander using PPO (Non RND)

Bipedal using PPO (Non RND)

Pendulum using PPO (Non RND)

Pong using PPO (Non RND)

Future Development

Contributing

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages