PPO-Dash

Code for reproducing the results found in PPO Dash: Improving Generalization in Deep Reinforcement Learning

About

PPO-Dash is a modified version of the PPO algrothem that utalises the following optimizations and best practices:

Action Space Reduction
Frame Stack Reduction
Large Scale Hyperparameters
Vector Observations
Normalized Observations
Reward Hacking
Recurrent Memory

PPO-Dash was able to solve the first 10 levels of the Obsticle Tower Enviroment without the need for demonstrations or curosity based algorthemic enhancements.

The version of PPO-Dash in the technical paper, placed 2nd in Round One of the Obsticle Tower Challenge with an average score of 10. We were able to reproduce this score in Round Two of the challenge, with a minor modifiaction (randomizing the themes during in training). We placed 4th overall, with a score of 10.8 with the addition of demonstrations.

Reproducing Results

To reproduce the results listed in the paper and for round one of the competition, see ReproduceRound1

Acknowlegements

This codebase derives from pytorch-a2c-ppo-acktr - #8258f95

Citation

If you use PPO-Dash in your research, we ask that you cite the technical report as a reference.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
ppo-dash-study		ppo-dash-study
ppo-dash-training		ppo-dash-training
test against validation seeds		test against validation seeds
README.md		README.md
ReproduceRound1.md		ReproduceRound1.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PPO-Dash

About

Reproducing Results

Acknowlegements

Citation

About

Releases

Packages

Languages

willpower057/ppo-dash

Folders and files

Latest commit

History

Repository files navigation

PPO-Dash

About

Reproducing Results

Acknowlegements

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages