Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Version 2 #6

Merged
merged 14 commits into from
Sep 13, 2020
Prev Previous commit
Update README.md
  • Loading branch information
wisnunugroho21 committed Sep 13, 2020
commit d3a8eb9b19d0c5e71587c3887777ad3f9d587f1f
5 changes: 1 addition & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
Simple code to demonstrate Deep Reinforcement Learning by using Proximal Policy Optimization and Random Network Distillation in Tensorflow 2 and Pytorch

## Version 2 and Other Progress
Version 2 will bring improvement in code quality and peformance. I refactor the code so it will follow algorithm in PPO's implementation on OpenAI's baseline. I also using newer version of PPO called Truly PPO, which has more sample efficiency and performance than OpenAI's PPO
Version 2 will bring improvement in code quality and peformance. I refactor the code so it will follow algorithm in PPO's implementation on OpenAI's baseline. I also using newer version of PPO called Truly PPO, which has more sample efficiency and performance than OpenAI's PPO. Currently, I am focused on how to implement this project in more difficult environment (Atari games, MuJoCo, etc).

- [x] Use Pytorch and Tensorflow 2
- [x] Clean up the code
Expand Down Expand Up @@ -103,8 +103,5 @@ You can read full detail of Truly PPO in [here](https://arxiv.org/abs/1903.07940
| ------------- |
| ![Result Gif](https://github.com/wisnunugroho21/reinforcement_learning_ppo_rnd/blob/master/Result/pong.gif) |

## Future Development
For now, I focus on how to implement this project on more difficult environment (Atari games, MuJoCo, etc)

## Contributing
This project is far from finish and will be improved anytime . Any fix, contribute, or idea would be very appreciated