Deep-Reinforcement-Learning-Algorithms/CartPole-Policy-Gradient-Reinforce at master · Rafael1s/Deep-Reinforcement-Learning-Algorithms

History

Name		Name	Last commit message	Last commit date
parent directory ..
CartPole_REINFORCE.ipynb		CartPole_REINFORCE.ipynb
README.md		README.md
gradient_ascent.jpg		gradient_ascent.jpg

README.md

CartPole - known also as an Inverted Pendulum

Power-Gradient Method - REINFORCE

REINFORCE algorithm is based on finding the local maximum of a function
using a procedure known as gradient ascent.

Other CartPole projects

Class Policy

This class implements the simple Convolution Neuron Network (CNN) model containing only 2 fully-connected levels. In this CNN model, the function reinforce() approximizes the return value (= sum of all rewards with discounts). The environment is solved in 791 episodes!

Training log

Episode 100 Average Score: 34.47
Episode 200 Average Score: 66.26
Episode 300 Average Score: 87.82
Episode 400 Average Score: 72.83
Episode 500 Average Score: 172.00
Episode 600 Average Score: 160.65
Episode 700 Average Score: 167.15

Environment solved in 791 episodes! Average Score: 196.69

Credit

Most of the code is based on the Udacity code for the REINFORCE algorithm applied to CartPole.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CartPole-Policy-Gradient-Reinforce

CartPole-Policy-Gradient-Reinforce

README.md

CartPole - known also as an Inverted Pendulum

Power-Gradient Method - REINFORCE

Other CartPole projects

Class Policy

Training log

Credit

Files

CartPole-Policy-Gradient-Reinforce

Directory actions

More options

Directory actions

More options

Latest commit

History

CartPole-Policy-Gradient-Reinforce

Folders and files

parent directory

README.md

CartPole - known also as an Inverted Pendulum

Power-Gradient Method - REINFORCE

Other CartPole projects

Class Policy

Training log

Credit