Skip to content

Latest commit

 

History

History

CartPole-Policy-Gradient-Reinforce

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

CartPole - known also as an Inverted Pendulum

Power-Gradient Method - REINFORCE

REINFORCE algorithm is based on finding the local maximum of a function
using a procedure known as gradient ascent.

Other CartPole projects

Class Policy

This class implements the simple Convolution Neuron Network (CNN) model containing only 2 fully-connected levels. In this CNN model, the function reinforce() approximizes the return value (= sum of all rewards with discounts). The environment is solved in 791 episodes!

Training log

Episode 100 Average Score: 34.47
Episode 200 Average Score: 66.26
Episode 300 Average Score: 87.82
Episode 400 Average Score: 72.83
Episode 500 Average Score: 172.00
Episode 600 Average Score: 160.65
Episode 700 Average Score: 167.15

Environment solved in 791 episodes! Average Score: 196.69

Credit

Most of the code is based on the Udacity code for the REINFORCE algorithm applied to CartPole.