REINFORCE algorithm is based on finding the local maximum of a function
using a procedure known as gradient ascent.
- CartPole-Policy-Based-Hill-Climbing, or
- CartPole-Policy-Deep-Q_Learning, or
- Cartpole with Double Deep Q-Learning
This class implements the simple Convolution Neuron Network (CNN) model containing only 2 fully-connected levels. In this CNN model, the function reinforce() approximizes the return value (= sum of all rewards with discounts). The environment is solved in 791 episodes!
Episode 100 Average Score: 34.47
Episode 200 Average Score: 66.26
Episode 300 Average Score: 87.82
Episode 400 Average Score: 72.83
Episode 500 Average Score: 172.00
Episode 600 Average Score: 160.65
Episode 700 Average Score: 167.15
Environment solved in 791 episodes! Average Score: 196.69
Most of the code is based on the Udacity code for the REINFORCE algorithm applied to CartPole.