Skip to content

Latest commit

 

History

History

Cartpole-Double-Deep-Q-Learning

Project - Cartpole with Double Deep Q-Network

Environment

Solving the environment require an average total reward of over 195 for Cartpole-v0 and 475 for Cartpole-v1
over 100 consecutive episodes. A pole is attached by an joint to a cart, which moves along a track.
The system is controlled by applying a force of +1 or -1 to the cart. The pendulum starts upright, and
the goal is to prevent it from falling over. A reward of +1 is provided for every timestep that the pole remains upright.
The episode ends when the pole > 15 degrees from vertical, or the cart moves > 2.4 units from the center.

Other CartPole projects

Four important tensors

The Deep Q-Learning agent uses 4 following tensors (see method learn()) constructed on the basis of two neural networks q_local and q_target:

Training History

  1. For Cartpole-v0: Score 195 is achieved in 239 episodes

  1. For Cartpole-v0: Score 195 is achieved in 612 episodes

Note that such a quick achievement (239 episodes) of threshold 195
is a very rare case, the second example (612 episodes) is a much more typical result.

  1. For Cartpole-v1: Score 475 is achieved in 1030 episodes

Watch the Trained Agent

For both neural networks, q_local and q_traget, we save the trained weights into checkpoint files
with the extension pth. The corresponding files are saved into the directory dir_chk_V0_ddqn for Cartpole-v0
and the directory dir_chk_V1_ddqn for Cartpole-v1. Using this notebook we load the trained weights and replay them.

Paper

A pair of interrelated neural networks in Deep Q-Network