Name		Name	Last commit message	Last commit date
parent directory ..
images		images
PPO_800epis.pth		PPO_800epis.pth
README.md		README.md
WatchAgent.ipynb		WatchAgent.ipynb
parallelEnv.py		parallelEnv.py
pong-PPO_800epis.ipynb		pong-PPO_800epis.ipynb
pong_utils.py		pong_utils.py

README.md

Pong with PPO, Learning from Raw Pixels

Introduction

In this notebook, we implement an agent learning to play Pong with algorithm PPO (Proximal Policy Optimization).
As with the REINFORCE version, the model learns from pixels.

Algorithm PPO

I. Collect trajectories based on the policy \Pi(\theta'),
initialize \theta' = \theta.

II. Compute the gradient for the clipped surrogate function

III. Gradient ascent, update \theta':

IV. The internal loop of the PPO training: Steps II and III are repeated k times,
i.e., every trajectory is used k times before it is trown away. In our case, k = 4.
For the case REINFORCE, k = 1. In the code, k = SGD_epoch, see file pong_utils.py,
function clipped_surrogate.

V. External loop: back to step 1. Set \theta=\theta', go to new epsodes with new trajectories.

Rewards

RL uses the idea of rewards in order to determine which actions to perform.
The reward is simply a +1 for every round the Agent wins, and a -1 for every round the opponent CPU wins.
For more complex games, rewards can be tied to score increments. In real-life applications computing rewards
can be trickier, especially when there is no obvious single score or objective to optimize.

Training the Agent

The environment was solved:

Input: 800 episodes and tmax = 300
Result: score = 4.625, Running time - 56m

Parallel Environmnets

The training is performed by 8 parallel agents. The agents run in
8 independent environments and learn the same Neural Network.

 envs = parallelEnv('PongDeterministic-v4', n=8)

Other PPO projects

CarRacing, Single agent, Learning from pixels
C r a w l e r , 12 parallel agents
BipedalWalker, 16 parallel agents

Credit

The implementation of the PPO algorithm code is based on the Udacity code.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pong-Policy-Gradient-PPO

Pong-Policy-Gradient-PPO

README.md

Pong with PPO, Learning from Raw Pixels

Introduction

Algorithm PPO

Rewards

Training the Agent

Parallel Environmnets

Other PPO projects

Credit

Files

Pong-Policy-Gradient-PPO

Directory actions

More options

Directory actions

More options

Latest commit

History

Pong-Policy-Gradient-PPO

Folders and files

parent directory

README.md

Pong with PPO, Learning from Raw Pixels

Introduction

Algorithm PPO

Rewards

Training the Agent

Parallel Environmnets

Other PPO projects

Credit