Skip to content
/ A2C Public

PyTorch implementation of Advantage Actor-Critic (A2C)

License

Notifications You must be signed in to change notification settings

lnpalmer/A2C

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A2C

PyTorch implementation of Advantage Actor-Critic (A2C)

Usage

Example command line usage:

python main.py BreakoutDeterministic-v3 --num-workers 8 --render

This will train the agent on BreakoutDeterministic-v3 with 8 parallel environments, and render each environment.

Example training curve for PongDeterministic-v3:

Training curve

References

Code

This code uses Gym environment utilities from these repos:

openai/baselines

openai/universe-starter-agent

ikostrikov/pytorch-a3c

Literature

High-Dimensional Continuous Control Using Generalized Advantage Estimation

Asynchronous Methods for Deep Reinforcement Learning

OpenAI Baselines: ACKTR & A2C