PyTorch implementation of Advantage Actor-Critic (A2C)
Example command line usage:
python main.py BreakoutDeterministic-v3 --num-workers 8 --render
This will train the agent on BreakoutDeterministic-v3 with 8 parallel environments, and render each environment.
Example training curve for PongDeterministic-v3
:
This code uses Gym environment utilities from these repos:
High-Dimensional Continuous Control Using Generalized Advantage Estimation