My attempt to reproduce a water down version of PBT (Population based training) for MARL (Multi-agent reinforcement learning) using DDPPO (Decentralized & distributed proximal policy optimization) from ray[rllib].
ray
pbt
population-based-training
self-play
multi-agent-reinforcement-learning
rllib
marl
pbt-marl
ddppo
-
Updated
Aug 25, 2020 - Jupyter Notebook