Open API Practice Date: 2021-11-07 14:14:12 MLP discrete space MLP Continuous space CNN implementation reply buffer open AI gym custom/modifications Notes PPO can learn from short trajectories