[Question] How best to implement self-play/multiple agents in the same environment? #181

brokenloop · 2019-01-31T06:51:51Z

I'm trying to train a model using self play, and really love the work that has been done here so far. I was wondering whether anyone might have some advice about how I might adapt PPO2 to allow for multiple models to play against each other in the same environment.

The overall strategy would be to:

Store N models in a list
Generate an action from each of these models using a single observation
Generate a list of rewards for each of these actions from an environment
Update the models based on these rewards

I have written a custom environment that can take an array of actions, update the game state, and then return a list of rewards for each agent. My main issue is in prying apart the actual model from the interactions with the gym environment. I have been trying to decouple the model from the runner, but it seems as if they are quite tightly intertwined and I'm having a difficult time. Has anyone else played around with this idea before? Or be able to point me in the right direction?

araffin · 2019-06-15T09:38:46Z

Hello,

I think @AdamGleave tackled that problem in the Adversarial policies repo, you should take a look ;)

AdamGleave · 2019-06-16T18:51:35Z

I never finished the self-play implementation but it might still be worth looking at:https://github.com/HumanCompatibleAI/adversarial-policies/blob/master/src/aprl/agents/ppo_self_play.py

stefanbschneider · 2020-05-19T15:16:58Z

I never finished the self-play implementation but it might still be worth looking at:https://github.com/HumanCompatibleAI/adversarial-policies/blob/master/src/aprl/agents/ppo_self_play.py

@AdamGleave I can't access the page. Is there still an available/public version of it?

AdamGleave · 2020-05-19T21:31:25Z

Yeah it's still in the commit history.

https://github.com/HumanCompatibleAI/adversarial-policies/tree/99700aab22f99f8353dc74b0ddaf8e5861ff34a5/src/aprl/agents

moliqingwa · 2021-11-02T10:53:44Z

Here is an example for your reference.
https://github.com/hardmaru/slimevolleygym

araffin added the question Further information is requested label Jan 31, 2019

araffin mentioned this issue Jul 30, 2019

Multi-agent Support #423

Closed

araffin mentioned this issue Feb 19, 2020

can any of the baseline can be used for chess? [question] #695

Closed

Miffyli mentioned this issue Feb 24, 2020

[question] Multiple agents with different RL Algorthims #705

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] How best to implement self-play/multiple agents in the same environment? #181

[Question] How best to implement self-play/multiple agents in the same environment? #181

brokenloop commented Jan 31, 2019

araffin commented Jun 15, 2019

AdamGleave commented Jun 16, 2019

stefanbschneider commented May 19, 2020 •

edited

Loading

AdamGleave commented May 19, 2020

moliqingwa commented Nov 2, 2021

[Question] How best to implement self-play/multiple agents in the same environment? #181

[Question] How best to implement self-play/multiple agents in the same environment? #181

Comments

brokenloop commented Jan 31, 2019

araffin commented Jun 15, 2019

AdamGleave commented Jun 16, 2019

stefanbschneider commented May 19, 2020 • edited Loading

AdamGleave commented May 19, 2020

moliqingwa commented Nov 2, 2021

stefanbschneider commented May 19, 2020 •

edited

Loading