Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] How best to implement self-play/multiple agents in the same environment? #181

Open
brokenloop opened this issue Jan 31, 2019 · 5 comments
Labels
question Further information is requested

Comments

@brokenloop
Copy link

I'm trying to train a model using self play, and really love the work that has been done here so far. I was wondering whether anyone might have some advice about how I might adapt PPO2 to allow for multiple models to play against each other in the same environment.

The overall strategy would be to:

  • Store N models in a list
  • Generate an action from each of these models using a single observation
  • Generate a list of rewards for each of these actions from an environment
  • Update the models based on these rewards

I have written a custom environment that can take an array of actions, update the game state, and then return a list of rewards for each agent. My main issue is in prying apart the actual model from the interactions with the gym environment. I have been trying to decouple the model from the runner, but it seems as if they are quite tightly intertwined and I'm having a difficult time. Has anyone else played around with this idea before? Or be able to point me in the right direction?

@araffin araffin added the question Further information is requested label Jan 31, 2019
@araffin
Copy link
Collaborator

araffin commented Jun 15, 2019

Hello,

I think @AdamGleave tackled that problem in the Adversarial policies repo, you should take a look ;)

@AdamGleave
Copy link
Collaborator

I never finished the self-play implementation but it might still be worth looking at:https://github.com/HumanCompatibleAI/adversarial-policies/blob/master/src/aprl/agents/ppo_self_play.py

@stefanbschneider
Copy link

stefanbschneider commented May 19, 2020

I never finished the self-play implementation but it might still be worth looking at:https://github.com/HumanCompatibleAI/adversarial-policies/blob/master/src/aprl/agents/ppo_self_play.py

@AdamGleave I can't access the page. Is there still an available/public version of it?

@AdamGleave
Copy link
Collaborator

Yeah it's still in the commit history.

https://github.com/HumanCompatibleAI/adversarial-policies/tree/99700aab22f99f8353dc74b0ddaf8e5861ff34a5/src/aprl/agents

@moliqingwa
Copy link

Here is an example for your reference.
https://github.com/hardmaru/slimevolleygym

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

5 participants