-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib] MARL examples updated to support RLModule #31169
Conversation
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
1. The test now runs on PPO by default instead of APPO 2. Tests pass when enable_rl_module_api Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
… need to investigate Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one minor thing.
one random suggestion, if you want to make this job easier going forward, we can let AlgorithmConfiig.validate() turn on enable_connectors and _enable_rl_module_api based on e.g. an env variable.
and you can add a new pipeline in pipeline.ml.yml say "rl_module".
Then, if you want to test any example or tuned_example in rl_module mode, you can simply add a build rule with "rl_module" tag without updating any of the files.
You can also have a separate CI group for RLModule :)
ray.init(num_cpus=args.num_cpus or None, include_dashboard=False) | ||
|
||
args = get_cli_args() | ||
ray.init(num_cpus=args.num_cpus or None, include_dashboard=False, local_mode=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
revert local_mode
Very good idea, I'll set that up here in this PR since I have to revert local_mode anyway. |
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
.environment("multi_agent_cartpole") | ||
.framework(args.framework) | ||
.multi_agent( | ||
# The multiagent Policy map. | ||
policies={ | ||
# The Policy we are actually learning. | ||
"pg_policy": PolicySpec( | ||
config=PGConfig.overrides(framework_str=args.framework) | ||
"learnable_policy": PolicySpec( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just using better names :)
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
@@ -3182,6 +3182,15 @@ py_test( | |||
args = ["--as-test", "--framework=torch", "--stop-reward=70.0", "--num-cpus=4"] | |||
) | |||
|
|||
py_test( | |||
name = "examples/multi_agent_cartpole_torch_w_rlm", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"rlm" -> "rl_module"
Also: see my comment below: Can we use a command line arg here (--use-rl-module
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
answered below.
@@ -3200,6 +3209,15 @@ py_test( | |||
args = ["--as-test", "--framework=torch", "--stop-reward=80"] | |||
) | |||
|
|||
py_test( | |||
name = "examples/multi_agent_custom_policy_torch_w_rlm", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"rlm" -> "rl_module"
Also: see my comment below: Can we use a command line arg here (--use-rl-module
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
answered below.
@@ -3218,6 +3236,16 @@ py_test( | |||
args = ["--stop-iters=4", "--framework=torch"] | |||
) | |||
|
|||
py_test( | |||
name = "examples/multi_agent_different_spaces_for_agents_torch_w_rlm", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"rlm" -> "rl_module"
Also: see my comment below: Can we use a command line arg here (--use-rl-module
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
answered below.
@@ -3538,6 +3566,15 @@ py_test( | |||
args = ["--framework=torch", "--env=connect_four", "--win-rate-threshold=0.6", "--stop-iters=2", "--num-episodes-human-play=0"] | |||
) | |||
|
|||
py_test( | |||
name = "examples/self_play_with_open_spiel_connect_4_torch_w_rlm", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"rlm" -> "rl_module"
Also: see my comment below: Can we use a command line arg here (--use-rl-module
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
answered below
@@ -3556,6 +3593,15 @@ py_test( | |||
args = ["--framework=torch", "--env=markov_soccer", "--win-rate-threshold=0.6", "--stop-iters=2", "--num-episodes-human-play=0"] | |||
) | |||
|
|||
py_test( | |||
name = "examples/self_play_league_based_with_open_spiel_markov_soccer_torch_w_rlm", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"rlm" -> "rl_module"
Also: see my comment below: Can we use a command line arg here (--use-rl-module
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These will show up in CI under RL Module tests, It will be super clear in the CI UI. I'd like to keep it concise here. Also, it's a transitional solution. At some point it will become the default and we have to get rid of these stuff anyways.
@@ -765,6 +766,12 @@ def validate(self) -> None: | |||
"`config.rollouts(enable_connectors=True)`." | |||
) | |||
|
|||
if bool(os.environ.get("RLLIB_ENABLE_RL_MODULE", False)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use command line arg
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I had it like what you said and changed it to this way. It will be easier for us to flip it by default later on. Also adding that flag to every other unittest down the line will create a lot of duplicates and is not very clean for a temporary transition. This is a temporary solution until RLModule is proven out across the board and fully rolled out.
gamma=random.choice([0.95, 0.99]), | ||
) | ||
|
||
if bool(os.environ.get("RLLIB_ENABLE_RL_MODULE", False)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we instead use a command line arg here, like we do for all other settings?
I know we have the gpu env setting, but te reason for that is to be able to set this globally so we can more gracefully handle this case in our buildkite/pipeline.ml.yml
file. But for this case here, I don't see that need. Thx!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to disagree due to the reason above. I went through both solutions and found this one to be more scalable for our transition.
@@ -120,31 +120,31 @@ def get_cli_args(): | |||
"episode_reward_mean": args.stop_reward, | |||
} | |||
|
|||
config = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please no more config dicts.
Use AlgorithmConfig
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know you just moved this from below, but either way :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh yeah. Done.
"policy_mapping_fn": ( | ||
lambda agent_id, episode, worker, **kwargs: agent_id | ||
), | ||
"policies": env.get_agent_ids(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, same here. Let's translate to AlgorithmConfig
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
) | ||
args = parser.parse_args() | ||
|
||
def get_cli_args(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
) | ||
args = parser.parse_args() | ||
|
||
def get_cli_args(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super nice! :)
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for the explanation on the command line arg question. Yes, sounds reasonable.
Signed-off-by: Capiru <[email protected]>
Signed-off-by: tmynn <[email protected]>
Signed-off-by: tmynn <[email protected]>
Signed-off-by: tmynn <[email protected]>
Why are these changes needed?
Added unittests to cover PPORLModule examples for MARL.
Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.