[RLlib] Issue 18280: A3C/IMPALA multi-agent not working. #19100

sven1977 · 2021-10-05T12:04:44Z

Issue 18280: A3C/IMPALA multi-agent not working.

Fix IMPALA's MultiGPULearnerThread to also work with multi-agent.
Fix RolloutWorker's compute_gradients/apply_gradients methods to work with TFPolicy's per-policy session in multi-agent case.

Why are these changes needed?

Issue #18280

Related issue number

Closes #18280

Checks

I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

sven1977 · 2021-10-06T07:07:41Z

rllib/tests/test_supported_multi_agent.py

@@ -14,7 +15,23 @@ def check_support_multiagent(alg, config):
 lambda _: MultiAgentMountainCar({"num_agents": 2}))
 register_env("multi_agent_cartpole",
 lambda _: MultiAgentCartPole({"num_agents": 2}))
- config["log_level"] = "ERROR"
+


Before this PR, we would use the default multiagent setup, which doesn't cover the case of having 2 policies. The default case is simply: 1 policy ("default_policy") and all agents map to this 1 policy.

sven1977 · 2021-10-06T07:08:50Z

rllib/execution/multi_gpu_learner_thread.py

@@ -104,18 +103,14 @@ def __init__(

 self.train_batch_size = train_batch_size

- # TODO: (sven) Allow multi-GPU to work for multi-agent as well.
- self.policy = self.local_worker.policy_map[DEFAULT_POLICY_ID]
+ self.policy_map = self.local_worker.policy_map


The new changes in this file make it possible to run learner threads (e.g. IMPALA) with multi-agent as well. Before, the somewhat obsoleted "simple_optimizer" would have to be used to make this work.

sven1977 · 2021-10-06T07:10:18Z

rllib/evaluation/rollout_worker.py

- pid: self.policy_map[pid]._build_apply_gradients(
- builder, grad)
- for pid, grad in grads.items()
+ if self.policy_config.get("framework") == "tf":


This was buggy code: Each policy nowadays has its own tf-graph (instead of the workers/trainer carrying the graph), so we have to loop through each policy here.

sven1977 · 2021-10-06T07:10:29Z

rllib/evaluation/rollout_worker.py

@@ -860,14 +860,15 @@ def compute_gradients(
 summarize(samples)))
 if isinstance(samples, MultiAgentBatch):
 grad_out, info_out = {}, {}
- if self.tf_sess is not None:
- builder = TFRunBuilder(self.tf_sess, "compute_gradients")
+ if self.policy_config.get("framework") == "tf":


This was buggy code: Each policy nowadays has its own tf-graph (instead of the workers/trainer carrying the graph), so we have to loop through each policy here.

avnishn

This all looks fine to me -- I have some questions that I left as inline review comments. If you could PTAL and answer them that would help me a lot.

avnishn · 2021-10-07T21:34:45Z

rllib/evaluation/rollout_worker.py

+ builders = {}
+ outputs = {}
+ for pid, grad in grads.items():
+ if pid not in self.policies_to_train:


is there a reason that we compute grads ever on a non trainable policy?

couldn't we just not compute grads on it in the first place?

Good point. The thing is we don't call learn_on_batch/learn_on_loaded_batch on non-trainable policies.
But what I think could happen is that the grads were calculated, when some policy A was still trainable and we are now trying here to apply these to A, which might have become untrainable in the meantime as per a user-induced change in the policy_map (and in the trainable_policies list of the worker).

avnishn · 2021-10-07T21:39:54Z

rllib/utils/test_utils.py

@@ -509,6 +509,8 @@ def check_train_results(train_results):
 f"train_results['infos']['learner'] ({learner_info})!"

 for pid, policy_stats in learner_info.items():
+ if pid == "batch_count":


can you explain this addition here? Why would a policy id ever be the string "batch_count". Is this a hack/work around?

Sorry, yeah, this is a hack. Not sure where this key comes from, but it is currently in this learner_info dict (besides the policy IDs), so I didn't want to touch this here. We can fix this in a follow-up PR. No additional harm done here by the PR ;)

avnishn · 2021-10-07T21:40:34Z

rllib/execution/multi_gpu_learner_thread.py

- # TODO: (sven) Allow multi-GPU to work for multi-agent as well.
- self.policy = self.local_worker.policy_map[DEFAULT_POLICY_ID]
+ self.policy_map = self.local_worker.policy_map
+ self.devices = next(iter(self.policy_map.values())).devices


Does this line unblock the following comment that you had deleted?

# TODO: (sven) Allow multi-GPU to work for multi-agent as well.

Correct, the whole PR makes multi-agent actually "runnable" by the MultiGPULearnerThread (which was not possible before).
It's a different question of whether multi-agent (with >1 policies) should use multi-GPU, but in this case, multi-agent was even not running here in the case where only one single policy is used.

wip

56192a0

sven1977 requested a review from avnishn October 5, 2021 12:04

sven1977 assigned avnishn Oct 5, 2021

sven1977 mentioned this pull request Oct 5, 2021

[rllib] A3C error: AttributeError: 'RolloutWorker' object has no attribute 'tf_sess' #18280

Closed

sven1977 added the tests-ok The tagger certifies test failures are unrelated and assumes personal liability. label Oct 6, 2021

sven1977 commented Oct 6, 2021

View reviewed changes

avnishn approved these changes Oct 7, 2021

View reviewed changes

sven1977 merged commit c3e3fc7 into ray-project:master Oct 7, 2021

luckywlj mentioned this pull request Dec 22, 2021

Execution error in a3c_4x4grid.py LucasAlegre/sumo-rl#71

Closed

sven1977 deleted the issue_18280_a3c_and_impala_multi_agent_not_working branch October 10, 2024 13:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib] Issue 18280: A3C/IMPALA multi-agent not working. #19100

[RLlib] Issue 18280: A3C/IMPALA multi-agent not working. #19100

sven1977 commented Oct 5, 2021 •

edited

Loading

sven1977 Oct 6, 2021

sven1977 Oct 6, 2021

sven1977 Oct 6, 2021

sven1977 Oct 6, 2021

avnishn left a comment

avnishn Oct 7, 2021

avnishn Oct 7, 2021

sven1977 Oct 7, 2021

avnishn Oct 7, 2021

sven1977 Oct 7, 2021

avnishn Oct 7, 2021

sven1977 Oct 7, 2021

[RLlib] Issue 18280: A3C/IMPALA multi-agent not working. #19100

[RLlib] Issue 18280: A3C/IMPALA multi-agent not working. #19100

Conversation

sven1977 commented Oct 5, 2021 • edited Loading

Why are these changes needed?

Related issue number

Checks

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

avnishn left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sven1977 commented Oct 5, 2021 •

edited

Loading