[RLlib] APPO+new-stack (Atari benchmark) - Preparatory PR 01 #34743

sven1977 · 2023-04-25T14:34:27Z

APPO+new-stack (Atari benchmark) - Preparatory PR 01

Code cleanups.
tuned_examples for Atari should all configure their env_config to match the old NoFrameskip + Deterministic (-v4) setup (see here for more details: https://gymnasium.farama.org/environments/atari/#version-history-and-naming-schemes).

This is a breakdown PR (1st of N) from this one here, which is too large to merge: #34363

Why are these changes needed?

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: sven1977 <[email protected]>

sven1977 · 2023-04-25T14:40:27Z

rllib/BUILD

@@ -171,6 +171,16 @@ py_test(
 args = ["--dir=tuned_examples/appo"]
 )

+py_test(


Added this learning test for the new stack.

sven1977 · 2023-04-25T14:40:48Z

rllib/algorithms/ppo/torch/ppo_torch_rl_module.py

@@ -13,19 +13,6 @@
 torch, nn = try_import_torch()


-def get_ppo_loss(fwd_in, fwd_out):


Not used anywhere in the code anymore: Removed! :)

sven1977 · 2023-04-25T14:41:23Z

rllib/core/learner/learner.py

@@ -522,14 +522,14 @@ def compile_results(

 # We put the stats for all modules under the ALL_MODULES key. e.g. average of
 # the gradients across all modules will go here.
- mean_grads = [
- np.mean(grad)
+ mean_abs_grads = [


MEAN(ABS(grads)) is a much better metric than MEAN(grads) as the latter is always very close to zero, regardless of the nature of the gradients.

what we should actually probably do here is the gradient norms. We can stick with this for now as it is similar, but in the future, norm is probably the way to go.

sven1977 · 2023-04-25T14:41:53Z

rllib/policy/eager_tf_policy_v2.py

@@ -864,8 +864,6 @@ def _compute_actions_helper(
 dist_inputs = None

 elif is_overridden(self.action_sampler_fn):
- dist_inputs = None


these vars were not used. Not sure how this survived our LINTer for so long.

sven1977 · 2023-04-25T14:42:17Z

rllib/policy/torch_mixins.py


- for target in self.target_models.values():
- target.load_state_dict(model_state_dict)
+ if self.config.get("_enable_rl_module_api", False):


Preparation for the upcoming move of APPO (torch) to the new API stack.

sven1977 · 2023-04-25T14:42:46Z

rllib/tuned_examples/appo/cartpole-appo-learner.yaml

@@ -1,31 +0,0 @@
-cartpole-appo-learner:


Gave this file a better, more descriptive/accurate name.

sven1977 · 2023-04-25T14:43:23Z

rllib/utils/actor_manager.py

@@ -232,8 +232,8 @@ class _ActorState:
 def __init__(
 self,
 actors: Optional[List[ActorHandle]] = None,
- max_remote_requests_in_flight_per_actor: Optional[int] = 2,


These are not optional. They must be ints. Yes, they have default (int) values, but cannot be None.

avnishn

some questions, ty for this cleanup

avnishn · 2023-04-25T21:18:11Z

rllib/core/learner/learner.py

@@ -522,14 +522,14 @@ def compile_results(

 # We put the stats for all modules under the ALL_MODULES key. e.g. average of
 # the gradients across all modules will go here.
- mean_grads = [
- np.mean(grad)
+ mean_abs_grads = [


what we should actually probably do here is the gradient norms. We can stick with this for now as it is similar, but in the future, norm is probably the way to go.

avnishn · 2023-04-25T21:19:31Z

rllib/core/learner/learner.py


 Returns:
- The constructed module.
+ The constructed MultiAgentRLModule.


nit not necessary to address: A constructed MultiAgentRLModule

avnishn · 2023-04-25T21:22:10Z

rllib/core/learner/scaling_config.py

@@ -13,7 +14,8 @@ class LearnerGroupScalingConfig:
 training will run on a single CPU.
 num_gpus_per_worker: The number of GPUs to allocate per worker. If
 num_workers=0, any number greater than 0 will run the training on a single
- GPU. A value of zero will run the training on a single CPU.
+ GPU. A value of zero will run the training on a single CPU. Fractional


fractional values aren't really supported. Fractional gpu ussage will cause cuda aync errors AFAICT

a value of zero will run the traing on num_cpus_per worker CPUs, not a single cpu.

Thanks, will fix. ...

Actually, there is one test, where we do set num_gpus_per_worker to 0.5:

In test_learner_group.py:

LOCAL_SCALING_CONFIGS = { "local-cpu": LearnerGroupScalingConfig(num_workers=0, num_gpus_per_worker=0), "local-gpu": LearnerGroupScalingConfig(num_workers=0, num_gpus_per_worker=0.5), }

I think that's why I fixed the typehint, but yeah, it makes sense that torch/tf DDP wouldn't like it. I fixed the comment and said it's not allowed due to xyz.

avnishn · 2023-04-25T21:28:42Z

rllib/tuned_examples/appo/pong-appo.yaml

- vtrace: True
- use_kl_loss: False
+ full_action_space: false
+ repeat_action_probability: 0.0 # deterministic


hmm what is this. What is the repeat action probability by default and how much does it matter?

Repeat action prob by default is 025, meaning that in 25% of all Env.step(action) calls, the env doesn't actually apply the given action, but repeats the previous one. This means that by default, Atari envs are stochastic, but our benchmarks always used the deterministic setting.
See here for more details: https://gymnasium.farama.org/environments/atari/#version-history-and-naming-schemes

We used to run against: "PongNoFrameskip-v4", where the v4 indicates repeat_action_probability=0.0 and the NoFrameskip means frameskip=0.

avnishn · 2023-04-25T21:29:34Z

rllib/tuned_examples/appo/pong-appo.yaml

+ full_action_space: false
+ repeat_action_probability: 0.0 # deterministic
+ vtrace: true
+ use_kl_loss: false


do we need to disable the kl loss? Is this something that we were doing in the old stack?

I noticed in the new benchmarks that KL-loss might destabilize the run, but this observation was made while several other issues had not been fixed so it could have been a red herring. Either way, we don't currently use kl_loss in our ols-stack benchmark, so I would like to keep it switched off until we have fully investigated a possible benefit of using this term.

avnishn · 2023-04-25T21:29:59Z

rllib/tuned_examples/dqn/atari-dist-dqn.yaml

 env_config:
- frameskip: 1 # no frameskip
+ frameskip: 1
+ full_action_space: false


what is the full action space? Does disabling it blow up the action space significantly?

See here: https://gymnasium.farama.org/environments/atari/#action-space
The default is False anyways, so no need to really set this here, but for explicitness, I added this as well.

kouroshHakha · 2023-04-25T21:28:58Z

rllib/tuned_examples/appo/cartpole-appo-w-rl-modules-and-learner.yaml

+ num_gpus_per_learner_worker: 0
+ num_cpus_per_learner_worker: 1
+
+ _enable_rl_module_api: true


super nit: can you put _enable_xxx flags close to each other?

kouroshHakha · 2023-04-25T21:29:58Z

rllib/tuned_examples/appo/pong-appo-w-rl-modules-and-learner.yaml

+ # Run with Learner API.
+ _enable_learner_api: true
+ grad_clip_by_global_norm: 10.0
+ # Use a single Learner worker on the GPU.


the comment doesn't match the code.

…stack_appo_atari_benchmark_01

Signed-off-by: sven1977 <[email protected]>

…oject#34743) Signed-off-by: Jack He <[email protected]>

…oject#34743)

wip

f8a4cce

Signed-off-by: sven1977 <[email protected]>

sven1977 requested review from gjoliver, avnishn, ArturNiederfahrenhorst, smorad, maxpumperla, kouroshHakha and krfricke as code owners April 25, 2023 14:34

sven1977 assigned ArturNiederfahrenhorst Apr 25, 2023

sven1977 commented Apr 25, 2023

View reviewed changes

avnishn reviewed Apr 25, 2023

View reviewed changes

kouroshHakha approved these changes Apr 25, 2023

View reviewed changes

sven1977 added 3 commits April 26, 2023 09:22

Merge branch 'master' of https://github.com/ray-project/ray into new_…

d9f0820

…stack_appo_atari_benchmark_01

wip

889c079

Signed-off-by: sven1977 <[email protected]>

wip

2a2f763

Signed-off-by: sven1977 <[email protected]>

sven1977 merged commit 5a3954e into ray-project:master Apr 26, 2023

ProjectsByJackHe pushed a commit to ProjectsByJackHe/ray that referenced this pull request May 4, 2023

[RLlib] APPO+new-stack (Atari benchmark) - Preparatory PR 01. (ray-pr…

fac84e6

…oject#34743) Signed-off-by: Jack He <[email protected]>

architkulkarni pushed a commit to architkulkarni/ray that referenced this pull request May 16, 2023

[RLlib] APPO+new-stack (Atari benchmark) - Preparatory PR 01. (ray-pr…

20a0963

…oject#34743)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib] APPO+new-stack (Atari benchmark) - Preparatory PR 01 #34743

[RLlib] APPO+new-stack (Atari benchmark) - Preparatory PR 01 #34743

sven1977 commented Apr 25, 2023 •

edited

Loading

sven1977 Apr 25, 2023

sven1977 Apr 25, 2023

sven1977 Apr 25, 2023

avnishn Apr 25, 2023

sven1977 Apr 25, 2023

sven1977 Apr 25, 2023

sven1977 Apr 25, 2023

sven1977 Apr 25, 2023

avnishn left a comment

avnishn Apr 25, 2023

avnishn Apr 25, 2023

avnishn Apr 25, 2023

avnishn Apr 25, 2023

sven1977 Apr 26, 2023

sven1977 Apr 26, 2023

sven1977 Apr 26, 2023

sven1977 Apr 26, 2023

avnishn Apr 25, 2023

sven1977 Apr 26, 2023

avnishn Apr 25, 2023

sven1977 Apr 26, 2023

avnishn Apr 25, 2023

sven1977 Apr 26, 2023

kouroshHakha Apr 25, 2023

sven1977 Apr 26, 2023

kouroshHakha Apr 25, 2023

sven1977 Apr 26, 2023

		@@ -13,19 +13,6 @@
		torch, nn = try_import_torch()


		def get_ppo_loss(fwd_in, fwd_out):

[RLlib] APPO+new-stack (Atari benchmark) - Preparatory PR 01 #34743

[RLlib] APPO+new-stack (Atari benchmark) - Preparatory PR 01 #34743

Conversation

sven1977 commented Apr 25, 2023 • edited Loading

Why are these changes needed?

Related issue number

Checks

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

avnishn left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sven1977 commented Apr 25, 2023 •

edited

Loading