Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib] APPO+new-stack (Atari benchmark) - Preparatory PR 01 #34743

Merged

Conversation

sven1977
Copy link
Contributor

@sven1977 sven1977 commented Apr 25, 2023

APPO+new-stack (Atari benchmark) - Preparatory PR 01

This is a breakdown PR (1st of N) from this one here, which is too large to merge: #34363

Why are these changes needed?

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: sven1977 <[email protected]>
@@ -171,6 +171,16 @@ py_test(
args = ["--dir=tuned_examples/appo"]
)

py_test(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this learning test for the new stack.

@@ -13,19 +13,6 @@
torch, nn = try_import_torch()


def get_ppo_loss(fwd_in, fwd_out):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not used anywhere in the code anymore: Removed! :)

@@ -522,14 +522,14 @@ def compile_results(

# We put the stats for all modules under the ALL_MODULES key. e.g. average of
# the gradients across all modules will go here.
mean_grads = [
np.mean(grad)
mean_abs_grads = [
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MEAN(ABS(grads)) is a much better metric than MEAN(grads) as the latter is always very close to zero, regardless of the nature of the gradients.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what we should actually probably do here is the gradient norms. We can stick with this for now as it is similar, but in the future, norm is probably the way to go.

@@ -864,8 +864,6 @@ def _compute_actions_helper(
dist_inputs = None

elif is_overridden(self.action_sampler_fn):
dist_inputs = None
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these vars were not used. Not sure how this survived our LINTer for so long.


for target in self.target_models.values():
target.load_state_dict(model_state_dict)
if self.config.get("_enable_rl_module_api", False):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Preparation for the upcoming move of APPO (torch) to the new API stack.

@@ -1,31 +0,0 @@
cartpole-appo-learner:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gave this file a better, more descriptive/accurate name.

@@ -232,8 +232,8 @@ class _ActorState:
def __init__(
self,
actors: Optional[List[ActorHandle]] = None,
max_remote_requests_in_flight_per_actor: Optional[int] = 2,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are not optional. They must be ints. Yes, they have default (int) values, but cannot be None.

Copy link
Member

@avnishn avnishn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some questions, ty for this cleanup

@@ -522,14 +522,14 @@ def compile_results(

# We put the stats for all modules under the ALL_MODULES key. e.g. average of
# the gradients across all modules will go here.
mean_grads = [
np.mean(grad)
mean_abs_grads = [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what we should actually probably do here is the gradient norms. We can stick with this for now as it is similar, but in the future, norm is probably the way to go.


Returns:
The constructed module.
The constructed MultiAgentRLModule.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit not necessary to address: A constructed MultiAgentRLModule

@@ -13,7 +14,8 @@ class LearnerGroupScalingConfig:
training will run on a single CPU.
num_gpus_per_worker: The number of GPUs to allocate per worker. If
num_workers=0, any number greater than 0 will run the training on a single
GPU. A value of zero will run the training on a single CPU.
GPU. A value of zero will run the training on a single CPU. Fractional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fractional values aren't really supported. Fractional gpu ussage will cause cuda aync errors AFAICT

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a value of zero will run the traing on num_cpus_per worker CPUs, not a single cpu.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, will fix. ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, there is one test, where we do set num_gpus_per_worker to 0.5:

In test_learner_group.py:

LOCAL_SCALING_CONFIGS = {
    "local-cpu": LearnerGroupScalingConfig(num_workers=0, num_gpus_per_worker=0),
    "local-gpu": LearnerGroupScalingConfig(num_workers=0, num_gpus_per_worker=0.5),
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's why I fixed the typehint, but yeah, it makes sense that torch/tf DDP wouldn't like it. I fixed the comment and said it's not allowed due to xyz.

vtrace: True
use_kl_loss: False
full_action_space: false
repeat_action_probability: 0.0 # deterministic
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm what is this. What is the repeat action probability by default and how much does it matter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Repeat action prob by default is 025, meaning that in 25% of all Env.step(action) calls, the env doesn't actually apply the given action, but repeats the previous one. This means that by default, Atari envs are stochastic, but our benchmarks always used the deterministic setting.
See here for more details: https://gymnasium.farama.org/environments/atari/#version-history-and-naming-schemes

We used to run against: "PongNoFrameskip-v4", where the v4 indicates repeat_action_probability=0.0 and the NoFrameskip means frameskip=0.

full_action_space: false
repeat_action_probability: 0.0 # deterministic
vtrace: true
use_kl_loss: false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to disable the kl loss? Is this something that we were doing in the old stack?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed in the new benchmarks that KL-loss might destabilize the run, but this observation was made while several other issues had not been fixed so it could have been a red herring. Either way, we don't currently use kl_loss in our ols-stack benchmark, so I would like to keep it switched off until we have fully investigated a possible benefit of using this term.

env_config:
frameskip: 1 # no frameskip
frameskip: 1
full_action_space: false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the full action space? Does disabling it blow up the action space significantly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See here: https://gymnasium.farama.org/environments/atari/#action-space
The default is False anyways, so no need to really set this here, but for explicitness, I added this as well.

num_gpus_per_learner_worker: 0
num_cpus_per_learner_worker: 1

_enable_rl_module_api: true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nit: can you put _enable_xxx flags close to each other?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

# Run with Learner API.
_enable_learner_api: true
grad_clip_by_global_norm: 10.0
# Use a single Learner worker on the GPU.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the comment doesn't match the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
@sven1977 sven1977 merged commit 5a3954e into ray-project:master Apr 26, 2023
ProjectsByJackHe pushed a commit to ProjectsByJackHe/ray that referenced this pull request May 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants