[RLlib] Learner API: Fix and unify grad-clipping configs and behaviors. #34464

sven1977 · 2023-04-17T08:48:09Z

Learner API: Fix and unify grad-clipping configs and behaviors.

Introduce new AlgorithmConfig setting: grad_clip_by, which can be set to "value", "norm" or "global_norm" and determines the mode of clipping.
Made grad_clip a generic AlgorithmConfig property (was only supported by some algos before). However, this setting is only used if _enable_learner_api=True.
Implement proper clipping behaviors based on the 3 new modes in Tf- and TorchLearner (postprocess_gradients method).
Add test cases for these new behaviors.

Why are these changes needed?

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: sven1977 <[email protected]>

avnishn

Thanks for sharing this. However, I am overall confused by this pr. Is it incomplete?

avnishn · 2023-04-18T00:56:26Z

rllib/core/learner/tf/tf_learner.py

+ ) for k, v in gradients_dict.items()
+ }
+
+ # Clip by L2-norm (per gradient tensor).


are we going to allow users to clip gradients by value and then by norm? that doesn't sound right to me, but that is the behavior that has been enabled here.

Great point. Was thinking about this myself. Maybe we should just do: grad_clip: [some value] and then grad_clip_by: [value|norm|global_norm].

avnishn · 2023-04-18T00:57:22Z

rllib/algorithms/qmix/qmix.py

+
+ # TODO (sven): Deprecate grad_clip setting once all-in on new Learner API.
+ self.grad_clip = 10.0
+ self.grad_clip_by_global_norm = 10.0


how is this going to be used downstream by the qmix policy / multi_gpu_train_one_batch.

fixed by replacing the new settings with backward-compatible ones

…process_gradients_implementations

Signed-off-by: sven1977 <[email protected]>

…process_gradients_implementations

Signed-off-by: sven1977 <[email protected]>

sven1977 · 2023-04-26T08:43:42Z

Hey @avnishn , sorry about the confusion and thanks for taking a look!

I went through the PR once more and your comments and addressed all of them.
I changed the configs to the backward-compatible pair of grad_clip and (new) grad_clip_by, which specifies the mode of clipping. This way, it's a) backward compatible and b) exclusive (only one way of clipping at the same time allowed).

sven1977 · 2023-04-26T08:46:57Z

rllib/algorithms/impala/impala.py

@@ -422,16 +425,6 @@ def validate(self) -> None:
 self.vtrace_clip_pg_rho_threshold
 )

- @override(AlgorithmConfig)


Not needed anymore with this PR. grad clipping has been universally moved into Learner.postprocess_gradients().

sven1977 · 2023-04-26T08:50:51Z

rllib/core/learner/torch/torch_learner.py

@@ -79,6 +79,22 @@ def compute_gradients(

 return grads

+ @override(Learner)


This was missing in torch thus far.

sven1977 · 2023-04-26T08:51:30Z

rllib/utils/tf_utils.py

@@ -27,6 +27,48 @@
 tf1, tf, tfv = try_import_tf()


+@PublicAPI


New grad clip utilities, replacing the old, messy ones (some of the old ones do clipping by value, some by norm (not global norm), some have additional optimizer update logic in them, etc..

…s. (ray-project#34464) Signed-off-by: Jack He <[email protected]>

…s. (ray-project#34464)

wip

dc34c33

Signed-off-by: sven1977 <[email protected]>

sven1977 requested review from gjoliver, avnishn, ArturNiederfahrenhorst, smorad, maxpumperla, kouroshHakha and krfricke as code owners April 17, 2023 08:48

sven1977 assigned avnishn Apr 17, 2023

sven1977 added 3 commits April 17, 2023 11:04

wip

6059b3e

Signed-off-by: sven1977 <[email protected]>

wip

38e2a0d

Signed-off-by: sven1977 <[email protected]>

fixes and test case

9e7daf0

Signed-off-by: sven1977 <[email protected]>

avnishn reviewed Apr 18, 2023

View reviewed changes

sven1977 added 5 commits April 25, 2023 21:21

Merge branch 'master' of https://github.com/ray-project/ray into post…

5090ed0

…process_gradients_implementations

wip

6e53eb4

Signed-off-by: sven1977 <[email protected]>

wip

149190e

Signed-off-by: sven1977 <[email protected]>

Merge branch 'master' of https://github.com/ray-project/ray into post…

1afbf7f

…process_gradients_implementations

wip

62ec14d

Signed-off-by: sven1977 <[email protected]>

sven1977 commented Apr 26, 2023

View reviewed changes

sven1977 added the tests-ok The tagger certifies test failures are unrelated and assumes personal liability. label Apr 26, 2023

avnishn approved these changes Apr 27, 2023

View reviewed changes

sven1977 merged commit 25a5bcb into ray-project:master Apr 27, 2023

ProjectsByJackHe pushed a commit to ProjectsByJackHe/ray that referenced this pull request May 4, 2023

[RLlib] Learner API: Fix and unify grad-clipping configs and behavior…

2ea2028

…s. (ray-project#34464) Signed-off-by: Jack He <[email protected]>

architkulkarni pushed a commit to architkulkarni/ray that referenced this pull request May 16, 2023

[RLlib] Learner API: Fix and unify grad-clipping configs and behavior…

2024a06

…s. (ray-project#34464)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib] Learner API: Fix and unify grad-clipping configs and behaviors. #34464

[RLlib] Learner API: Fix and unify grad-clipping configs and behaviors. #34464

sven1977 commented Apr 17, 2023 •

edited

Loading

avnishn left a comment

avnishn Apr 18, 2023

sven1977 Apr 25, 2023

avnishn Apr 18, 2023

sven1977 Apr 27, 2023

sven1977 commented Apr 26, 2023

sven1977 Apr 26, 2023

sven1977 Apr 26, 2023

sven1977 Apr 26, 2023

		@@ -79,6 +79,22 @@ def compute_gradients(

		return grads

		@override(Learner)

[RLlib] Learner API: Fix and unify grad-clipping configs and behaviors. #34464

[RLlib] Learner API: Fix and unify grad-clipping configs and behaviors. #34464

Conversation

sven1977 commented Apr 17, 2023 • edited Loading

Why are these changes needed?

Related issue number

Checks

avnishn left a comment

Choose a reason for hiding this comment

avnishn Apr 18, 2023

Choose a reason for hiding this comment

sven1977 Apr 25, 2023

Choose a reason for hiding this comment

avnishn Apr 18, 2023

Choose a reason for hiding this comment

sven1977 Apr 27, 2023

Choose a reason for hiding this comment

sven1977 commented Apr 26, 2023

sven1977 Apr 26, 2023

Choose a reason for hiding this comment

sven1977 Apr 26, 2023

Choose a reason for hiding this comment

sven1977 Apr 26, 2023

Choose a reason for hiding this comment

sven1977 commented Apr 17, 2023 •

edited

Loading