Add CrossQ #28

araffin · 2024-02-08T10:00:36Z

Description

Implementing https://openreview.net/forum?id=PczQtTsTIX
on top of #21

Discussion in #36

perf report:
https://wandb.ai/openrlbenchmark/sbx/reports/CrossQ-SBX-Perf-Report--Vmlldzo3MzQxOTAw

Motivation and Context

I have raised an issue to propose this change (required for new features and bug fixes)

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation (update in the documentation)

Checklist:

I've read the CONTRIBUTION guide (required)
I have updated the changelog accordingly (required).
My change requires a change to the documentation.
I have updated the tests accordingly (required for a bug fix or a new feature).
I have updated the documentation accordingly.
I have reformatted the code using make format (required)
I have checked the codestyle using make check-codestyle and make lint (required)
I have ensured make pytest and make type both pass. (required)
I have checked that the documentation builds using make doc (required)

Note: You can run most of the checks using make commit-checks.

Note: we are using a maximum length of 127 characters per line

…C by replacing the unrolled loop with jax.lax.fori_loop

Implemented CrossQ

araffin · 2024-03-29T16:15:18Z

@danielpalen after reading the paper, I'm wondering if you have the learning curves for relu6?
or is it similar to SAC - TN + tanh?

sbx/common/jax_layers.py

jan1854 · 2024-03-30T08:29:45Z

sbx/crossq/policies.py

+ if optimizer_kwargs is None:
+ # Note: the default value for b1 is 0.9 in Adam.
+ # b1=0.5 is used in the original CrossQ implementation and is found
+ # but shows only little overall improvement.
+ optimizer_kwargs = {}
+ if optimizer_class in [optax.adam, optax.adamw]:
+ optimizer_kwargs["b1"] = 0.5


Here, the default value of b1 is set to 0.5 only if no other arguments (or even an empty dict) are passed to the optimizer. It would be cleaner to set the default value to 0.5 regardless of the other optimizer parameters.

Suggested change

if optimizer_kwargs is None:

# Note: the default value for b1 is 0.9 in Adam.

# b1=0.5 is used in the original CrossQ implementation and is found

# but shows only little overall improvement.

optimizer_kwargs = {}

if optimizer_class in [optax.adam, optax.adamw]:

optimizer_kwargs["b1"] = 0.5

if optimizer_kwargs is None:

optimizer_kwargs = {}

if optimizer_class in [optax.adam, optax.adamw] and "b1" not in optimizer_kwargs:

# Note: the default value for b1 is 0.9 in Adam.

# b1=0.5 is used in the original CrossQ implementation but shows only little overall improvement.

optimizer_kwargs["b1"] = 0.5

I would keep it as is to be consistent with what is done in the rest of SB3.

jan1854 · 2024-03-30T08:38:29Z

sbx/common/jax_layers.py

+PRNGKey = Any
+Array = Any
+Shape = Tuple[int, ...]
+Dtype = Any # this could be a real type?
+Axes = Union[int, Sequence[int]]


Flax v0.8.1 introduced flax.typing, which we could use here for more descriptive type hints, similar to the current version of flax.linen.normalization. However, we should probably wait a bit here since this would require a relatively recent flax version.

Co-authored-by: Jan Schneider <[email protected]>

araffin · 2024-04-01T19:16:14Z

@danielpalen Some early results of DroQ + CrossQ (only 2 random seeds on 3 pybullet envs, need more runs): https://wandb.ai/openrlbenchmark/sbx/reports/DroQ-CrossQ-SBX-Perf-Report--Vmlldzo3MzcxNDUy

I also quickly checked the warmup steps and could see an impact on AntBulletEnv-v0 only when it was too small.

danielpalen · 2024-04-02T08:45:54Z

@danielpalen after reading the paper, I'm wondering if you have the learning curves for relu6? or is it similar to SAC - TN + tanh?

I quickly checked and it looked pretty similar.

danielpalen · 2024-04-02T08:53:58Z

@danielpalen Some early results of DroQ + CrossQ (only 2 random seeds on 3 pybullet envs, need more runs): https://wandb.ai/openrlbenchmark/sbx/reports/DroQ-CrossQ-SBX-Perf-Report--Vmlldzo3MzcxNDUy

I have also played around with REDQ/DroQ + CrossQ on MuJoCo but from what I remember, the results were not really consistent, sometimes better, sometimes worse.

I also quickly checked the warmup steps and could see an impact on AntBulletEnv-v0 only when it was too small.

That makes sense. If you go to low you don't have a good estimate for the running statistics yet, so you need to give them enough time to warm up. But the exact time will be environment specific I guess

araffin · 2024-04-02T08:58:15Z

I have also played around with REDQ/DroQ + CrossQ on MuJoCo but from what I remember, the results were not really consistent, sometimes better, sometimes worse.

So far, it always improved the results in my case (need more seeds to confirm, I have tried on different pybullet and mujoco envs), or at least to quickly get "good enough" solution (using up to 2x less samples than CrossQ).

One last point in case you missed it (because from #36 (comment)):
@danielpalen would you be interested in providing a PyTorch implementation for SB3 contrib? (https://github.com/Stable-Baselines-Team/stable-baselines3-contrib)

danielpalen · 2024-04-02T12:25:35Z

One last point in case you missed it (because from #36 (comment)): @danielpalen would you be interested in providing a PyTorch implementation for SB3 contrib? (https://github.com/Stable-Baselines-Team/stable-baselines3-contrib)

Yes, absolutely :) I put it on my todo. But I think I won't be able to get on that right away at the moment.

jan1854 and others added 30 commits December 12, 2023 16:23

Added support for large values for gradient_steps to SAC, TD3, and TQ…

34106cb

…C by replacing the unrolled loop with jax.lax.fori_loop

Merge branch 'master' into master

7d3f328

Add comments

36febf0

Hotfix for train signature

dbe8760

Fixed start index for dynamic_slice_in_dim

4cd5f27

Merge branch 'master' into jan1854/master

a705e4f

Merge branch 'master' into master

8fb4647

Rename policy delay

3066fb2

Fix type annotation

fe68677

Add CrossQ POC

5fef7e4

Remove old annotations

4667216

Add actor BN

e2c2fd8

Concatenate obs/next obs, first working example

03ba862

Deactivate batchnorm for actor

1533ae5

Merge branch 'master' into feat/crossq

695dcc4

Merge branch 'master' into master

6504fe9

Fix off-by-one and improve type annotation

bfbe00b

Fix typo

de56192

Merge branch 'jan1854/master' into feat/crossq

31be655

Update type annotation

2f76b77

Update off-by one

49fbced

Merge branch 'master' into feat/crossq

32b7b99

Merge branch 'master' into feat/crossq

cd02332

Implemented CrossQ

7f5dd22

Added CrossQ to README

e73cf2e

clean up and comments

74af2ea

refactored and added comments

13565d7

Update doc

c6e75da

Cleanup CrossQ and BatchRenorm

f2d4e27

Update tests

ddc6c90

araffin mentioned this pull request Mar 29, 2024

Implemented CrossQ #36

Merged

14 tasks

Merge pull request #36 from danielpalen/feat/crossq

e9262a1

Implemented CrossQ

araffin changed the title ~~Feat/crossq~~ Add CrossQ Mar 29, 2024

araffin marked this pull request as ready for review March 29, 2024 15:35

Fix for new tfp version

deabd7f

Clean-up: Removed unused variables and fixed typo

de80349

jan1854 reviewed Mar 30, 2024

View reviewed changes

This was referenced Mar 30, 2024

set tensorflow-probability version in setup.py #38

Closed

[Bug] AttributeError: module 'tensorflow.python.util.tf_inspect' has no attribute 'Parameter' #29

Closed

araffin and others added 5 commits March 31, 2024 12:20

Cleaner variable names for BatchReNorm

55d60e3

Co-authored-by: Jan Schneider <[email protected]>

Allow to change the number of warmup steps

2ef9dbe

Merge branch 'master' into feat/crossq

3032a97

Update SB3 dependency

289eb47

Merge branch 'feat/crossq' of github.com:araffin/sbx into feat/crossq

726fcbe

araffin added 3 commits April 3, 2024 11:53

Merge branch 'master' into feat/crossq

f1426cd

Deprecate DroQ class

a7b3135

[ci skip] Update comments

8cc748f

araffin merged commit c8db73f into master Apr 3, 2024

araffin deleted the feat/crossq branch April 3, 2024 10:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add CrossQ #28

Add CrossQ #28

araffin commented Feb 8, 2024 •

edited

Loading

araffin commented Mar 29, 2024

jan1854 Mar 30, 2024

araffin Mar 31, 2024

jan1854 Mar 30, 2024

araffin commented Apr 1, 2024

danielpalen commented Apr 2, 2024

danielpalen commented Apr 2, 2024

araffin commented Apr 2, 2024

danielpalen commented Apr 2, 2024

Add CrossQ #28

Add CrossQ #28

Conversation

araffin commented Feb 8, 2024 • edited Loading

Description

Motivation and Context

Types of changes

Checklist:

araffin commented Mar 29, 2024

jan1854 Mar 30, 2024

Choose a reason for hiding this comment

araffin Mar 31, 2024

Choose a reason for hiding this comment

jan1854 Mar 30, 2024

Choose a reason for hiding this comment

araffin commented Apr 1, 2024

danielpalen commented Apr 2, 2024

danielpalen commented Apr 2, 2024

araffin commented Apr 2, 2024

danielpalen commented Apr 2, 2024

araffin commented Feb 8, 2024 •

edited

Loading