[Feature]: Dreamer support #341

nicolas-dufour · 2022-08-05T17:18:41Z

Description

In this PR we add Dreamer a model based environnement.

Implemented objects

To retrieve the dreamers objects, call make_dreamer

from torchrl.trainers.helpers.models import make_dreamer
world_model, model_based_env, actor_model, value_model, policy = make_dreamer(
        proof_environment=proof_env, cfg=cfg, device=device, use_decoder_in_env=True,action_key="action",
        value_key="predicted_value",
    )

proof_env the env on which we are going to train afterward, cfg the config file (see DreamerConfig in torchrl.trainers.helpers.models)

world_model is the world model of dreamer. From a given obs, it predicts a reward, a latent world state and a reconstruction of the obs.
model_based_env is an env that works on the latent world state (no obs). It can generate new latent world state from initial latent world state. If use_decoder_in_env, the env can decode the generated states with decode_obs method
actor_model is the associated dreamer actor model
value_model predicts a value
policy combines world_model and actor, such as from a given observation, we predict the action to take

We also provide 3 loss models:

world_model_loss = DreamerModelLoss(world_model, cfg).to(device)
actor_loss = DreamerActorLoss(
    actor_model, value_model, model_based_env, cfg
).to(device)
value_loss = DreamerValueLoss(value_model).to(device)

This 3 losses module will each enable training the 5 models above.

Usecase of model_based_env

Our Env allow us to generate new data.
To do so we can do:

td  = model_based_env.rollout(
    max_steps=self.cfg.imagination_horizon,
    policy=self.actor_model,
)

By default, the env will get reseted to have defaults states of zeros. The better way to sample from it is to have a previous world state (lets call it td)

td  = model_based_env.rollout(
    max_steps=self.cfg.imagination_horizon,
    policy=self.actor_model,
    auto_reset=False,
    tensordict=td
)

Motivation and Context

Dreamer is very data efficient and will enable easy use of model based methods.

Types of changes

What types of changes does your code introduce? Remove all that do not apply:

New feature (non-breaking change which adds core functionality)
Documentation (update in the documentation)

Checklist

Go over all the following points, and put an x in all the boxes that apply.
If you are unsure about any of these, don't hesitate to ask. We are here to help!

I have read the CONTRIBUTION guide (required)
My change requires a change to the documentation.
I have updated the tests accordingly (required for a bug fix or a new feature).
I have updated the documentation accordingly.

vmoens

Some high level comments.
I would rather have a DreamerLoss module that contains all that is not needed for inference than one big env that does everything. Do you agree?
Have a look at how we do it for SAC and REDQ for instance.

torchrl/envs/mb_envs/dreamer.py

vmoens

Let's use the TorchRL primitives instead.
Also let's use LSTM(batched_input)
With batched_input with size [B, T] instead of looping over the LSTM. It uses CuDNN and it's way way faster.
Let's avoid building distributions unless it's really necessary.

torchrl/modules/models/models.py

torchrl/envs/model_based.py

torchrl/modules/models/models.py

vmoens

Some comments to improve efficiency

torchrl/objectives/costs/dreamer.py

vmoens · 2022-08-15T16:36:32Z

torchrl/objectives/costs/dreamer.py

+ )
+ actor_loss = -lambda_target.mean()
+ with torch.no_grad():
+ value_td = tensordict.clone().detach()


Detach under no_grad?

If not detach we have graph optimization overlap

torchrl/objectives/costs/utils.py

torchrl/modules/models/models.py

vmoens · 2022-08-16T05:32:21Z

torchrl/modules/models/models.py

+ obs_decoded = obs_decoded.reshape(*batch_sizes, C, H, W)
+ return obs_decoded
+
+class RSSMPriorRollout(nn.Module):


To me this would work perfectly with tensordictmodule.
That would permit us to preallocate the tensors of the rollout, which should be more efficient than stacking the outputs.

You mean not having RSSMPriorRollout as an nn.Module but doing the loop over a TDModule? But then how would you integrate this with a TDSequence?

vmoens · 2022-08-16T05:33:05Z

torchrl/modules/models/models.py

+class RSSMPrior(nn.Module):
+ def __init__(self, hidden_dim=200, rnn_hidden_dim=200, state_dim=20):
+ super().__init__()
+ self.min_std = 0.1


Let's make this a hyperparam

torchrl/modules/models/models.py

vmoens · 2022-08-16T05:38:20Z

torchrl/objectives/costs/dreamer.py

+ return (
+ TensorDict(
+ {
+ "loss_world_model": loss,


Here you are returning 4 losses, one of them being a sum of the others. To make sure that we don't do anything stupid like re-summing the losses (which is what the trainer will do) you should either return one loss only (but then we won't be able to log each of them individually) or return the decomposed loss only.

From what i 've seen from trainer it doesnt retrieve losses if it doesnt start with "loss_". So renaming loss_kl by kl and so on would do the trick no?

torchrl/objectives/costs/utils.py

torchrl/trainers/helpers/envs.py

examples/dreamer/dreamer.py

vmoens · 2022-08-17T20:01:03Z

examples/dreamer/dreamer.py

+ scaler.unscale_(value_opt)
+ clip_grad_norm_(value_model.parameters(), cfg.grad_clip)
+
+ scaler.step(world_model_opt)


I think we should do

loss1 = ...
Optim1.step()

Loss2 = ...
Optim2.Step()
Etc

Like this we allow the gpu to free mem when calling backward

I was doing it like that before, however the pb is that autocast struggle in this context. Indeed according to pytorch doc only a single scaler can be created and you cannot scale again after unscale_ . I felt that 16bit precision would be better to have

Can't you scale / unscale multiple times? What would be the difference between that and scaling/unscaling through a loop?

with autocast(dtype=torch.float16): model_loss_td, sampled_tensordict = world_model_loss( sampled_tensordict ) if cfg.record_video: world_model_td = sampled_tensordict.clone().select( "pixels", "reco_pixels", "posterior_states", "next_belief" )[:4].detach() scaler1.scale(model_loss_td["loss_world_model"]).backward() scaler1.unscale_(world_model_opt) clip_grad_norm_(world_model.parameters(), cfg.grad_clip) scaler1.step(world_model_opt) world_model_opt.zero_grad() scaler1.update() with autocast(dtype=torch.float16): actor_loss_td, sampled_tensordict = actor_loss(sampled_tensordict) scaler2.scale(actor_loss_td["loss_actor"]).backward() scaler2.unscale_(actor_opt) clip_grad_norm_(actor_model.parameters(), cfg.grad_clip) scaler2.step(actor_opt) actor_opt.zero_grad() scaler2.update() with autocast(dtype=torch.float16): value_loss_td, sampled_tensordict = value_loss(sampled_tensordict) scaler3.scale(value_loss_td["loss_value"]).backward() scaler3.unscale_(value_opt) clip_grad_norm_(value_model.parameters(), cfg.grad_clip) scaler3.step(value_opt) value_opt.zero_grad() scaler3.update()

vmoens · 2022-08-17T20:02:08Z

test/mocking_classes.py

+ batch_size=None,
+ ):
+ super(DummyModelBasedEnv, self).__init__(
+ WorldModelWrapper(


Let's avoid building thing inside a caller, it's just syntax but it feels messy

I'll change that in the MBEnv PR instead

vmoens · 2022-08-17T20:04:07Z

torchrl/modules/models/models.py

+ observation, start_dim=0, end_dim=end_dim
+ )
+ obs_encoded = self.encoder(observation)
+ latent = obs_encoded.reshape(*batch_sizes, -1)


Do we need reshape or does view work?

View does not work in this case

luisenp · 2022-08-18T17:39:36Z

torchrl/envs/model_based.py

@@ -0,0 +1,181 @@
+# Copyright (c) Meta Platforms, Inc. and affiliates.


Seems like some of the changes in this PR belong in #333? It would be cleaner for code review and log history to have this PR only include the dreamer specific implementations of the abstractions provided in #333.

examples/dreamer/dreamer.py

vmoens · 2022-08-20T04:08:21Z

examples/dreamer/dreamer.py

+ policy=actor_model,
+ auto_reset=False,
+ tensordict=world_model_td[:, 0],
+ ).detach()


this is already under a no_grad(), no need to detach

I've seen memory explosion without the detach. Tell me if i'm wrong but no grad makes sure to not collect gradients but it doesn't detach elements that are already in the graph no?

vmoens · 2022-08-20T04:09:39Z

examples/dreamer/dreamer.py

+ imagine_pxls = recover_pixels(model_based_env.decode_obs(world_model_td)["reco_pixels"], stats)
+
+ stacked_pixels = torch.cat([true_pixels, reco_pixels, imagine_pxls], dim=-1)
+ logger.log_video(


Why does t appear in the log that we have way more reconstructions than actual pixels? Storing pixels is heavy and can quickly overload the disk

"way more reconstructions than actual pixels?" -> What do you mean by this?

vmoens · 2022-08-20T04:39:04Z

examples/dreamer/dreamer.py

+ sampled_tensordict
+ )
+ if cfg.record_video:
+ world_model_td = sampled_tensordict.clone().select(


careful here: you clone the whole thing then select, I would do the opposite (select -> clone)
Not even sure clone is needed (select is not done in-place unless specified).

Was doing the opposite before but this was changing the original tensordict, a lot of keys were missing for the actor part after, so that's why i reverted it.

no select does not change the original tensordict! Or there is a serious bug!

vmoens · 2022-08-21T04:27:02Z

examples/dreamer/dreamer.py

+ # update weights of the inference policy
+ collector.update_policy_weights_()
+
+ if r0 is None:


we should log the training rewards somewhere

vmoens · 2022-08-21T04:29:04Z

examples/dreamer/dreamer.py

+ scaler.update()
+
+ with torch.no_grad(), set_exploration_mode("mode"):
+ td_record = record(None)


on my end this step does not log any video. Perhaps the way the global_step is indicated in TensorBoard conflicts with the Wandb api?

vmoens · 2022-08-21T04:31:15Z

examples/dreamer/dreamer.py

+
+ scaler.update()
+
+ with torch.no_grad(), set_exploration_mode("mode"):


I personally prefer to have this out of the inner training loop. The reason is that it's easier to control the number of collection steps than the number of training steps, and small changes in the config can have a great impact in the number of evaluation data collections, which has a great physical memory cost and compute time cost.

examples/dreamer/dreamer.py

vmoens · 2022-08-21T04:40:16Z

examples/dreamer/dreamer.py

+ else:
+ current_frames = tensordict.numel()
+ collected_frames += current_frames
+ tensordict = tensordict.reshape(-1, cfg.batch_length)


Why do we need this? Doesn't the tensordict already have this size?
This will break if there is a "mask" key in the tensordict (see my comment above)
Also reshape should be replaced by 'view' no?

No, the tensordict has the size of the collected batch which is n_workers x max_frames_per_traj with max_frames_per_traj=1000, but then we want to use tensors of size B x batch_length with batch_length = 50 in dreamer default

are we sure that the resulting tensordict will always be properly shaped? Also that might break no, if batch_length does not match the size of the collected data. e.g. what happens if max_frames_per_traj=789 and batch_length=25?

vmoens · 2022-08-21T04:54:04Z

examples/dreamer/dreamer.py

+)
+from torchrl.trainers.helpers.models import (
+ make_dreamer,
+ DreamerConfig,


Why importing this when we redefine it after?

# Conflicts: # torchrl/objectives/costs/__init__.py # torchrl/objectives/dreamer.py

vmoens

LGTM

vmoens

LGTM

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 5, 2022

nicolas-dufour requested a review from vmoens August 5, 2022 17:19

vmoens reviewed Aug 5, 2022

View reviewed changes

nicolas-dufour marked this pull request as draft August 10, 2022 10:55

vmoens reviewed Aug 14, 2022

View reviewed changes

vmoens reviewed Aug 15, 2022

View reviewed changes

torchrl/envs/model_based.py Outdated Show resolved Hide resolved

torchrl/modules/models/models.py Outdated Show resolved Hide resolved

vmoens reviewed Aug 15, 2022

View reviewed changes

vmoens reviewed Aug 16, 2022

View reviewed changes

nicolas-dufour changed the title ~~[Feature]: Dreamer ModelBased env~~ [Feature]: Dreamer support Aug 16, 2022

vmoens reviewed Aug 17, 2022

View reviewed changes

luisenp reviewed Aug 18, 2022

View reviewed changes

vmoens reviewed Aug 20, 2022

View reviewed changes

examples/dreamer/dreamer.py Outdated Show resolved Hide resolved

vmoens reviewed Aug 20, 2022

View reviewed changes

vmoens reviewed Aug 21, 2022

View reviewed changes

examples/dreamer/dreamer.py Outdated Show resolved Hide resolved

vmoens reviewed Aug 21, 2022

View reviewed changes

nicolas-dufour added 9 commits September 9, 2022 12:18

Online logging

4b43159

Added detach to grad logging

4ae9de1

Added detach to grad logging

aee720e

Log less

d6d9af3

Memory leak debug: Only model

494a34f

Memory leak debug: Only model

05c51ff

Memory leak debug: Only model+actor

bf10c62

Reverted to default

bad2fb8

Changed to _new_

8d9dc8f

vmoens and others added 18 commits October 14, 2022 10:25

better tests and consistency

4876462

Added modules test

073483a

Added modules test

0b325c5

Fixed test

3bbf880

Reduced tensor sizes

c778c6a

linting

ef5bce9

Merge branch 'main' into dreamer_model

66d6406

revert data collection

dfacf11

nit

c62ab85

Merge branch 'main' into dreamer_model

53aaf7c

Merge branch 'main' into dreamer_model

c552ba0

# Conflicts: # torchrl/objectives/costs/__init__.py # torchrl/objectives/dreamer.py

merge fix

eea03b3

amend

c515522

amend

10fb0e9

amend

f6ab327

lint

270c08a

Merge branch 'main' into dreamer_model

373389e

lint

6632e3c

vmoens added the new algo New algorithm request or PR label Oct 19, 2022

vmoens added 5 commits October 19, 2022 14:52

Merge branch 'main' into dreamer_model

cb2e440

Merge branch 'main' into dreamer_model

1eaaf7e

amend

b8e78d2

amend

8c31646

amend

fba6171

vmoens approved these changes Oct 20, 2022

View reviewed changes

vmoens added 3 commits October 20, 2022 08:56

Merge branch 'main' into dreamer_model

f6bf6be

amend

d1abf7f

device bf

99c8eac

vmoens approved these changes Oct 20, 2022

View reviewed changes

vmoens merged commit e1fbf86 into pytorch:main Oct 20, 2022

		@@ -0,0 +1,181 @@
		# Copyright (c) Meta Platforms, Inc. and affiliates.


		scaler.update()

		with torch.no_grad(), set_exploration_mode("mode"):

[Feature]: Dreamer support #341

[Feature]: Dreamer support #341

Conversation

nicolas-dufour commented Aug 5, 2022 • edited Loading

Description

Implemented objects

Usecase of model_based_env

Motivation and Context

Types of changes

Checklist

vmoens left a comment

Choose a reason for hiding this comment

vmoens left a comment

Choose a reason for hiding this comment

vmoens left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nicolas-dufour Aug 18, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nicolas-dufour Aug 22, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vmoens Aug 20, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vmoens left a comment

Choose a reason for hiding this comment

vmoens left a comment

Choose a reason for hiding this comment

nicolas-dufour commented Aug 5, 2022 •

edited

Loading

nicolas-dufour Aug 18, 2022 •

edited

Loading

nicolas-dufour Aug 22, 2022 •

edited

Loading

vmoens Aug 20, 2022 •

edited

Loading