25 Apr 20:14

New Features:

Better video rendering
- [Feature] A PixelRenderTransform by @vmoens in #2099
- [Feature] Video recording in SOTA examples by @vmoens in #2070
- [Feature] VideoRecorder for datasets and replay buffers by @vmoens in #2069
Replay buffer: sampling trajectories is now much easier, cleaner and faster
- [Benchmark] Benchmark slice sampler by @vmoens in #1992
- [Feature] Add PrioritizedSliceSampler by @Cadene in #1875
- [Feature] Span slice indices on the left and on the right by @vmoens in #2107
- [Feature] batched trajectories - SliceSampler compatibility by @vmoens in #1775
- [Performance] Faster slice sampler by @vmoens in #2031
Datasets: allow preprocessing datasets after download
- [Feature] Preproc for datasets by @vmoens in #1989
Losses: reduction parameters and non-functional execution
- [Feature] Add reduction parameter to On-Policy losses. by @albertbou92 in #1890
- [Feature] Adds value clipping in ClipPPOLoss loss by @albertbou92 in #2005
- [Feature] Offline objectives reduction parameter by @albertbou92 in #1984
Environment API: support "fork" start method in ParallelEnv, better handling of auto-resetting envs.
- [Feature] Use non-default mp start method in ParallelEnv by @vmoens in #1966
- [Feature] Auto-resetting envs by @vmoens in #2073
Transforms
- [Feature] Allow any callable to be used as transform by @vmoens in #2027
- [Feature] invert transforms appended to a RB by @vmoens in #2111
- [Feature] Extend TensorDictPrimer default_value options by @albertbou92 in #2071
- [Feature] Fine grained DeviceCastTransform by @vmoens in #2041
- [Feature] BatchSizeTransform by @vmoens in #2030
- [Feature] Allow non-sorted keys in CatFrames by @vmoens in #1913
- [Feature] env.append_transform by @vmoens in #2040
New environment and improvements:
- [Environment] Meltingpot by @matteobettini in #2054
- [Feature] Return depth from RoboHiveEnv by @sriramsk1999 in #2058
- [Feature] PettingZoo possibility to choose reset strategy by @matteobettini in #2048

Other features

[Feature] Add time_dim arg in value modules by @vmoens in #1946
[Feature] Batched actions wrapper by @vmoens in #2018
[Feature] Better repr of RBs by @vmoens in #1991
[Feature] Execute rollouts with regular nn.Module instances by @vmoens in #1947
[Feature] Logger by @vmoens in #1858
[Feature] Passing lists of keyword arguments in reset for batched envs by @vmoens in #2076
[Feature] RB MultiStep transform by @vmoens in #2008
[Feature] Replace RewardClipping with SignTransform in Atari examples by @albertbou92 in #1870
[Feature] reset_parameters for multiagent nets by @matteobettini in #1970
[Feature] optionally set truncated = True at the end of rollouts by @vmoens in #2042

Miscellaneous

Fix onw typo by @kit1980 in #1917
Rename SOTA-IMPLEMENTATIONS.md to README.md by @matteobettini in #2093
Revert "[BugFix] Fix Isaac" by @vmoens in #2118
Update getting-started-5.py by @vmoens in #1894
[BugFix, Performance] Fewer imports at root by @vmoens in #1930
[BugFix,CI] Fix Windows CI by @vmoens in #1983
[BugFix,CI] Fix sporadically failing tests in CI by @vmoens in #2098
[BugFix,Refactor] Dreamer refactor by @BY571 in #1918
[BugFix] Adaptable non-blocking for mps and non cuda device in batched-envs by @vmoens in #1900
[BugFix] Call contiguous on rollout results in TestMultiStepTransform by @vmoens in #2025
[BugFix] Dedicated tests for on policy losses reduction parameter by @albertbou92 in #1974
[BugFix] Extend with a list of tensordicts by @vmoens in #2032
[BugFix] Fix Atari DQN ensembling by @vmoens in #1981
[BugFix] Fix CQL/IQL pbar update by @vmoens in #2020
[BugFix] Fix Exclude / Double2Float transforms by @vmoens in #2101
[BugFix] Fix Isaac by @vmoens in #2072
[BugFix] Fix KLPENPPOLoss KL computation by @vmoens in #1922
[BugFix] Fix MPS sync in device transform by @vmoens in #2061
[BugFix] Fix OOB TruncatedNormal LP by @vmoens in #1924
[BugFix] Fix R2Go once more by @vmoens in #2089
[BugFix] Fix Ray collector example error by @albertbou92 in #1908
[BugFix] Fix Ray collector on Python > 3.8 by @albertbou92 in #2015
[BugFix] Fix RoboHiveEnv tests by @sriramsk1999 in #2062
[BugFix] Fix _reset data passing in parallel env by @vmoens in #1880
[BugFix] Fix a bug in SliceSampler, indexes outside sampler lengths were produced by @vladisai in #1874
[BugFix] Fix args/kwargs passing in advantages by @vmoens in #2001
[BugFix] Fix batch-size expansion in functionalization by @vmoens in #1959
[BugFix] Fix broken gym tests by @vmoens in #1980
[BugFix] Fix clip_fraction in PO losses by @vmoens in #2021
[BugFix] Fix colab in tutos by @vmoens in #2113
[BugFix] Fix env.shape regex matches by @vmoens in #1940
[BugFix] Fix examples by @vmoens in #1945
[BugFix] Fix exploration in losses by @vmoens in #1898
[BugFix] Fix flaky rb tests by @vmoens in #1901
[BugFix] Fix habitat by @vmoens in #1941
[BugFix] Fix jumanji by @vmoens in #2064
[BugFix] Fix load_state_dict and is_empty td bugfix impact by @vmoens in #1869
[BugFix] Fix mp_start_method for ParallelEnv with single_for_serial by @vmoens in #2007
[BugFix] Fix multiple context syntax in multiagent examples by @matteobettini in #1943
[BugFix] Fix offline CatFrames by @vmoens in #1953
[BugFix] Fix offline CatFrames for pixels by @vmoens in #1964
[BugFix] Fix prints of size error when no file is associated with memmap by @vmoens in #2090
[BugFix] Fix replay buffer extension with lists by @vmoens in #1937
[BugFix] Fix reward2go for nd tensors by @vmoens in #2087
[BugFix] Fix robohive by @vmoens in #2080
[BugFix] Fix sampling without replacement with ndim storages by @vmoens in #1999
[BugFix] Fix slice sampler compatibility with split_trajs and MultiStep by @vmoens in #1961
[BugFix] Fix slicesampler terminated/truncated signaling by @vmoens in #2044
[BugFix] Fix strict-length for spanning trajectories by @vmoens in #1982
[BugFix] Fix strict_length=True in SliceSampler by @vmoens in #2037
[BugFix] Fix unwanted lazy stacks by @vmoens in #2102
[BugFix] Fix update in serial / parallel env by @vmoens in #1866
[BugFix] Fix vmas stacks by @vmoens in #2105
[BugFix] Fixed import for importlib by @DanilBaibak in #1914
[BugFix] Make KL-controllers independent of the model by @vmoens in #1903
[BugFix] Make sure ParallelEnv does not overflow mem when policy requires grad by @vmoens in #1909
[BugFix] More robust _StepMDP and multi-purpose envs by @vmoens in #2038
[BugFix] No grad on collector reset by @matteobettini in #1927
[BugFix] Non exclusive terminated and truncated by @vmoens in #1911
[BugFix] Refactor ...

Contributors

kit1980, vladisai, and 11 other contributors

Assets 15

torchrl-0.4.0-cp310-cp310-macosx_11_0_arm64.whl

1.18 MB 2024-04-25T20:13:12Z
torchrl-0.4.0-cp310-cp310-manylinux1_x86_64.whl

5.67 MB 2024-04-25T20:13:12Z
torchrl-0.4.0-cp310-cp310-win_amd64.whl

874 KB 2024-04-25T20:13:13Z
torchrl-0.4.0-cp311-cp311-macosx_11_0_arm64.whl

1.18 MB 2024-04-25T20:13:13Z
torchrl-0.4.0-cp311-cp311-manylinux1_x86_64.whl

5.68 MB 2024-04-25T20:13:13Z
torchrl-0.4.0-cp311-cp311-win_amd64.whl

872 KB 2024-04-25T20:13:14Z
torchrl-0.4.0-cp312-cp312-macosx_11_0_arm64.whl

1.18 MB 2024-04-25T20:13:15Z
torchrl-0.4.0-cp38-cp38-macosx_11_0_arm64.whl

1.2 MB 2024-04-25T20:13:07Z
torchrl-0.4.0-cp38-cp38-manylinux1_x86_64.whl

5.67 MB 2024-04-25T20:13:08Z
torchrl-0.4.0-cp38-cp38-win_amd64.whl

874 KB 2024-04-25T20:13:09Z
Source code (zip)

2024-05-02T13:17:39Z
Source code (tar.gz)

2024-05-02T13:17:39Z
Loading

01 Mar 22:41

vmoens

v0.3.1

146341b

v0.3.1

This release provides a bunch of bug fixes and speedups.

What's Changed

[BugFix] Fix broken gym tests (#1980)
[BugFix,CI] Fix Windows CI (#1983)
[Minor] Cleanup
[CI] Install stable torch and tensordict for release tests (#1978)
[Refactor] Remove remnant legacy functional calls (#1973)
[Minor] Use the main branch for the M1 build wheels (#1965)
[BugFix] Fixed import for importlib (#1914)
[BugFix] Fix offline CatFrames for pixels (#1964)
[BugFix] Fix offline CatFrames (#1953)
[BugFix] Fix batch-size expansion in functionalization (#1959)
[BugFix] Update iql docstring example (#1950)
[BugFix] Update cql docstring example (#1951)
[BugFix] Fix examples (#1945)
[BugFix] Remove reset on last step of a rollout (#1936)
[BugFix] Vmap randomness for value estimator (#1942)
[BugFix] Fix multiple context syntax in multiagent examples (#1943)
[BugFix] Fix habitat (#1941)
[BugFix] Fix env.shape regex matches (#1940)
[Minor] Add env.shape attribute (#1938)
[BugFix] Fix replay buffer extension with lists (#1937)
[BugFix] No grad on collector reset (#1927)
[BugFix] fix trunc normal device (#1931)
[BugFix, Performance] Fewer imports at root (#1930)
[BugFix] Fix OOB TruncatedNormal LP (#1924)
[BugFix] Fix KLPENPPOLoss KL computation (#1922)
[Doc] Fix onw typo (#1917)
[BugFix] Make sure ParallelEnv does not overflow mem when policy requires grad (#1909)
[BugFix] Non exclusive terminated and truncated (#1911)
[BugFix] Use setdefault in _cache_values (#1910)
[BugFix] Fix Ray collector example error (#1908)
[BugFix] Make KL-controllers independent of the model (#1903)
[Minor] Remove warnings in test_cost (#1902)
[BugFix] Adaptable non-blocking for mps and non cuda device in batched-envs (#1900)
[BugFix] Fix flaky rb tests (#1901)
[BugFix] Fix exploration in losses (#1898)
[BugFix] Solve recursion issue in losses hook (#1897)
[Doc] Update getting-started-5.py (#1894)
[Doc] Getting started tutos (#1886)
[BugFix] Use traj_terminated in SliceSampler (#1884)
[Doc] Improve PrioritizedSampler doc and get rid of np dependency as much as possible (#1881)
[BugFix] Fix _reset data passing in parallel env (#1880)
[BugFix] state typo in RNG control module (#1878)
[BugFix] Fix a bug in SliceSampler, indexes outside sampler lengths were produced (#1874)
[BugFix] check_env_specs seeding logic (#1872)
[BugFix] Fix update in serial / parallel env (#1866)
[Doc] Installation instructions in API ref (#1871)
[BugFix] better device consistency in EGreedy (#1867)
[BugFix] Fix load_state_dict and is_empty td bugfix impact (#1869)
[Doc] Fix tutos (#1863)

Full Changelog: v0.3.0...v0.3.1

Assets 19

31 Jan 21:40

vmoens

v0.3.0

520e65f

v0.3.0: Data hub, universal env converter and more!

In this release, we focused on building a Data Hub for offline RL, providing a universal 2gym conversion tool (#1795) and improving the doc.

TorchRL Data Hub

TorchRL now offers many offline datasets in robotics and control or gaming, all under a single data format (TED for TorchRL Episode Data Format). All datasets are one step away of being downloaded: dataset = <Name>ExperienceReplay(dataset_id, root="/path/to/storage", download=True) is all you need to get started.
This means that you can now download OpenX #1751 or Roboset #1743 datasets and combine them in a single replay buffer #1768 or swap one another in no time and with no extra code.
We allow many new sampling techniques, like sampling slices of trajectories with or without repetition etc.
As always you can append your favourite transform to these transforms.

TorchRL2Gym universal converter

#1795 introduces a new universal converter for simulation libraries to gym.
As RL practitioner, it's sometimes difficult to accommodate for the many different environment APIs that exist. TorchRL now provides a way of registering any env in gym(nasium). This allows users to build their dataset in torchrl and integrate them in their code base with no effort if they are already using gym as a backend. It also allows to transform DMControl or Brax envs (among others) to gym without the need for an extra library.

PPO and A2C compatibility with distributed models

Functional calls can now be turned off for PPO and A2C loss modules, allowing users to run RLHF training loops at scale! #1804

## TensorDict-free replay buffers

You can now use TorchRL's replay buffer with ANY tensor-based structure, whether it involves dict, tuples or lists. In principle, storing data contiguously on disk given any gym environment is as simple as

rb = ReplayBuffer(storage=LazyMemmapStorage(capacity))
obs_, reward, terminal, truncated, info = env.step(action)
rb.add((obs, obs_, reward, terminal, truncated, info, action))

# sampling a tuple obs, reward, terminal, truncated, info
obs, obs_, reward, terminal, truncated, info = rb.sample()

This is independent of TensorDict and it supports many components of our replay buffers as well as transforms. Check the doc here.

## Multiprocessed replay buffers

TorchRL's replay buffers can now be shared across processes. Multiprocessed RBs can not only be read from but also extended on different workers. #1724

SOTA checks

We introduce a list of scripts to check that our training scripts work ok before each release: #1822

Throughput of Gym and DMControl

We removed loads of checks in GymLikeEnv if some basic conditions are met, which improves the throughput significantly for simple envs. #1803

## Algorithms

We introduce discrete CQL #1666 , discrete IQL #1793 and Impala #1506.

What's Changed: PR description

[BugFix] Fix incorrect deprecation warning by @mikemykhaylov in #1655
[Bug] TensorDictMaxValueWriter raises error when no sample in a batch is accepted by @albertbou92 in #1664
[BugFix] Fix "done" instead of "terminated" mistakes by @MarCnu in #1661
[Feature] CatFrames constant padding by @albertbou92 in #1663
doc(README): remove typo by @Deep145757 in #1665
[Docs] Update README.md by @vaibhav-009 in #1667
[Minor] Update dreamer example tests by @vmoens in #1668
[Feature] Introduce grouping in VMAS by @matteobettini in #1658
[BugFix] assertion error message, envs/util.py by @laszloKopits in #1669
[Doc] Set action_spec instead of input_spec by @FrankTianTT in #1657
[BugFix] Fix submitit IP address/node name retrieval by @vmoens in #1672
[Doc] Document (and test) compound actor by @vmoens in #1673
[Doc] Update rollout_recurrent.png to account for terminal by @vmoens in #1677
[Doc] Add EGreedyWrapper back in the doc by @vmoens in #1680
[Doc] Fix TanhDelta docstring by @matteobettini in #1683
[Doc] Add discord badge on README by @vmoens in #1686
[CI] Downgrade RAY to fix CI by @vmoens in #1687
[BugFix] MaxValueWriter cuda compatibility by @albertbou92 in #1689
Upload docs for preview on HUD by @DanilBaibak in #1682
[Doc] Update pendulum and rnn tutos by @vmoens in #1691
[Algorithm] Discrete CQL by @BY571 in #1666
[BugFix] Minor fix in the logging of PPO and A2C examples by @albertbou92 in #1693
[CI] Enable retry mechanism by @DanilBaibak in #1681
[Refactor] Minor changes in prep of pytorch/tensordict#541 by @vmoens in #1696
[BugFix] fix dreamer actor by @FrankTianTT in #1697
[Refactor] Deprecate direct usage of memmap tensors by @vmoens in #1684
Revert "[Refactor] Deprecate direct usage of memmap tensors" by @vmoens in #1698
[Refactor] Deprecate direct usage of memmap tensors by @vmoens in #1699
[Doc] Fix discord link by @vmoens in #1701
[BugFix] make sure the params of exploration-wrapper is float by @FrankTianTT in #1700
[Fix] EndOfLifeTransform fix in end of life detection by @albertbou92 in #1705
[CI] Fix benchmark on gpu by @vmoens in #1706
[Algorithm] IMPALA and VTrace module by @albertbou92 in #1506
[Doc] Fix discord link by @vmoens in #1712
[Refactor] Refactor functional calls in losses by @vmoens in #1707
[CI] Fix CI by @vmoens in #1711
[BugFix] Make casting to 'meta' device uniform across cost modules by @vmoens in #1715
[BugFix] Change ppo mujoco example to match paper results by @albertbou92 in #1714
[Minor] Hide params in ddpg actor-critic by @vmoens in #1716
[BugFix] Fix hold_out_net by @vmoens in #1719
[BugFix] RewardSum key check by @matteobettini in #1718
[Feature] Allow usage of a different device on main and sub-envs in ParallelEnv and SerialEnv by @vmoens in #1626
[Refactor] Better weight update in collectors by @vmoens in #1723
[Feature] Shared replay buffers by @vmoens in #1724
[CI] FIx nightly builds on osx by @vmoens in #1726
[BugFix] _call_actor_net does not handle multiple inputs by @albertbou92 in #1728
[Feature] Python-based RNN Modules by @albertbou92 in #1720
[BugFix, Test] Fix flaky gym vecenvs tests by @vmoens in #1727
[BugFix] Fix non-full TensorStorage indexing by @vmoens in #1730
[Feature] Minari datasets by @vmoens in #1721
[Feature] All VMAS scenarios available by @matteobettini in #1731
[Feature] pickle-free RB checkpointing by @vmoens in #1733
[CI] Fix doc upload by @vmoens in #1738
[BugFix] Fix RNNs trajectory split in VMAP calls by @vmoens in #1736
[CI] Fix doc upload by @vmoens in #1739
[BugFix, Feature] Fix DDQN implementation by @vmoens in #1737
[Algorithm] Update DQN example by @albertbou92 in #1512
[BugFix] Use rsync in doc workflow by @vmoens in #1741
[BugFix] Fix compat with new memmap API by @vmoens in #1744
[Feature] Roboset datasets by @vmoens in #1743
[Algorithm] Simpler IQL example by @BY571 in #998
[Performance] Faster RNNs by @vmoens in #1732
[BugFix, Test] Fix torch.vmap call in RNN tests by @vmoens in #1749
[BugFix] Fix discrete SAC log-prob by @vmoens in #1750
[Minor] Remove dead code in RolloutFromModel by @ianbarber in #1752
[Minor] Fix runnability of RLHF example in examples/rlhf by @ianbarber in #1753
[Feature] SliceSampler by @vmoens in #1748
[CI] Fix windows CI by @vmoens in #1746
[CI] Fix CI for optional dependencies by @vmoens in https://github.com/pytorch/rl/pull...

Contributors

ianbarber, DanilBaibak, and 13 other contributors

Assets 19

25 Oct 17:24

vmoens

v0.2.1

1bb192e

v0.2.1: Faster parallel envs, fixes in transforms and M1 wheel fix

What's Changed

[Feature] Warning for init_random_frames rounding in collectors by @matteobettini in #1616
[Feature] Add support of non-pickable gym env by @duburcqa in #1615
[BugFix] Add keys to GAE in PPO/A2C by @vmoens in #1618
[BugFix] Fix gym benchmark by @vmoens in #1619
[BugFix] Fix shape setting in CompositeSpec by @vmoens in #1620
[Deprecation] Deprecate ambiguous device for memmap replay buffer by @vmoens in #1624
[CI] Fix CI (python and cuda versions) by @vmoens in #1621
[Feature] Max Value Writer by @albertbou92 in #1622
[CI] Cython<3 for d4rl by @vmoens in #1634
[BugFix] make cursor a torch.long tensor by @vmoens in #1639
[BugFix] Gracefully handle C++ import error in TorchRL by @vmoens in #1640
[Feature] step_and_maybe_reset in env by @vmoens in #1611
[BugFix] Avoid overlapping temporary dirs during training by @vmoens in #1635
[Feature] Exclude all private keys in collectors by @vmoens in #1644
[BugFix] Fix tutos by @vmoens in #1648
[Feature] Lazy imports for implement_for during torchrl import by @vmoens in #1646
[Refactor] Put all buffers on CPU in examples by @vmoens in #1645
[BugFix] Fix storage device by @vmoens in #1650
[BugFix] Fix EXAMPLES.md by @vmoens in #1649
[Release] 0.2.1 by @vmoens in #1642

New Contributors

@duburcqa made their first contribution in #1615

Full Changelog: v0.2.0...v0.2.1

Contributors

duburcqa, albertbou92, and 2 other contributors

Assets 18

05 Oct 16:45

vmoens

v0.2.0

bf264e0

0.2.0: Faster collection, MARL compatibility and RLHF prototype

TorchRL 0.2.0

This release provides many new features and bug fixes.

TorchRL now publishes Apple Silicon compatible wheels.
We drop coverage of python 3.7 in favour of 3.11.

New and updated algorithms

Most algorithms have been cleaned and designed to reach (at least) SOTA results.

Compatibility with MARL settings has been drastically improved, and we provide a good amount of MARL examples within the library:

A prototype RLHF training script is also proposed (#1597)

A whole new category of offline RL algorithms have been integrated: Decision transformers.

[Algorithm] Update offpolicy examples by @BY571 in #1206
[Algorithm] Online Decision transformer by @BY571 in #1149
[Algorithm] QMixer loss and multiagent models by @matteobettini in #1378
[Algorithm] RLHF end-to-end, clean by @vmoens in #1597
[Algorithm] Update A2C examples by @albertbou92 in #1521
[Algorithm] Update DDPG Example by @BY571 in #1525
[Algorithm] Update DT by @BY571 in #1560
[Algorithm] Update PPO examples by @albertbou92 in #1495
[Algorithm] Update SAC Example by @BY571 in #1524
[Algorithm] Update TD3 Example by @BY571 in #1523

New features

One of the major new features of the library is the introduction of the terminated / truncated / done distinction at no cost within the library. All third-party and primary environments are now compatible with this, as well as losses and data collection primitives (collector etc). This feature is also compatible with complex data structures, such as those found in MARL training pipelines.

All losses are now compatible with tensordict-free inputs, for a more generic deployment.

New transforms

Atari games can now benefit from a EndOfLifeTransform that allows to use the end-of-life as a done state in the loss (#1605)

We provide a KL transform to add a KL factor to the reward in RLHF settings.

Action masking is made possible through the ActionMask transform (#1421)

VC1 is also integrated for better image embedding.

[Feature] Allow sequential transforms to work offline by @vmoens in #1136
[Feature] ClipTransform + rename min/maximum -> low/high by @vmoens in #1500
[Feature] End-of-life transform by @vmoens in #1605
[Feature] KL Transform for RLHF by @vmoens in #1196
[Features] Conv3dNet and PermuteTransform by @xmaples in #1398
[Feature, Refactor] Scale in ToTensorImage based on the dtype and new from_int parameter by @hyerra in #1208
[Feature] CatFrames used as inverse by @BY571 in #1321
[Feature] Masking actions by @vmoens in #1421
[Feature] VC1 integration by @vmoens in #1211

New models

We provide GRU alongside LSTM for POMDP training.

MARL model coverage is now richer of a MultiAgentMLP and MultiAgentCNN! Other improvments for MARL include coverage for nested keys in most places of the library (losses, data collection, environments...)/

[Feature] Support for GRU by @vmoens in #1586
[Feature] TanhModule by @vmoens in #1213
[Features] Conv3dNet and PermuteTransform by @xmaples in #1398
[Feature] CNN version of MultiAgentMLP by @MarkHaoxiang in #1479

Other features (misc)

[Feature] RLHF Rollouts (reopened) by @vmoens in #1329
[Feature] Add CQL by @BY571 in #1239
[Feature] Allow multiple (nested) action, reward, done keys in env,vec_env and collectors by @matteobettini in #1462
[Feature] Auto-DoubleToFloat by @vmoens in #1442
[Feature] CompositeSpec.lock by @vmoens in #1143
[Feature] Device transform by @vmoens in #1472
[Feature] Dispatch DiscreteSAC loss module by @Blonck in #1248
[Feature] Dispatch PPO loss module by @Blonck in #1249
[Feature] Dispatch REDQ loss module by @Blonck in #1251
[Feature] Dispatch SAC loss module by @Blonck in #1244
[Feature] Dispatch TD3 loss module by @Blonck in #1254
[Feature] Dispatch for DDPG loss module by @Blonck in #1215
[Feature] Dispatch for SAC loss module by @Blonck in #1223
[Feature] Dispatch reinforce loss module by @Blonck in #1252
[Feature] Distpatch IQL loss module by @Blonck in #1230
[Feature] Fix DType casting lazy init by @vmoens in #1589
[Feature] Heterogeneous Environments compatibility by @matteobettini in #1411
[Feature] Log hparams from python dict by @matteobettini in #1517
[Feature] MARL exploration e-greedy compatibility by @matteobettini in #1277
[Feature] Make advantages compatible with Terminated, Truncated, Done by @vmoens in #1581
[Feature] Make losses inherit from TDMBase by @vmoens in #1246
[Feature] Making action masks compatible with q value modules and e-greedy by @matteobettini in #1499
[Feature] Nested keys in OrnsteinUhlenbeckProcess by @matteobettini in #1305
[Feature] Optional mapping of "state" in gym specs by @matteobettini in #1431
[Feature] Parallel environments lazy heterogenous data compatibility by @matteobettini in #1436
[Feature] Pettingzoo: add multiagent dimension to single agent groups by @matteobettini in #1550
[Feature] RLHF Reward Model (reopened) by @vmoens in #1328
[Feature] RLHF dataloading by @vmoens in #1309
[Feature] RLHF networks by @apbard in #1319
[Feature] Refactor categorical dists: Masked one-hot and pass-through gradients by @vmoens in #1488
[Feature] ReplayBuffer.empty by @vmoens in #1238
[Feature] Separate losses by @MateuszGuzek in #1240
[Feature] Single call to value network in advantages [bis] by @vmoens in #1263
[Feature] Single call to value network in advantages by @vmoens in #1256
[Feature] TensorStorage by @vmoens in #1310
[Feature] Threaded collection and parallel envs by @vmoens in #1559
[Feature] Unbind specs by @vmoens in #1555
[Feature] VMAS obs dict by @matteobettini in #1419
[Feature] VMAS: choose between categorical or one-hot actions by @matteobettini in #1484
[Feature] dispatch for DQNLoss by @vmoens in #1194
[Feature] log histograms by @vmoens in #1306
[Feature] make csv logger exist_ok on logging folder by @matteobettini in #1561
[Feature] shifted for all adv by @vmoens in #1276

New environments and third-party improvements

We now cover SMAC-v2, PettingZoo, IsaacGymEnvs (prototype) and RoboHive. The D4RL dataset can now be used without the eponym library, which permit training with more recent or older versions of gym.

[Environment, Docs] SMACv2 and docs on action masking by @matteobettini in #1466
[Environment] Petting zoo by @matteobettini in #1471
[Feature] D4rl direct download by @MateuszGuzek in #1430
[Feature] Gym 'vectorized' envs compatibility by @vmoens in #1519
[Feature] Gym compatibility: Terminal and truncated by @vmoens in #1539
[Feature] IsaacGymEnvs integration by @vmoens in #1443
[Feature] RoboHive integration by @vmoens in #1119

Performance improvements

We provide several speed improvements, in particular for data collection.

[Performance] Accelerate GAE by @Blonck in #1142
[Performance] Accelerate TD lambda return estimate by @Blonck in #1158
[Performance] Accelerate _split_and_pad_sequence by @Blonck in https://github.com/pytorch/rl/pu...

Contributors

kit1980, smorad, and 19 other contributors

Assets 18

06 May 21:34

vmoens

v0.1.1

6d030c9

v0.1.1

What's Changed

[Feature] Stacking specs by @vmoens in #892
[Feature] Multicollector interruptor by @albertbou92 in #963
[BugFix] VMAS api fix by @matteobettini in #978
[CI] Fix D4RL tests in CI by @vmoens in #976
[CI] Fix CI by @vmoens in #982
[Refactor] Binary spec inherits from discrete spec by @matteobettini in #984
[Feature] _DataCollector -> DataCollectorBase by @vmoens in #985
[Feature] Discrete SAC by @BY571 in #882
[Refactor, Doc] Refactor refs to SafeModule to TensorDictModule unless necessary by @vmoens in #986
[BugFix] Quickfix by @vmoens in #991
[Feature] Add Dropout to MLP module by @BY571 in #988
[Feature] Warn when collectors collect more frames than requested by @matteobettini in #989
[BugFix] make "_reset", "step_count", and other done_based keys follow done_spec by @matteobettini in #981
[Feature] Bandit datasets by @vmoens in #912
[BugFix] Fix sampling in PPO tutorial by @vmoens in #996
[Refactor] Refactor losses (value function, doc, input batch size) by @vmoens in #987
[BugFix,Feature,Doc] Fix replay buffers sampling info, docstrings and iteration by @vmoens in #1003
[Feature] Replace ValueError by warning in collectors when total_frames is not an exact multiple of frames_per_batch by @albertbou92 in #999
[BugFix] Only call replay buffer transforms when there are by @vmoens in #1008
[BugFix] Patch tests in 1008 by @vmoens in #1009
[Feature] Multidim value functions by @vmoens in #1007
[BugFix] Fix exploration (OU and Gaussian) by @vmoens in #1006
[CI] Fix python version in habitat by @vmoens in #1010
Advantages pass time_dimand docfix by @matteobettini in #1014
[Refactor] Faster transformed distributions by @vmoens in #1017
[WIP, CI] Upgrade cuda channel by @vmoens in #1019
[BugFix] Fix collector reset with truncation by @vmoens in #1021
[Refactor] Improve collector performance by @matteobettini in #1020
[BugFix] Fix params and buffer casting for policies by @vmoens in #1022
[Feature] PPO allow entropy logging when entropy_coeff is 0 by @matteobettini in #1025
[Feature] Distributed data collector (ray) by @albertbou92 in #930
[Refactor] Minor changes in tensordict construction by @vmoens in #1029
[CI] Fix Brax 0.9.0 by @vmoens in #1011
[Feature] Multiagent API in vmas by @matteobettini in #983
[Feature] Benchmarking worflow by @vmoens in #1028
[Benchmark] Fix adv benchmark by @vmoens in #1030
[Doc] Refactor DDPG and DQN tutos to narrow the scope by @vmoens in #979
Revert "[Doc] Refactor DDPG and DQN tutos to narrow the scope" by @vmoens in #1032
[BugFix] Advantage normalisation in ClipPPOLoss is done after computing gain1 by @albertbou92 in #1033
[BugFix] Codecov SHA error by @vmoens in #1035
[Doc] DDPG and DQN refactoring -- Doc cleaning by @vmoens in #1036
[BugFix,CI] Fix macos codecov install by @vmoens in #1039
[BugFix] kwargs update in distributed collectors by @vmoens in #1040
[Feature] make_composite_from_td by @vmoens in #1042
[Refactor] Import envpool locally to avoid importing gym at root level by @vmoens in #1041
[Minor] Fix a typo by @FrankTianTT in #1046
[BugFix] Fix param tying in loss modules by @vmoens in #1037
[Refactor] less ad-hoc disable_env_checker check by @vmoens in #1047
[Refactor] Improve distributed collectors by @vmoens in #1044
[Doc] Document tensordict modules by @vmoens in #1053
[Doc] Minor changes to contributing.md by @vmoens in #1054
[Doc] A bit more doc on modules by @vmoens in #1056
[Refactor] Import enum and interaction_type utils by @Goldspear in #1055
[Feature] Deduplicate calls to common layers in PPO by @vmoens in #1057
[BugFix] CompositeSpec nested key deletion by @btx0424 in #1059
[Feature] Add MaskedCategorical distribution by @xiaomengy in #1012
[Refactor] resetting envs in collectors always passes the _reset entry by @vmoens in #1061
[Refactor] Better integration of QValue tools by @vmoens in #1063
MUJOCO_INSTALLATION.md: Fix typo by @traversaro in #1064
[Refactor] Removes "reward" from root tensordicts by @vmoens in #1065
[Test] Fix tests for older pytorch versions by @vmoens in #1066
[Feature] Reward2go Transform by @BY571 in #1038
[CI] Reduce tests by @vmoens in #1071
[Feature] Skip existing for advantage modules by @vmoens in #1070
[BugFix] Fix parallel env data passing on cuda by @vmoens in #1024
[Refactor] Deprecate interaction_mode by @vmoens in #1067
[Doc] Update KB: cannot find -lGL by @vmoens in #1073
[Doc] fix figures display issues in documentation of actors.py by @DamienAllonsius in #1074
[Example] PPO simplified example by @albertbou92 in #1004
[Feature] Update td in step (not overwrite) by @vmoens in #1075
[CI] Remove migrated CircleCI macOS jobs by @seemethere in #1069
[Feature] Target Return Transform by @BY571 in #1045
[Test] Fix tensorboard tests with ImageIO 2.26 by @vmoens in #1083
[Feature] LSTMModule by @vmoens in #1084
[BugFix] Change default of skip_existing to None by @tcbegley in #1082
[Example] A2C simplified example by @albertbou92 in #1076
[BugFix] Fix output_spec transform calls by @vmoens in #1091
[Feature] Indexing Discrete and OneHot specs by @remidomingues in #1081
[Refactor] Refactor DQN by @vmoens in #1085
[Feature] Auto-init updaters and raise a warning if not present by @vmoens in #1092
[BugFix] Remove false warnings in losses by @vmoens in #1096
[CI, BugFix] Fix CI warnings and errors by @vmoens in #1100
[Refactor] Update vmap imports to torch by @vmoens in #1102
[Refactor] Make advantages non-differentiable by default (except in losses) by @vmoens in #1104
[Feature] Indexing specs by @remidomingues in #1105
[BugFix] Fix EnvPoool by @vmoens in #1106
[Feature,Doc] QValue refactoring and QNet + RNN tuto by @vmoens in #1060
[BugFix] Fix Gym imports by @vmoens in #1023
[CI] pytest should not skip tests for dependencies by @rohitnig in #1048
[BugFix, Doc] Fix tutos by @vmoens in #1107
[CI] Fix tutos (2) by @vmoens in #1109
[Doc] Fix doc rendering by @vmoens in #1112
Added the entry for skip-tests in the environment.yml by @rohitnig in #1113
[CI] Upgrade ubuntu version in GHA by @vmoens in #1116
Fix in windows unit test by @mischab in #1099
Revert "Fix in windows unit test" by @mischab in #1117
[Nova] Lint job on GHA by @osalpekar in #1114
[Nova] Remove CircleCI Wheels Builds by @osalpekar in #1121
[BugFix] Set exploration mode to MODE in all losses by default by @vmoens in #1123
[BugFix] Instruct the v...

Contributors

seemethere, traversaro, and 16 other contributors

Assets 2

16 Mar 20:31

vmoens

v0.1.0

a32d2e7

v0.1.0 - Beta

First official beta release of the library!

What's Changed

QuickFix Versioning by @fedebotu in #958
Version 0.0.5 by @vmoens in #957
[Minor] Warning when loading memmap storage on uninitialized td by @vmoens in #961
[Refactor] Defaults split_trajs to False by @vmoens in #947
[Feature] InitTracker transform by @vmoens in #962
[Feature] RenameTransform by @vmoens in #964
[Feature] Implicit Q-Learning (IQL) by @BY571 in #933
[Refactor] Refactor data collectors constructors by @vmoens in #970
[Feature, Refactor] Iterable replay buffers by @vmoens in #968
[Doc] README rewrite by @vmoens in #971
[Refactor] A less verbose torchrl by @vmoens in #973
[Feature] torch.distributed collectors by @vmoens in #934
[Feature] Offline datasets: D4RL by @vmoens in #928

Full Changelog: v0.0.5...v0.1.0

Contributors

vmoens, BY571, and fedebotu

Assets 2

08 Mar 20:58

vmoens

v0.0.5

48eca98

0.0.5

We change the env.step API, see #941 for more info.

What's Changed

[BugFix] Fix dreamer training loop by @vmoens in #915
[Doc] PPO Tutorial by @vmoens in #913
[Doc] Create your pendulum tutorial by @vmoens in #911
[BugFix] Deploy doc by @vmoens in #920
[BugFix] Nvidia not found fix by @vmoens in #922
[Feature] Rework to_one_hot and to_categorical to take a tensor as parameter by @riiswa in #816
[Doc] Tutorial revamp by @vmoens in #926
[BugFix] Fix EnvPool spec shapes by @vmoens in #932
[BugFix] Fix CompositeSpec.to_numpy method by @riiswa in #931
[CI] Do not run nightly workflows on forked repos by @XuehaiPan in #936
[Refactor] set_default -> setdefault by @tcbegley in #935
[BugFix] Step and maybe reset by @vmoens in #938
[Doc] Minor doc improvements by @vmoens in #907
[Doc] Add debug doc by @acohen13 in #940
[BugFix] Propagate args to TransformedEnv's state_dict by @fedebotu in #944
[BugFix] Vmas expanded specs by @matteobettini in #942
[Quality] RB constuctors cleanup by @vmoens in #945
[Doc] Refactor KB by @vmoens in #946
[BugFix] Upgrade vision's functional import by @vmoens in #948
[BugFix] Deprecate tensordict.set check skips in transforms by @vmoens in #951
[BugFix] Upgrade tensordict deps by @vmoens in #953
[CI] Fix windows CI by @vmoens in #954
[Refactor] Refactor composite spec keys to match tensordict by @vmoens in #956
[Refactor] Refactor the step to include reward and done in the 'next' tensordict by @vmoens in #941

New Contributors

@XuehaiPan made their first contribution in #936
@acohen13 made their first contribution in #940
@fedebotu made their first contribution in #944

Full Changelog: v0.0.4...v0.0.5

Contributors

tcbegley, XuehaiPan, and 5 other contributors

Assets 2

11 Feb 10:28

vmoens

v0.0.4b

eec263f

v0.0.4-beta Pre-release

Pre-release

What's Changed

[CI, Doc] Update functorch source installation command by @zou3519 in #446
[BugFix] TransformedEnv attributes inheritance by @vmoens in #467
[Feature] Cleanup mocking envs init and new by @vmoens in #469
[Tests] Adding tensordict __repr__ tests by @sladebot in #435
[Logging]: implement MLFlow logging integration by @rayanht in #432
[BugFix] MLFlow import fix by @vmoens in #473
[BugFix] Fixed pip install by @brandonsj in #475
[Features]: Changed _inplace_update cls parameter passing in __new__ by @nicolas-dufour in #464
[Feature]: ModelBased Envs by @nicolas-dufour in #333
[Feature] make ReplayBufferTrainer compatible with storing trajectories by @vmoens in #476
[Tutorial] DQN tutorial by @vmoens in #474
[Feature] reader hooks for GymLike by @vmoens in #478
[BugFix] TensorSpec.zero(None) failure fix by @vmoens in #483
[Feature]: Support for planners and CEM by @nicolas-dufour in #384
[Feature] Replaced device_safe() with device by @ordinskiy in #485
[Feature]: TensorDictPrimer transform by @nicolas-dufour in #456
[Feature]: erase() method for torchrl.timeit by @nicolas-dufour in #480
[Feature] Added support for single collector in sync_async_collector by @nicolas-dufour in #482
[BugFix] removing unwanted device_safe() by @vmoens in #486
[Refactoring] Refactored get_stats_random_rollout by @nicolas-dufour in #481
[Feature] VIP Integration by @JasonMa2016 in #487
[Refactoring] Minor tweaks to recorder and logger by @nicolas-dufour in #489
[Feature]: Deactivate typechecks in envs by @nicolas-dufour in #490
[BugFix] Vectorized td_lambda with gamma tensor does not match the serial version by @vmoens in #400
[BugFix] Fix TensorDictPrimer init by @vmoens in #491
[Feature] Optional auto-reset when done for collectors and batched envs by @vmoens in #492
[BugFix] Defaulting passing_devices to None by @himjohntang in #477
Revert "[BugFix] Defaulting passing_devices to None" by @vmoens in #494
[BugFix] Multi-agent fixes by @vmoens in #488
[BugFix] Defaulting passing_devices to None by @vmoens in #495
[Feature] Lazy initialization of CatTensors by @vmoens in #497
[Cleanup] Removing cuda 10.2 references by @vmoens in #498
[BugFix] Migration to pytorch org by @vmoens in #499
[Refactoring] Import at root to enable vmap monkey-patching by @vmoens in #500
[BugFix] python version for linting checks by @vmoens in #502
[Feature] Replay Buffers refactor by @bamaxw in #330
[Feature] Rename step_tensordict in step_mdp by @romainjln in #512
[Lint] re-instantiate F821 by @vmoens in #516
[BugFix] run_type_checks for TransformedEnvs by @vmoens in #513
[BugFix] making first_dim and last_dim negative in FlattenObservation when a parent is set by @vmoens in #511
[Feature] Add info dict key-spec pairs to observation_spec by @tcbegley in #504
[BugFix] Changing the dm_control import to fail if not installed by @zeenolife in #515
[CI] Add coverage with codecov by @silvestrebahi in #523
Revert "[CI] Add coverage with codecov" by @vmoens in #525
[Quality] Use relative imports for local c++ deps by @apbard in #526
[Feature] Nightly release by @vmoens in #519
[Feature] Add make_tensordict() function by @sicong-huang in #522
[Doc] Misc readme fixes by @GavinPHR in #532
[BugFix] Replacing inference_mode decorator with no_grad to fix state_dict loading error by @GavinPHR in #530
[BugFix] Transformed ParallelEnv meta data are broken when passing to device by @vmoens in #531
[Doc] Add coverage banner by @vmoens in #533
[BugFix] Fix colab link of coding_dqn.ipynb by @Benjamin-eecs in #543
[BugFix] Fix optional imports by @vmoens in #535
[BugFix] Restore missing keys in data collector output by @tcbegley in #521
[Lint] reorganize imports by @apbard in #545
[BugFix] Single-cpu compatibility by @vmoens in #548
[BugFix] vision install and other deps in optdeps by @vmoens in #552
[Feature] Implemented device argument for modules.models by @yushiyangk in #524
[BugFix] Fix ellipsis indexing of 2d TensorDicts by @vmoens in #559
[BugFix] Additive gaussian exploration spec fix by @vmoens in #560
[BugFix] Disabling video step for wandb by @vmoens in #561
[BugFix] Various device fix by @vmoens in #558
[Feature] Allow collectors to accept regular modules as policies by @tcbegley in #546
[BugFix] Fix push binary nightly action by @psolikov in #566
[BugFix] TensorDict comparison by @vmoens in #567
[BugFix] Fix SyncDataCollector reset by @jrobine in #571
[Doc] Banners on README.md by @vmoens in #572
[Feature] Log printing in alphabetical order when creating a replay buffer by @nikhlrao in #573
[BugFix] Add eps to reward normalization by @vmoens in #574
[BugFix] Fix argument for PPOLoss.get_entropy_bonus() by @vmoens in #578
[Feature] Restructure torchrl/objectives by @sgrigory in #580
[Docs] Documentation revamp by @vmoens in #581
[Doc] Publishing on pytorch.org by @vmoens in #582
Revert "[Doc] Publishing on pytorch.org" by @vmoens in #584
[Doc] Publishing on pytorch.org by @vmoens in #585
Revert "[Doc] Publishing on pytorch.org" by @vmoens in #586
[Doc] Publishing on pytorch.org by @vmoens in #587
[Feature] More restrictive tests on docstrings by @vmoens in #457
[BugFix] Wrong stack import in tests by @vmoens in #590
[Feature] Exclude "_" out_keys in tensordictmodel by @jlesuffleur in #589
[Feature]: Dreamer support by @nicolas-dufour in #341
[Doc] Missing doc for prototype RB by @vmoens in #595
[Feature] Update list of supported libraries by @vmoens in #594
[BugFix] Fix timeit count registration by @vmoens in #598
[Naming] Renaming ProbabilisticTensorDictModule keys by @vmoens in #603
[Feature] Categorical encoding for action space by @artkorenev in #593
[BugFix] ReplayBuffer's storage now signal back when changes happen by @paulomarciano in #614
[Doc] Typos in tensordict tutorial by @PaLeroy in #621
[Doc] Integrate knowledge base in docs by @hatala91 in #622
[Doc] Updating docs requirements by @vmoens in #624
[Feature] Make torchrl runnable without functorch and with gym==0.13 by @vmoens in #386
[Feature] Habitat integration by @vmoens in #514
[Feature] Checkpointing by @vmoens in #549
Add support for null dim argument in TensorDict.squeeze by @jgonik in #608
[Version] Updating to torch 1.13 by @vmoens in #627
[Feature] Sub-memmap tensors by @vmoens in #626
[BugFix] copy_ changes the index if the dest and source memmap tensors share the same file location by @vmoens in #631
[F...

Contributors

kadeng, sladebot, and 50 other contributors

Assets 2

11 Feb 22:15

vmoens

v0.0.4

4a74149

v0.0.4 Pre-release

Pre-release

What's Changed

[CI, Doc] Update functorch source installation command by @zou3519 in #446
[BugFix] TransformedEnv attributes inheritance by @vmoens in #467
[Feature] Cleanup mocking envs init and new by @vmoens in #469
[Tests] Adding tensordict __repr__ tests by @sladebot in #435
[Logging]: implement MLFlow logging integration by @rayanht in #432
[BugFix] MLFlow import fix by @vmoens in #473
[BugFix] Fixed pip install by @brandonsj in #475
[Features]: Changed _inplace_update cls parameter passing in __new__ by @nicolas-dufour in #464
[Feature]: ModelBased Envs by @nicolas-dufour in #333
[Feature] make ReplayBufferTrainer compatible with storing trajectories by @vmoens in #476
[Tutorial] DQN tutorial by @vmoens in #474
[Feature] reader hooks for GymLike by @vmoens in #478
[BugFix] TensorSpec.zero(None) failure fix by @vmoens in #483
[Feature]: Support for planners and CEM by @nicolas-dufour in #384
[Feature] Replaced device_safe() with device by @ordinskiy in #485
[Feature]: TensorDictPrimer transform by @nicolas-dufour in #456
[Feature]: erase() method for torchrl.timeit by @nicolas-dufour in #480
[Feature] Added support for single collector in sync_async_collector by @nicolas-dufour in #482
[BugFix] removing unwanted device_safe() by @vmoens in #486
[Refactoring] Refactored get_stats_random_rollout by @nicolas-dufour in #481
[Feature] VIP Integration by @JasonMa2016 in #487
[Refactoring] Minor tweaks to recorder and logger by @nicolas-dufour in #489
[Feature]: Deactivate typechecks in envs by @nicolas-dufour in #490
[BugFix] Vectorized td_lambda with gamma tensor does not match the serial version by @vmoens in #400
[BugFix] Fix TensorDictPrimer init by @vmoens in #491
[Feature] Optional auto-reset when done for collectors and batched envs by @vmoens in #492
[BugFix] Defaulting passing_devices to None by @himjohntang in #477
Revert "[BugFix] Defaulting passing_devices to None" by @vmoens in #494
[BugFix] Multi-agent fixes by @vmoens in #488
[BugFix] Defaulting passing_devices to None by @vmoens in #495
[Feature] Lazy initialization of CatTensors by @vmoens in #497
[Cleanup] Removing cuda 10.2 references by @vmoens in #498
[BugFix] Migration to pytorch org by @vmoens in #499
[Refactoring] Import at root to enable vmap monkey-patching by @vmoens in #500
[BugFix] python version for linting checks by @vmoens in #502
[Feature] Replay Buffers refactor by @bamaxw in #330
[Feature] Rename step_tensordict in step_mdp by @romainjln in #512
[Lint] re-instantiate F821 by @vmoens in #516
[BugFix] run_type_checks for TransformedEnvs by @vmoens in #513
[BugFix] making first_dim and last_dim negative in FlattenObservation when a parent is set by @vmoens in #511
[Feature] Add info dict key-spec pairs to observation_spec by @tcbegley in #504
[BugFix] Changing the dm_control import to fail if not installed by @zeenolife in #515
[CI] Add coverage with codecov by @silvestrebahi in #523
Revert "[CI] Add coverage with codecov" by @vmoens in #525
[Quality] Use relative imports for local c++ deps by @apbard in #526
[Feature] Nightly release by @vmoens in #519
[Feature] Add make_tensordict() function by @sicong-huang in #522
[Doc] Misc readme fixes by @GavinPHR in #532
[BugFix] Replacing inference_mode decorator with no_grad to fix state_dict loading error by @GavinPHR in #530
[BugFix] Transformed ParallelEnv meta data are broken when passing to device by @vmoens in #531
[Doc] Add coverage banner by @vmoens in #533
[BugFix] Fix colab link of coding_dqn.ipynb by @Benjamin-eecs in #543
[BugFix] Fix optional imports by @vmoens in #535
[BugFix] Restore missing keys in data collector output by @tcbegley in #521
[Lint] reorganize imports by @apbard in #545
[BugFix] Single-cpu compatibility by @vmoens in #548
[BugFix] vision install and other deps in optdeps by @vmoens in #552
[Feature] Implemented device argument for modules.models by @yushiyangk in #524
[BugFix] Fix ellipsis indexing of 2d TensorDicts by @vmoens in #559
[BugFix] Additive gaussian exploration spec fix by @vmoens in #560
[BugFix] Disabling video step for wandb by @vmoens in #561
[BugFix] Various device fix by @vmoens in #558
[Feature] Allow collectors to accept regular modules as policies by @tcbegley in #546
[BugFix] Fix push binary nightly action by @psolikov in #566
[BugFix] TensorDict comparison by @vmoens in #567
[BugFix] Fix SyncDataCollector reset by @jrobine in #571
[Doc] Banners on README.md by @vmoens in #572
[Feature] Log printing in alphabetical order when creating a replay buffer by @nikhlrao in #573
[BugFix] Add eps to reward normalization by @vmoens in #574
[BugFix] Fix argument for PPOLoss.get_entropy_bonus() by @vmoens in #578
[Feature] Restructure torchrl/objectives by @sgrigory in #580
[Docs] Documentation revamp by @vmoens in #581
[Doc] Publishing on pytorch.org by @vmoens in #582
Revert "[Doc] Publishing on pytorch.org" by @vmoens in #584
[Doc] Publishing on pytorch.org by @vmoens in #585
Revert "[Doc] Publishing on pytorch.org" by @vmoens in #586
[Doc] Publishing on pytorch.org by @vmoens in #587
[Feature] More restrictive tests on docstrings by @vmoens in #457
[BugFix] Wrong stack import in tests by @vmoens in #590
[Feature] Exclude "_" out_keys in tensordictmodel by @jlesuffleur in #589
[Feature]: Dreamer support by @nicolas-dufour in #341
[Doc] Missing doc for prototype RB by @vmoens in #595
[Feature] Update list of supported libraries by @vmoens in #594
[BugFix] Fix timeit count registration by @vmoens in #598
[Naming] Renaming ProbabilisticTensorDictModule keys by @vmoens in #603
[Feature] Categorical encoding for action space by @artkorenev in #593
[BugFix] ReplayBuffer's storage now signal back when changes happen by @paulomarciano in #614
[Doc] Typos in tensordict tutorial by @PaLeroy in #621
[Doc] Integrate knowledge base in docs by @hatala91 in #622
[Doc] Updating docs requirements by @vmoens in #624
[Feature] Make torchrl runnable without functorch and with gym==0.13 by @vmoens in #386
[Feature] Habitat integration by @vmoens in #514
[Feature] Checkpointing by @vmoens in #549
Add support for null dim argument in TensorDict.squeeze by @jgonik in #608
[Version] Updating to torch 1.13 by @vmoens in #627
[Feature] Sub-memmap tensors by @vmoens in #626
[BugFix] copy_ changes the index if the dest and source memmap tensors share the same file location by @vmoens in #631
[F...

Contributors

kadeng, sladebot, and 50 other contributors

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New Features:

Other features

Miscellaneous

Contributors

What's Changed

TorchRL Data Hub

TorchRL2Gym universal converter

PPO and A2C compatibility with distributed models

SOTA checks

Throughput of Gym and DMControl

What's Changed: PR description

Contributors

What's Changed

New Contributors

Contributors

TorchRL 0.2.0

New and updated algorithms

New features

New transforms

New models

Other features (misc)

New environments and third-party improvements

Performance improvements

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

New Contributors

Contributors

What's Changed

Contributors

What's Changed

Contributors

Releases: pytorch/rl

v0.4.0

New Features:

Other features

Miscellaneous

Contributors

v0.3.1

What's Changed

v0.3.0: Data hub, universal env converter and more!

TorchRL Data Hub

TorchRL2Gym universal converter

PPO and A2C compatibility with distributed models

SOTA checks

Throughput of Gym and DMControl

What's Changed: PR description

Contributors

v0.2.1: Faster parallel envs, fixes in transforms and M1 wheel fix

What's Changed

New Contributors

Contributors

0.2.0: Faster collection, MARL compatibility and RLHF prototype

TorchRL 0.2.0

New and updated algorithms

New features

New transforms

New models

Other features (misc)

New environments and third-party improvements

Performance improvements

Contributors

v0.1.1

What's Changed

Contributors

v0.1.0 - Beta

What's Changed

Contributors

0.0.5

What's Changed

New Contributors

Contributors

v0.0.4-beta

What's Changed

Contributors

v0.0.4

What's Changed

Contributors