[Feature Request] Simplify the example and tutorial codes #1843

whatdhack · 2024-01-27T21:28:24Z

Motivation

It is hard to follow and understand the example and tutorials. As an example, if I compare the 2 flavors of cartpole PyTorch code, the one from PyTorch pytorch/tutorial is far easier to understand and follow than the one in pytorch/rl.

https://pytorch.org/tutorials/intermediate/reinforcement_q_learning.html
https://github.com/pytorch/rl/blob/main/examples/dqn/dqn_cartpole.py

Solution

A clear and concise example code.

Alternatives

Additional context

Checklist

[ X] I have checked that there is no similar issue in the repo (required)

vmoens · 2024-01-29T08:39:49Z

Hi @whatdhack
Thanks for this suggestion. It's something that has come over and over, we've had many discussions with @matteobettini @albertbou92 @BY571 @btx0424 @smorad and others about how to address this problem and the fact that it keeps coming up means we haven't done a great job dealing with it.

Let me bring a few datapoints to move the discussion forward:

Regarding the examples, our current rationale is that examples are examples of SOTA implementations rather than simple code examples. The name is confusing and should be something like benchmark, but the term is a bit overloaded (we have a benchmark folder dedicated to measuring runtimes). Baselines could do. We had some other ideas but none really stroke us as the perfect solution. In a way, we'd still like to be able to land an example that shows how to use a specific component without committing to having a piece of code that performs better than any other implementation, and the name examples gives us that freedom. Most of the time we do and will keep on doing our best to get the most fine tuned implementation of course.
Another thing to keep in mind regarding the examples is that we still want them to open the reader's mind to new possibilities. We tried to build them on top of various factories that you can easily hack through to make your own piece of code and solve your own problem, while still exposing most of the TorchRL's APIs to give you full visibility on what you can do with it. I understand the examples aren't as simple as it gets but they are trainable pieces of code that encompass most of our code base, which isn't trivial if you look at the current breadth of what we propose.
Regarding the tutorials, we tried to simplify them on several occasions yet there is always room for improvement. Our PPO tutorial is supposed to be a good example of (a) how to solve a problem in a few steps and (b) a showcase of how to make components play with each others. We could have more of these, or even simpler versions of DQN etc. I'd be thrilled to know what people want to see and what would be a useful tutorial for beginners if we got that wrong! Note that our tutorials are divided in three levels of difficulty to help users know where to start.
Now comparing the PyToch DQN tutorial with torchrl's one isn't really doing justice to both. The reason we kept the DQN tutorial from PT is that it's a useful example to train users to make the best of pytorch. I don't personally see it as a read "RL" example but more as a generalist ML example. In fact, it's a rather trivial problem and it's being solved in a very ad-hoc manner. With TorchRL, we want to give you the tools to quickly experiment DQN on that problem and then switch to SAC or anything else, maybe utilizing one version of the replay buffer or another, collecting data on multiple processes etc. The goal of our tutorials is to teach you how to get there. Again, not to say that your assessment is wrong -- we must have an easier way of getting acquainted with torchrl -- but I'd keep in mind the differences of scope between these examples.

I hope that helps. If you (or anyone else) have concrete suggestions of what would be a clear and concise tutorial to add to the lib we'd be excited to get started working on it!

vmoens · 2024-02-02T08:55:52Z

I'm closing this per lack of feedback but if there's any actionable we can do I'll be thrilled to consider it!

whatdhack · 2024-02-09T18:07:27Z

Specially in the dqn example, some of the well established logical division of Deep Learning are not followed. Hard to rationalize why the SyncDataCollector has a policy network attached to it. Also, looks like the dqn example calls the MLP 3 times in one iteration. !!

vmoens · 2024-02-09T18:10:20Z

It's very much WIP but here's the PR that will hopefully clarify things

#1886

There's a link on top to see the doc rendered according to this work

whatdhack added the enhancement New feature or request label Jan 27, 2024

whatdhack assigned vmoens Jan 27, 2024

vmoens closed this as completed Feb 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Simplify the example and tutorial codes #1843

[Feature Request] Simplify the example and tutorial codes #1843

whatdhack commented Jan 27, 2024

vmoens commented Jan 29, 2024

vmoens commented Feb 2, 2024

whatdhack commented Feb 9, 2024

vmoens commented Feb 9, 2024 •

edited

Loading

[Feature Request] Simplify the example and tutorial codes #1843

[Feature Request] Simplify the example and tutorial codes #1843

Comments

whatdhack commented Jan 27, 2024

Motivation

Solution

Alternatives

Additional context

Checklist

vmoens commented Jan 29, 2024

vmoens commented Feb 2, 2024

whatdhack commented Feb 9, 2024

vmoens commented Feb 9, 2024 • edited Loading

vmoens commented Feb 9, 2024 •

edited

Loading