Clearer separation between the trainer and the algorithm and refactoring of policy classes #1034

maxhuettenrauch · 2024-01-24T17:18:30Z

I have marked all applicable categories:
- exception-raising bug
- RL algorithm bug
- documentation request (i.e. "X is missing from the documentation.")
- new feature request
- design request (i.e. "X should be changed to Y.")
I have visited the source website
I have searched through the issue tracker for duplicates

I have mentioned version numbers, operating system and environment, where applicable:

import tianshou, gymnasium as gym, torch, numpy, sys
print(tianshou.__version__, gym.__version__, torch.__version__, numpy.__version__, sys.version, sys.platform)

While Tianshou's idea of encapsulating everything learning related into a single method of a base policy class is a nice idea to keep algorithms more or less self contained, it has lead to a cluttered trainer and the need for kwargs during the update call to remain functional (see #949). Additionally, while tianshou uses RL jargon such as policies, they’re not really doing what, in RL terms, policies are doing, i.e., mapping from states to actions. Rather, tianshou interweaves the RL policy with parts of RL algorithms. Also, the trainer implements some parts which might be algorithm specific, like sampling transitions, and then passing them to tianshou-policies for updates.

In order to create a cleaner framework, I think the following changes would improve things:

Separate trainer from algorithm logic
As already mentioned, the trainer currently holds the logic for training loops, logging, and additionally holds some algorithm logic. Half of the if statements and asserts in the __next__ function are necessary because the trainer doesn’t really know what type of RL it’s doing. Instead, I think it makes much more sense to move algorithm logic to new algorithm classes and let the trainer solely be responsible for running the algorithm, logging, and checkpointing. There would also be only one trainer, instead of the OnPolicyTrainer, etc. Eventually, the trainer may even serve as the factory for the high level API.
Separate algorithms from policies
While this adds an additional layer of abstraction/complexity, things will eventually become cleaner. Right now, loads of keywords and arguments can be given to the trainer and they might not even be compatible. A BaseAlgorithm class would allow for more explicit arguments given to a certain algorithm and additionally allow for unconventional algorithms such as ARS. This would also allow for the algorithm to be independent of code regarding the type of policy (continuous/discrete) The policy class then is merely the function that maps from observations to actions.

The text was updated successfully, but these errors were encountered:

MischaPanch · 2024-01-25T13:28:05Z

I'm pinning this issue, it will not happen in next release, but we will definitely look into it asap.

Right now it's blocked, as is any algorithm extension, until we have an evaluation protocol and reliable regression tests (including some kind of reporting to a publicly accessible page) - i.e. by issues in the are of #936 #978 and similar ones

opcode81 added refactoring No change to functionality tentative Up to discussion, may be dismissed breaking changes Changes in public interfaces. Includes small changes or changes in keys labels Jan 24, 2024

MischaPanch added major Large changes that cannot or should not be broken down into smaller ones blocked Can't be worked on for now labels Jan 25, 2024

MischaPanch added this to the Release 1.0.0 milestone Jan 25, 2024

MischaPanch pinned this issue Jan 25, 2024

MischaPanch unpinned this issue Sep 8, 2024

MischaPanch mentioned this issue Sep 8, 2024

Towards Roadmap #1215

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clearer separation between the trainer and the algorithm and refactoring of policy classes #1034

Clearer separation between the trainer and the algorithm and refactoring of policy classes #1034

maxhuettenrauch commented Jan 24, 2024

MischaPanch commented Jan 25, 2024

Clearer separation between the trainer and the algorithm and refactoring of policy classes #1034

Clearer separation between the trainer and the algorithm and refactoring of policy classes #1034

Comments

maxhuettenrauch commented Jan 24, 2024

MischaPanch commented Jan 25, 2024