Skip to content
/ RLego Public

A collection of building blocks for the Reinforcement Learning problem

License

Notifications You must be signed in to change notification settings

d3sm0/RLego

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

74 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

From RLax:

Many functions consider policies, actions, rewards, values, in consecutive timesteps in order to compute their outputs. In this case the suffix _t and tm1 is often to clarify on which step each input was generated, e.g:

    q_tm1: the action value in the source state of a transition.
    a_tm1: the action that was selected in the source state.
    r_t: the resulting rewards collected in the destination state.
    discount_t: the discount associated with a transition.
    q_t: the action values in the destination state.

Key differences:

  • we do not allow stop gradient in the objective because it is more efficient to use torch.no_grad at evaluation time and detach() does not seems to play well with vmap.

About

A collection of building blocks for the Reinforcement Learning problem

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages