Skip to content

sanitgupta/pac-planning

 
 

Repository files navigation

Introduction

Markov Decision Processes (MDPs) are a fundamental mathematical abstraction used to model sequential decision making under uncertainty and are a model of discrete-time stochastic control and reinforcement learning problems. Particularly important in MDPs is the planning problem, wherein we try to compute an optimal policy that maps each state of an MDP to an action to be followed at that state. The optimal policy is a policy which maximises the reward while traversing the MDP.

We study the planning problem assuming that a near perfect simulator is available. Given how expensive it can be to gather data, the time required to find a near optimal policy for many problems is dominated by the number of calls to the simulator. A good MDP planning algorithm in such a setting attempts to minimise the number of calls made to the simulator to learn a policy that is very close to being optimal with a high probability. This is known as the probably approximately correctly (PAC) framework.

About

PAC Optimal MDP Planning

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 92.5%
  • C++ 5.4%
  • Shell 2.1%