A Markov decision process (MDP) is a discrete time stochastic control process. It provides a mathematical framework for modeling decision making in situations where outcomes are partly random and partly under the control of a decision maker. MDPs are useful for studying optimization problems solved via dynamic programming and reinforcement learning.
There are three methods to solve MDPs-
- Value Iteration
- Policy Iteration
- Linear Programming
This assignment uses Value Iteration and Linear Programming to solve MDP.