Skip to content

adityam/MDP.jl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MDP

Build Status Coverage Status

The MDP package implements algorithms for Markov decision processes (MDP).

Installation

MDP can be installed through Julia package manager

julia> Pkg.clone("https://github.com/adityam/MDP.jl")

Supported algorithms

The MDP package currently implements the following algorithms

  • Infinite horizon discounted setup

    • Value iteration
  • Finite horizon setup

    • Standard dynamic program

Other algorithms related to MDPs will be implemented as time permits. Pull requests welcome.

Specifying a model

There are two ways to specify the MDP model:

  1. By specifying the controlled transition matrix and the cost and reward function.

  2. By specifying the bellman update operator.

Perhaps the easiest way to understand these is by means of examples.

Specifying a model: probabilistic method

This example is taken from Puterman, "Markov Decision Processes", Wiley 2005 (Chapter 3.1)

Consider a Markov decision process where

  • States S = {1, 2}

  • Actions A = {1, 2}

  • Rewards r(s,a) given by

    r(1,1) = 5      r(1,2) = 10
    r(2,1) = -1     r(2,2) = -Inf
    

    Note that r(2,2) = -Inf means that action 2 is infeasible at state 2.

  • Transition probabilities P(s_next | s_current, action)

      P(1 | 1, 1) = 0.5       P(2 | 1, 1) = 0.5
      P(1 | 2, 1) = 0.0       P(2 | 2, 1) = 1.0
    
      P(1 | 1, 2) = 0.0       P(2 | 1, 2) = 1.0
      P(1 | 2, 2) = 0.5       P(2 | 2, 2) = 0.5
    

This model is specified as follows:

  • Specify the rewards as a S x A matrix.

    r = [ 5  10
         -1  -Inf]
  • Specify the transition matrix as a list { P( . | ., 1), P( . | ., 2) }.

    P = Matrix[ [ 0.5  0.5 
                  0.0  1.0 ],
                [ 0.0  1.0
                  0.5  0.5 ] ]

    This list may also be specified as a sparse matrix (this can lead to considerable speedup when the transition matrices are sparse)

    using SparseArrays
    P = SparseMatrixCSC[sparse([0.5 0.5; 0.0 1.0]),
                        sparse([0.0 1.0; 0.5 0.5]) ]
    
  • Specify an MDP model

    model = ProbModel(r, P; objective=:Max)

    objective=:Max specifies that we want to maximize reward. To minimize a cost function, use objective=:Min.

To solve this model using value iteration use

(v,g) = valueIteration(model; discount=0.95)

v is the value function (stored as a S dimensional vector) nad g is the policy (again, stored as a S dimensional vector).

The optimal arguments for valueIteration are

  • discount (default 0.95): The discount factor.

  • iterations (default 1_000): The maximum number of iteration to be perfermoed.

  • tolerance (default 1e-4): Stop the iteration when sup-norm of v - v_opt is less than tolerance, where v_opt is the optimal solution.

    To speed up convergence, we use span seminorm and do a correction before
    returning the value function. See Puterman Sec 6.6 for details. 
    
  • intial_v (default, an all zeros-vector).

Here is the full example:

using MDP

P = Matrix[ [ 0.5  0.5 
              0.0  1.0 ],
            [ 0.0  1.0
              0.5  0.5 ] ]

r = [ 5  10
     -1  -Inf ]

model = ProbModel(r, P; objective=:Max)

(v,g) = valueIteration(model; discount=0.95)

println(g)
println(v)

which outputs

[ Info: value iteration will converge in at most 20 iterations
[ Info: Reached precision 5.742009e-05 at iteration 18
[1, 1]
[-8.5714, -20.0]

About

Markov Decision Processes algorithms for Julia

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages