MDP

The MDP package implements algorithms for Markov decision processes (MDP).

Installation

MDP can be installed through Julia package manager

julia> Pkg.clone("https://github.com/adityam/MDP.jl")

Supported algorithms

The MDP package currently implements the following algorithms

Infinite horizon discounted setup
- Value iteration
Finite horizon setup
- Standard dynamic program

Other algorithms related to MDPs will be implemented as time permits. Pull requests welcome.

Specifying a model

There are two ways to specify the MDP model:

By specifying the controlled transition matrix and the cost and reward function.
By specifying the bellman update operator.

Perhaps the easiest way to understand these is by means of examples.

Specifying a model: probabilistic method

This example is taken from Puterman, "Markov Decision Processes", Wiley 2005 (Chapter 3.1)

Consider a Markov decision process where

States S = {1, 2}
Actions A = {1, 2}
Rewards r(s,a) given by
```
r(1,1) = 5      r(1,2) = 10
r(2,1) = -1     r(2,2) = -Inf
```
Note that r(2,2) = -Inf means that action 2 is infeasible at state 2.

Transition probabilities P(s_next | s_current, action)

  P(1 | 1, 1) = 0.5       P(2 | 1, 1) = 0.5
  P(1 | 2, 1) = 0.0       P(2 | 2, 1) = 1.0

  P(1 | 1, 2) = 0.0       P(2 | 1, 2) = 1.0
  P(1 | 2, 2) = 0.5       P(2 | 2, 2) = 0.5

This model is specified as follows:

Specify the rewards as a S x A matrix.
```
r = [ 5  10
     -1  -Inf]
```

Specify the transition matrix as a list { P( . | ., 1), P( . | ., 2) }.

P = Matrix[ [ 0.5  0.5 
              0.0  1.0 ],
            [ 0.0  1.0
              0.5  0.5 ] ]

This list may also be specified as a sparse matrix (this can lead to considerable speedup when the transition matrices are sparse)

using SparseArrays
P = SparseMatrixCSC[sparse([0.5 0.5; 0.0 1.0]),
                    sparse([0.0 1.0; 0.5 0.5]) ]

Specify an MDP model
```
model = ProbModel(r, P; objective=:Max)
```
objective=:Max specifies that we want to maximize reward. To minimize a cost function, use objective=:Min.

To solve this model using value iteration use

(v,g) = valueIteration(model; discount=0.95)

v is the value function (stored as a S dimensional vector) nad g is the policy (again, stored as a S dimensional vector).

The optimal arguments for valueIteration are

discount (default 0.95): The discount factor.
iterations (default 1_000): The maximum number of iteration to be perfermoed.

tolerance (default 1e-4): Stop the iteration when sup-norm of v - v_opt is less than tolerance, where v_opt is the optimal solution.

To speed up convergence, we use span seminorm and do a correction before
returning the value function. See Puterman Sec 6.6 for details.

intial_v (default, an all zeros-vector).

Here is the full example:

using MDP

P = Matrix[ [ 0.5  0.5 
              0.0  1.0 ],
            [ 0.0  1.0
              0.5  0.5 ] ]

r = [ 5  10
     -1  -Inf ]

model = ProbModel(r, P; objective=:Max)

(v,g) = valueIteration(model; discount=0.95)

println(g)
println(v)

which outputs

[ Info: value iteration will converge in at most 20 iterations
[ Info: Reached precision 5.742009e-05 at iteration 18
[1, 1]
[-8.5714, -20.0]

Name		Name	Last commit message	Last commit date
Latest commit History 124 Commits
examples		examples
src		src
test		test
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
Manifest.toml		Manifest.toml
Project.toml		Project.toml
README.md		README.md
REQUIRE		REQUIRE

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MDP

Installation

Supported algorithms

Specifying a model

Specifying a model: probabilistic method

About

Releases

Packages

Languages

License

adityam/MDP.jl

Folders and files

Latest commit

History

Repository files navigation

MDP

Installation

Supported algorithms

Specifying a model

Specifying a model: probabilistic method

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages