Skip to content

Commit

Permalink
add mrp and mdp
Browse files Browse the repository at this point in the history
  • Loading branch information
jzsherlock4869 committed Oct 12, 2020
1 parent 0c9b5e0 commit 21fea91
Showing 1 changed file with 16 additions and 8 deletions.
24 changes: 16 additions & 8 deletions markov_decision_process/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,13 +9,17 @@ In MRP, each state returns a reward, and the transition of states are only relat

The transition probability is:

<img src="https://www.forkosh.com/mathtex.cgi? P(s_{t+1} = S_j | s_t = S_i) ">
[//]: <> ($$ P(s_{t+1} = S_j | s_t = S_i) $$)
<img src="https://latex.codecogs.com/gif.latex?P(s_{t+1} = S_j | s_t = S_i) ">

[^_^]:
($$ P(s_{t+1} = S_j | s_t = S_i) $$)

The reward of each state is defined as a one-param function:

<img src="https://www.forkosh.com/mathtex.cgi? R(s_t = S_i) = E[r_t | s_t = S_i]">
[//]: <> ($$ R(s_t = S_i) = E[r_t | s_t = S_i] $$)
<img src="https://latex.codecogs.com/gif.latex?R(s_t = S_i) = E[r_t | s_t = S_i]">

[^_^]:
($$ R(s_t = S_i) = E[r_t | s_t = S_i] $$)


## Markov Decision Process (MDP) : the state transition controlled by current state and action.
Expand All @@ -24,12 +28,16 @@ Markov decision process (MDP) is a discrete-time stochastic control process. It

The transition probability from state S_i to S_j under action A_k is defined as follows:

<img src="https://www.forkosh.com/mathtex.cgi? P(s_{t+1} = S_j | s_t = S_i, a_t = A_k) ">
[//] <> ($$ P(s_{t+1} = S_j | s_t = S_i, a_t = A_k) $$)
<img src="https://latex.codecogs.com/gif.latex?P(s_{t+1} = S_j | s_t = S_i, a_t = A_k) ">

[^_^]:
($$ P(s_{t+1} = S_j | s_t = S_i, a_t = A_k) $$)

The reward function of MDP has two parameters:

<img src="https://www.forkosh.com/mathtex.cgi? R(s_t = S_i, a = A_k) = E[r_{t+1} | s_t = S_i, a = A_k]">
[//] <> ($$ R(s_t = S_i, a = A_k) = E[r_{t+1} | s_t = S_i, a = A_k] $$)
<img src="https://latex.codecogs.com/gif.latex?R(s_t = S_i, a = A_k) = E[r_{t+1} | s_t = S_i, a = A_k]">

[^_^]:
($$ R(s_t = S_i, a = A_k) = E[r_{t+1} | s_t = S_i, a = A_k] $$)


0 comments on commit 21fea91

Please sign in to comment.