Skip to content

RL Path Planning with Monte Carlo without exploring starts, SARSA and Q-Learning, used to solve the Froze Lake problem and variation

Notifications You must be signed in to change notification settings

pngqunshen/RL-path-planning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RL-path-planning

Path planning using reinforcement learning in a n x n grid environment:

alt text

Starting position at (0, 0) (top left), and goal position at (n-1, n-1) (bottom right).

Current techniques

  1. Monte-Carlo control without exploring starts
  2. SARSA with an $\epsilon$-greedy behavior policy
  3. Q-learning with an $\epsilon$-greedy behavior policy

Monte-Carlo control without exploring starts

alt text

SARSA with an $\epsilon$-greedy behavior policy

Update rule:

$$Q\left(S_t,A_t\right)← Q\left(S_t,A_t\right)+\alpha\left[R_{t+1}+\gamma Q\left(S_{t+1}, A_{t+1}\right)-Q\left(S_t,A_t\right)\right]$$

alt text

Q-learning with an $\epsilon$-greedy behavior policy

Update rule:

$$Q\left(S_t,A_t\right)← Q\left(S_t,A_t\right)+\alpha\left[R_{t+1}+\gamma \text{max}_ {a'}Q\left(S_{t+1}, a'\right)-Q\left(S_t,A_t\right)\right]$$

alt text

$\epsilon$-greedy policy

$$\pi\left(a|s\right) \begin{cases} 1-\epsilon+\frac{\epsilon}{\left|A\left(s\right)\right|}, & \text{if}\ a=A^{*}≜\text{argmax}_{a}Q\left(s,a\right) \\ \frac{\epsilon}{\left|A\left(s\right)\right|}, & \text{if}\ a\neq A^{*} \end{cases}$$

About

RL Path Planning with Monte Carlo without exploring starts, SARSA and Q-Learning, used to solve the Froze Lake problem and variation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published