Skip to content

my solutions to david silver's easy21 assignment

Notifications You must be signed in to change notification settings

axelahmer/easy21

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Easy21

My solutions to David Silver's Easy21 assignment.

Code and discussion answers are not guaranteed to be correct.

q1. Monte Carlo Control

MC Control Vstar

q2. TD Learning

TD backup with time-varying alpha and epsilon (see assignment) TD MSE Lambda TD MSE Episodes

q3. Linear Function Approximation

TD backup with constant: epsilon = 0.05, alpha = 0.01 LFA MSE Lambda LFA MSE Episodes

Discussion

What are the pros and cons of bootstrapping in Easy21?

Bootstrapping allows policy iteration to be done on every time step within a trajectory by using value estimates of future states as opposed to the true result from reaching the terminal state. Bootstrapping also avoids propagating large terminal rewards to episode state values which cause an initially high mean squared error (see TD learning image 2). The main downside to bootstrapping is using a biased estimate of value, although it will still converge to true value in the limit.

Would you expect bootstrapping to help more in blackjack or Easy21? Why?

As cards can have negative values in Easy21 unlike in regular blackjack, episodes can be longer in length. Therefore the benefit of bootstrapping is greater when applied to Easy21, as not bootstrapping allows the reward of long episodes to be propagated back to all states within the episode, despite each state being only loosely connected.

About

my solutions to david silver's easy21 assignment

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages