Easy21

My solutions to David Silver's Easy21 assignment.

Code and discussion answers are not guaranteed to be correct.

q1. Monte Carlo Control

q2. TD Learning

TD backup with time-varying alpha and epsilon (see assignment)

q3. Linear Function Approximation

TD backup with constant: epsilon = 0.05, alpha = 0.01

Discussion

What are the pros and cons of bootstrapping in Easy21?

Bootstrapping allows policy iteration to be done on every time step within a trajectory by using value estimates of future states as opposed to the true result from reaching the terminal state. Bootstrapping also avoids propagating large terminal rewards to episode state values which cause an initially high mean squared error (see TD learning image 2). The main downside to bootstrapping is using a biased estimate of value, although it will still converge to true value in the limit.

Would you expect bootstrapping to help more in blackjack or Easy21? Why?

As cards can have negative values in Easy21 unlike in regular blackjack, episodes can be longer in length. Therefore the benefit of bootstrapping is greater when applied to Easy21, as not bootstrapping allows the reward of long episodes to be propagated back to all states within the episode, despite each state being only loosely connected.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
imgs		imgs
README.md		README.md
game.py		game.py
q1.py		q1.py
q2.py		q2.py
q3.py		q3.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Easy21

q1. Monte Carlo Control

q2. TD Learning

q3. Linear Function Approximation

Discussion

About

Releases

Packages

Languages

axelahmer/easy21

Folders and files

Latest commit

History

Repository files navigation

Easy21

q1. Monte Carlo Control

q2. TD Learning

q3. Linear Function Approximation

Discussion

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages