This package provide a framework for developing and comparing various Bandit algorithms
- Uniform Strategy (Randomly picking some arm)
- ϵ-greedy
- ϵ-greedy
- ϵ_n greedy
- Upper Confidence Bound Policies
- UCB1
- UCB-Normal
- UCB-V
- Bayes-UCB (For Bernoulli Rewards)
- KL-UCB
- Discounted-UCB
- Sliding Window UCB
- Thompson Sampling
- Thompson Sampling
- Dynamic Thompson Sampling
- Optimistic Thompson Sampling
- TSNormal (Thompson Sampling for Gaussian distributed rewards)
- Restarting Thompson Sampling
- TS With Gaussian Prior
- EXP3
- EXP3
- EXP3.1
- EXP3-IX
- SoftMax
- REXP3
- Gradient Bandit
- Bernoulli
- Beta
- Normal
- Sinusoidal (without noise)
- Pulse (without noise)
- Square
- Variational (without noise)