Skip to content

Latest commit

 

History

History
5 lines (3 loc) · 552 Bytes

README.md

File metadata and controls

5 lines (3 loc) · 552 Bytes

Bernoulli-Bandits

Implementations of some popular bandit algorithms both for the regret minimization setting and for the best arm selection setting.

Also, some code to analyze Thompson Sampling in many ways e.g. computing probabilities of ending up in particular states, probabilities of choosing the picking arm from particular states or at particular time steps, computing the expected reward in given time, etc. Here, state means the number of successes or failures encountered for each arm i.e. the history of arms pulled and rewards obtained.