Implementations of some popular bandit algorithms both for the regret minimization setting and for the best arm selection setting.
Also, some code to analyze Thompson Sampling in many ways e.g. computing probabilities of ending up in particular states, probabilities of choosing the picking arm from particular states or at particular time steps, computing the expected reward in given time, etc. Here, state means the number of successes or failures encountered for each arm i.e. the history of arms pulled and rewards obtained.