Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
multi_armed_bandit		multi_armed_bandit
.gitignore		.gitignore
README.md		README.md

Repository files navigation

Reinforcement_Learning_Toys

Some reinforcement learning algorithm implementations. Toy models ~

Topic	Link
Multi-armed bandit (epsilon-greedy/UCB)
Markov decision process	link

Detailed introduction

Multi-armed Bandits

Multi-armed bandit problem (MAB) is a simple and fundamental example for reinforcement learning, and has been used in real world tasks (recommender sys etc.).

Defination ( from wiki ):

In probability theory, the multi-armed bandit problem (sometimes called the K- or N-armed bandit problem) is a problem in which a fixed limited set of resources must be allocated between competing (alternative) choices in a way that maximizes their expected gain, when each choice's properties are only partially known at the time of allocation, and may become better understood as time passes or by allocating resources to the choice. wiki:multi-armed bandits

In the MAB problem, agent uses the previous reward in the earlier actions to estimate the value of each arm, and try to maximize the expected gain for each action.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reinforcement_Learning_Toys

Contents

Detailed introduction

Multi-armed Bandits

About

Releases

Packages

Languages

jzsherlock4869/reinforcement-learning-sutton-code

Folders and files

Latest commit

History

Repository files navigation

Reinforcement_Learning_Toys

Contents

Detailed introduction

Multi-armed Bandits

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages