Counterfactual Risk Minimization: Learning from Logged Bandit Feedback

Swaminathan, Adith; Joachims, Thorsten

Computer Science > Machine Learning

arXiv:1502.02362 (cs)

[Submitted on 9 Feb 2015 (v1), last revised 20 May 2015 (this version, v2)]

Title:Counterfactual Risk Minimization: Learning from Logged Bandit Feedback

Authors:Adith Swaminathan, Thorsten Joachims

View PDF

Abstract:We develop a learning principle and an efficient algorithm for batch learning from logged bandit feedback. This learning setting is ubiquitous in online systems (e.g., ad placement, web search, recommendation), where an algorithm makes a prediction (e.g., ad ranking) for a given input (e.g., query) and observes bandit feedback (e.g., user clicks on presented ads). We first address the counterfactual nature of the learning problem through propensity scoring. Next, we prove generalization error bounds that account for the variance of the propensity-weighted empirical risk estimator. These constructive bounds give rise to the Counterfactual Risk Minimization (CRM) principle. We show how CRM can be used to derive a new learning method -- called Policy Optimizer for Exponential Models (POEM) -- for learning stochastic linear rules for structured output prediction. We present a decomposition of the POEM objective that enables efficient stochastic gradient optimization. POEM is evaluated on several multi-label classification problems showing substantially improved robustness and generalization performance compared to the state-of-the-art.

Comments:	10 pages
Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:1502.02362 [cs.LG]
	(or arXiv:1502.02362v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1502.02362

Submission history

From: Adith Swaminathan [view email]
[v1] Mon, 9 Feb 2015 05:09:25 UTC (52 KB)
[v2] Wed, 20 May 2015 23:29:49 UTC (54 KB)

Full-text links:

Access Paper:

view license

Current browse context:

stat

< prev | next >

new | recent | 2015-02

Change to browse by:

cs
cs.LG
stat.ML

References & Citations

DBLP - CS Bibliography

listing | bibtex

Adith Swaminathan
Thorsten Joachims

export BibTeX citation

Computer Science > Machine Learning

Title:Counterfactual Risk Minimization: Learning from Logged Bandit Feedback

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Counterfactual Risk Minimization: Learning from Logged Bandit Feedback

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators