Bootstrapping with Models: Confidence Intervals for Off-Policy Evaluation

Hanna, Josiah P.; Stone, Peter; Niekum, Scott

Computer Science > Artificial Intelligence

arXiv:1606.06126 (cs)

[Submitted on 20 Jun 2016 (v1), last revised 24 Sep 2018 (this version, v3)]

Title:Bootstrapping with Models: Confidence Intervals for Off-Policy Evaluation

Authors:Josiah P. Hanna, Peter Stone, Scott Niekum

View PDF

Abstract:For an autonomous agent, executing a poor policy may be costly or even dangerous. For such agents, it is desirable to determine confidence interval lower bounds on the performance of any given policy without executing said policy. Current methods for exact high confidence off-policy evaluation that use importance sampling require a substantial amount of data to achieve a tight lower bound. Existing model-based methods only address the problem in discrete state spaces. Since exact bounds are intractable for many domains we trade off strict guarantees of safety for more data-efficient approximate bounds. In this context, we propose two bootstrapping off-policy evaluation methods which use learned MDP transition models in order to estimate lower confidence bounds on policy performance with limited data in both continuous and discrete state spaces. Since direct use of a model may introduce bias, we derive a theoretical upper bound on model bias for when the model transition function is estimated with i.i.d. trajectories. This bound broadens our understanding of the conditions under which model-based methods have high bias. Finally, we empirically evaluate our proposed methods and analyze the settings in which different bootstrapping off-policy confidence interval methods succeed and fail.

Comments:	Published in proceedings of the 16th International Conference on Autonomous Agents and Multi-agent Systems
Subjects:	Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:1606.06126 [cs.AI]
	(or arXiv:1606.06126v3 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.1606.06126

Submission history

From: Josiah Hanna [view email]
[v1] Mon, 20 Jun 2016 14:06:22 UTC (510 KB)
[v2] Wed, 8 Mar 2017 23:26:07 UTC (512 KB)
[v3] Mon, 24 Sep 2018 17:13:08 UTC (513 KB)

Computer Science > Artificial Intelligence

Title:Bootstrapping with Models: Confidence Intervals for Off-Policy Evaluation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Artificial Intelligence

Title:Bootstrapping with Models: Confidence Intervals for Off-Policy Evaluation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators