Improved Regret Analysis for Variance-Adaptive Linear Bandits and Horizon-Free Linear Mixture MDPs

Kim, Yeoneung; Yang, Insoon; Jun, Kwang-Sung

Statistics > Machine Learning

arXiv:2111.03289 (stat)

[Submitted on 5 Nov 2021 (v1), last revised 4 Feb 2023 (this version, v4)]

Title:Improved Regret Analysis for Variance-Adaptive Linear Bandits and Horizon-Free Linear Mixture MDPs

Authors:Yeoneung Kim, Insoon Yang, Kwang-Sung Jun

View PDF

Abstract:In online learning problems, exploiting low variance plays an important role in obtaining tight performance guarantees yet is challenging because variances are often not known a priori. Recently, considerable progress has been made by Zhang et al. (2021) where they obtain a variance-adaptive regret bound for linear bandits without knowledge of the variances and a horizon-free regret bound for linear mixture Markov decision processes (MDPs). In this paper, we present novel analyses that improve their regret bounds significantly. For linear bandits, we achieve $\tilde O(\min\{d\sqrt{K}, d^{1.5}\sqrt{\sum_{k=1}^K \sigma_k^2}\} + d^2)$ where $d$ is the dimension of the features, $K$ is the time horizon, and $\sigma_k^2$ is the noise variance at time step $k$, and $\tilde O$ ignores polylogarithmic dependence, which is a factor of $d^3$ improvement. For linear mixture MDPs with the assumption of maximum cumulative reward in an episode being in $[0,1]$, we achieve a horizon-free regret bound of $\tilde O(d \sqrt{K} + d^2)$ where $d$ is the number of base models and $K$ is the number of episodes. This is a factor of $d^{3.5}$ improvement in the leading term and $d^7$ in the lower order term. Our analysis critically relies on a novel peeling-based regret analysis that leverages the elliptical potential `count' lemma.

Comments:	accepted to neurips'22
Subjects:	Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)
Cite as:	arXiv:2111.03289 [stat.ML]
	(or arXiv:2111.03289v4 [stat.ML] for this version)
	https://doi.org/10.48550/arXiv.2111.03289

Submission history

From: Kwang-Sung Jun [view email]
[v1] Fri, 5 Nov 2021 06:47:27 UTC (90 KB)
[v2] Fri, 28 Jan 2022 19:19:48 UTC (55 KB)
[v3] Thu, 20 Oct 2022 23:17:11 UTC (71 KB)
[v4] Sat, 4 Feb 2023 21:49:44 UTC (448 KB)

✅2024-10-01: arxiv.org is back to normal.✅

Statistics > Machine Learning

Title:Improved Regret Analysis for Variance-Adaptive Linear Bandits and Horizon-Free Linear Mixture MDPs

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

✅2024-10-01: arxiv.org is back to normal.✅

Statistics > Machine Learning

Title:Improved Regret Analysis for Variance-Adaptive Linear Bandits and Horizon-Free Linear Mixture MDPs

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators