Upper and lower bounds on the subgeometric convergence of adaptive Markov chain Monte Carlo

Austin Brown \orcidlink0000-0003-1576-8381 [email protected] Department of Statistical Sciences, University of Toronto, Toronto, Canada Jeffrey S. Rosenthal \orcidlink0000-0002-5118-6808 [email protected] Department of Statistical Sciences, University of Toronto, Toronto, Canada
Abstract

We investigate lower bounds on the subgeometric convergence of adaptive Markov chain Monte Carlo under any adaptation strategy. In particular, we prove general lower bounds in total variation and on the weak convergence rate under general adaptation plans. If the adaptation diminishes sufficiently fast, we also develop comparable convergence rate upper bounds that are capable of approximately matching the convergence rate in the subgeometric lower bound. These results provide insight into the optimal design of adaptation strategies and also limitations on the convergence behavior of adaptive Markov chain Monte Carlo. Applications to an adaptive unadjusted Langevin algorithm as well as adaptive Metropolis-Hastings with independent proposals and random-walk proposals are explored.

MSC: 60J05; 60J22; 60G07

Keywords: adaptive Metropolis-Hastings; lower bounds for adaptive MCMC; weak convergence of adaptive MCMC;

1 Introduction

Let π𝜋\piitalic_π be a Borel probability measure on a Polish space 𝒳𝒳\mathcal{X}caligraphic_X. Adaptive Markov chain Monte Carlo [Haario et al., 2001, Roberts and Rosenthal, 2007] is a widely successful framework to simulate realizations from π𝜋\piitalic_π when optimal tuning parameters for the Markov chain are not readily available. The adaptive process (Γt,Xt)t=1superscriptsubscriptsubscriptΓ𝑡subscript𝑋𝑡𝑡1(\Gamma_{t},X_{t})_{t=1}^{\infty}( roman_Γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT is constructed from a family of Markov kernels indexed by a set of potential tuning parameters. The discrete-time adaptive process first updates the tuning parameter Γt|(Γs,Xs)0st1conditionalsubscriptΓ𝑡subscriptsubscriptΓ𝑠subscript𝑋𝑠0𝑠𝑡1\Gamma_{t}|(\Gamma_{s},X_{s})_{0\leq s\leq t-1}roman_Γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | ( roman_Γ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT 0 ≤ italic_s ≤ italic_t - 1 end_POSTSUBSCRIPT with an adaptation strategy utilizing previous history and next, updates Xt|Γt,Xt1conditionalsubscript𝑋𝑡subscriptΓ𝑡subscript𝑋𝑡1X_{t}|\Gamma_{t},X_{t-1}italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | roman_Γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT using a Markov transition kernel. The goal is for the adaptive process to “learn” optimal tuning parameters so that the marginal distribution of the random variable Xtsubscript𝑋𝑡X_{t}italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT produces a close approximation to the measure π𝜋\piitalic_π.

With a large option for adaptation strategies, theoretical convergence rates of adaptive algorithms are less understood than for non-adaptive Markov chain Monte Carlo (MCMC) where fixed tuning parameters are chosen carefully beforehand. In particular, a theoretical understanding of the rate of convergence is essential in applications as it helps to ensure a stable and reliable Monte Carlo simulation. However, adaptive MCMC can exhibit empirical performance superseding the performance of standard MCMC even though much of the theoretical understanding is lacking. For example, adaptive MCMC is widely used to automatically learn the covariance in random-walk Metropolis-Hastings [Haario et al., 2001], which is often difficult or impossible to choose optimally with only fixed tuning parameter choices.

The main contributions of this paper develop general subgeometric lower bounds in total variation and the weak convergence rate of adaptive MCMC paired with upper bounds under strong conditions on the rate at which adaptation diminishes. Applications of the theory are demonstrated on an adaptive unadjusted Langevin algorithm, Metropolis-Hastings independence sampler, and an adaptive Metropolis-Hastings random-walk. The lower bounds for convergence hold under arbitrary adaptation plans and serve as a measurement of the optimal convergence behavior for adaptive MCMC. The techniques for obtaining these lower bounds are based on finding large discrepancies between the tail probabilities of the marginal adaptive process and the target measure π𝜋\piitalic_π. Since the convergence rate is determined by tail properties, this may guide further theoretical understanding of some modern adaptation strategies that restrict adaptation to compact sets [Pompe et al., 2020]. Convergence rate lower bounds can also be of practical use in applications to determine if an appropriate rate is achievable so that central limit theorems may hold [Andrieu and Moulines, 2006, Laitinen and Vihola, 2024].

One barrier in developing lower bounds for adaptive MCMC is due to the non-Markovian, non-reversible nature of these processes and spectral analysis for reversible Markov processes is not directly available. To the best of our knowledge, the lower bounds for weak convergence developed here are novel, even when applied to non-adapted Markov chains, and general total variation lower bounds have not yet been explored for adaptive MCMC. In specific situations, adaptive random-walk algorithms have been shown to improve “local” behavior but fail to adapt to “global” properties of the target measure, such as the tail probabilities, and proven to experience poor convergence properties [Schmidler and Woodard, 2011]. Related research develops general lower bounds in total variation for Markov processes [Hairer, 2009, Theorem 3.6, Corollary 3.7]. More recently, this technique has also been extended to polynomial rate lower bounds in unbounded Wasserstein distances for some Markov processes [Sandrić et al., 2022, Theorem 1.2]. When the tail decay of the target measure is unavailable, lower bounds for Markov processes in total variation have recently been developed, but a precise computation of the constants is not available [Brešar and Mijatović, 2024].

In addition to lower bounds, we develop explicit quantitative subgeometric upper bounds in total variation that can match the lower bound rate if the adaptation diminishes sufficiently fast. The condition required on the adaptation is similar to the well-known diminishing adaptation condition [Roberts and Rosenthal, 2007] often used for the asymptotic convergence of adaptive MCMC. To the best of our knowledge, this is the first subgeometric upper bound to quantify the mixing for adaptive MCMC in total variation. In comparison, existing convergence results require strong assumptions for adaptive MCMC and are not quantitative [Andrieu and Moulines, 2006] or develop central limit theorems through Poisson’s equation [Laitinen and Vihola, 2024].

The organization of this article is as follows. Section 2 first develops lower bounds in total variation for large classes of adaptation strategies and then extends these lower bounds to weak convergence when the state space is Euclidean. A lower bound is shown on a concrete example for the adapted unadjusted Langevin algorithm. Section 3 proves comparable upper bounds under diminishing conditions on the adaptation plans that are capable of approximately matching the lower bound rates. Section 4 illustrates the lower bounds on a toy example with an adaptive Metropolis-Hastings independence sampler, and Section 5 applies the lower bounds to the popular adaptive random-walk Metropolis-Hastings. Section 6 provides a final discussion on the results and future research directions.

2 Lower bounds on the convergence of adaptive MCMC

For two Borel probability measures μ,ν𝜇𝜈\mu,\nuitalic_μ , italic_ν on 𝒳𝒳\mathcal{X}caligraphic_X, let 𝒞(μ,ν)𝒞𝜇𝜈\mathcal{C}(\mu,\nu)caligraphic_C ( italic_μ , italic_ν ) be the set of all couplings consisting of Borel probability measures on 𝒳×𝒳𝒳𝒳\mathcal{X}\times\mathcal{X}caligraphic_X × caligraphic_X satisfying Γ(×𝒳)=μ\Gamma(\cdot\times\mathcal{X})=\muroman_Γ ( ⋅ × caligraphic_X ) = italic_μ and Γ(𝒳×)=ν\Gamma(\mathcal{X}\times\cdot)=\nuroman_Γ ( caligraphic_X × ⋅ ) = italic_ν. Denote then the total variation distance between μ𝜇\muitalic_μ and ν𝜈\nuitalic_ν as the best probability of the off-diagonal over all possible couplings, that is,

μνTV=infξ𝒞(μ,ν)ξ({(x,y)𝒳×𝒳:xy}).subscriptdelimited-∥∥𝜇𝜈TVsubscriptinfimum𝜉𝒞𝜇𝜈𝜉conditional-set𝑥𝑦𝒳𝒳𝑥𝑦\left\lVert\mu-\nu\right\rVert_{\text{TV}}=\inf_{\xi\in\mathcal{C}(\mu,\nu)}% \xi(\{(x,y)\in\mathcal{X}\times\mathcal{X}:x\not=y\}).∥ italic_μ - italic_ν ∥ start_POSTSUBSCRIPT TV end_POSTSUBSCRIPT = roman_inf start_POSTSUBSCRIPT italic_ξ ∈ caligraphic_C ( italic_μ , italic_ν ) end_POSTSUBSCRIPT italic_ξ ( { ( italic_x , italic_y ) ∈ caligraphic_X × caligraphic_X : italic_x ≠ italic_y } ) .

Denote the min and max of a,b𝑎𝑏a,b\in\mathbb{R}italic_a , italic_b ∈ blackboard_R by ab𝑎𝑏a\wedge bitalic_a ∧ italic_b and ab𝑎𝑏a\vee bitalic_a ∨ italic_b respectively. On a Polish space (𝒳,d)𝒳𝑑(\mathcal{X},d)( caligraphic_X , italic_d ) where d:𝒳×𝒳[0,):𝑑𝒳𝒳0d:\mathcal{X}\times\mathcal{X}\to[0,\infty)italic_d : caligraphic_X × caligraphic_X → [ 0 , ∞ ) is a metric, we denote the Wasserstein distance that metrizes the weak convergence of probability measures [Dudley, 2018, Theorem 11.3.3]

𝒲d1(μ,ν)=infξ𝒞(μ,ν)𝒳×𝒳[d(x,y)1]ξ(dx,dy).subscript𝒲𝑑1𝜇𝜈subscriptinfimum𝜉𝒞𝜇𝜈subscript𝒳𝒳delimited-[]𝑑𝑥𝑦1𝜉𝑑𝑥𝑑𝑦\mathcal{W}_{d\wedge 1}(\mu,\nu)=\inf_{\xi\in\mathcal{C}(\mu,\nu)}\int_{% \mathcal{X}\times\mathcal{X}}\left[d(x,y)\wedge 1\right]\xi(dx,dy).caligraphic_W start_POSTSUBSCRIPT italic_d ∧ 1 end_POSTSUBSCRIPT ( italic_μ , italic_ν ) = roman_inf start_POSTSUBSCRIPT italic_ξ ∈ caligraphic_C ( italic_μ , italic_ν ) end_POSTSUBSCRIPT ∫ start_POSTSUBSCRIPT caligraphic_X × caligraphic_X end_POSTSUBSCRIPT [ italic_d ( italic_x , italic_y ) ∧ 1 ] italic_ξ ( italic_d italic_x , italic_d italic_y ) .

Let 𝒳𝒳\mathcal{X}caligraphic_X be a Polish space and 𝒴𝒴\mathcal{Y}caligraphic_Y be a Borel measurable space equipped with their Borel sigma-algebras (𝒳)𝒳\mathcal{B}(\mathcal{X})caligraphic_B ( caligraphic_X ) and (𝒴)𝒴\mathcal{B}(\mathcal{Y})caligraphic_B ( caligraphic_Y ) respectively where 𝒳𝒳\mathcal{X}caligraphic_X is the state space and 𝒴𝒴\mathcal{Y}caligraphic_Y is the space for tuning parameters. We now define the adaptive process (Γt,Xt)t=0superscriptsubscriptsubscriptΓ𝑡subscript𝑋𝑡𝑡0(\Gamma_{t},X_{t})_{t=0}^{\infty}( roman_Γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT on 𝒴×𝒳𝒴𝒳\mathcal{Y}\times\mathcal{X}caligraphic_Y × caligraphic_X using the filtration t=(Γs,Xs,0st)subscript𝑡subscriptΓ𝑠subscript𝑋𝑠0𝑠𝑡\mathcal{H}_{t}=\mathcal{B}(\Gamma_{s},X_{s},0\leq s\leq t)caligraphic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = caligraphic_B ( roman_Γ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT , 0 ≤ italic_s ≤ italic_t ). Let 𝒬𝒬\mathcal{Q}caligraphic_Q define an adaptation plan which denotes the map t𝒬tmaps-to𝑡subscript𝒬𝑡t\mapsto\mathcal{Q}_{t}italic_t ↦ caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT for all t+𝑡subscriptt\in\mathbb{Z}_{+}italic_t ∈ blackboard_Z start_POSTSUBSCRIPT + end_POSTSUBSCRIPT where 𝒬t:(𝒴×𝒳)t×(𝒴)[0,1]:subscript𝒬𝑡superscript𝒴𝒳𝑡𝒴01\mathcal{Q}_{t}:(\mathcal{Y}\times\mathcal{X})^{t}\times\mathcal{B}(\mathcal{Y% })\to[0,1]caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT : ( caligraphic_Y × caligraphic_X ) start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT × caligraphic_B ( caligraphic_Y ) → [ 0 , 1 ] is a Borel probability kernel. The kernels 𝒬tsubscript𝒬𝑡\mathcal{Q}_{t}caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT act on Borel functions g:𝒴:𝑔𝒴g:\mathcal{Y}\to\mathbb{R}italic_g : caligraphic_Y → blackboard_R and Borel measures ν𝜈\nuitalic_ν on (𝒴×𝒳)tsuperscript𝒴𝒳𝑡(\mathcal{Y}\times\mathcal{X})^{t}( caligraphic_Y × caligraphic_X ) start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT with

(𝒬tg)(γ0,x0,,γt1,xt1)=𝒳g(γt)𝒬t(γ0,x0,,γt1,xt1,dγt)subscript𝒬𝑡𝑔subscript𝛾0subscript𝑥0subscript𝛾𝑡1subscript𝑥𝑡1subscript𝒳𝑔subscript𝛾𝑡subscript𝒬𝑡subscript𝛾0subscript𝑥0subscript𝛾𝑡1subscript𝑥𝑡1𝑑subscript𝛾𝑡\displaystyle(\mathcal{Q}_{t}g)(\gamma_{0},x_{0},\ldots,\gamma_{t-1},x_{t-1})=% \int_{\mathcal{X}}g(\gamma_{t})\mathcal{Q}_{t}(\gamma_{0},x_{0},\ldots,\gamma_% {t-1},x_{t-1},d\gamma_{t})( caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_g ) ( italic_γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … , italic_γ start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) = ∫ start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT italic_g ( italic_γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … , italic_γ start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT , italic_d italic_γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT )
(ν𝒬t)()=𝒳𝒬t(γ0,x0,,γt1,xt1,)ν(dγ0,dx0,,dγt1,dxt1)𝜈subscript𝒬𝑡subscript𝒳subscript𝒬𝑡subscript𝛾0subscript𝑥0subscript𝛾𝑡1subscript𝑥𝑡1𝜈𝑑subscript𝛾0𝑑subscript𝑥0𝑑subscript𝛾𝑡1𝑑subscript𝑥𝑡1\displaystyle(\nu\mathcal{Q}_{t})(\cdot)=\int_{\mathcal{X}}\mathcal{Q}_{t}(% \gamma_{0},x_{0},\ldots,\gamma_{t-1},x_{t-1},\cdot)\nu(d\gamma_{0},dx_{0},% \ldots,d\gamma_{t-1},dx_{t-1})( italic_ν caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ( ⋅ ) = ∫ start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … , italic_γ start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT , ⋅ ) italic_ν ( italic_d italic_γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_d italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … , italic_d italic_γ start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT , italic_d italic_x start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT )

for all t+𝑡subscriptt\in\mathbb{Z}_{+}italic_t ∈ blackboard_Z start_POSTSUBSCRIPT + end_POSTSUBSCRIPT and γ0,x0,,γt1,xt1(𝒴×𝒳)tsubscript𝛾0subscript𝑥0subscript𝛾𝑡1subscript𝑥𝑡1superscript𝒴𝒳𝑡\gamma_{0},x_{0},\ldots,\gamma_{t-1},x_{t-1}\in(\mathcal{Y}\times\mathcal{X})^% {t}italic_γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … , italic_γ start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ∈ ( caligraphic_Y × caligraphic_X ) start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT. Initialized at fixed x0,γ0𝒳×𝒴subscript𝑥0subscript𝛾0𝒳𝒴x_{0},\gamma_{0}\in\mathcal{X}\times\mathcal{Y}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∈ caligraphic_X × caligraphic_Y, the discrete-time adaptive process first updates the tuning parameter

Γt|(Γs,Xs)0st1𝒬t((Γs,Xs)0st1,)similar-toconditionalsubscriptΓ𝑡subscriptsubscriptΓ𝑠subscript𝑋𝑠0𝑠𝑡1subscript𝒬𝑡subscriptsubscriptΓ𝑠subscript𝑋𝑠0𝑠𝑡1\Gamma_{t}|(\Gamma_{s},X_{s})_{0\leq s\leq t-1}\sim\mathcal{Q}_{t}((\Gamma_{s}% ,X_{s})_{0\leq s\leq t-1},\cdot)roman_Γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | ( roman_Γ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT 0 ≤ italic_s ≤ italic_t - 1 end_POSTSUBSCRIPT ∼ caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( ( roman_Γ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT 0 ≤ italic_s ≤ italic_t - 1 end_POSTSUBSCRIPT , ⋅ )

using an adaptation plan. Let (𝒫γ)γ𝒴subscriptsubscript𝒫𝛾𝛾𝒴(\mathcal{P}_{\gamma})_{\gamma\in\mathcal{Y}}( caligraphic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_γ ∈ caligraphic_Y end_POSTSUBSCRIPT be a family of Borel Markov kernels where 𝒫γ:𝒳×(𝒳)[0,1]:subscript𝒫𝛾𝒳𝒳01\mathcal{P}_{\gamma}:\mathcal{X}\times\mathcal{B}(\mathcal{X})\to[0,1]caligraphic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT : caligraphic_X × caligraphic_B ( caligraphic_X ) → [ 0 , 1 ] for each γ𝒴𝛾𝒴\gamma\in\mathcal{Y}italic_γ ∈ caligraphic_Y and for each x𝒳𝑥𝒳x\in\mathcal{X}italic_x ∈ caligraphic_X, γ𝒫γ(x,)maps-to𝛾subscript𝒫𝛾𝑥\gamma\mapsto\mathcal{P}_{\gamma}(x,\cdot)italic_γ ↦ caligraphic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_x , ⋅ ) is Borel measurable. The Markov family acts on Borel functions f:𝒳:𝑓𝒳f:\mathcal{X}\to\mathbb{R}italic_f : caligraphic_X → blackboard_R and Borel measures μ𝜇\muitalic_μ on 𝒳𝒳\mathcal{X}caligraphic_X with

(𝒫γf)(x)=𝒳f(y)𝒫γ(x,dy)subscript𝒫𝛾𝑓𝑥subscript𝒳𝑓𝑦subscript𝒫𝛾𝑥𝑑𝑦\displaystyle(\mathcal{P}_{\gamma}f)(x)=\int_{\mathcal{X}}f(y)\mathcal{P}_{% \gamma}(x,dy)( caligraphic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT italic_f ) ( italic_x ) = ∫ start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT italic_f ( italic_y ) caligraphic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_x , italic_d italic_y ) (μ𝒫γ)()=𝒳𝒫γ(x,)μ(dx)𝜇subscript𝒫𝛾subscript𝒳subscript𝒫𝛾𝑥𝜇𝑑𝑥\displaystyle(\mu\mathcal{P}_{\gamma})(\cdot)=\int_{\mathcal{X}}\mathcal{P}_{% \gamma}(x,\cdot)\mu(dx)( italic_μ caligraphic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ) ( ⋅ ) = ∫ start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT caligraphic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_x , ⋅ ) italic_μ ( italic_d italic_x )

for all x,γ𝒳×𝒴𝑥𝛾𝒳𝒴x,\gamma\in\mathcal{X}\times\mathcal{Y}italic_x , italic_γ ∈ caligraphic_X × caligraphic_Y. The process then updates the state space given the updated tuning parameters

Xt|Γt,Xt1𝒫Γt(Xt1,)similar-toconditionalsubscript𝑋𝑡subscriptΓ𝑡subscript𝑋𝑡1subscript𝒫subscriptΓ𝑡subscript𝑋𝑡1X_{t}|\Gamma_{t},X_{t-1}\sim\mathcal{P}_{\Gamma_{t}}(X_{t-1},\cdot)italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | roman_Γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ∼ caligraphic_P start_POSTSUBSCRIPT roman_Γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT , ⋅ )

using the Markov kernel.

Let 𝒮(𝒳,𝒴)𝒮𝒳𝒴\mathcal{S}(\mathcal{X},\mathcal{Y})caligraphic_S ( caligraphic_X , caligraphic_Y ) denote the set of all possible adaptation plans 𝒬𝒬\mathcal{Q}caligraphic_Q that define the Borel kernels 𝒬tsubscript𝒬𝑡\mathcal{Q}_{t}caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT updating the tuning parameters at every iteration time t𝑡titalic_t. For a chosen adaptive strategy 𝒬𝒮(𝒳,𝒴)𝒬𝒮𝒳𝒴\mathcal{Q}\in\mathcal{S}(\mathcal{X},\mathcal{Y})caligraphic_Q ∈ caligraphic_S ( caligraphic_X , caligraphic_Y ), we denote the marginal of the adaptive process at iteration time t𝑡titalic_t by Xt𝒜𝒬(t)((γ0,x0),)similar-tosubscript𝑋𝑡superscriptsubscript𝒜𝒬𝑡subscript𝛾0subscript𝑥0X_{t}\sim\mathcal{A}_{\mathcal{Q}}^{(t)}((\gamma_{0},x_{0}),\cdot)italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∼ caligraphic_A start_POSTSUBSCRIPT caligraphic_Q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ( ( italic_γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) , ⋅ ). We will develop conditions to lower bound the the total variation over all feasible adaptation strategies, that is, to lower bound

inf𝒬𝒮(𝒳,𝒴)𝒜𝒬(t)(γ0,x0,)πTVsubscriptinfimum𝒬𝒮𝒳𝒴subscriptdelimited-∥∥superscriptsubscript𝒜𝒬𝑡subscript𝛾0subscript𝑥0𝜋TV\inf_{\mathcal{Q}\in\mathcal{S}(\mathcal{X},\mathcal{Y})}\left\lVert\mathcal{A% }_{\mathcal{Q}}^{(t)}(\gamma_{0},x_{0},\cdot)-\pi\right\rVert_{\text{TV}}roman_inf start_POSTSUBSCRIPT caligraphic_Q ∈ caligraphic_S ( caligraphic_X , caligraphic_Y ) end_POSTSUBSCRIPT ∥ caligraphic_A start_POSTSUBSCRIPT caligraphic_Q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ( italic_γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , ⋅ ) - italic_π ∥ start_POSTSUBSCRIPT TV end_POSTSUBSCRIPT

for t+𝑡subscriptt\in\mathbb{Z}_{+}italic_t ∈ blackboard_Z start_POSTSUBSCRIPT + end_POSTSUBSCRIPT.

The main tool will be a function prescribing a subgeometric rate defined implicitly as an inverse which we now define. For concave functions φ:(0,)(0,):𝜑00\varphi:(0,\infty)\to(0,\infty)italic_φ : ( 0 , ∞ ) → ( 0 , ∞ ) and w0[1,)subscript𝑤01w_{0}\in[1,\infty)italic_w start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∈ [ 1 , ∞ ), define

Hw0,φ(w)=w0wdvφ(v)subscript𝐻subscript𝑤0𝜑𝑤superscriptsubscriptsubscript𝑤0𝑤𝑑𝑣𝜑𝑣\displaystyle H_{w_{0},\varphi}(w)=\int_{w_{0}}^{w}\frac{dv}{\varphi(v)}italic_H start_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_φ end_POSTSUBSCRIPT ( italic_w ) = ∫ start_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT divide start_ARG italic_d italic_v end_ARG start_ARG italic_φ ( italic_v ) end_ARG (1)

for all ww0𝑤subscript𝑤0w\geq w_{0}italic_w ≥ italic_w start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT. The assumptions on φ𝜑\varphiitalic_φ imply it is non-decreasing and Hw0,φ()subscript𝐻subscript𝑤0𝜑H_{w_{0},\varphi}(\cdot)italic_H start_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_φ end_POSTSUBSCRIPT ( ⋅ ) is strictly increasing as well as the inverse Hw0,φ1()subscriptsuperscript𝐻1subscript𝑤0𝜑H^{-1}_{w_{0},\varphi}(\cdot)italic_H start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_φ end_POSTSUBSCRIPT ( ⋅ ) exists. Depending on the form of φ𝜑\varphiitalic_φ, the inverse function Hw0,φ1()subscriptsuperscript𝐻1subscript𝑤0𝜑H^{-1}_{w_{0},\varphi}(\cdot)italic_H start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_φ end_POSTSUBSCRIPT ( ⋅ ) defines a polynomial, subgeometric, or geometric function increasing to infinity.

The first lower bound in total variation uses a technique extended from [Hairer, 2009, Corollary 3.7] to adaptive MCMC over all adaptive strategies.

Theorem 1.

Assume there is a Borel function W:𝒳[1,):𝑊𝒳1W:\mathcal{X}\to[1,\infty)italic_W : caligraphic_X → [ 1 , ∞ ) and constants C,κ>0𝐶𝜅0C,\kappa>0italic_C , italic_κ > 0 where

π(Wr)Crκ𝜋𝑊𝑟𝐶superscript𝑟𝜅\displaystyle\pi(W\geq r)\geq Cr^{-\kappa}italic_π ( italic_W ≥ italic_r ) ≥ italic_C italic_r start_POSTSUPERSCRIPT - italic_κ end_POSTSUPERSCRIPT (2)

holds for all r>0𝑟0r>0italic_r > 0 and there is an α>κ𝛼𝜅\alpha>\kappaitalic_α > italic_κ and a concave function φ:(0,)(0,):𝜑00\varphi:(0,\infty)\to(0,\infty)italic_φ : ( 0 , ∞ ) → ( 0 , ∞ ) such that

(𝒫γWα)(x)W(x)αφ(W(x)α)subscript𝒫𝛾superscript𝑊𝛼𝑥𝑊superscript𝑥𝛼𝜑𝑊superscript𝑥𝛼\displaystyle(\mathcal{P}_{\gamma}W^{\alpha})(x)-W(x)^{\alpha}\leq\varphi(W(x)% ^{\alpha})( caligraphic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT italic_W start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ) ( italic_x ) - italic_W ( italic_x ) start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ≤ italic_φ ( italic_W ( italic_x ) start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ) (3)

holds for all x,γ𝒳×𝒴𝑥𝛾𝒳𝒴x,\gamma\in\mathcal{X}\times\mathcal{Y}italic_x , italic_γ ∈ caligraphic_X × caligraphic_Y. Then for all t+𝑡subscriptt\in\mathbb{Z}_{+}italic_t ∈ blackboard_Z start_POSTSUBSCRIPT + end_POSTSUBSCRIPT,

inf𝒬𝒮(𝒳,𝒴)𝒜𝒬(t)(γ0,x0,)πTVM(HW(x0)α,φ1(t))κακsubscriptinfimum𝒬𝒮𝒳𝒴subscriptdelimited-∥∥superscriptsubscript𝒜𝒬𝑡subscript𝛾0subscript𝑥0𝜋TV𝑀superscriptsuperscriptsubscript𝐻𝑊superscriptsubscript𝑥0𝛼𝜑1𝑡𝜅𝛼𝜅\displaystyle\inf_{\mathcal{Q}\in\mathcal{S}(\mathcal{X},\mathcal{Y})}\left% \lVert\mathcal{A}_{\mathcal{Q}}^{(t)}(\gamma_{0},x_{0},\cdot)-\pi\right\rVert_% {\text{TV}}\geq\frac{M}{\left(H_{W(x_{0})^{\alpha},\varphi}^{-1}(t)\right)^{% \frac{\kappa}{\alpha-\kappa}}}roman_inf start_POSTSUBSCRIPT caligraphic_Q ∈ caligraphic_S ( caligraphic_X , caligraphic_Y ) end_POSTSUBSCRIPT ∥ caligraphic_A start_POSTSUBSCRIPT caligraphic_Q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ( italic_γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , ⋅ ) - italic_π ∥ start_POSTSUBSCRIPT TV end_POSTSUBSCRIPT ≥ divide start_ARG italic_M end_ARG start_ARG ( italic_H start_POSTSUBSCRIPT italic_W ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT , italic_φ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_t ) ) start_POSTSUPERSCRIPT divide start_ARG italic_κ end_ARG start_ARG italic_α - italic_κ end_ARG end_POSTSUPERSCRIPT end_ARG

where

M=Cαακ[(κ/α)κακ(κ/α)αακ].𝑀superscript𝐶𝛼𝛼𝜅delimited-[]superscript𝜅𝛼𝜅𝛼𝜅superscript𝜅𝛼𝛼𝛼𝜅\displaystyle M=C^{\frac{\alpha}{\alpha-\kappa}}\left[(\kappa/\alpha)^{\frac{% \kappa}{\alpha-\kappa}}-(\kappa/\alpha)^{\frac{\alpha}{\alpha-\kappa}}\right].italic_M = italic_C start_POSTSUPERSCRIPT divide start_ARG italic_α end_ARG start_ARG italic_α - italic_κ end_ARG end_POSTSUPERSCRIPT [ ( italic_κ / italic_α ) start_POSTSUPERSCRIPT divide start_ARG italic_κ end_ARG start_ARG italic_α - italic_κ end_ARG end_POSTSUPERSCRIPT - ( italic_κ / italic_α ) start_POSTSUPERSCRIPT divide start_ARG italic_α end_ARG start_ARG italic_α - italic_κ end_ARG end_POSTSUPERSCRIPT ] . (4)
Proof.

Let V(x)=Wα(x)𝑉𝑥superscript𝑊𝛼𝑥V(x)=W^{\alpha}(x)italic_V ( italic_x ) = italic_W start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ( italic_x ), and let t+𝑡subscriptt\in\mathbb{Z}_{+}italic_t ∈ blackboard_Z start_POSTSUBSCRIPT + end_POSTSUBSCRIPT, so then we have

𝔼(V(Xt+1)|t)V(Xt)φ(Xt).𝔼conditional𝑉subscript𝑋𝑡1subscript𝑡𝑉subscript𝑋𝑡𝜑subscript𝑋𝑡\mathbb{E}\left(V(X_{t+1})|\mathcal{H}_{t}\right)-V(X_{t})\leq\varphi(X_{t}).blackboard_E ( italic_V ( italic_X start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ) | caligraphic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - italic_V ( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ≤ italic_φ ( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) .

Since 𝔼[V(X1)]V(x0)φ(V(x0))𝔼delimited-[]𝑉subscript𝑋1𝑉subscript𝑥0𝜑𝑉subscript𝑥0\mathbb{E}\left[V(X_{1})\right]-V(x_{0})\leq\varphi(V(x_{0}))blackboard_E [ italic_V ( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ] - italic_V ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ≤ italic_φ ( italic_V ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ), then assume by induction for all kt𝑘𝑡k\leq titalic_k ≤ italic_t, 𝔼[V(Xk+1)]𝔼[V(Xk)]φ(𝔼[V(Xk])\mathbb{E}\left[V(X_{k+1})\right]-\mathbb{E}[V(X_{k})]\leq\varphi(\mathbb{E}[V% (X_{k}])blackboard_E [ italic_V ( italic_X start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) ] - blackboard_E [ italic_V ( italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ] ≤ italic_φ ( blackboard_E [ italic_V ( italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ] ) and 𝔼[V(Xk)]<𝔼delimited-[]𝑉subscript𝑋𝑘\mathbb{E}[V(X_{k})]<\inftyblackboard_E [ italic_V ( italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ] < ∞. By the induction hypothesis and Jensen’s inequality,

𝔼[V(Xt+1)]𝔼[V(Xt)]𝔼delimited-[]𝑉subscript𝑋𝑡1𝔼delimited-[]𝑉subscript𝑋𝑡\displaystyle\mathbb{E}\left[V(X_{t+1})\right]-\mathbb{E}\left[V(X_{t})\right]blackboard_E [ italic_V ( italic_X start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ) ] - blackboard_E [ italic_V ( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ] =𝔼[𝔼(V(Xt+1)|t)V(Xt)]absent𝔼delimited-[]𝔼conditional𝑉subscript𝑋𝑡1subscript𝑡𝑉subscript𝑋𝑡\displaystyle=\mathbb{E}\left[\mathbb{E}\left(V(X_{t+1})|\mathcal{H}_{t}\right% )-V(X_{t})\right]= blackboard_E [ blackboard_E ( italic_V ( italic_X start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ) | caligraphic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - italic_V ( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ]
𝔼[φ[V(Xt)]]absent𝔼delimited-[]𝜑delimited-[]𝑉subscript𝑋𝑡\displaystyle\leq\mathbb{E}\left[\varphi[V(X_{t})]\right]≤ blackboard_E [ italic_φ [ italic_V ( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ] ]
φ[𝔼(V(Xt)].\displaystyle\leq\varphi[\mathbb{E}\left(V(X_{t}\right)].≤ italic_φ [ blackboard_E ( italic_V ( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ] . (5)

The inverse function theorem implies the derivative

ddsHV(x0),φ1(s)=φ(HV(x0),φ1(s)).𝑑𝑑𝑠subscriptsuperscript𝐻1𝑉subscript𝑥0𝜑𝑠𝜑subscriptsuperscript𝐻1𝑉subscript𝑥0𝜑𝑠\frac{d}{ds}H^{-1}_{V(x_{0}),\varphi}(s)=\varphi(H^{-1}_{V(x_{0}),\varphi}(s)).divide start_ARG italic_d end_ARG start_ARG italic_d italic_s end_ARG italic_H start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_V ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) , italic_φ end_POSTSUBSCRIPT ( italic_s ) = italic_φ ( italic_H start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_V ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) , italic_φ end_POSTSUBSCRIPT ( italic_s ) ) .

Since HV(x0),φ1(0)V(x0)subscriptsuperscript𝐻1𝑉subscript𝑥0𝜑0𝑉subscript𝑥0H^{-1}_{V(x_{0}),\varphi}(0)\geq V(x_{0})italic_H start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_V ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) , italic_φ end_POSTSUBSCRIPT ( 0 ) ≥ italic_V ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ), assume by induction HV(x0),φ1(k)𝔼[V(Xk)]subscriptsuperscript𝐻1𝑉subscript𝑥0𝜑𝑘𝔼delimited-[]𝑉subscript𝑋𝑘H^{-1}_{V(x_{0}),\varphi}(k)\geq\mathbb{E}[V(X_{k})]italic_H start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_V ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) , italic_φ end_POSTSUBSCRIPT ( italic_k ) ≥ blackboard_E [ italic_V ( italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ] for all kt𝑘𝑡k\leq titalic_k ≤ italic_t. Since φ𝜑\varphiitalic_φ is non-decreasing, the fundamental theorem of calculus, and (5),

HV(x0),φ1(t+1)=HV(x0),φ1(t)+tt+1φ(HV(x0),φ1(s))𝑑ssubscriptsuperscript𝐻1𝑉subscript𝑥0𝜑𝑡1subscriptsuperscript𝐻1𝑉subscript𝑥0𝜑𝑡superscriptsubscript𝑡𝑡1𝜑subscriptsuperscript𝐻1𝑉subscript𝑥0𝜑𝑠differential-d𝑠\displaystyle H^{-1}_{V(x_{0}),\varphi}(t+1)=H^{-1}_{V(x_{0}),\varphi}(t)+\int% _{t}^{t+1}\varphi(H^{-1}_{V(x_{0}),\varphi}(s))dsitalic_H start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_V ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) , italic_φ end_POSTSUBSCRIPT ( italic_t + 1 ) = italic_H start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_V ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) , italic_φ end_POSTSUBSCRIPT ( italic_t ) + ∫ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT italic_φ ( italic_H start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_V ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) , italic_φ end_POSTSUBSCRIPT ( italic_s ) ) italic_d italic_s HV(x0),φ1(t)+φ(HV(x0),φ1(t))absentsubscriptsuperscript𝐻1𝑉subscript𝑥0𝜑𝑡𝜑subscriptsuperscript𝐻1𝑉subscript𝑥0𝜑𝑡\displaystyle\geq H^{-1}_{V(x_{0}),\varphi}(t)+\varphi(H^{-1}_{V(x_{0}),% \varphi}(t))≥ italic_H start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_V ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) , italic_φ end_POSTSUBSCRIPT ( italic_t ) + italic_φ ( italic_H start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_V ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) , italic_φ end_POSTSUBSCRIPT ( italic_t ) )
𝔼[V(Xt)]+φ(𝔼[V(Xt)])absent𝔼delimited-[]𝑉subscript𝑋𝑡𝜑𝔼delimited-[]𝑉subscript𝑋𝑡\displaystyle\geq\mathbb{E}\left[V(X_{t})\right]+\varphi(\mathbb{E}\left[V(X_{% t})\right])≥ blackboard_E [ italic_V ( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ] + italic_φ ( blackboard_E [ italic_V ( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ] )
𝔼[V(Xt+1)].absent𝔼delimited-[]𝑉subscript𝑋𝑡1\displaystyle\geq\mathbb{E}\left[V(X_{t+1})\right].≥ blackboard_E [ italic_V ( italic_X start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ) ] .

By Markov’s inequality,

(W(Xt)r)𝔼[W(Xt)α]rαHW(x0)α,φ1(t)rα.𝑊subscript𝑋𝑡𝑟𝔼delimited-[]𝑊superscriptsubscript𝑋𝑡𝛼superscript𝑟𝛼superscriptsubscript𝐻𝑊superscriptsubscript𝑥0𝛼𝜑1𝑡superscript𝑟𝛼\mathbb{P}(W(X_{t})\geq r)\leq\frac{\mathbb{E}\left[W(X_{t})^{\alpha}\right]}{% r^{\alpha}}\leq\frac{H_{W(x_{0})^{\alpha},\varphi}^{-1}(t)}{r^{\alpha}}.blackboard_P ( italic_W ( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ≥ italic_r ) ≤ divide start_ARG blackboard_E [ italic_W ( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ] end_ARG start_ARG italic_r start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT end_ARG ≤ divide start_ARG italic_H start_POSTSUBSCRIPT italic_W ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT , italic_φ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_t ) end_ARG start_ARG italic_r start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT end_ARG .

Optimizing r𝑟ritalic_r gives the lower bound

𝒜(t)(γ0,x0,)πTVsubscriptdelimited-∥∥superscript𝒜𝑡subscript𝛾0subscript𝑥0𝜋TV\displaystyle\left\lVert\mathcal{A}^{(t)}(\gamma_{0},x_{0},\cdot)-\pi\right% \rVert_{\text{TV}}∥ caligraphic_A start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ( italic_γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , ⋅ ) - italic_π ∥ start_POSTSUBSCRIPT TV end_POSTSUBSCRIPT π(Wr)(W(Xt)r)CrκHW(x0)α,φ1(t)rαabsent𝜋𝑊𝑟𝑊subscript𝑋𝑡𝑟𝐶superscript𝑟𝜅superscriptsubscript𝐻𝑊superscriptsubscript𝑥0𝛼𝜑1𝑡superscript𝑟𝛼\displaystyle\geq\pi(W\geq r)-\mathbb{P}(W(X_{t})\geq r)\geq\frac{C}{r^{\kappa% }}-\frac{H_{W(x_{0})^{\alpha},\varphi}^{-1}(t)}{r^{\alpha}}≥ italic_π ( italic_W ≥ italic_r ) - blackboard_P ( italic_W ( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ≥ italic_r ) ≥ divide start_ARG italic_C end_ARG start_ARG italic_r start_POSTSUPERSCRIPT italic_κ end_POSTSUPERSCRIPT end_ARG - divide start_ARG italic_H start_POSTSUBSCRIPT italic_W ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT , italic_φ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_t ) end_ARG start_ARG italic_r start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT end_ARG
M(HW(x0)α,φ1(t))κακ.absent𝑀superscriptsuperscriptsubscript𝐻𝑊superscriptsubscript𝑥0𝛼𝜑1𝑡𝜅𝛼𝜅\displaystyle\geq\frac{M}{\left(H_{W(x_{0})^{\alpha},\varphi}^{-1}(t)\right)^{% \frac{\kappa}{\alpha-\kappa}}}.≥ divide start_ARG italic_M end_ARG start_ARG ( italic_H start_POSTSUBSCRIPT italic_W ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT , italic_φ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_t ) ) start_POSTSUPERSCRIPT divide start_ARG italic_κ end_ARG start_ARG italic_α - italic_κ end_ARG end_POSTSUPERSCRIPT end_ARG .

Assumption (3) of Theorem 1 requires the Markov family (𝒫γ)γ𝒴subscriptsubscript𝒫𝛾𝛾𝒴(\mathcal{P}_{\gamma})_{\gamma\in\mathcal{Y}}( caligraphic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_γ ∈ caligraphic_Y end_POSTSUBSCRIPT to satisfy a simultaneous growth condition for some concave function φ𝜑\varphiitalic_φ. We look at some concrete examples of concave functions that lead to common subgeometric convergence rates that have been explored previously for upper bounds [Douc et al., 2004].

Example 2.

(Polynomial lower bounds) Assume (2) holds with constants C>0𝐶0C>0italic_C > 0 and κ=1𝜅1\kappa=1italic_κ = 1 and additionally, (3) holds with function W()𝑊W(\cdot)italic_W ( ⋅ ), α=2𝛼2\alpha=2italic_α = 2, and φ(w)=cwβ𝜑𝑤𝑐superscript𝑤𝛽\varphi(w)=cw^{\beta}italic_φ ( italic_w ) = italic_c italic_w start_POSTSUPERSCRIPT italic_β end_POSTSUPERSCRIPT for some constants c>0𝑐0c>0italic_c > 0 and β(0,1)𝛽01\beta\in(0,1)italic_β ∈ ( 0 , 1 ). Then a straight forward calculation gives HW(x0)2,φ1(t)=((1β)ct+W(x0)2(1β))11βsubscriptsuperscript𝐻1𝑊superscriptsubscript𝑥02𝜑𝑡superscript1𝛽𝑐𝑡𝑊superscriptsubscript𝑥021𝛽11𝛽H^{-1}_{W(x_{0})^{2},\varphi}(t)=\left((1-\beta)ct+W(x_{0})^{2(1-\beta)}\right% )^{\frac{1}{1-\beta}}italic_H start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_W ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , italic_φ end_POSTSUBSCRIPT ( italic_t ) = ( ( 1 - italic_β ) italic_c italic_t + italic_W ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 ( 1 - italic_β ) end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 1 - italic_β end_ARG end_POSTSUPERSCRIPT and Theorem 1 implies for all t+𝑡subscriptt\in\mathbb{Z}_{+}italic_t ∈ blackboard_Z start_POSTSUBSCRIPT + end_POSTSUBSCRIPT,

inf𝒬𝒮(𝒳,𝒴)𝒜𝒬(t)(γ0,x0,)πTVC24((1β)ct+W(x0)2(1β))11β.subscriptinfimum𝒬𝒮𝒳𝒴subscriptdelimited-∥∥superscriptsubscript𝒜𝒬𝑡subscript𝛾0subscript𝑥0𝜋TVsuperscript𝐶24superscript1𝛽𝑐𝑡𝑊superscriptsubscript𝑥021𝛽11𝛽\displaystyle\inf_{\mathcal{Q}\in\mathcal{S}(\mathcal{X},\mathcal{Y})}\left% \lVert\mathcal{A}_{\mathcal{Q}}^{(t)}(\gamma_{0},x_{0},\cdot)-\pi\right\rVert_% {\text{TV}}\geq\frac{C^{2}}{4\left((1-\beta)ct+W(x_{0})^{2(1-\beta)}\right)^{% \frac{1}{1-\beta}}}.roman_inf start_POSTSUBSCRIPT caligraphic_Q ∈ caligraphic_S ( caligraphic_X , caligraphic_Y ) end_POSTSUBSCRIPT ∥ caligraphic_A start_POSTSUBSCRIPT caligraphic_Q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ( italic_γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , ⋅ ) - italic_π ∥ start_POSTSUBSCRIPT TV end_POSTSUBSCRIPT ≥ divide start_ARG italic_C start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 4 ( ( 1 - italic_β ) italic_c italic_t + italic_W ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 ( 1 - italic_β ) end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 1 - italic_β end_ARG end_POSTSUPERSCRIPT end_ARG .
Example 3.

(Subgeometric lower bounds) If (2) holds with constants C>0𝐶0C>0italic_C > 0 and κ=1𝜅1\kappa=1italic_κ = 1 and (3) holds with W()𝑊W(\cdot)italic_W ( ⋅ ), α=2𝛼2\alpha=2italic_α = 2, and φ(x)=c(x+Kβ)/log(x+Kβ)β\varphi(x)=c(x+K_{\beta})/\log(x+K_{\beta})^{\beta}italic_φ ( italic_x ) = italic_c ( italic_x + italic_K start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT ) / roman_log ( italic_x + italic_K start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_β end_POSTSUPERSCRIPT where Kβ=exp(β+1)subscript𝐾𝛽𝛽1K_{\beta}=\exp(\beta+1)italic_K start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT = roman_exp ( italic_β + 1 ), then

HW(x0)2,φ1(t)(W(x0)2+Kβ)exp((1+β)ct11+β).subscriptsuperscript𝐻1𝑊superscriptsubscript𝑥02𝜑𝑡𝑊superscriptsubscript𝑥02subscript𝐾𝛽1𝛽𝑐superscript𝑡11𝛽H^{-1}_{W(x_{0})^{2},\varphi}(t)\leq(W(x_{0})^{2}+K_{\beta})\exp\left((1+\beta% )ct^{\frac{1}{1+\beta}}\right).italic_H start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_W ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , italic_φ end_POSTSUBSCRIPT ( italic_t ) ≤ ( italic_W ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_K start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT ) roman_exp ( ( 1 + italic_β ) italic_c italic_t start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 1 + italic_β end_ARG end_POSTSUPERSCRIPT ) .

By Theorem 1, then for all t+𝑡subscriptt\in\mathbb{Z}_{+}italic_t ∈ blackboard_Z start_POSTSUBSCRIPT + end_POSTSUBSCRIPT

inf𝒬𝒮(𝒳,𝒴)𝒜𝒬(t)(γ0,x0,)πTVC24(W(x0)2+Kβ)exp((1+β)ct11+β).subscriptinfimum𝒬𝒮𝒳𝒴subscriptdelimited-∥∥superscriptsubscript𝒜𝒬𝑡subscript𝛾0subscript𝑥0𝜋TVsuperscript𝐶24𝑊superscriptsubscript𝑥02subscript𝐾𝛽1𝛽𝑐superscript𝑡11𝛽\displaystyle\inf_{\mathcal{Q}\in\mathcal{S}(\mathcal{X},\mathcal{Y})}\left% \lVert\mathcal{A}_{\mathcal{Q}}^{(t)}(\gamma_{0},x_{0},\cdot)-\pi\right\rVert_% {\text{TV}}\geq\frac{C^{2}}{4(W(x_{0})^{2}+K_{\beta})}\exp\left(-(1+\beta)ct^{% \frac{1}{1+\beta}}\right).roman_inf start_POSTSUBSCRIPT caligraphic_Q ∈ caligraphic_S ( caligraphic_X , caligraphic_Y ) end_POSTSUBSCRIPT ∥ caligraphic_A start_POSTSUBSCRIPT caligraphic_Q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ( italic_γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , ⋅ ) - italic_π ∥ start_POSTSUBSCRIPT TV end_POSTSUBSCRIPT ≥ divide start_ARG italic_C start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 4 ( italic_W ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_K start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT ) end_ARG roman_exp ( - ( 1 + italic_β ) italic_c italic_t start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 1 + italic_β end_ARG end_POSTSUPERSCRIPT ) .

Now we obtain a matching weak lower bound rate under essentially the same conditions as total variation in Euclidean spaces. Let delimited-∥∥\left\lVert\cdot\right\rVert∥ ⋅ ∥ denote the Euclidean norm.

Theorem 4.

Let 𝒳=d𝒳superscript𝑑\mathcal{X}=\mathbb{R}^{d}caligraphic_X = blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT for d+𝑑subscriptd\in\mathbb{Z}_{+}italic_d ∈ blackboard_Z start_POSTSUBSCRIPT + end_POSTSUBSCRIPT. Assume (2) holds with C,κ𝐶𝜅C,\kappaitalic_C , italic_κ and (3) holds with W()𝑊W(\cdot)italic_W ( ⋅ ), and α𝛼\alphaitalic_α, and let M𝑀Mitalic_M be defined as in (4). Assume for each r>0𝑟0r>0italic_r > 0, the sets {xd:W(x)r}conditional-set𝑥superscript𝑑𝑊𝑥𝑟\{x\in\mathbb{R}^{d}:W(x)\leq r\}{ italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT : italic_W ( italic_x ) ≤ italic_r } are compact. Then for any ϵ(0,1)italic-ϵ01\epsilon\in(0,1)italic_ϵ ∈ ( 0 , 1 ),

inf𝒬𝒮(𝒳,𝒴)infξ𝒞[𝒜𝒬(t)(γ0,x0,),π]ξ({x,y𝒳×𝒳:xy>δϵ})(1ϵ)MHW(x0)α,φ1(t)κακsubscriptinfimum𝒬𝒮𝒳𝒴subscriptinfimum𝜉𝒞superscriptsubscript𝒜𝒬𝑡subscript𝛾0subscript𝑥0𝜋𝜉conditional-set𝑥𝑦𝒳𝒳delimited-∥∥𝑥𝑦subscript𝛿italic-ϵ1italic-ϵ𝑀superscriptsubscript𝐻𝑊superscriptsubscript𝑥0𝛼𝜑1superscript𝑡𝜅𝛼𝜅\displaystyle\inf_{\mathcal{Q}\in\mathcal{S}(\mathcal{X},\mathcal{Y})}\inf_{% \xi\in\mathcal{C}\left[\mathcal{A}_{\mathcal{Q}}^{(t)}(\gamma_{0},x_{0},\cdot)% ,\pi\right]}\xi(\{x,y\in\mathcal{X}\times\mathcal{X}:\left\lVert x-y\right% \rVert>\delta_{\epsilon}\})\geq\frac{(1-\epsilon)M}{H_{W(x_{0})^{\alpha},% \varphi}^{-1}(t)^{\frac{\kappa}{\alpha-\kappa}}}roman_inf start_POSTSUBSCRIPT caligraphic_Q ∈ caligraphic_S ( caligraphic_X , caligraphic_Y ) end_POSTSUBSCRIPT roman_inf start_POSTSUBSCRIPT italic_ξ ∈ caligraphic_C [ caligraphic_A start_POSTSUBSCRIPT caligraphic_Q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ( italic_γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , ⋅ ) , italic_π ] end_POSTSUBSCRIPT italic_ξ ( { italic_x , italic_y ∈ caligraphic_X × caligraphic_X : ∥ italic_x - italic_y ∥ > italic_δ start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT } ) ≥ divide start_ARG ( 1 - italic_ϵ ) italic_M end_ARG start_ARG italic_H start_POSTSUBSCRIPT italic_W ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT , italic_φ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_t ) start_POSTSUPERSCRIPT divide start_ARG italic_κ end_ARG start_ARG italic_α - italic_κ end_ARG end_POSTSUPERSCRIPT end_ARG

holds for some δϵ(0,1)subscript𝛿italic-ϵ01\delta_{\epsilon}\in(0,1)italic_δ start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ∈ ( 0 , 1 ) and all tHW(x0)α,φ(κC(1ϵ)α/α)𝑡subscript𝐻𝑊superscriptsubscript𝑥0𝛼𝜑𝜅𝐶superscript1italic-ϵ𝛼𝛼t\geq H_{W(x_{0})^{\alpha},\varphi}\left(\kappa C(1-\epsilon)^{\alpha}/\alpha\right)italic_t ≥ italic_H start_POSTSUBSCRIPT italic_W ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT , italic_φ end_POSTSUBSCRIPT ( italic_κ italic_C ( 1 - italic_ϵ ) start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT / italic_α ). In particular,

inf𝒬𝒮(𝒳,𝒴)𝒲1(𝒜𝒬(t)(γ0,x0,),π)δϵ(1ϵ)MHW(x0)α,φ1(t)κακ.subscriptinfimum𝒬𝒮𝒳𝒴subscript𝒲delimited-∥∥1superscriptsubscript𝒜𝒬𝑡subscript𝛾0subscript𝑥0𝜋subscript𝛿italic-ϵ1italic-ϵ𝑀superscriptsubscript𝐻𝑊superscriptsubscript𝑥0𝛼𝜑1superscript𝑡𝜅𝛼𝜅\inf_{\mathcal{Q}\in\mathcal{S}(\mathcal{X},\mathcal{Y})}\mathcal{W}_{\left% \lVert\cdot\right\rVert\wedge 1}\left(\mathcal{A}_{\mathcal{Q}}^{(t)}(\gamma_{% 0},x_{0},\cdot),\pi\right)\geq\frac{\delta_{\epsilon}(1-\epsilon)M}{H_{W(x_{0}% )^{\alpha},\varphi}^{-1}(t)^{\frac{\kappa}{\alpha-\kappa}}}.roman_inf start_POSTSUBSCRIPT caligraphic_Q ∈ caligraphic_S ( caligraphic_X , caligraphic_Y ) end_POSTSUBSCRIPT caligraphic_W start_POSTSUBSCRIPT ∥ ⋅ ∥ ∧ 1 end_POSTSUBSCRIPT ( caligraphic_A start_POSTSUBSCRIPT caligraphic_Q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ( italic_γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , ⋅ ) , italic_π ) ≥ divide start_ARG italic_δ start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ( 1 - italic_ϵ ) italic_M end_ARG start_ARG italic_H start_POSTSUBSCRIPT italic_W ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT , italic_φ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_t ) start_POSTSUPERSCRIPT divide start_ARG italic_κ end_ARG start_ARG italic_α - italic_κ end_ARG end_POSTSUPERSCRIPT end_ARG .
Proof.

Let r1𝑟1r\geq 1italic_r ≥ 1 and let T={x𝒳:W(x)r}𝑇conditional-set𝑥𝒳𝑊𝑥𝑟T=\{x\in\mathcal{X}:W(x)\geq r\}italic_T = { italic_x ∈ caligraphic_X : italic_W ( italic_x ) ≥ italic_r }. Since W𝑊Witalic_W is continuous, then T𝑇Titalic_T is closed and by Strassen’s theorem ([Strassen, 1965] and [Villani, 2003, Corollary 1.28]), then for any δ>0𝛿0\delta>0italic_δ > 0,

infξ𝒞[𝒜𝒬(t)(γ0,x0,),π]ξ({x,y𝒳×𝒳:xy>δ})π(T)𝒜𝒬(t)(γ0,x0,Tδ)subscriptinfimum𝜉𝒞superscriptsubscript𝒜𝒬𝑡subscript𝛾0subscript𝑥0𝜋𝜉conditional-set𝑥𝑦𝒳𝒳delimited-∥∥𝑥𝑦𝛿𝜋𝑇superscriptsubscript𝒜𝒬𝑡subscript𝛾0subscript𝑥0superscript𝑇𝛿\inf_{\xi\in\mathcal{C}\left[\mathcal{A}_{\mathcal{Q}}^{(t)}(\gamma_{0},x_{0},% \cdot),\pi\right]}\xi(\{x,y\in\mathcal{X}\times\mathcal{X}:\left\lVert x-y% \right\rVert>\delta\})\geq\pi(T)-\mathcal{A}_{\mathcal{Q}}^{(t)}(\gamma_{0},x_% {0},T^{\delta})roman_inf start_POSTSUBSCRIPT italic_ξ ∈ caligraphic_C [ caligraphic_A start_POSTSUBSCRIPT caligraphic_Q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ( italic_γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , ⋅ ) , italic_π ] end_POSTSUBSCRIPT italic_ξ ( { italic_x , italic_y ∈ caligraphic_X × caligraphic_X : ∥ italic_x - italic_y ∥ > italic_δ } ) ≥ italic_π ( italic_T ) - caligraphic_A start_POSTSUBSCRIPT caligraphic_Q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ( italic_γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_T start_POSTSUPERSCRIPT italic_δ end_POSTSUPERSCRIPT )

where Tδ={yd:dist(y,T)δ}superscript𝑇𝛿conditional-set𝑦superscript𝑑dist𝑦𝑇𝛿T^{\delta}=\{y\in\mathbb{R}^{d}:\text{dist}(y,T)\leq\delta\}italic_T start_POSTSUPERSCRIPT italic_δ end_POSTSUPERSCRIPT = { italic_y ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT : dist ( italic_y , italic_T ) ≤ italic_δ } and dist(y,T)=infxTxydist𝑦𝑇subscriptinfimum𝑥𝑇delimited-∥∥𝑥𝑦\text{dist}(y,T)=\inf_{x\in T}\left\lVert x-y\right\rVertdist ( italic_y , italic_T ) = roman_inf start_POSTSUBSCRIPT italic_x ∈ italic_T end_POSTSUBSCRIPT ∥ italic_x - italic_y ∥. Thus, we will find a discrepancy between π({Wr})𝜋𝑊𝑟\pi\left(\{W\geq r\}\right)italic_π ( { italic_W ≥ italic_r } ) and 𝒜𝒬(t)(γ0,x0,{W(1ϵ)r})superscriptsubscript𝒜𝒬𝑡subscript𝛾0subscript𝑥0𝑊1italic-ϵ𝑟\mathcal{A}_{\mathcal{Q}}^{(t)}\left(\gamma_{0},x_{0},\{W\geq(1-\epsilon)r\}\right)caligraphic_A start_POSTSUBSCRIPT caligraphic_Q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ( italic_γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , { italic_W ≥ ( 1 - italic_ϵ ) italic_r } ) for small ϵitalic-ϵ\epsilonitalic_ϵ and the intuition is illustrated in Figure 1.

Refer to caption
Figure 1: The diagram illustrates intuition for a discrepancy between the set {Wr}𝑊𝑟\{W\geq r\}{ italic_W ≥ italic_r } for the adaptive process and the target measure and also {W(1ϵ)r}𝑊1italic-ϵ𝑟\{W\geq(1-\epsilon)r\}{ italic_W ≥ ( 1 - italic_ϵ ) italic_r } for small ϵitalic-ϵ\epsilonitalic_ϵ.

Let A=cl(A)int(A)𝐴cl𝐴int𝐴\partial A=\text{cl}(A)\setminus\text{int}(A)∂ italic_A = cl ( italic_A ) ∖ int ( italic_A ) denote the boundary of a set A𝐴Aitalic_A where cl is the closure and int is the interior. Since dsuperscript𝑑\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT is convex, we have that d(x,T)=d(x,T)𝑑𝑥𝑇𝑑𝑥𝑇d(x,T)=d(x,\partial T)italic_d ( italic_x , italic_T ) = italic_d ( italic_x , ∂ italic_T ) (see Lemma 20). Since K={xd:W(x)r}𝐾conditional-set𝑥superscript𝑑𝑊𝑥𝑟K=\{x\in\mathbb{R}^{d}:W(x)\leq r\}italic_K = { italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT : italic_W ( italic_x ) ≤ italic_r } is compact, then W𝑊Witalic_W is uniformly continuous on K𝐾Kitalic_K. For ϵ(0,1)italic-ϵ01\epsilon\in(0,1)italic_ϵ ∈ ( 0 , 1 ), we can then choose δϵsubscript𝛿italic-ϵ\delta_{\epsilon}italic_δ start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT depending on ϵitalic-ϵ\epsilonitalic_ϵ sufficiently small so that W(x)(1ϵ)r𝑊𝑥1italic-ϵ𝑟W(x)\geq(1-\epsilon)ritalic_W ( italic_x ) ≥ ( 1 - italic_ϵ ) italic_r if dist(x,T)δϵdist𝑥𝑇subscript𝛿italic-ϵ\text{dist}(x,T)\leq\delta_{\epsilon}dist ( italic_x , italic_T ) ≤ italic_δ start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT and so

(XtTδϵ)subscript𝑋𝑡superscript𝑇subscript𝛿italic-ϵ\displaystyle\mathbb{P}(X_{t}\in T^{\delta_{\epsilon}})blackboard_P ( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ italic_T start_POSTSUPERSCRIPT italic_δ start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) (W(Xt)(1ϵ)r).absent𝑊subscript𝑋𝑡1italic-ϵ𝑟\displaystyle\leq\mathbb{P}(W(X_{t})\geq(1-\epsilon)r).≤ blackboard_P ( italic_W ( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ≥ ( 1 - italic_ϵ ) italic_r ) .

Markov’s inequality and (3) imply that

(W(Xt)(1ϵ)r)𝔼[Wα(Xt)](1ϵ)αrαHW(x0)α,φ1(t)(1ϵ)αrα.𝑊subscript𝑋𝑡1italic-ϵ𝑟𝔼delimited-[]superscript𝑊𝛼subscript𝑋𝑡superscript1italic-ϵ𝛼superscript𝑟𝛼superscriptsubscript𝐻𝑊superscriptsubscript𝑥0𝛼𝜑1𝑡superscript1italic-ϵ𝛼superscript𝑟𝛼\mathbb{P}(W(X_{t})\geq(1-\epsilon)r)\leq\frac{\mathbb{E}\left[W^{\alpha}(X_{t% })\right]}{(1-\epsilon)^{\alpha}r^{\alpha}}\leq\frac{H_{W(x_{0})^{\alpha},% \varphi}^{-1}(t)}{(1-\epsilon)^{\alpha}r^{\alpha}}.blackboard_P ( italic_W ( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ≥ ( 1 - italic_ϵ ) italic_r ) ≤ divide start_ARG blackboard_E [ italic_W start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ] end_ARG start_ARG ( 1 - italic_ϵ ) start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT italic_r start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT end_ARG ≤ divide start_ARG italic_H start_POSTSUBSCRIPT italic_W ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT , italic_φ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_t ) end_ARG start_ARG ( 1 - italic_ϵ ) start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT italic_r start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT end_ARG .

Optimizing, we get for t𝑡titalic_t large enough so that

r=(ακC(1ϵ)αHW(x0)α,φ1(t))1ακ1𝑟superscript𝛼𝜅𝐶superscript1italic-ϵ𝛼superscriptsubscript𝐻𝑊superscriptsubscript𝑥0𝛼𝜑1𝑡1𝛼𝜅1r=\left(\frac{\alpha}{\kappa C(1-\epsilon)^{\alpha}}H_{W(x_{0})^{\alpha},% \varphi}^{-1}(t)\right)^{\frac{1}{\alpha-\kappa}}\geq 1italic_r = ( divide start_ARG italic_α end_ARG start_ARG italic_κ italic_C ( 1 - italic_ϵ ) start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT end_ARG italic_H start_POSTSUBSCRIPT italic_W ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT , italic_φ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_t ) ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_α - italic_κ end_ARG end_POSTSUPERSCRIPT ≥ 1

and this yields the lower bound

δϵ1𝒲1(𝒜𝒬(t)(γ0,x0,),π)superscriptsubscript𝛿italic-ϵ1subscript𝒲delimited-∥∥1superscriptsubscript𝒜𝒬𝑡subscript𝛾0subscript𝑥0𝜋\displaystyle\delta_{\epsilon}^{-1}\mathcal{W}_{\left\lVert\cdot\right\rVert% \wedge 1}\left(\mathcal{A}_{\mathcal{Q}}^{(t)}(\gamma_{0},x_{0},\cdot),\pi\right)italic_δ start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT caligraphic_W start_POSTSUBSCRIPT ∥ ⋅ ∥ ∧ 1 end_POSTSUBSCRIPT ( caligraphic_A start_POSTSUBSCRIPT caligraphic_Q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ( italic_γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , ⋅ ) , italic_π ) infξ𝒞[𝒜𝒬(t)(γ0,x0,),π]ξ({x,y:xy>δϵ})absentsubscriptinfimum𝜉𝒞superscriptsubscript𝒜𝒬𝑡subscript𝛾0subscript𝑥0𝜋𝜉conditional-set𝑥𝑦delimited-∥∥𝑥𝑦subscript𝛿italic-ϵ\displaystyle\geq\inf_{\xi\in\mathcal{C}\left[\mathcal{A}_{\mathcal{Q}}^{(t)}(% \gamma_{0},x_{0},\cdot),\pi\right]}\xi(\{x,y:\left\lVert x-y\right\rVert>% \delta_{\epsilon}\})≥ roman_inf start_POSTSUBSCRIPT italic_ξ ∈ caligraphic_C [ caligraphic_A start_POSTSUBSCRIPT caligraphic_Q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ( italic_γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , ⋅ ) , italic_π ] end_POSTSUBSCRIPT italic_ξ ( { italic_x , italic_y : ∥ italic_x - italic_y ∥ > italic_δ start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT } )
CrκHW(x0)α,φ1(t)(1ϵ)αrα.absent𝐶superscript𝑟𝜅superscriptsubscript𝐻𝑊superscriptsubscript𝑥0𝛼𝜑1𝑡superscript1italic-ϵ𝛼superscript𝑟𝛼\displaystyle\geq\frac{C}{r^{\kappa}}-\frac{H_{W(x_{0})^{\alpha},\varphi}^{-1}% (t)}{(1-\epsilon)^{\alpha}r^{\alpha}}.≥ divide start_ARG italic_C end_ARG start_ARG italic_r start_POSTSUPERSCRIPT italic_κ end_POSTSUPERSCRIPT end_ARG - divide start_ARG italic_H start_POSTSUBSCRIPT italic_W ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT , italic_φ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_t ) end_ARG start_ARG ( 1 - italic_ϵ ) start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT italic_r start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT end_ARG .
(1ϵ)ακακMHW(x0)α,φ1(t)κακabsentsuperscript1italic-ϵ𝛼𝜅𝛼𝜅𝑀superscriptsubscript𝐻𝑊superscriptsubscript𝑥0𝛼𝜑1superscript𝑡𝜅𝛼𝜅\displaystyle\geq(1-\epsilon)^{\frac{\alpha\kappa}{\alpha-\kappa}}\frac{M}{H_{% W(x_{0})^{\alpha},\varphi}^{-1}(t)^{\frac{\kappa}{\alpha-\kappa}}}≥ ( 1 - italic_ϵ ) start_POSTSUPERSCRIPT divide start_ARG italic_α italic_κ end_ARG start_ARG italic_α - italic_κ end_ARG end_POSTSUPERSCRIPT divide start_ARG italic_M end_ARG start_ARG italic_H start_POSTSUBSCRIPT italic_W ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT , italic_φ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_t ) start_POSTSUPERSCRIPT divide start_ARG italic_κ end_ARG start_ARG italic_α - italic_κ end_ARG end_POSTSUPERSCRIPT end_ARG

where M𝑀Mitalic_M is defined by (4). The conclusion follows since ϵitalic-ϵ\epsilonitalic_ϵ is arbitrary. ∎

An interpretation of Theorem 4 is the best possible rate of convergence for adaptive MCMC satisfying (3) for target measure satisfying (2). The conclusion of Theorem 4 can also be extended to general path-connected state spaces 𝒳𝒳\mathcal{X}caligraphic_X. The mild assumption of compact level sets for the function W𝑊Witalic_W often holds in many applications. However, there is a significant drawback to the Wasserstein lower bound being the constant is non-explicit compared to the explicit lower bound in total variation.

What is surprising about the lower bounds in this section is the requirement only on the Markov family (𝒫γ)γ𝒴subscriptsubscript𝒫𝛾𝛾𝒴(\mathcal{P}_{\gamma})_{\gamma\in\mathcal{Y}}( caligraphic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_γ ∈ caligraphic_Y end_POSTSUBSCRIPT to satisfy (3) and does not directly depend on an adaptation strategy. For example, it is common scenario in adaptive MCMC for the parameters space 𝒴𝒴\mathcal{Y}caligraphic_Y to be compact. In this case, the simultaneous growth condition (3) often holds if a Markov kernel satisfies some mild regularity conditions and (3) holds with only fixed parameters.

Example 5.

(Adaptive Unadjusted Langevin algorithm) Consider the multivariate Student’s t-distribution π𝜋\piitalic_π on dsuperscript𝑑\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT with d1𝑑1d\geq 1italic_d ≥ 1 and v>0𝑣0v>0italic_v > 0 degrees of freedom. The Lebesgue density is defined by

Dπ(x)=(v+d)/2Γ(v/2)(vπ)d/2exp(U(x))subscript𝐷𝜋𝑥𝑣𝑑2Γ𝑣2superscript𝑣𝜋𝑑2𝑈𝑥D_{\pi}(x)=\frac{(v+d)/2}{\Gamma(v/2)(v\pi)^{d/2}}\exp(-U(x))italic_D start_POSTSUBSCRIPT italic_π end_POSTSUBSCRIPT ( italic_x ) = divide start_ARG ( italic_v + italic_d ) / 2 end_ARG start_ARG roman_Γ ( italic_v / 2 ) ( italic_v italic_π ) start_POSTSUPERSCRIPT italic_d / 2 end_POSTSUPERSCRIPT end_ARG roman_exp ( - italic_U ( italic_x ) )

where U(x)=v+d2log(1+x2)𝑈𝑥𝑣𝑑21superscriptdelimited-∥∥𝑥2U(x)=\frac{v+d}{2}\log(1+\left\lVert x\right\rVert^{2})italic_U ( italic_x ) = divide start_ARG italic_v + italic_d end_ARG start_ARG 2 end_ARG roman_log ( 1 + ∥ italic_x ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). The adapted unadjusted Langevin process (Γt,Xt)t0subscriptsubscriptΓ𝑡subscript𝑋𝑡𝑡0(\Gamma_{t},X_{t})_{t\geq 0}( roman_Γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_t ≥ 0 end_POSTSUBSCRIPT on (0,1)×d01superscript𝑑(0,1)\times\mathbb{R}^{d}( 0 , 1 ) × blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT defined by

Xt+1=XtΓt+1U(x)+2Γt+1Zt+1subscript𝑋𝑡1subscript𝑋𝑡subscriptΓ𝑡1𝑈𝑥2subscriptΓ𝑡1subscript𝑍𝑡1X_{t+1}=X_{t}-\Gamma_{t+1}\nabla U(x)+\sqrt{2\Gamma_{t+1}}Z_{t+1}italic_X start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT = italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - roman_Γ start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ∇ italic_U ( italic_x ) + square-root start_ARG 2 roman_Γ start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT end_ARG italic_Z start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT

where Γt+1(0,1)subscriptΓ𝑡101\Gamma_{t+1}\in(0,1)roman_Γ start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ∈ ( 0 , 1 ) and Zt+1subscript𝑍𝑡1Z_{t+1}italic_Z start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT is an independent standard normal random vector. Subgeometric drift conditions have been shown for unadjusted Langevin in the non-adaptive case for heavy tailed target measures [Kamatani, 2009].

Let α>0𝛼0\alpha>0italic_α > 0 and W(x)=(1+x2)(v+d)/2𝑊𝑥superscript1superscriptdelimited-∥∥𝑥2𝑣𝑑2W(x)=(1+\left\lVert x\right\rVert^{2})^{(v+d)/2}italic_W ( italic_x ) = ( 1 + ∥ italic_x ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ( italic_v + italic_d ) / 2 end_POSTSUPERSCRIPT. By Ito’s formula, for large enough xdelimited-∥∥𝑥\left\lVert x\right\rVert∥ italic_x ∥, there is a constant ϵ>0italic-ϵ0\epsilon>0italic_ϵ > 0 such that the second term is bounded using the moment generating function of non-central chi-square random variables by

𝔼[Wα(Xt+1)|Γt+1=γ,Xt=x]Wα(x)𝔼delimited-[]formulae-sequenceconditionalsuperscript𝑊𝛼subscript𝑋𝑡1subscriptΓ𝑡1𝛾subscript𝑋𝑡𝑥superscript𝑊𝛼𝑥\displaystyle\mathbb{E}\left[W^{\alpha}(X_{t+1})|\Gamma_{t+1}=\gamma,X_{t}=x% \right]-W^{\alpha}(x)blackboard_E [ italic_W start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ( italic_X start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ) | roman_Γ start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT = italic_γ , italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_x ] - italic_W start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ( italic_x )
=𝔼[0γW(x)α𝑑Xt]+𝔼[0γtr(2W(x)αx)𝑑t]absent𝔼delimited-[]superscriptsubscript0𝛾𝑊superscript𝑥𝛼differential-dsubscript𝑋𝑡𝔼delimited-[]superscriptsubscript0𝛾trsuperscript2𝑊superscript𝑥𝛼𝑥differential-d𝑡\displaystyle=\mathbb{E}\left[\int_{0}^{\gamma}\nabla W(x)^{\alpha}\cdot dX_{t% }\right]+\mathbb{E}\left[\int_{0}^{\gamma}\text{tr}\left(\nabla^{2}W(x)^{% \alpha}x\right)dt\right]= blackboard_E [ ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_γ end_POSTSUPERSCRIPT ∇ italic_W ( italic_x ) start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ⋅ italic_d italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ] + blackboard_E [ ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_γ end_POSTSUPERSCRIPT tr ( ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_W ( italic_x ) start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT italic_x ) italic_d italic_t ]
α(v+d)[γ(v+2)+α(v+d)+ϵ](1+x2)α(v+d)/21.absent𝛼𝑣𝑑delimited-[]subscript𝛾𝑣2𝛼𝑣𝑑italic-ϵsuperscript1superscriptdelimited-∥∥𝑥2𝛼𝑣𝑑21\displaystyle\leq\alpha(v+d)\left[-\gamma_{*}(v+2)+\alpha(v+d)+\epsilon\right]% \left(1+\left\lVert x\right\rVert^{2}\right)^{\alpha(v+d)/2-1}.≤ italic_α ( italic_v + italic_d ) [ - italic_γ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ( italic_v + 2 ) + italic_α ( italic_v + italic_d ) + italic_ϵ ] ( 1 + ∥ italic_x ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_α ( italic_v + italic_d ) / 2 - 1 end_POSTSUPERSCRIPT .

It follows that for some constant Cα>0subscript𝐶𝛼0C_{\alpha}>0italic_C start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT > 0 and for all x,γ𝑥𝛾x,\gammaitalic_x , italic_γ,

𝔼[Wα(Xt+1)|Γt+1=γ,Xt=x]Wα(x)CαWα(x)12α(v+d).𝔼delimited-[]formulae-sequenceconditionalsuperscript𝑊𝛼subscript𝑋𝑡1subscriptΓ𝑡1𝛾subscript𝑋𝑡𝑥superscript𝑊𝛼𝑥subscript𝐶𝛼superscript𝑊𝛼superscript𝑥12𝛼𝑣𝑑\displaystyle\mathbb{E}\left[W^{\alpha}(X_{t+1})|\Gamma_{t+1}=\gamma,X_{t}=x% \right]-W^{\alpha}(x)\leq C_{\alpha}W^{\alpha}(x)^{1-\frac{2}{\alpha(v+d)}}.blackboard_E [ italic_W start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ( italic_X start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ) | roman_Γ start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT = italic_γ , italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_x ] - italic_W start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ( italic_x ) ≤ italic_C start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT italic_W start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ( italic_x ) start_POSTSUPERSCRIPT 1 - divide start_ARG 2 end_ARG start_ARG italic_α ( italic_v + italic_d ) end_ARG end_POSTSUPERSCRIPT .

One has the lower bound for some constant C>0𝐶0C>0italic_C > 0

π(Wr)Cr12/(v+d).𝜋𝑊𝑟𝐶superscript𝑟12𝑣𝑑\pi(W\geq r)\geq\frac{C}{r^{1-2/(v+d)}}.italic_π ( italic_W ≥ italic_r ) ≥ divide start_ARG italic_C end_ARG start_ARG italic_r start_POSTSUPERSCRIPT 1 - 2 / ( italic_v + italic_d ) end_POSTSUPERSCRIPT end_ARG .

If v+d2>0𝑣𝑑20v+d-2>0italic_v + italic_d - 2 > 0, then by Theorem 4, then there is a constants M>0𝑀0M>0italic_M > 0 such that

inf𝒬𝒮(𝒳,𝒴)𝒲1(𝒜𝒬(t)(γ0,0,),π)M(1+t)v+d2.subscriptinfimum𝒬𝒮𝒳𝒴subscript𝒲delimited-∥∥1superscriptsubscript𝒜𝒬𝑡subscript𝛾00𝜋𝑀superscript1𝑡𝑣𝑑2\inf_{\mathcal{Q}\in\mathcal{S}(\mathcal{X},\mathcal{Y})}\mathcal{W}_{\left% \lVert\cdot\right\rVert\wedge 1}(\mathcal{A}_{\mathcal{Q}}^{(t)}(\gamma_{0},0,% \cdot),\pi)\geq\frac{M}{(1+t)^{v+d-2}}.roman_inf start_POSTSUBSCRIPT caligraphic_Q ∈ caligraphic_S ( caligraphic_X , caligraphic_Y ) end_POSTSUBSCRIPT caligraphic_W start_POSTSUBSCRIPT ∥ ⋅ ∥ ∧ 1 end_POSTSUBSCRIPT ( caligraphic_A start_POSTSUBSCRIPT caligraphic_Q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ( italic_γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , 0 , ⋅ ) , italic_π ) ≥ divide start_ARG italic_M end_ARG start_ARG ( 1 + italic_t ) start_POSTSUPERSCRIPT italic_v + italic_d - 2 end_POSTSUPERSCRIPT end_ARG .

Of particular interest is that the rate cannot be geometric even when considering weak convergence.

In certain situations, the tail probability decay on π𝜋\piitalic_π in (2) may be difficult to establish. In this case, we consider finding a function that is not integrable with respect to π𝜋\piitalic_π, but this results in a trade-off of only a having a lower bound for a subsequence. An analogous result will also hold in total variation.

Theorem 6.

Let 𝒳=d𝒳superscript𝑑\mathcal{X}=\mathbb{R}^{d}caligraphic_X = blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT for d+𝑑subscriptd\in\mathbb{Z}_{+}italic_d ∈ blackboard_Z start_POSTSUBSCRIPT + end_POSTSUBSCRIPT. Assume for some Borel function W:𝒳[1,):𝑊𝒳1W:\mathcal{X}\to[1,\infty)italic_W : caligraphic_X → [ 1 , ∞ ) such that 𝒳W𝑑π=subscript𝒳𝑊differential-d𝜋\int_{\mathcal{X}}Wd\pi=\infty∫ start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT italic_W italic_d italic_π = ∞ but also for some α>1𝛼1\alpha>1italic_α > 1 and some concave function φ:(0,)(0,):𝜑00\varphi:(0,\infty)\to(0,\infty)italic_φ : ( 0 , ∞ ) → ( 0 , ∞ ),

(𝒫γWα)(x)W(x)αφ(W(x)α)subscript𝒫𝛾superscript𝑊𝛼𝑥𝑊superscript𝑥𝛼𝜑𝑊superscript𝑥𝛼\displaystyle(\mathcal{P}_{\gamma}W^{\alpha})(x)-W(x)^{\alpha}\leq\varphi(W(x)% ^{\alpha})( caligraphic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT italic_W start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ) ( italic_x ) - italic_W ( italic_x ) start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ≤ italic_φ ( italic_W ( italic_x ) start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ) (6)

holds for all x,γ𝒳×𝒴𝑥𝛾𝒳𝒴x,\gamma\in\mathcal{X}\times\mathcal{Y}italic_x , italic_γ ∈ caligraphic_X × caligraphic_Y. Assume additionally for each r>0𝑟0r>0italic_r > 0, the set {xd:W(x)r}conditional-set𝑥superscript𝑑𝑊𝑥𝑟\{x\in\mathbb{R}^{d}:W(x)\leq r\}{ italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT : italic_W ( italic_x ) ≤ italic_r } is compact. Then there is a constant M>0subscript𝑀0M_{*}>0italic_M start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT > 0 and a subsequence tn+subscript𝑡𝑛subscriptt_{n}\in\mathbb{Z}_{+}italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∈ blackboard_Z start_POSTSUBSCRIPT + end_POSTSUBSCRIPT increasing to infinity such that for any ϵ(0,1)italic-ϵ01\epsilon\in(0,1)italic_ϵ ∈ ( 0 , 1 ) with α>1+ϵ𝛼1italic-ϵ\alpha>1+\epsilonitalic_α > 1 + italic_ϵ,

inf𝒬𝒮(𝒳,𝒴)𝒲1(𝒜𝒬(tn,𝒬)(γ0,x0,),π)M(HWα(x0),φ1(tn))1+ϵα1ϵ.subscriptinfimum𝒬𝒮𝒳𝒴subscript𝒲delimited-∥∥1superscriptsubscript𝒜𝒬subscript𝑡𝑛𝒬subscript𝛾0subscript𝑥0𝜋subscript𝑀superscriptsuperscriptsubscript𝐻superscript𝑊𝛼subscript𝑥0𝜑1subscript𝑡𝑛1italic-ϵ𝛼1italic-ϵ\inf_{\mathcal{Q}\in\mathcal{S}(\mathcal{X},\mathcal{Y})}\mathcal{W}_{\left% \lVert\cdot\right\rVert\wedge 1}\left(\mathcal{A}_{\mathcal{Q}}^{(t_{n},% \mathcal{Q})}(\gamma_{0},x_{0},\cdot),\pi\right)\geq\frac{M_{*}}{\left(H_{W^{% \alpha}(x_{0}),\varphi}^{-1}(t_{n})\right)^{\frac{1+\epsilon}{\alpha-1-% \epsilon}}}.roman_inf start_POSTSUBSCRIPT caligraphic_Q ∈ caligraphic_S ( caligraphic_X , caligraphic_Y ) end_POSTSUBSCRIPT caligraphic_W start_POSTSUBSCRIPT ∥ ⋅ ∥ ∧ 1 end_POSTSUBSCRIPT ( caligraphic_A start_POSTSUBSCRIPT caligraphic_Q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , caligraphic_Q ) end_POSTSUPERSCRIPT ( italic_γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , ⋅ ) , italic_π ) ≥ divide start_ARG italic_M start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT end_ARG start_ARG ( italic_H start_POSTSUBSCRIPT italic_W start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) , italic_φ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ) start_POSTSUPERSCRIPT divide start_ARG 1 + italic_ϵ end_ARG start_ARG italic_α - 1 - italic_ϵ end_ARG end_POSTSUPERSCRIPT end_ARG .
Proof.

Since 𝒳W𝑑π=subscript𝒳𝑊differential-d𝜋\int_{\mathcal{X}}Wd\pi=\infty∫ start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT italic_W italic_d italic_π = ∞, there is a sequence (rn)nsubscriptsubscript𝑟𝑛𝑛(r_{n})_{n}( italic_r start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT with limnrn=subscript𝑛subscript𝑟𝑛\lim_{n}r_{n}=\inftyroman_lim start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = ∞ such that with Tn={x:W(x)rn}subscript𝑇𝑛conditional-set𝑥𝑊𝑥subscript𝑟𝑛T_{n}=\{x:W(x)\geq r_{n}\}italic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = { italic_x : italic_W ( italic_x ) ≥ italic_r start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT },

π(Tn)2α+1rn1+ϵ.𝜋subscript𝑇𝑛superscript2𝛼1superscriptsubscript𝑟𝑛1italic-ϵ\pi(T_{n})\geq\frac{2^{\alpha+1}}{r_{n}^{1+\epsilon}}.italic_π ( italic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ≥ divide start_ARG 2 start_POSTSUPERSCRIPT italic_α + 1 end_POSTSUPERSCRIPT end_ARG start_ARG italic_r start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 + italic_ϵ end_POSTSUPERSCRIPT end_ARG .

The conclusion follows by Theorem 4. ∎

3 Subgeometric upper bounds for adaptive MCMC

This section is dedicated to studying conditions such that an upper bound convergence rate can be obtained for adaptive MCMC comparable to the lower bounds in the previous section. We first consider an alternative to the diminishing adaptation condition [Roberts and Rosenthal, 2007] that is stronger in the sense that it requires a specified rate of decay.

Definition 7.

An adaptive process satisfies expected diminishing adaptation with function G:+(0,):𝐺subscript0G:\mathbb{Z}_{+}\to(0,\infty)italic_G : blackboard_Z start_POSTSUBSCRIPT + end_POSTSUBSCRIPT → ( 0 , ∞ ) strictly decreasing to infinity if and for all t+𝑡subscriptt\in\mathbb{Z}_{+}italic_t ∈ blackboard_Z start_POSTSUBSCRIPT + end_POSTSUBSCRIPT,

supx𝒳𝔼[𝒫Γt+1(x,)𝒫Γt(x,)TVXt=x]G(t).subscriptsupremum𝑥𝒳𝔼delimited-[]conditionalsubscriptdelimited-∥∥subscript𝒫subscriptΓ𝑡1𝑥subscript𝒫subscriptΓ𝑡𝑥TVsubscript𝑋𝑡𝑥𝐺𝑡\sup_{x\in\mathcal{X}}\mathbb{E}\left[\left\lVert\mathcal{P}_{\Gamma_{t+1}}(x,% \cdot)-\mathcal{P}_{\Gamma_{t}}(x,\cdot)\right\rVert_{\text{TV}}\mid X_{t}=x% \right]\leq G(t).roman_sup start_POSTSUBSCRIPT italic_x ∈ caligraphic_X end_POSTSUBSCRIPT blackboard_E [ ∥ caligraphic_P start_POSTSUBSCRIPT roman_Γ start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x , ⋅ ) - caligraphic_P start_POSTSUBSCRIPT roman_Γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x , ⋅ ) ∥ start_POSTSUBSCRIPT TV end_POSTSUBSCRIPT ∣ italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_x ] ≤ italic_G ( italic_t ) . (7)

Proposition 17 ensures Borel measurability of the total variation in (7). One way to satisfy this condition is if ρ𝜌\rhoitalic_ρ is a metric on 𝒴𝒴\mathcal{Y}caligraphic_Y and supx𝔼[ρ(Γt+1,Γt)Xt=x]G(t)subscriptsupremum𝑥𝔼delimited-[]conditional𝜌subscriptΓ𝑡1subscriptΓ𝑡subscript𝑋𝑡𝑥𝐺𝑡\sup_{x}\mathbb{E}\left[\rho(\Gamma_{t+1},\Gamma_{t})\mid X_{t}=x\right]\leq G% (t)roman_sup start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT blackboard_E [ italic_ρ ( roman_Γ start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT , roman_Γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ∣ italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_x ] ≤ italic_G ( italic_t ), then the expected diminishing adaptation condition can be shown through Lipschitz continuity of 𝒫γsubscript𝒫𝛾\mathcal{P}_{\gamma}caligraphic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT. For example, if for each x𝒳𝑥𝒳x\in\mathcal{X}italic_x ∈ caligraphic_X, γ𝒫γ(x,)maps-to𝛾subscript𝒫𝛾𝑥\gamma\mapsto\mathcal{P}_{\gamma}(x,\cdot)italic_γ ↦ caligraphic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_x , ⋅ ) is ρ𝜌\rhoitalic_ρ-Lipschitz with constant Lxsubscript𝐿𝑥L_{x}italic_L start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT, then

supx𝒳𝒫Γt+1(x,)𝒫Γt(x,)TV(supx𝒳Lx)ρ(Γt+1,Γt).subscriptsupremum𝑥𝒳subscriptdelimited-∥∥subscript𝒫subscriptΓ𝑡1𝑥subscript𝒫subscriptΓ𝑡𝑥TVsubscriptsupremum𝑥𝒳subscript𝐿𝑥𝜌subscriptΓ𝑡1subscriptΓ𝑡\sup_{x\in\mathcal{X}}\left\lVert\mathcal{P}_{\Gamma_{t+1}}(x,\cdot)-\mathcal{% P}_{\Gamma_{t}}(x,\cdot)\right\rVert_{\text{TV}}\leq\left(\sup_{x\in\mathcal{X% }}L_{x}\right)\rho(\Gamma_{t+1},\Gamma_{t}).roman_sup start_POSTSUBSCRIPT italic_x ∈ caligraphic_X end_POSTSUBSCRIPT ∥ caligraphic_P start_POSTSUBSCRIPT roman_Γ start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x , ⋅ ) - caligraphic_P start_POSTSUBSCRIPT roman_Γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x , ⋅ ) ∥ start_POSTSUBSCRIPT TV end_POSTSUBSCRIPT ≤ ( roman_sup start_POSTSUBSCRIPT italic_x ∈ caligraphic_X end_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ) italic_ρ ( roman_Γ start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT , roman_Γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) .

This has been shown to hold generally for adaptive Metropolis-Hastings with symmetric proposals [Andrieu and Moulines, 2006]. Next, we consider a simultaneous version of a subgeometric drift condition on the Markov family.

Definition 8.

A Markov family (𝒫γ)γ𝒴subscriptsubscript𝒫𝛾𝛾𝒴(\mathcal{P}_{\gamma})_{\gamma\in\mathcal{Y}}( caligraphic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_γ ∈ caligraphic_Y end_POSTSUBSCRIPT satisfies a simultaneous subgeometric drift condition if there is a Borel function V:𝒳[1,):𝑉𝒳1V:\mathcal{X}\to[1,\infty)italic_V : caligraphic_X → [ 1 , ∞ ) and a concave function φ:[0,)[0,):𝜑00\varphi:[0,\infty)\to[0,\infty)italic_φ : [ 0 , ∞ ) → [ 0 , ∞ ) strictly increasing to infinity with limvφ(v)/v=0subscript𝑣𝜑𝑣𝑣0\lim_{v\to\infty}\varphi(v)/v=0roman_lim start_POSTSUBSCRIPT italic_v → ∞ end_POSTSUBSCRIPT italic_φ ( italic_v ) / italic_v = 0 and a constant K0𝐾0K\geq 0italic_K ≥ 0 such that

(𝒫γV)(x)V(x)φ(V(x))+Ksubscript𝒫𝛾𝑉𝑥𝑉𝑥𝜑𝑉𝑥𝐾\displaystyle(\mathcal{P}_{\gamma}V)(x)-V(x)\leq-\varphi(V(x))+K( caligraphic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT italic_V ) ( italic_x ) - italic_V ( italic_x ) ≤ - italic_φ ( italic_V ( italic_x ) ) + italic_K (8)

holds for every x,γ𝒳×𝒴𝑥𝛾𝒳𝒴x,\gamma\in\mathcal{X}\times\mathcal{Y}italic_x , italic_γ ∈ caligraphic_X × caligraphic_Y.

Here we assume limvφ(v)/v=0subscript𝑣𝜑𝑣𝑣0\lim_{v\to\infty}\varphi(v)/v=0roman_lim start_POSTSUBSCRIPT italic_v → ∞ end_POSTSUBSCRIPT italic_φ ( italic_v ) / italic_v = 0 to exclude the geometric case. Subgeometric drift conditions for Markov chains has been studied previously [Jarner and Roberts, 2002, Douc et al., 2004] but we adjust the previous conditions to hold over feasible tuning parameters 𝒴𝒴\mathcal{Y}caligraphic_Y. We now combine this drift condition with a simultaneous local contracting condition.

Definition 9.

A Markov family (𝒫γ)γ𝒴subscriptsubscript𝒫𝛾𝛾𝒴(\mathcal{P}_{\gamma})_{\gamma\in\mathcal{Y}}( caligraphic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_γ ∈ caligraphic_Y end_POSTSUBSCRIPT satisfies a simultaneously locally contracting condition on a set C𝒳×𝒳𝐶𝒳𝒳C\subseteq\mathcal{X}\times\mathcal{X}italic_C ⊆ caligraphic_X × caligraphic_X if there is a constant α(0,1)𝛼01\alpha\in(0,1)italic_α ∈ ( 0 , 1 ) where

𝒫γ(x,)𝒫γ(y,)TV1αsubscriptdelimited-∥∥subscript𝒫𝛾𝑥subscript𝒫𝛾𝑦TV1𝛼\displaystyle\left\lVert\mathcal{P}_{\gamma}(x,\cdot)-\mathcal{P}_{\gamma}(y,% \cdot)\right\rVert_{\text{TV}}\leq 1-\alpha∥ caligraphic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_x , ⋅ ) - caligraphic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_y , ⋅ ) ∥ start_POSTSUBSCRIPT TV end_POSTSUBSCRIPT ≤ 1 - italic_α (9)

holds for all x,yC𝑥𝑦𝐶x,y\in Citalic_x , italic_y ∈ italic_C and γ𝒴𝛾𝒴\gamma\in\mathcal{Y}italic_γ ∈ caligraphic_Y.

Local coupling conditions have been studied in the subgeometric case for Markov chains [Durmus et al., 2016]. For example, a minorization condition can be used to verify the Markov family is simultaneously locally contracting (see [Roberts and Rosenthal, 2007]). Under these three conditions, we can establish an upper bound for the adaptation process.

Theorem 10.

Assume the expected diminishing adaptation condition (7) holds with G()𝐺G(\cdot)italic_G ( ⋅ ) decreasing to infinity. Additionally assume the following assumptions hold for the Markov family (𝒫γ)γ𝒴subscriptsubscript𝒫𝛾𝛾𝒴(\mathcal{P}_{\gamma})_{\gamma\in\mathcal{Y}}( caligraphic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_γ ∈ caligraphic_Y end_POSTSUBSCRIPT:

  1. 1.

    π𝒫γ=π𝜋subscript𝒫𝛾𝜋\pi\mathcal{P}_{\gamma}=\piitalic_π caligraphic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT = italic_π for all γ𝒴𝛾𝒴\gamma\in\mathcal{Y}italic_γ ∈ caligraphic_Y.

  2. 2.

    A simultaneously subgeometric drift condition (8) holds with a Borel function V:𝒳[0,):𝑉𝒳0V:\mathcal{X}\to[0,\infty)italic_V : caligraphic_X → [ 0 , ∞ ).

  3. 3.

    A simultaneous locally contracting condition (9) holds on the set C={x,y𝒳×𝒳:V(x)+V(y)2K/(1δ)}𝐶conditional-set𝑥𝑦𝒳𝒳𝑉𝑥𝑉𝑦2𝐾1𝛿C=\{x,y\in\mathcal{X}\times\mathcal{X}:V(x)+V(y)\leq 2K/(1-\delta)\}italic_C = { italic_x , italic_y ∈ caligraphic_X × caligraphic_X : italic_V ( italic_x ) + italic_V ( italic_y ) ≤ 2 italic_K / ( 1 - italic_δ ) } for some δ(0,1)𝛿01\delta\in(0,1)italic_δ ∈ ( 0 , 1 ).

Then for all ϵ(0,1)italic-ϵ01\epsilon\in(0,1)italic_ϵ ∈ ( 0 , 1 ) and all t+𝑡subscriptt\in\mathbb{Z}_{+}italic_t ∈ blackboard_Z start_POSTSUBSCRIPT + end_POSTSUBSCRIPT,

𝒜𝒬(Tϵ,t+t)((γ0,x0),)πTVδ+[r(1)+1][V(x0)+Vdπ+KTϵ,t+CδH1,φ1(tlog(H1,φ1(t))/log(1α)+1)+ϵ\left\lVert\mathcal{A}_{\mathcal{Q}}^{(T_{\epsilon,t}+t)}((\gamma_{0},x_{0}),% \cdot)-\pi\right\rVert_{\text{TV}}\leq\frac{\delta+\left[r(1)+1\right][V(x_{0}% )+\int Vd\pi+KT_{\epsilon,t}+C}{\delta H_{1,\varphi}^{-1}\left(\frac{t}{-\log(% H_{1,\varphi}^{-1}(t))/\log(1-\alpha)+1}\right)}+\epsilon∥ caligraphic_A start_POSTSUBSCRIPT caligraphic_Q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_T start_POSTSUBSCRIPT italic_ϵ , italic_t end_POSTSUBSCRIPT + italic_t ) end_POSTSUPERSCRIPT ( ( italic_γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) , ⋅ ) - italic_π ∥ start_POSTSUBSCRIPT TV end_POSTSUBSCRIPT ≤ divide start_ARG italic_δ + [ italic_r ( 1 ) + 1 ] [ italic_V ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) + ∫ italic_V italic_d italic_π + italic_K italic_T start_POSTSUBSCRIPT italic_ϵ , italic_t end_POSTSUBSCRIPT + italic_C end_ARG start_ARG italic_δ italic_H start_POSTSUBSCRIPT 1 , italic_φ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( divide start_ARG italic_t end_ARG start_ARG - roman_log ( italic_H start_POSTSUBSCRIPT 1 , italic_φ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_t ) ) / roman_log ( 1 - italic_α ) + 1 end_ARG ) end_ARG + italic_ϵ

where Tϵ,t=(1/G)1(t2/ϵ)subscript𝑇italic-ϵ𝑡superscript1𝐺1superscript𝑡2italic-ϵT_{\epsilon,t}=(1/G)^{-1}(t^{2}/\epsilon)italic_T start_POSTSUBSCRIPT italic_ϵ , italic_t end_POSTSUBSCRIPT = ( 1 / italic_G ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / italic_ϵ ) and

r()=φ(H1,φ1()),R=φ1(2K/(1δ)),formulae-sequence𝑟𝜑superscriptsubscript𝐻1𝜑1𝑅superscript𝜑12𝐾1𝛿\displaystyle r(\cdot)=\varphi(H_{1,\varphi}^{-1}(\cdot)),\hskip 10.00002ptR=% \varphi^{-1}(2K/(1-\delta)),italic_r ( ⋅ ) = italic_φ ( italic_H start_POSTSUBSCRIPT 1 , italic_φ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( ⋅ ) ) , italic_R = italic_φ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( 2 italic_K / ( 1 - italic_δ ) ) , C=[r(1)+1]{R+r(1)r(0)(R+4K)}.𝐶delimited-[]𝑟11𝑅𝑟1𝑟0𝑅4𝐾\displaystyle C=\left[r(1)+1\right]\left\{R+\frac{r(1)}{r(0)}(R+4K)\right\}.italic_C = [ italic_r ( 1 ) + 1 ] { italic_R + divide start_ARG italic_r ( 1 ) end_ARG start_ARG italic_r ( 0 ) end_ARG ( italic_R + 4 italic_K ) } .

Theorem 10 requires satisfying expected diminishing adaptation (7) with a sufficiently fast rate. Table 1 compares approximate upper bounds for different combinations of φ()𝜑\varphi(\cdot)italic_φ ( ⋅ ) and G()𝐺G(\cdot)italic_G ( ⋅ ). The upper and lower bounds may be also combined and in particular, Theoerem 10 can guarantee the adaptive process approximately achieves the lower bound rate if the adaptation diminishes sufficiently fast. For example, if in addition to the assumptions of Theorem 10, there are constants C,κ>0𝐶𝜅0C,\kappa>0italic_C , italic_κ > 0 such that

π(Vr)Crκ,𝜋𝑉𝑟𝐶superscript𝑟𝜅\displaystyle\pi(V\geq r)\geq Cr^{-\kappa},italic_π ( italic_V ≥ italic_r ) ≥ italic_C italic_r start_POSTSUPERSCRIPT - italic_κ end_POSTSUPERSCRIPT , (𝒫γV2κ)(x)V(x)2κφ(V(x)2κ)subscript𝒫𝛾superscript𝑉2𝜅𝑥𝑉superscript𝑥2𝜅𝜑𝑉superscript𝑥2𝜅\displaystyle(\mathcal{P}_{\gamma}V^{2\kappa})(x)-V(x)^{2\kappa}\leq\varphi(V(% x)^{2\kappa})( caligraphic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT italic_V start_POSTSUPERSCRIPT 2 italic_κ end_POSTSUPERSCRIPT ) ( italic_x ) - italic_V ( italic_x ) start_POSTSUPERSCRIPT 2 italic_κ end_POSTSUPERSCRIPT ≤ italic_φ ( italic_V ( italic_x ) start_POSTSUPERSCRIPT 2 italic_κ end_POSTSUPERSCRIPT )

holds for every x,γ𝒳×𝒴𝑥𝛾𝒳𝒴x,\gamma\in\mathcal{X}\times\mathcal{Y}italic_x , italic_γ ∈ caligraphic_X × caligraphic_Y. Then Theorem 1 and Theorem 10 imply some constants M,α>0superscript𝑀𝛼0M^{*},\alpha>0italic_M start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , italic_α > 0 such that

C24HV(x0)2κ,φ1(t)𝒜𝒬t((γ0,x0),)πTVMTϵ,tH1,φ1(tlog(H1,φ1(t))/log(1α)+1)+ϵsuperscript𝐶24superscriptsubscript𝐻𝑉superscriptsubscript𝑥02𝜅𝜑1𝑡subscriptdelimited-∥∥superscriptsubscript𝒜𝒬𝑡subscript𝛾0subscript𝑥0𝜋TVsuperscript𝑀subscript𝑇italic-ϵ𝑡superscriptsubscript𝐻1𝜑1𝑡superscriptsubscript𝐻1𝜑1𝑡1𝛼1italic-ϵ\frac{C^{2}}{4H_{V(x_{0})^{2\kappa},\varphi}^{-1}(t)}\leq\left\lVert\mathcal{A% }_{\mathcal{Q}}^{t}((\gamma_{0},x_{0}),\cdot)-\pi\right\rVert_{\text{TV}}\leq% \frac{M^{*}T_{\epsilon,t}}{H_{1,\varphi}^{-1}\left(\frac{t}{-\log(H_{1,\varphi% }^{-1}(t))/\log(1-\alpha)+1}\right)}+\epsilondivide start_ARG italic_C start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 4 italic_H start_POSTSUBSCRIPT italic_V ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 italic_κ end_POSTSUPERSCRIPT , italic_φ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_t ) end_ARG ≤ ∥ caligraphic_A start_POSTSUBSCRIPT caligraphic_Q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( ( italic_γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) , ⋅ ) - italic_π ∥ start_POSTSUBSCRIPT TV end_POSTSUBSCRIPT ≤ divide start_ARG italic_M start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT italic_T start_POSTSUBSCRIPT italic_ϵ , italic_t end_POSTSUBSCRIPT end_ARG start_ARG italic_H start_POSTSUBSCRIPT 1 , italic_φ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( divide start_ARG italic_t end_ARG start_ARG - roman_log ( italic_H start_POSTSUBSCRIPT 1 , italic_φ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_t ) ) / roman_log ( 1 - italic_α ) + 1 end_ARG ) end_ARG + italic_ϵ

holds for all t𝑡titalic_t and ϵitalic-ϵ\epsilonitalic_ϵ. Similarly, Theorem 4 can be used to give a weak lower bound. As an example, consider a target measure on dsuperscript𝑑\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT with potential U:d:𝑈superscript𝑑U:\mathbb{R}^{d}\to\mathbb{R}italic_U : blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT → blackboard_R defined by π(dx)exp(U(x))dxproportional-to𝜋𝑑𝑥𝑈𝑥𝑑𝑥\pi(dx)\propto\exp(-U(x))dxitalic_π ( italic_d italic_x ) ∝ roman_exp ( - italic_U ( italic_x ) ) italic_d italic_x and Lyapunov function defined by exp(κU(x))𝜅𝑈𝑥\exp(\kappa U(x))roman_exp ( italic_κ italic_U ( italic_x ) ) for α>0𝛼0\alpha>0italic_α > 0. Then with α<1𝛼1\alpha<1italic_α < 1, this can be used to obtain an upper bound and with α>1𝛼1\alpha>1italic_α > 1, this can be used to obtain a lower bound.

Examples of upper bound rates from Theorem 10
G(t) φ(w)=cwβ,β(0,1)formulae-sequence𝜑𝑤𝑐superscript𝑤𝛽𝛽01\varphi(w)=cw^{\beta},\beta\in(0,1)italic_φ ( italic_w ) = italic_c italic_w start_POSTSUPERSCRIPT italic_β end_POSTSUPERSCRIPT , italic_β ∈ ( 0 , 1 ) φ(x)=c(x+K)/log(x+K)β,β(0,1)\varphi(x)=c(x+K)/\log(x+K)^{\beta},\beta\in(0,1)italic_φ ( italic_x ) = italic_c ( italic_x + italic_K ) / roman_log ( italic_x + italic_K ) start_POSTSUPERSCRIPT italic_β end_POSTSUPERSCRIPT , italic_β ∈ ( 0 , 1 )
exp(αt),α>0𝛼𝑡𝛼0\exp(-\alpha t),\alpha>0roman_exp ( - italic_α italic_t ) , italic_α > 0 log(t)(1+(1β)ct)1/(1β)proportional-toabsent𝑡superscript11𝛽𝑐𝑡11𝛽\propto\frac{\log(t)}{(1+(1-\beta)ct)^{1/(1-\beta)}}∝ divide start_ARG roman_log ( italic_t ) end_ARG start_ARG ( 1 + ( 1 - italic_β ) italic_c italic_t ) start_POSTSUPERSCRIPT 1 / ( 1 - italic_β ) end_POSTSUPERSCRIPT end_ARG log(t)exp((1+β)ct11+β)proportional-toabsent𝑡1𝛽𝑐superscript𝑡11𝛽\propto\log(t)\exp\left(-(1+\beta)ct^{\frac{1}{1+\beta}}\right)∝ roman_log ( italic_t ) roman_exp ( - ( 1 + italic_β ) italic_c italic_t start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 1 + italic_β end_ARG end_POSTSUPERSCRIPT )
exp(tα),α(0,1)superscript𝑡𝛼𝛼01\exp(-t^{\alpha}),\alpha\in(0,1)roman_exp ( - italic_t start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ) , italic_α ∈ ( 0 , 1 ) log(t)1/α(1+(1β)ct)1/(1β)\propto\frac{\log(t)^{1/\alpha}}{(1+(1-\beta)ct)^{1/(1-\beta)}}∝ divide start_ARG roman_log ( italic_t ) start_POSTSUPERSCRIPT 1 / italic_α end_POSTSUPERSCRIPT end_ARG start_ARG ( 1 + ( 1 - italic_β ) italic_c italic_t ) start_POSTSUPERSCRIPT 1 / ( 1 - italic_β ) end_POSTSUPERSCRIPT end_ARG log(t)1/αexp((1+β)ct11+β)\propto\log(t)^{1/\alpha}\exp\left(-(1+\beta)ct^{\frac{1}{1+\beta}}\right)∝ roman_log ( italic_t ) start_POSTSUPERSCRIPT 1 / italic_α end_POSTSUPERSCRIPT roman_exp ( - ( 1 + italic_β ) italic_c italic_t start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 1 + italic_β end_ARG end_POSTSUPERSCRIPT )
tα,α>1superscript𝑡𝛼𝛼1t^{-\alpha},\alpha>1italic_t start_POSTSUPERSCRIPT - italic_α end_POSTSUPERSCRIPT , italic_α > 1 t2/α(1+(1β)ct)1/(1β)proportional-toabsentsuperscript𝑡2𝛼superscript11𝛽𝑐𝑡11𝛽\propto\frac{t^{2/\alpha}}{(1+(1-\beta)ct)^{1/(1-\beta)}}∝ divide start_ARG italic_t start_POSTSUPERSCRIPT 2 / italic_α end_POSTSUPERSCRIPT end_ARG start_ARG ( 1 + ( 1 - italic_β ) italic_c italic_t ) start_POSTSUPERSCRIPT 1 / ( 1 - italic_β ) end_POSTSUPERSCRIPT end_ARG t2/αexp((1+β)ct11+β)proportional-toabsentsuperscript𝑡2𝛼1𝛽𝑐superscript𝑡11𝛽\propto t^{2/\alpha}\exp\left(-(1+\beta)ct^{\frac{1}{1+\beta}}\right)∝ italic_t start_POSTSUPERSCRIPT 2 / italic_α end_POSTSUPERSCRIPT roman_exp ( - ( 1 + italic_β ) italic_c italic_t start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 1 + italic_β end_ARG end_POSTSUPERSCRIPT )
Table 1: Upper bound convergence rate comparisons from Theorem 10 for different combinations of φ()𝜑\varphi(\cdot)italic_φ ( ⋅ ) and G()𝐺G(\cdot)italic_G ( ⋅ ). The table entries specify a convergence rate upper bound up to an explicit constant.
Proof of Theorem 10.

We first specify a finite adaptation plan 𝒬Tsuperscript𝒬𝑇\mathcal{Q}^{T}caligraphic_Q start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT with a time T+𝑇subscriptT\in\mathbb{Z}_{+}italic_T ∈ blackboard_Z start_POSTSUBSCRIPT + end_POSTSUBSCRIPT defining a stopping point of adaptation. This defines an adaptive process where for all tT𝑡𝑇t\geq Titalic_t ≥ italic_T, Γt=ΓTsubscriptΓ𝑡subscriptΓ𝑇\Gamma_{t}=\Gamma_{T}roman_Γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = roman_Γ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT and (Γt,Xt)|(Γs,Xs)st1δΓT()𝒫ΓT(Xt1,)similar-toconditionalsubscriptΓ𝑡subscript𝑋𝑡subscriptsubscriptΓ𝑠subscript𝑋𝑠𝑠𝑡1subscript𝛿subscriptΓ𝑇subscript𝒫subscriptΓ𝑇subscript𝑋𝑡1(\Gamma_{t},X_{t})|(\Gamma_{s},X_{s})_{s\leq t-1}\sim\delta_{\Gamma_{T}}(\cdot% )\mathcal{P}_{\Gamma_{T}}(X_{t-1},\cdot)( roman_Γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) | ( roman_Γ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_s ≤ italic_t - 1 end_POSTSUBSCRIPT ∼ italic_δ start_POSTSUBSCRIPT roman_Γ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( ⋅ ) caligraphic_P start_POSTSUBSCRIPT roman_Γ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT , ⋅ ) where δΓTsubscript𝛿subscriptΓ𝑇\delta_{\Gamma_{T}}italic_δ start_POSTSUBSCRIPT roman_Γ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_POSTSUBSCRIPT is the Dirac measure at ΓTsubscriptΓ𝑇\Gamma_{T}roman_Γ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT. Using the finite adaptation process, we have the upper bound via the triangle inequality

𝒜𝒬(T+t)((γ0,x0),)πTVsubscriptdelimited-∥∥superscriptsubscript𝒜𝒬𝑇𝑡subscript𝛾0subscript𝑥0𝜋TV\displaystyle\left\lVert\mathcal{A}_{\mathcal{Q}}^{(T+t)}((\gamma_{0},x_{0}),% \cdot)-\pi\right\rVert_{\text{TV}}∥ caligraphic_A start_POSTSUBSCRIPT caligraphic_Q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_T + italic_t ) end_POSTSUPERSCRIPT ( ( italic_γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) , ⋅ ) - italic_π ∥ start_POSTSUBSCRIPT TV end_POSTSUBSCRIPT
𝒜𝒬(T+t)((γ0,x0),)𝒜𝒬T(T+t)((γ0,x0),)TV+𝒜𝒬T(T+t)((γ0,x0),)πTV.absentsubscriptdelimited-∥∥superscriptsubscript𝒜𝒬𝑇𝑡subscript𝛾0subscript𝑥0superscriptsubscript𝒜superscript𝒬𝑇𝑇𝑡subscript𝛾0subscript𝑥0TVsubscriptdelimited-∥∥superscriptsubscript𝒜superscript𝒬𝑇𝑇𝑡subscript𝛾0subscript𝑥0𝜋TV\displaystyle\leq\left\lVert\mathcal{A}_{\mathcal{Q}}^{(T+t)}((\gamma_{0},x_{0% }),\cdot)-\mathcal{A}_{\mathcal{Q}^{T}}^{(T+t)}((\gamma_{0},x_{0}),\cdot)% \right\rVert_{\text{TV}}+\left\lVert\mathcal{A}_{\mathcal{Q}^{T}}^{(T+t)}((% \gamma_{0},x_{0}),\cdot)-\pi\right\rVert_{\text{TV}}.≤ ∥ caligraphic_A start_POSTSUBSCRIPT caligraphic_Q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_T + italic_t ) end_POSTSUPERSCRIPT ( ( italic_γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) , ⋅ ) - caligraphic_A start_POSTSUBSCRIPT caligraphic_Q start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_T + italic_t ) end_POSTSUPERSCRIPT ( ( italic_γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) , ⋅ ) ∥ start_POSTSUBSCRIPT TV end_POSTSUBSCRIPT + ∥ caligraphic_A start_POSTSUBSCRIPT caligraphic_Q start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_T + italic_t ) end_POSTSUPERSCRIPT ( ( italic_γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) , ⋅ ) - italic_π ∥ start_POSTSUBSCRIPT TV end_POSTSUBSCRIPT . (10)

We will bound each term on the right hand side of (10) separately. For the first term in (10), fix ϵ(0,1)italic-ϵ01\epsilon\in(0,1)italic_ϵ ∈ ( 0 , 1 ) and choose Tϵ,t=(1/G)1(t2/ϵ)subscript𝑇italic-ϵ𝑡superscript1𝐺1superscript𝑡2italic-ϵT_{\epsilon,t}=(1/G)^{-1}(t^{2}/\epsilon)italic_T start_POSTSUBSCRIPT italic_ϵ , italic_t end_POSTSUBSCRIPT = ( 1 / italic_G ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / italic_ϵ ). Using the triangle inequality, we have that

supx𝒳𝔼[𝒫ΓT+t(x,)𝒫ΓT(x,)TVXT+t1=x]subscriptsupremum𝑥𝒳𝔼delimited-[]conditionalsubscriptdelimited-∥∥subscript𝒫subscriptΓ𝑇𝑡𝑥subscript𝒫subscriptΓ𝑇𝑥TVsubscript𝑋𝑇𝑡1𝑥\displaystyle\sup_{x\in\mathcal{X}}\mathbb{E}\left[\left\lVert\mathcal{P}_{% \Gamma_{T+t}}(x,\cdot)-\mathcal{P}_{\Gamma_{T}}(x,\cdot)\right\rVert_{\text{TV% }}\mid X_{T+t-1}=x\right]roman_sup start_POSTSUBSCRIPT italic_x ∈ caligraphic_X end_POSTSUBSCRIPT blackboard_E [ ∥ caligraphic_P start_POSTSUBSCRIPT roman_Γ start_POSTSUBSCRIPT italic_T + italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x , ⋅ ) - caligraphic_P start_POSTSUBSCRIPT roman_Γ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x , ⋅ ) ∥ start_POSTSUBSCRIPT TV end_POSTSUBSCRIPT ∣ italic_X start_POSTSUBSCRIPT italic_T + italic_t - 1 end_POSTSUBSCRIPT = italic_x ]
s=1tsupx𝒳𝔼[𝒫ΓT+s(x,)𝒫ΓT+s1(x,)TVXT+s1=x]absentsuperscriptsubscript𝑠1𝑡subscriptsupremum𝑥𝒳𝔼delimited-[]conditionalsubscriptdelimited-∥∥subscript𝒫subscriptΓ𝑇𝑠𝑥subscript𝒫subscriptΓ𝑇𝑠1𝑥TVsubscript𝑋𝑇𝑠1𝑥\displaystyle\leq\sum_{s=1}^{t}\sup_{x\in\mathcal{X}}\mathbb{E}\left[\left% \lVert\mathcal{P}_{\Gamma_{T+s}}(x,\cdot)-\mathcal{P}_{\Gamma_{T+s-1}}(x,\cdot% )\right\rVert_{\text{TV}}\mid X_{T+s-1}=x\right]≤ ∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT roman_sup start_POSTSUBSCRIPT italic_x ∈ caligraphic_X end_POSTSUBSCRIPT blackboard_E [ ∥ caligraphic_P start_POSTSUBSCRIPT roman_Γ start_POSTSUBSCRIPT italic_T + italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x , ⋅ ) - caligraphic_P start_POSTSUBSCRIPT roman_Γ start_POSTSUBSCRIPT italic_T + italic_s - 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x , ⋅ ) ∥ start_POSTSUBSCRIPT TV end_POSTSUBSCRIPT ∣ italic_X start_POSTSUBSCRIPT italic_T + italic_s - 1 end_POSTSUBSCRIPT = italic_x ]
tG(T)absent𝑡𝐺𝑇\displaystyle\leq tG(T)≤ italic_t italic_G ( italic_T )
ϵ/t.absentitalic-ϵ𝑡\displaystyle\leq\epsilon/t.≤ italic_ϵ / italic_t .

Since 𝒳𝒳\mathcal{X}caligraphic_X is Polish, Proposition 17 ensures the total variation is Borel measurable.

Let (Γt,Xt)t0subscriptsubscriptΓ𝑡subscript𝑋𝑡𝑡0(\Gamma_{t},X_{t})_{t\geq 0}( roman_Γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_t ≥ 0 end_POSTSUBSCRIPT be an adaptive process initialized at x0,γ0subscript𝑥0subscript𝛾0x_{0},\gamma_{0}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and (Γt,Xt)t0subscriptsubscriptsuperscriptΓ𝑡subscriptsuperscript𝑋𝑡𝑡0(\Gamma^{\prime}_{t},X^{\prime}_{t})_{t\geq 0}( roman_Γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_t ≥ 0 end_POSTSUBSCRIPT be the finite adaptation process initialized similarly. Since both of these processes are initialized at the same point, we can construct a coupling where Xs=Xssubscript𝑋𝑠superscriptsubscript𝑋𝑠X_{s}=X_{s}^{\prime}italic_X start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT = italic_X start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT for sT𝑠𝑇s\leq Titalic_s ≤ italic_T and

(Xt+T=Yt+T)subscript𝑋𝑡𝑇subscript𝑌𝑡𝑇\displaystyle\mathbb{P}\left(X_{t+T}=Y_{t+T}\right)blackboard_P ( italic_X start_POSTSUBSCRIPT italic_t + italic_T end_POSTSUBSCRIPT = italic_Y start_POSTSUBSCRIPT italic_t + italic_T end_POSTSUBSCRIPT ) =(Xt+T=Yt+T|Xt+T1=Yt+T1)(Xt+T1=Yt+T1)absentsubscript𝑋𝑡𝑇conditionalsubscript𝑌𝑡𝑇subscript𝑋𝑡𝑇1subscript𝑌𝑡𝑇1subscript𝑋𝑡𝑇1subscript𝑌𝑡𝑇1\displaystyle=\mathbb{P}\left(X_{t+T}=Y_{t+T}|X_{t+T-1}=Y_{t+T-1}\right)% \mathbb{P}\left(X_{t+T-1}=Y_{t+T-1}\right)= blackboard_P ( italic_X start_POSTSUBSCRIPT italic_t + italic_T end_POSTSUBSCRIPT = italic_Y start_POSTSUBSCRIPT italic_t + italic_T end_POSTSUBSCRIPT | italic_X start_POSTSUBSCRIPT italic_t + italic_T - 1 end_POSTSUBSCRIPT = italic_Y start_POSTSUBSCRIPT italic_t + italic_T - 1 end_POSTSUBSCRIPT ) blackboard_P ( italic_X start_POSTSUBSCRIPT italic_t + italic_T - 1 end_POSTSUBSCRIPT = italic_Y start_POSTSUBSCRIPT italic_t + italic_T - 1 end_POSTSUBSCRIPT )
(1ϵ/t)(Xt+T1=Yt+T1)absent1italic-ϵ𝑡subscript𝑋𝑡𝑇1subscript𝑌𝑡𝑇1\displaystyle\geq(1-\epsilon/t)\mathbb{P}\left(X_{t+T-1}=Y_{t+T-1}\right)≥ ( 1 - italic_ϵ / italic_t ) blackboard_P ( italic_X start_POSTSUBSCRIPT italic_t + italic_T - 1 end_POSTSUBSCRIPT = italic_Y start_POSTSUBSCRIPT italic_t + italic_T - 1 end_POSTSUBSCRIPT )
(1ϵ/t)tabsentsuperscript1italic-ϵ𝑡𝑡\displaystyle\geq(1-\epsilon/t)^{t}≥ ( 1 - italic_ϵ / italic_t ) start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT
1ϵ.absent1italic-ϵ\displaystyle\geq 1-\epsilon.≥ 1 - italic_ϵ .

Since 𝒳𝒳\mathcal{X}caligraphic_X is Polish, then it follows immediately that the optimal coupling is controlled by this coupling we have constructed so that

𝒜𝒬(Tϵ,t+t)((γ0,x0),)𝒜𝒬Tϵ,t(Tϵ,t+t)((γ0,x0),)TV(Xt+TYt+T)ϵ.subscriptdelimited-∥∥superscriptsubscript𝒜𝒬subscript𝑇italic-ϵ𝑡𝑡subscript𝛾0subscript𝑥0superscriptsubscript𝒜superscript𝒬subscript𝑇italic-ϵ𝑡subscript𝑇italic-ϵ𝑡𝑡subscript𝛾0subscript𝑥0TVsubscript𝑋𝑡𝑇subscript𝑌𝑡𝑇italic-ϵ\left\lVert\mathcal{A}_{\mathcal{Q}}^{(T_{\epsilon,t}+t)}((\gamma_{0},x_{0}),% \cdot)-\mathcal{A}_{\mathcal{Q}^{T_{\epsilon,t}}}^{(T_{\epsilon,t}+t)}((\gamma% _{0},x_{0}),\cdot)\right\rVert_{\text{TV}}\leq\mathbb{P}\left(X_{t+T}\not=Y_{t% +T}\right)\leq\epsilon.∥ caligraphic_A start_POSTSUBSCRIPT caligraphic_Q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_T start_POSTSUBSCRIPT italic_ϵ , italic_t end_POSTSUBSCRIPT + italic_t ) end_POSTSUPERSCRIPT ( ( italic_γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) , ⋅ ) - caligraphic_A start_POSTSUBSCRIPT caligraphic_Q start_POSTSUPERSCRIPT italic_T start_POSTSUBSCRIPT italic_ϵ , italic_t end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_T start_POSTSUBSCRIPT italic_ϵ , italic_t end_POSTSUBSCRIPT + italic_t ) end_POSTSUPERSCRIPT ( ( italic_γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) , ⋅ ) ∥ start_POSTSUBSCRIPT TV end_POSTSUBSCRIPT ≤ blackboard_P ( italic_X start_POSTSUBSCRIPT italic_t + italic_T end_POSTSUBSCRIPT ≠ italic_Y start_POSTSUBSCRIPT italic_t + italic_T end_POSTSUBSCRIPT ) ≤ italic_ϵ .

To bound the second term in (10), the following is adapted from previous arguments for subgeometric upper bounds for non-adapted Markov chains [Durmus et al., 2016], but modified for adaptive MCMC, and the constants are improved and explicit. Since 𝒳𝒳\mathcal{X}caligraphic_X is Polish, there is a Borel measurable conditional total variation distance by [Villani, 2009, Theorem 4.8] so that

𝒜𝒬T(T+t)((γ0,x0),)πTVsubscriptdelimited-∥∥superscriptsubscript𝒜superscript𝒬𝑇𝑇𝑡subscript𝛾0subscript𝑥0𝜋TV\displaystyle\left\lVert\mathcal{A}_{\mathcal{Q}^{T}}^{(T+t)}((\gamma_{0},x_{0% }),\cdot)-\pi\right\rVert_{\text{TV}}∥ caligraphic_A start_POSTSUBSCRIPT caligraphic_Q start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_T + italic_t ) end_POSTSUPERSCRIPT ( ( italic_γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) , ⋅ ) - italic_π ∥ start_POSTSUBSCRIPT TV end_POSTSUBSCRIPT 𝔼[𝒫ΓTt(XT,)πTV].absent𝔼delimited-[]subscriptdelimited-∥∥superscriptsubscript𝒫subscriptΓ𝑇𝑡subscript𝑋𝑇𝜋TV\displaystyle\leq\mathbb{E}\left[\left\lVert\mathcal{P}_{\Gamma_{T}}^{t}(X_{T}% ,\cdot)-\pi\right\rVert_{\text{TV}}\right].≤ blackboard_E [ ∥ caligraphic_P start_POSTSUBSCRIPT roman_Γ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( italic_X start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT , ⋅ ) - italic_π ∥ start_POSTSUBSCRIPT TV end_POSTSUBSCRIPT ] .

Let τC=inf{n1:Xn,YnC}subscript𝜏𝐶infimumconditional-set𝑛1subscript𝑋𝑛subscript𝑌𝑛𝐶\tau_{C}=\inf\{n\geq 1:X_{n},Y_{n}\in C\}italic_τ start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT = roman_inf { italic_n ≥ 1 : italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∈ italic_C } be the first hit time to the set C𝐶Citalic_C. For n+𝑛subscriptn\in\mathbb{Z}_{+}italic_n ∈ blackboard_Z start_POSTSUBSCRIPT + end_POSTSUBSCRIPT, let θnsubscript𝜃𝑛\theta_{n}italic_θ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT denote the shift operator applied n𝑛nitalic_n times so that θn(Xi)=Xi+nsubscript𝜃𝑛subscript𝑋𝑖subscript𝑋𝑖𝑛\theta_{n}(X_{i})=X_{i+n}italic_θ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = italic_X start_POSTSUBSCRIPT italic_i + italic_n end_POSTSUBSCRIPT for all i+𝑖subscripti\in\mathbb{Z}_{+}italic_i ∈ blackboard_Z start_POSTSUBSCRIPT + end_POSTSUBSCRIPT. Define the successive hit times to C𝐶Citalic_C recursively by

τ1=τC,subscript𝜏1subscript𝜏𝐶\displaystyle\tau_{1}=\tau_{C},italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_τ start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT , τn+1=τn+τCθτn=k=1n+1τksubscript𝜏𝑛1subscript𝜏𝑛subscript𝜏𝐶subscript𝜃subscript𝜏𝑛superscriptsubscript𝑘1𝑛1subscript𝜏𝑘\displaystyle\tau_{n+1}=\tau_{n}+\tau_{C}\circ\theta_{\tau_{n}}=\sum_{k=1}^{n+% 1}\tau_{k}italic_τ start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT = italic_τ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT + italic_τ start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT ∘ italic_θ start_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n + 1 end_POSTSUPERSCRIPT italic_τ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT

for each n+𝑛subscriptn\in\mathbb{Z}_{+}italic_n ∈ blackboard_Z start_POSTSUBSCRIPT + end_POSTSUBSCRIPT. The inverse function theorem implies the derivative r(s)=ddsH1,φ1(s)=φ(H1,φ1(s))𝑟𝑠𝑑𝑑𝑠subscriptsuperscript𝐻11𝜑𝑠𝜑subscriptsuperscript𝐻11𝜑𝑠r(s)=\frac{d}{ds}H^{-1}_{1,\varphi}(s)=\varphi(H^{-1}_{1,\varphi}(s))italic_r ( italic_s ) = divide start_ARG italic_d end_ARG start_ARG italic_d italic_s end_ARG italic_H start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 , italic_φ end_POSTSUBSCRIPT ( italic_s ) = italic_φ ( italic_H start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 , italic_φ end_POSTSUBSCRIPT ( italic_s ) ) for s0𝑠0s\geq 0italic_s ≥ 0. Thus, H1,φ1superscriptsubscript𝐻1𝜑1H_{1,\varphi}^{-1}italic_H start_POSTSUBSCRIPT 1 , italic_φ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT is convex since its derivative is monotone increasing by Lemma 18. By Markov’s inequality and Jensen’s inequality,

(τmt)subscript𝜏𝑚𝑡\displaystyle\mathbb{P}(\tau_{m}\geq t)blackboard_P ( italic_τ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ≥ italic_t ) 𝔼[H1,φ1(1mk=1mτk)]H1,φ1(t/m)absent𝔼delimited-[]superscriptsubscript𝐻1𝜑11𝑚superscriptsubscript𝑘1𝑚subscript𝜏𝑘superscriptsubscript𝐻1𝜑1𝑡𝑚\displaystyle\leq\frac{\mathbb{E}\left[H_{1,\varphi}^{-1}\left(\frac{1}{m}\sum% _{k=1}^{m}\tau_{k}\right)\right]}{H_{1,\varphi}^{-1}(t/m)}≤ divide start_ARG blackboard_E [ italic_H start_POSTSUBSCRIPT 1 , italic_φ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( divide start_ARG 1 end_ARG start_ARG italic_m end_ARG ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT italic_τ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ] end_ARG start_ARG italic_H start_POSTSUBSCRIPT 1 , italic_φ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_t / italic_m ) end_ARG
1mk=1m𝔼[H1,φ1(τk)]H1,φ1(t/m).absent1𝑚superscriptsubscript𝑘1𝑚𝔼delimited-[]superscriptsubscript𝐻1𝜑1subscript𝜏𝑘superscriptsubscript𝐻1𝜑1𝑡𝑚\displaystyle\leq\frac{\frac{1}{m}\sum_{k=1}^{m}\mathbb{E}\left[H_{1,\varphi}^% {-1}(\tau_{k})\right]}{H_{1,\varphi}^{-1}(t/m)}.≤ divide start_ARG divide start_ARG 1 end_ARG start_ARG italic_m end_ARG ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT blackboard_E [ italic_H start_POSTSUBSCRIPT 1 , italic_φ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_τ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ] end_ARG start_ARG italic_H start_POSTSUBSCRIPT 1 , italic_φ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_t / italic_m ) end_ARG .

For any t,m+𝑡𝑚subscriptt,m\in\mathbb{Z}_{+}italic_t , italic_m ∈ blackboard_Z start_POSTSUBSCRIPT + end_POSTSUBSCRIPT with tm𝑡𝑚t\geq mitalic_t ≥ italic_m, the local coupling condition (9) implies an upper bound via a coupling argument with [Jarner and Tweedie, 2001, Lemma 3.1] so that for all γ𝒴𝛾𝒴\gamma\in\mathcal{Y}italic_γ ∈ caligraphic_Y and x,y𝒳𝑥𝑦𝒳x,y\in\mathcal{X}italic_x , italic_y ∈ caligraphic_X,

𝒫γt(x,)𝒫γt(y,)TVsubscriptdelimited-∥∥superscriptsubscript𝒫𝛾𝑡𝑥superscriptsubscript𝒫𝛾𝑡𝑦TV\displaystyle\left\lVert\mathcal{P}_{\gamma}^{t}(x,\cdot)-\mathcal{P}_{\gamma}% ^{t}(y,\cdot)\right\rVert_{\text{TV}}∥ caligraphic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( italic_x , ⋅ ) - caligraphic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( italic_y , ⋅ ) ∥ start_POSTSUBSCRIPT TV end_POSTSUBSCRIPT infξ𝒞(𝒫γt(x,),𝒫γt(y,))ξ({u,v:uv,τm<t})+(τmt)absentsubscriptinfimum𝜉𝒞superscriptsubscript𝒫𝛾𝑡𝑥superscriptsubscript𝒫𝛾𝑡𝑦𝜉conditional-set𝑢𝑣formulae-sequence𝑢𝑣subscript𝜏𝑚𝑡subscript𝜏𝑚𝑡\displaystyle\leq\inf_{\xi\in\mathcal{C}\left(\mathcal{P}_{\gamma}^{t}(x,\cdot% ),\mathcal{P}_{\gamma}^{t}(y,\cdot)\right)}\xi(\{u,v:u\not=v,\tau_{m}<t\})+% \mathbb{P}(\tau_{m}\geq t)≤ roman_inf start_POSTSUBSCRIPT italic_ξ ∈ caligraphic_C ( caligraphic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( italic_x , ⋅ ) , caligraphic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( italic_y , ⋅ ) ) end_POSTSUBSCRIPT italic_ξ ( { italic_u , italic_v : italic_u ≠ italic_v , italic_τ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT < italic_t } ) + blackboard_P ( italic_τ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ≥ italic_t )
(1α)m+(k=1mτk>t)absentsuperscript1𝛼𝑚superscriptsubscript𝑘1𝑚subscript𝜏𝑘𝑡\displaystyle\leq(1-\alpha)^{m}+\mathbb{P}\left(\sum_{k=1}^{m}\tau_{k}>t\right)≤ ( 1 - italic_α ) start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT + blackboard_P ( ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT italic_τ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT > italic_t )
(1α)m+1mk=1m𝔼[H1,φ1(τk)]H1,φ1(t/m).absentsuperscript1𝛼𝑚1𝑚superscriptsubscript𝑘1𝑚𝔼delimited-[]superscriptsubscript𝐻1𝜑1subscript𝜏𝑘superscriptsubscript𝐻1𝜑1𝑡𝑚\displaystyle\leq(1-\alpha)^{m}+\frac{\frac{1}{m}\sum_{k=1}^{m}\mathbb{E}\left% [H_{1,\varphi}^{-1}(\tau_{k})\right]}{H_{1,\varphi}^{-1}(t/m)}.≤ ( 1 - italic_α ) start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT + divide start_ARG divide start_ARG 1 end_ARG start_ARG italic_m end_ARG ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT blackboard_E [ italic_H start_POSTSUBSCRIPT 1 , italic_φ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_τ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ] end_ARG start_ARG italic_H start_POSTSUBSCRIPT 1 , italic_φ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_t / italic_m ) end_ARG .

Since φ𝜑\varphiitalic_φ is concave, it is subadditive so φ(V(x)+V(y))φ(V(x))+φ(V(y))𝜑𝑉𝑥𝑉𝑦𝜑𝑉𝑥𝜑𝑉𝑦\varphi(V(x)+V(y))\leq\varphi(V(x))+\varphi(V(y))italic_φ ( italic_V ( italic_x ) + italic_V ( italic_y ) ) ≤ italic_φ ( italic_V ( italic_x ) ) + italic_φ ( italic_V ( italic_y ) ). Since φ𝜑\varphiitalic_φ is strictly increasing, by the drift condition,

(𝒫γV)(x)+(𝒫γV)(y)[V(x)+V(y)]subscript𝒫𝛾𝑉𝑥subscript𝒫𝛾𝑉𝑦delimited-[]𝑉𝑥𝑉𝑦\displaystyle(\mathcal{P}_{\gamma}V)(x)+(\mathcal{P}_{\gamma}V)(y)-[V(x)+V(y)]( caligraphic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT italic_V ) ( italic_x ) + ( caligraphic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT italic_V ) ( italic_y ) - [ italic_V ( italic_x ) + italic_V ( italic_y ) ] [φ(V(x))+φ(V(y))]+2Kabsentdelimited-[]𝜑𝑉𝑥𝜑𝑉𝑦2𝐾\displaystyle\leq-[\varphi(V(x))+\varphi(V(y))]+2K≤ - [ italic_φ ( italic_V ( italic_x ) ) + italic_φ ( italic_V ( italic_y ) ) ] + 2 italic_K
[φ(V(x)+V(y))]+2Kabsentdelimited-[]𝜑𝑉𝑥𝑉𝑦2𝐾\displaystyle\leq-[\varphi(V(x)+V(y))]+2K≤ - [ italic_φ ( italic_V ( italic_x ) + italic_V ( italic_y ) ) ] + 2 italic_K

holds for all x,y𝒳𝑥𝑦𝒳x,y\in\mathcal{X}italic_x , italic_y ∈ caligraphic_X. Using Lemma 19,

(𝒫γV)(x)+(𝒫γV)(y)[V(x)+V(y)]δ[φ(V(x)+V(y))]+(R+2K)IC(x,y).subscript𝒫𝛾𝑉𝑥subscript𝒫𝛾𝑉𝑦delimited-[]𝑉𝑥𝑉𝑦𝛿delimited-[]𝜑𝑉𝑥𝑉𝑦𝑅2𝐾subscript𝐼𝐶𝑥𝑦\displaystyle(\mathcal{P}_{\gamma}V)(x)+(\mathcal{P}_{\gamma}V)(y)-[V(x)+V(y)]% \leq-\delta[\varphi(V(x)+V(y))]+(R+2K)I_{C}(x,y).( caligraphic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT italic_V ) ( italic_x ) + ( caligraphic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT italic_V ) ( italic_y ) - [ italic_V ( italic_x ) + italic_V ( italic_y ) ] ≤ - italic_δ [ italic_φ ( italic_V ( italic_x ) + italic_V ( italic_y ) ) ] + ( italic_R + 2 italic_K ) italic_I start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT ( italic_x , italic_y ) .

By [Douc et al., 2004, Proposition 2.2],

supx,yC𝔼x,y(i=0τC1r(i))φ1(2K/(1δ))δ+(R+2K)r(1)δr(0).subscriptsupremum𝑥𝑦𝐶subscript𝔼𝑥𝑦superscriptsubscript𝑖0subscript𝜏𝐶1𝑟𝑖superscript𝜑12𝐾1𝛿𝛿𝑅2𝐾𝑟1𝛿𝑟0\sup_{x,y\in C}\mathbb{E}_{x,y}\left(\sum_{i=0}^{\tau_{C}-1}r(i)\right)\leq% \frac{\varphi^{-1}(2K/(1-\delta))}{\delta}+\frac{(R+2K)r(1)}{\delta r(0)}.roman_sup start_POSTSUBSCRIPT italic_x , italic_y ∈ italic_C end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT italic_x , italic_y end_POSTSUBSCRIPT ( ∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_τ start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT italic_r ( italic_i ) ) ≤ divide start_ARG italic_φ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( 2 italic_K / ( 1 - italic_δ ) ) end_ARG start_ARG italic_δ end_ARG + divide start_ARG ( italic_R + 2 italic_K ) italic_r ( 1 ) end_ARG start_ARG italic_δ italic_r ( 0 ) end_ARG .

We have that r()=φ(H1,φ1())𝑟𝜑superscriptsubscript𝐻1𝜑1r(\cdot)=\varphi(H_{1,\varphi}^{-1}(\cdot))italic_r ( ⋅ ) = italic_φ ( italic_H start_POSTSUBSCRIPT 1 , italic_φ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( ⋅ ) ) is log-concave and so r(s+t)r(s)r(t)𝑟𝑠𝑡𝑟𝑠𝑟𝑡r(s+t)\leq r(s)r(t)italic_r ( italic_s + italic_t ) ≤ italic_r ( italic_s ) italic_r ( italic_t ) for all t,s0𝑡𝑠0t,s\geq 0italic_t , italic_s ≥ 0 [Douc et al., 2004, see the proof of Proposition 2.1]. We then have the upper bound for k2𝑘2k\geq 2italic_k ≥ 2,

𝔼[H1,φ1(τk)]𝔼delimited-[]superscriptsubscript𝐻1𝜑1subscript𝜏𝑘\displaystyle\mathbb{E}\left[H_{1,\varphi}^{-1}(\tau_{k})\right]blackboard_E [ italic_H start_POSTSUBSCRIPT 1 , italic_φ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_τ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ] =𝔼[𝔼Xτk1(0τkr(s)𝑑s)]absent𝔼delimited-[]subscript𝔼subscript𝑋subscript𝜏𝑘1subscriptsuperscriptsubscript𝜏𝑘0𝑟𝑠differential-d𝑠\displaystyle=\mathbb{E}\left[\mathbb{E}_{X_{\tau_{k-1}}}\left(\int^{\tau_{k}}% _{0}r(s)ds\right)\right]= blackboard_E [ blackboard_E start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( ∫ start_POSTSUPERSCRIPT italic_τ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_r ( italic_s ) italic_d italic_s ) ]
𝔼[𝔼Xτk1(i=1τCr(i))]absent𝔼delimited-[]subscript𝔼subscript𝑋subscript𝜏𝑘1superscriptsubscript𝑖1subscript𝜏𝐶𝑟𝑖\displaystyle\leq\mathbb{E}\left[\mathbb{E}_{X_{\tau_{k-1}}}\left(\sum_{i=1}^{% \tau_{C}}r(i)\right)\right]≤ blackboard_E [ blackboard_E start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_τ start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_r ( italic_i ) ) ]
𝔼[𝔼Xτk1(r(τC))]r(0)+𝔼[𝔼Xτk1(i=0τC1r(i))]absent𝔼delimited-[]subscript𝔼subscript𝑋subscript𝜏𝑘1𝑟subscript𝜏𝐶𝑟0𝔼delimited-[]subscript𝔼subscript𝑋subscript𝜏𝑘1superscriptsubscript𝑖0subscript𝜏𝐶1𝑟𝑖\displaystyle\leq\mathbb{E}\left[\mathbb{E}_{X_{\tau_{k-1}}}\left(r(\tau_{C})% \right)\right]-r(0)+\mathbb{E}\left[\mathbb{E}_{X_{\tau_{k-1}}}\left(\sum_{i=0% }^{\tau_{C}-1}r(i)\right)\right]≤ blackboard_E [ blackboard_E start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_r ( italic_τ start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT ) ) ] - italic_r ( 0 ) + blackboard_E [ blackboard_E start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( ∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_τ start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT italic_r ( italic_i ) ) ]
r(1)𝔼[𝔼Xτk1(r(τC1))]r(0)+𝔼[𝔼Xτk1(i=0τC1r(i))]absent𝑟1𝔼delimited-[]subscript𝔼subscript𝑋subscript𝜏𝑘1𝑟subscript𝜏𝐶1𝑟0𝔼delimited-[]subscript𝔼subscript𝑋subscript𝜏𝑘1superscriptsubscript𝑖0subscript𝜏𝐶1𝑟𝑖\displaystyle\leq r(1)\mathbb{E}\left[\mathbb{E}_{X_{\tau_{k-1}}}\left(r(\tau_% {C}-1)\right)\right]-r(0)+\mathbb{E}\left[\mathbb{E}_{X_{\tau_{k-1}}}\left(% \sum_{i=0}^{\tau_{C}-1}r(i)\right)\right]≤ italic_r ( 1 ) blackboard_E [ blackboard_E start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_r ( italic_τ start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT - 1 ) ) ] - italic_r ( 0 ) + blackboard_E [ blackboard_E start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( ∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_τ start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT italic_r ( italic_i ) ) ]
[r(1)+1]𝔼[𝔼Xτk1(i=0τC1r(i))]r(0)absentdelimited-[]𝑟11𝔼delimited-[]subscript𝔼subscript𝑋subscript𝜏𝑘1superscriptsubscript𝑖0subscript𝜏𝐶1𝑟𝑖𝑟0\displaystyle\leq[r(1)+1]\mathbb{E}\left[\mathbb{E}_{X_{\tau_{k-1}}}\left(\sum% _{i=0}^{\tau_{C}-1}r(i)\right)\right]-r(0)≤ [ italic_r ( 1 ) + 1 ] blackboard_E [ blackboard_E start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( ∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_τ start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT italic_r ( italic_i ) ) ] - italic_r ( 0 )
r(1)+1δ{R(1+r(1)r(0))+2Kr(1)r(0)}.absent𝑟11𝛿𝑅1𝑟1𝑟02𝐾𝑟1𝑟0\displaystyle\leq\frac{r(1)+1}{\delta}\left\{R\left(1+\frac{r(1)}{r(0)}\right)% +\frac{2Kr(1)}{r(0)}\right\}.≤ divide start_ARG italic_r ( 1 ) + 1 end_ARG start_ARG italic_δ end_ARG { italic_R ( 1 + divide start_ARG italic_r ( 1 ) end_ARG start_ARG italic_r ( 0 ) end_ARG ) + divide start_ARG 2 italic_K italic_r ( 1 ) end_ARG start_ARG italic_r ( 0 ) end_ARG } .

For k=1𝑘1k=1italic_k = 1, similarly, we have

𝔼[H1,φ1(τC)]r(1)+1δ{V(x)+V(y)+Rr(1)r(0)+2Kr(1)r(0)}.𝔼delimited-[]superscriptsubscript𝐻1𝜑1subscript𝜏𝐶𝑟11𝛿𝑉𝑥𝑉𝑦𝑅𝑟1𝑟02𝐾𝑟1𝑟0\displaystyle\mathbb{E}\left[H_{1,\varphi}^{-1}(\tau_{C})\right]\leq\frac{r(1)% +1}{\delta}\left\{V(x)+V(y)+\frac{Rr(1)}{r(0)}+\frac{2Kr(1)}{r(0)}\right\}.blackboard_E [ italic_H start_POSTSUBSCRIPT 1 , italic_φ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_τ start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT ) ] ≤ divide start_ARG italic_r ( 1 ) + 1 end_ARG start_ARG italic_δ end_ARG { italic_V ( italic_x ) + italic_V ( italic_y ) + divide start_ARG italic_R italic_r ( 1 ) end_ARG start_ARG italic_r ( 0 ) end_ARG + divide start_ARG 2 italic_K italic_r ( 1 ) end_ARG start_ARG italic_r ( 0 ) end_ARG } .

Combining these upper bounds,

𝔼[H1,φ1(1mk=1mτk)]r(1)+1δ{V(x)+V(y)+R+r(1)r(0)(R+4K)}.𝔼delimited-[]superscriptsubscript𝐻1𝜑11𝑚superscriptsubscript𝑘1𝑚subscript𝜏𝑘𝑟11𝛿𝑉𝑥𝑉𝑦𝑅𝑟1𝑟0𝑅4𝐾\displaystyle\mathbb{E}\left[H_{1,\varphi}^{-1}\left(\frac{1}{m}\sum_{k=1}^{m}% \tau_{k}\right)\right]\leq\frac{r(1)+1}{\delta}\left\{V(x)+V(y)+R+\frac{r(1)}{% r(0)}(R+4K)\right\}.blackboard_E [ italic_H start_POSTSUBSCRIPT 1 , italic_φ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( divide start_ARG 1 end_ARG start_ARG italic_m end_ARG ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT italic_τ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ] ≤ divide start_ARG italic_r ( 1 ) + 1 end_ARG start_ARG italic_δ end_ARG { italic_V ( italic_x ) + italic_V ( italic_y ) + italic_R + divide start_ARG italic_r ( 1 ) end_ARG start_ARG italic_r ( 0 ) end_ARG ( italic_R + 4 italic_K ) } .

The simultaneous subgeometric drift condition (8) implies

𝔼[V(XT)]V(x0)+KT.𝔼delimited-[]𝑉subscript𝑋𝑇𝑉subscript𝑥0𝐾𝑇\mathbb{E}\left[V(X_{T})\right]\leq V(x_{0})+KT.blackboard_E [ italic_V ( italic_X start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) ] ≤ italic_V ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) + italic_K italic_T .

Choosing mmt=log(H1,φ1(t))/log(1/(1α))𝑚subscript𝑚𝑡superscriptsubscript𝐻1𝜑1𝑡11𝛼m\equiv m_{t}=\lceil\log(H_{1,\varphi}^{-1}(t))/\log(1/(1-\alpha))\rceilitalic_m ≡ italic_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = ⌈ roman_log ( italic_H start_POSTSUBSCRIPT 1 , italic_φ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_t ) ) / roman_log ( 1 / ( 1 - italic_α ) ) ⌉, we have the upper bound

𝒜𝒬T(T+t)((γ0,x0),)πTVsubscriptdelimited-∥∥superscriptsubscript𝒜superscript𝒬𝑇𝑇𝑡subscript𝛾0subscript𝑥0𝜋TV\displaystyle\left\lVert\mathcal{A}_{\mathcal{Q}^{T}}^{(T+t)}((\gamma_{0},x_{0% }),\cdot)-\pi\right\rVert_{\text{TV}}∥ caligraphic_A start_POSTSUBSCRIPT caligraphic_Q start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_T + italic_t ) end_POSTSUPERSCRIPT ( ( italic_γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) , ⋅ ) - italic_π ∥ start_POSTSUBSCRIPT TV end_POSTSUBSCRIPT
(1α)mt+[r(1)+1]{V(x0)+KT+V𝑑π+R+r(1)r(0)(R+4K)}δH1,φ1(t/mt)absentsuperscript1𝛼subscript𝑚𝑡delimited-[]𝑟11𝑉subscript𝑥0𝐾𝑇𝑉differential-d𝜋𝑅𝑟1𝑟0𝑅4𝐾𝛿superscriptsubscript𝐻1𝜑1𝑡subscript𝑚𝑡\displaystyle\leq(1-\alpha)^{m_{t}}+\frac{\left[r(1)+1\right]\left\{V(x_{0})+% KT+\int Vd\pi+R+\frac{r(1)}{r(0)}(R+4K)\right\}}{\delta H_{1,\varphi}^{-1}(t/m% _{t})}≤ ( 1 - italic_α ) start_POSTSUPERSCRIPT italic_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUPERSCRIPT + divide start_ARG [ italic_r ( 1 ) + 1 ] { italic_V ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) + italic_K italic_T + ∫ italic_V italic_d italic_π + italic_R + divide start_ARG italic_r ( 1 ) end_ARG start_ARG italic_r ( 0 ) end_ARG ( italic_R + 4 italic_K ) } end_ARG start_ARG italic_δ italic_H start_POSTSUBSCRIPT 1 , italic_φ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_t / italic_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) end_ARG
δ+[r(1)+1]{V(x0)+KT+V𝑑π}+CδH1,φ1(t/mt).absent𝛿delimited-[]𝑟11𝑉subscript𝑥0𝐾𝑇𝑉differential-d𝜋𝐶𝛿superscriptsubscript𝐻1𝜑1𝑡subscript𝑚𝑡\displaystyle\leq\frac{\delta+\left[r(1)+1\right]\left\{V(x_{0})+KT+\int Vd\pi% \right\}+C}{\delta H_{1,\varphi}^{-1}(t/m_{t})}.≤ divide start_ARG italic_δ + [ italic_r ( 1 ) + 1 ] { italic_V ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) + italic_K italic_T + ∫ italic_V italic_d italic_π } + italic_C end_ARG start_ARG italic_δ italic_H start_POSTSUBSCRIPT 1 , italic_φ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_t / italic_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) end_ARG .

4 Example: adaptive Metropolis-Hastings independence sampler

In many cases, it is difficult to choose a proposal for Metropolis-Hastings that approximately matches the tail behavior of a complex target measure and adaptive MCMC is often employed. The point of this toy example is to concretely demonstrate this scenario. We will use the upper and lower bounds on the convergence to investigate and the sensitivity to different adaptation strategies. Consider the target measure π(dx)=exp(x)I[0,)(x)dx𝜋𝑑𝑥𝑥subscript𝐼0𝑥𝑑𝑥\pi(dx)=\exp(-x)I_{[0,\infty)}(x)dxitalic_π ( italic_d italic_x ) = roman_exp ( - italic_x ) italic_I start_POSTSUBSCRIPT [ 0 , ∞ ) end_POSTSUBSCRIPT ( italic_x ) italic_d italic_x. Let (γ,γ)subscript𝛾superscript𝛾(\gamma_{*},\gamma^{*})( italic_γ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT , italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) be the interval for some potential tuning parameters 1<γ<γ1subscript𝛾superscript𝛾1<\gamma_{*}<\gamma^{*}1 < italic_γ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT < italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT and consider a Metropolis-Hastings Markov chain with independent proposal γexp(γx)I[0,)(x)𝛾𝛾𝑥subscript𝐼0𝑥\gamma\exp(-\gamma x)I_{[0,\infty)}(x)italic_γ roman_exp ( - italic_γ italic_x ) italic_I start_POSTSUBSCRIPT [ 0 , ∞ ) end_POSTSUBSCRIPT ( italic_x ) and Markov kernel defined for x,γ[0,)×(γ,γ)𝑥𝛾0subscript𝛾superscript𝛾x,\gamma\in[0,\infty)\times(\gamma_{*},\gamma^{*})italic_x , italic_γ ∈ [ 0 , ∞ ) × ( italic_γ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT , italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) and all Borel sets A[0,)𝐴0A\subseteq[0,\infty)italic_A ⊆ [ 0 , ∞ ) by

𝒫γ(x,A)=Aaγ(x,y)γexp(γy)𝑑y+δx(A)Rγ(x)subscript𝒫𝛾𝑥𝐴subscript𝐴subscript𝑎𝛾𝑥𝑦𝛾𝛾𝑦differential-d𝑦subscript𝛿𝑥𝐴subscript𝑅𝛾𝑥\displaystyle\mathcal{P}_{\gamma}(x,A)=\int_{A}a_{\gamma}(x,y)\gamma\exp(-% \gamma y)dy+\delta_{x}(A)R_{\gamma}(x)caligraphic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_x , italic_A ) = ∫ start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_x , italic_y ) italic_γ roman_exp ( - italic_γ italic_y ) italic_d italic_y + italic_δ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ( italic_A ) italic_R start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_x ) (11)

where the acceptance function is aγ(x,y)=exp[(γ1)(yx)]1subscript𝑎𝛾𝑥𝑦𝛾1𝑦𝑥1a_{\gamma}(x,y)=\exp\left[(\gamma-1)(y-x)\right]\wedge 1italic_a start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_x , italic_y ) = roman_exp [ ( italic_γ - 1 ) ( italic_y - italic_x ) ] ∧ 1 and the rejection probability is Rγ(x)=10aγ(x,y)γexp(γy)𝑑ysubscript𝑅𝛾𝑥1superscriptsubscript0subscript𝑎𝛾𝑥𝑦𝛾𝛾𝑦differential-d𝑦R_{\gamma}(x)=1-\int_{0}^{\infty}a_{\gamma}(x,y)\gamma\exp(-\gamma y)dyitalic_R start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_x ) = 1 - ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_x , italic_y ) italic_γ roman_exp ( - italic_γ italic_y ) italic_d italic_y. Since we restrict γ>1𝛾1\gamma>1italic_γ > 1, the tail probabilities of the proposal and the Metropolis-Hastings kernel are lighter than the target. Due to this restriction, we will have a polynomial lower bound over any possible adaptation plan.

Proposition 11.

Let 𝒜𝒬(t)(γ0,x0,)superscriptsubscript𝒜𝒬𝑡subscript𝛾0subscript𝑥0\mathcal{A}_{\mathcal{Q}}^{(t)}(\gamma_{0},x_{0},\cdot)caligraphic_A start_POSTSUBSCRIPT caligraphic_Q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ( italic_γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , ⋅ ) be the marginal of the adaptive independent Metropolis-Hastings process at time t+𝑡subscriptt\in\mathbb{Z}_{+}italic_t ∈ blackboard_Z start_POSTSUBSCRIPT + end_POSTSUBSCRIPT from (11) with adaptation parameter set (γ,γ)subscript𝛾superscript𝛾(\gamma_{*},\gamma^{*})( italic_γ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT , italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) and initialized at x0,γ0(0,)×(γ,γ)subscript𝑥0subscript𝛾00subscript𝛾superscript𝛾x_{0},\gamma_{0}\in(0,\infty)\times(\gamma_{*},\gamma^{*})italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∈ ( 0 , ∞ ) × ( italic_γ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT , italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ). Then

inf𝒬𝒮([0,),(γ,γ))𝒜𝒬(t)(γ0,0,)πTVM(ct+1)1γ1subscriptinfimum𝒬𝒮0subscript𝛾superscript𝛾subscriptdelimited-∥∥superscriptsubscript𝒜𝒬𝑡subscript𝛾00𝜋TVsubscript𝑀superscriptsubscript𝑐𝑡11subscript𝛾1\inf_{\mathcal{Q}\in\mathcal{S}([0,\infty),(\gamma_{*},\gamma^{*}))}\left% \lVert\mathcal{A}_{\mathcal{Q}}^{(t)}(\gamma_{0},0,\cdot)-\pi\right\rVert_{% \text{TV}}\geq\frac{M_{*}}{\left(c_{*}t+1\right)^{\frac{1}{\gamma_{*}-1}}}roman_inf start_POSTSUBSCRIPT caligraphic_Q ∈ caligraphic_S ( [ 0 , ∞ ) , ( italic_γ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT , italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ) end_POSTSUBSCRIPT ∥ caligraphic_A start_POSTSUBSCRIPT caligraphic_Q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ( italic_γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , 0 , ⋅ ) - italic_π ∥ start_POSTSUBSCRIPT TV end_POSTSUBSCRIPT ≥ divide start_ARG italic_M start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT end_ARG start_ARG ( italic_c start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT italic_t + 1 ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_γ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT - 1 end_ARG end_POSTSUPERSCRIPT end_ARG

where M=γ1/(γ1)γγ/(γ1)subscript𝑀superscriptsubscript𝛾1subscript𝛾1superscriptsubscript𝛾subscript𝛾subscript𝛾1M_{*}=\gamma_{*}^{-1/(\gamma_{*}-1)}-\gamma_{*}^{-\gamma_{*}/(\gamma_{*}-1)}italic_M start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT = italic_γ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 / ( italic_γ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT - 1 ) end_POSTSUPERSCRIPT - italic_γ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_γ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT / ( italic_γ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT - 1 ) end_POSTSUPERSCRIPT and c=γγ1+γ1subscript𝑐superscript𝛾subscript𝛾1superscript𝛾1c_{*}=\frac{\gamma^{*}}{\gamma_{*}-1}+\gamma^{*}-1italic_c start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT = divide start_ARG italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG start_ARG italic_γ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT - 1 end_ARG + italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - 1.

Proof.

Define W(x)=exp(x)𝑊𝑥𝑥W(x)=\exp(x)italic_W ( italic_x ) = roman_exp ( italic_x ), and we have by a standard computation π(W(x)r)=r1𝜋𝑊𝑥𝑟superscript𝑟1\pi(W(x)\geq r)=r^{-1}italic_π ( italic_W ( italic_x ) ≥ italic_r ) = italic_r start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT. Assume 1<γ<γ1subscript𝛾𝛾1<\gamma_{*}<\gamma1 < italic_γ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT < italic_γ. For α<γ𝛼𝛾\alpha<\gammaitalic_α < italic_γ, the identity holds

(𝒫γWα)(x)Wα(x)subscript𝒫𝛾superscript𝑊𝛼𝑥superscript𝑊𝛼𝑥\displaystyle(\mathcal{P}_{\gamma}W^{\alpha})(x)-W^{\alpha}(x)( caligraphic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT italic_W start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ) ( italic_x ) - italic_W start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ( italic_x )
=[0,)exp(αy)1{exp[(γ1)(yx)]}γexp(γy)dyexp(αx)𝔼(aγ(x,Y)).absentsubscript0𝛼𝑦1𝛾1𝑦𝑥𝛾𝛾𝑦𝑑𝑦𝛼𝑥𝔼subscript𝑎𝛾𝑥𝑌\displaystyle=\int_{[0,\infty)}\exp(\alpha y)1\wedge\left\{\exp\left[(\gamma-1% )(y-x)\right]\right\}\gamma\exp(-\gamma y)dy-\exp(\alpha x)\mathbb{E}\left(a_{% \gamma}(x,Y)\right).= ∫ start_POSTSUBSCRIPT [ 0 , ∞ ) end_POSTSUBSCRIPT roman_exp ( italic_α italic_y ) 1 ∧ { roman_exp [ ( italic_γ - 1 ) ( italic_y - italic_x ) ] } italic_γ roman_exp ( - italic_γ italic_y ) italic_d italic_y - roman_exp ( italic_α italic_x ) blackboard_E ( italic_a start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_x , italic_Y ) ) .

We also have for any α<γ𝛼𝛾\alpha<\gammaitalic_α < italic_γ,

[0,)exp(αy)1{exp[(γ1)(yx)]}γexp(γy)dysubscript0𝛼𝑦1𝛾1𝑦𝑥𝛾𝛾𝑦𝑑𝑦\displaystyle\int_{[0,\infty)}\exp(\alpha y)1\wedge\left\{\exp\left[(\gamma-1)% (y-x)\right]\right\}\gamma\exp(-\gamma y)dy∫ start_POSTSUBSCRIPT [ 0 , ∞ ) end_POSTSUBSCRIPT roman_exp ( italic_α italic_y ) 1 ∧ { roman_exp [ ( italic_γ - 1 ) ( italic_y - italic_x ) ] } italic_γ roman_exp ( - italic_γ italic_y ) italic_d italic_y
=γexp[(1γ)x]α10x(α1)exp[(α1)y]𝑑yabsent𝛾1𝛾𝑥𝛼1superscriptsubscript0𝑥𝛼1𝛼1𝑦differential-d𝑦\displaystyle=\frac{\gamma\exp\left[(1-\gamma)x\right]}{\alpha-1}\int_{0}^{x}(% \alpha-1)\exp\left[(\alpha-1)y\right]dy= divide start_ARG italic_γ roman_exp [ ( 1 - italic_γ ) italic_x ] end_ARG start_ARG italic_α - 1 end_ARG ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT ( italic_α - 1 ) roman_exp [ ( italic_α - 1 ) italic_y ] italic_d italic_y
+γγαx(γα)exp[(γα)y]𝑑y𝛾𝛾𝛼superscriptsubscript𝑥𝛾𝛼𝛾𝛼𝑦differential-d𝑦\displaystyle\hskip 15.00002pt+\frac{\gamma}{\gamma-\alpha}\int_{x}^{\infty}(% \gamma-\alpha)\exp\left[-(\gamma-\alpha)y\right]dy+ divide start_ARG italic_γ end_ARG start_ARG italic_γ - italic_α end_ARG ∫ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ( italic_γ - italic_α ) roman_exp [ - ( italic_γ - italic_α ) italic_y ] italic_d italic_y
={γα1+γγα}exp[(γα)x]γα1exp[(γ1)x].absent𝛾𝛼1𝛾𝛾𝛼𝛾𝛼𝑥𝛾𝛼1𝛾1𝑥\displaystyle=\left\{\frac{\gamma}{\alpha-1}+\frac{\gamma}{\gamma-\alpha}% \right\}\exp\left[-(\gamma-\alpha)x\right]-\frac{\gamma}{\alpha-1}\exp\left[-(% \gamma-1)x\right].= { divide start_ARG italic_γ end_ARG start_ARG italic_α - 1 end_ARG + divide start_ARG italic_γ end_ARG start_ARG italic_γ - italic_α end_ARG } roman_exp [ - ( italic_γ - italic_α ) italic_x ] - divide start_ARG italic_γ end_ARG start_ARG italic_α - 1 end_ARG roman_exp [ - ( italic_γ - 1 ) italic_x ] . (12)

Using (12), 𝔼(aγ(x,Y))=γexp((1γ)x)+(1γ)exp(γx).𝔼subscript𝑎𝛾𝑥𝑌𝛾1𝛾𝑥1𝛾𝛾𝑥\mathbb{E}\left(a_{\gamma}(x,Y)\right)=\gamma\exp((1-\gamma)x)+(1-\gamma)\exp(% -\gamma x).blackboard_E ( italic_a start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_x , italic_Y ) ) = italic_γ roman_exp ( ( 1 - italic_γ ) italic_x ) + ( 1 - italic_γ ) roman_exp ( - italic_γ italic_x ) . So then

(𝒫γWα)(x)Wα(x)subscript𝒫𝛾superscript𝑊𝛼𝑥superscript𝑊𝛼𝑥\displaystyle(\mathcal{P}_{\gamma}W^{\alpha})(x)-W^{\alpha}(x)( caligraphic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT italic_W start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ) ( italic_x ) - italic_W start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ( italic_x )
={γα1+γγα+γ1}exp[(γα)x]γα1exp[(γ1)x]γW(x)1+αγα.absent𝛾𝛼1𝛾𝛾𝛼𝛾1𝛾𝛼𝑥𝛾𝛼1𝛾1𝑥𝛾𝑊superscript𝑥1𝛼𝛾𝛼\displaystyle=\left\{\frac{\gamma}{\alpha-1}+\frac{\gamma}{\gamma-\alpha}+% \gamma-1\right\}\exp\left[-(\gamma-\alpha)x\right]-\frac{\gamma}{\alpha-1}\exp% \left[-(\gamma-1)x\right]-\gamma W(x)^{\frac{1+\alpha-\gamma}{\alpha}}.= { divide start_ARG italic_γ end_ARG start_ARG italic_α - 1 end_ARG + divide start_ARG italic_γ end_ARG start_ARG italic_γ - italic_α end_ARG + italic_γ - 1 } roman_exp [ - ( italic_γ - italic_α ) italic_x ] - divide start_ARG italic_γ end_ARG start_ARG italic_α - 1 end_ARG roman_exp [ - ( italic_γ - 1 ) italic_x ] - italic_γ italic_W ( italic_x ) start_POSTSUPERSCRIPT divide start_ARG 1 + italic_α - italic_γ end_ARG start_ARG italic_α end_ARG end_POSTSUPERSCRIPT .

It follows that

(𝒫γWγ)(x)Wγ(x)subscript𝒫𝛾superscript𝑊subscript𝛾𝑥superscript𝑊subscript𝛾𝑥\displaystyle(\mathcal{P}_{\gamma}W^{\gamma_{*}})(x)-W^{\gamma_{*}}(x)( caligraphic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT italic_W start_POSTSUPERSCRIPT italic_γ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) ( italic_x ) - italic_W start_POSTSUPERSCRIPT italic_γ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( italic_x ) ={γγ1+γγγ+γ1}exp((γγ)x)absent𝛾subscript𝛾1𝛾𝛾subscript𝛾𝛾1𝛾subscript𝛾𝑥\displaystyle=\left\{\frac{\gamma}{\gamma_{*}-1}+\frac{\gamma}{\gamma-\gamma_{% *}}+\gamma-1\right\}\exp(-(\gamma-\gamma_{*})x)= { divide start_ARG italic_γ end_ARG start_ARG italic_γ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT - 1 end_ARG + divide start_ARG italic_γ end_ARG start_ARG italic_γ - italic_γ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT end_ARG + italic_γ - 1 } roman_exp ( - ( italic_γ - italic_γ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) italic_x )
γγ1exp((γ1)x)γWγ(x)1+γγγ𝛾subscript𝛾1𝛾1𝑥𝛾superscript𝑊subscript𝛾superscript𝑥1subscript𝛾𝛾subscript𝛾\displaystyle\hskip 15.00002pt-\frac{\gamma}{\gamma_{*}-1}\exp(-(\gamma-1)x)-% \gamma W^{\gamma_{*}}(x)^{\frac{1+\gamma_{*}-\gamma}{\gamma_{*}}}- divide start_ARG italic_γ end_ARG start_ARG italic_γ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT - 1 end_ARG roman_exp ( - ( italic_γ - 1 ) italic_x ) - italic_γ italic_W start_POSTSUPERSCRIPT italic_γ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( italic_x ) start_POSTSUPERSCRIPT divide start_ARG 1 + italic_γ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT - italic_γ end_ARG start_ARG italic_γ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT end_ARG end_POSTSUPERSCRIPT
c.absentsubscript𝑐\displaystyle\leq c_{*}.≤ italic_c start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT . (13)

With the upper bound in (13), we can now use Theorem 1 with φc𝜑superscript𝑐\varphi\equiv c^{*}italic_φ ≡ italic_c start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, κ=1𝜅1\kappa=1italic_κ = 1, and α=γ𝛼subscript𝛾\alpha=\gamma_{*}italic_α = italic_γ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT. We then have the lower bound

𝒜𝒬(t)(γ0,x0,)πTVM[ct+exp(γx0)]1γ1subscriptdelimited-∥∥superscriptsubscript𝒜𝒬𝑡subscript𝛾0subscript𝑥0𝜋TVsubscript𝑀superscriptdelimited-[]subscript𝑐𝑡subscript𝛾subscript𝑥01subscript𝛾1\displaystyle\left\lVert\mathcal{A}_{\mathcal{Q}}^{(t)}(\gamma_{0},x_{0},\cdot% )-\pi\right\rVert_{\text{TV}}\geq\frac{M_{*}}{\left[c_{*}t+\exp(\gamma_{*}x_{0% })\right]^{\frac{1}{\gamma_{*}-1}}}∥ caligraphic_A start_POSTSUBSCRIPT caligraphic_Q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ( italic_γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , ⋅ ) - italic_π ∥ start_POSTSUBSCRIPT TV end_POSTSUBSCRIPT ≥ divide start_ARG italic_M start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT end_ARG start_ARG [ italic_c start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT italic_t + roman_exp ( italic_γ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ] start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_γ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT - 1 end_ARG end_POSTSUPERSCRIPT end_ARG

holds for every t+𝑡subscriptt\in\mathbb{Z}_{+}italic_t ∈ blackboard_Z start_POSTSUBSCRIPT + end_POSTSUBSCRIPT uniformly in the adaptation strategy 𝒬𝒬\mathcal{Q}caligraphic_Q. ∎

In Table 2, we compute the lower bound in Proposition 11 for different choices of (γ,γ)subscript𝛾superscript𝛾(\gamma_{*},\gamma^{*})( italic_γ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT , italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ). The large values from Table 2 illustrate that even in this toy example, it is possible to observe poor convergence behavior of adaptive MCMC with certain tuning parameter sets independently of the adaptation strategy. However, this limitation on the convergence rate can be avoided if the adaptation plan is capable of crossing the critical boundary γ=1𝛾1\gamma=1italic_γ = 1. By Theorem 4, the lower bound rate in Proposition 11 will be the same even when converging weakly.

Lower bound computations from Proposition 11
Iteration (γ,γ)=(3,5)subscript𝛾superscript𝛾35(\gamma_{*},\gamma^{*})=(3,5)( italic_γ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT , italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) = ( 3 , 5 ) (γ,γ)=(4,6)subscript𝛾superscript𝛾46(\gamma_{*},\gamma^{*})=(4,6)( italic_γ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT , italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) = ( 4 , 6 ) (γ,γ)=(8,10)subscript𝛾superscript𝛾810(\gamma_{*},\gamma^{*})=(8,10)( italic_γ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT , italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) = ( 8 , 10 )
103superscript10310^{3}10 start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT 0.00480.00480.00480.0048 0.02470.02470.02470.0247 0.1730.1730.1730.173
104superscript10410^{4}10 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT 0.00150.00150.00150.0015 0.01150.01150.01150.0115 0.1250.1250.1250.125
105superscript10510^{5}10 start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT 0.00050.00050.00050.0005 0.005320.005320.005320.00532 0.08980.08980.08980.0898
106superscript10610^{6}10 start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT 0.00010.00010.00010.0001 0.002470.002470.002470.00247 0.06460.06460.06460.0646
Table 2: Lower bound computations from Propositon 11 for the adaptive Metropolis-Hastings independence sampler.

We now look at upper bounds from Section 3 where we require adaptation is restricted to a compact set. This is a commonly used strategy in adaptive MCMC [Pompe et al., 2020].

Proposition 12.

For t+𝑡subscriptt\in\mathbb{Z}_{+}italic_t ∈ blackboard_Z start_POSTSUBSCRIPT + end_POSTSUBSCRIPT, let 𝒜𝒬(t)(γ0,x0,)superscriptsubscript𝒜𝒬𝑡subscript𝛾0subscript𝑥0\mathcal{A}_{\mathcal{Q}}^{(t)}(\gamma_{0},x_{0},\cdot)caligraphic_A start_POSTSUBSCRIPT caligraphic_Q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ( italic_γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , ⋅ ) be the marginal of an adaptive independent Metropolis-Hastings process and M,csubscript𝑀subscript𝑐M_{*},c_{*}italic_M start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT defined in Proposition 11. Assume for each t+𝑡subscriptt\in\mathbb{Z}_{+}italic_t ∈ blackboard_Z start_POSTSUBSCRIPT + end_POSTSUBSCRIPT, 𝒫Γt+1(x,)=𝒫Γt(x,)subscript𝒫subscriptΓ𝑡1𝑥subscript𝒫subscriptΓ𝑡𝑥\mathcal{P}_{\Gamma_{t+1}}(x,\cdot)=\mathcal{P}_{\Gamma_{t}}(x,\cdot)caligraphic_P start_POSTSUBSCRIPT roman_Γ start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x , ⋅ ) = caligraphic_P start_POSTSUBSCRIPT roman_Γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x , ⋅ ) for all xr𝑥𝑟x\geq ritalic_x ≥ italic_r for some r>0𝑟0r>0italic_r > 0 and

supx𝒳𝔼[Γt+1ΓtFXt=x]G(t)subscriptsupremum𝑥𝒳𝔼delimited-[]conditionalsubscriptdelimited-∥∥subscriptΓ𝑡1subscriptΓ𝑡𝐹subscript𝑋𝑡𝑥𝐺𝑡\sup_{x\in\mathcal{X}}\mathbb{E}\left[\left\lVert\Gamma_{t+1}-\Gamma_{t}\right% \rVert_{F}\mid X_{t}=x\right]\leq G(t)roman_sup start_POSTSUBSCRIPT italic_x ∈ caligraphic_X end_POSTSUBSCRIPT blackboard_E [ ∥ roman_Γ start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT - roman_Γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT ∣ italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_x ] ≤ italic_G ( italic_t )

for some function G()𝐺G(\cdot)italic_G ( ⋅ ) strictly decreasing to infinity. Additionally, assume γ<2ϵsuperscript𝛾2italic-ϵ\gamma^{*}<2-\epsilonitalic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT < 2 - italic_ϵ for some ϵ(0,1)italic-ϵ01\epsilon\in(0,1)italic_ϵ ∈ ( 0 , 1 ). Then for all δ(0,1)𝛿01\delta\in(0,1)italic_δ ∈ ( 0 , 1 ) and t+𝑡subscriptt\in\mathbb{Z}_{+}italic_t ∈ blackboard_Z start_POSTSUBSCRIPT + end_POSTSUBSCRIPT with Tδ,t=(1/G)1(Jt2/δ)subscript𝑇𝛿𝑡superscript1𝐺1𝐽superscript𝑡2𝛿T_{\delta,t}=(1/G)^{-1}\left(Jt^{2}/\delta\right)italic_T start_POSTSUBSCRIPT italic_δ , italic_t end_POSTSUBSCRIPT = ( 1 / italic_G ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_J italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / italic_δ ),

M[c(Tδ,t+t)+1]1γ1subscript𝑀superscriptdelimited-[]subscript𝑐subscript𝑇𝛿𝑡𝑡11subscript𝛾1absent\displaystyle\frac{M_{*}}{\left[c_{*}(T_{\delta,t}+t)+1\right]^{\frac{1}{% \gamma_{*}-1}}}\leqdivide start_ARG italic_M start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT end_ARG start_ARG [ italic_c start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ( italic_T start_POSTSUBSCRIPT italic_δ , italic_t end_POSTSUBSCRIPT + italic_t ) + 1 ] start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_γ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT - 1 end_ARG end_POSTSUPERSCRIPT end_ARG ≤ 𝒜𝒬(Tδ,t+t)((γ0,0),)πTVsubscriptdelimited-∥∥superscriptsubscript𝒜𝒬subscript𝑇𝛿𝑡𝑡subscript𝛾00𝜋TV\displaystyle\left\lVert\mathcal{A}_{\mathcal{Q}}^{(T_{\delta,t}+t)}((\gamma_{% 0},0),\cdot)-\pi\right\rVert_{\text{TV}}∥ caligraphic_A start_POSTSUBSCRIPT caligraphic_Q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_T start_POSTSUBSCRIPT italic_δ , italic_t end_POSTSUBSCRIPT + italic_t ) end_POSTSUPERSCRIPT ( ( italic_γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , 0 ) , ⋅ ) - italic_π ∥ start_POSTSUBSCRIPT TV end_POSTSUBSCRIPT 2[r(1)+1]KTδ,t+C[1+ctlog(1+ct)/log(1α)+1]1ϵγ1+δabsent2delimited-[]𝑟11𝐾subscript𝑇𝛿𝑡𝐶superscriptdelimited-[]1𝑐𝑡1𝑐𝑡1𝛼11italic-ϵsuperscript𝛾1𝛿\displaystyle\leq\frac{2\left[r(1)+1\right]KT_{\delta,t}+C}{\left[1+c\frac{t}{% -\log(1+ct)/\log(1-\alpha)+1}\right]^{\frac{1-\epsilon}{\gamma^{*}-1}}}+\delta≤ divide start_ARG 2 [ italic_r ( 1 ) + 1 ] italic_K italic_T start_POSTSUBSCRIPT italic_δ , italic_t end_POSTSUBSCRIPT + italic_C end_ARG start_ARG [ 1 + italic_c divide start_ARG italic_t end_ARG start_ARG - roman_log ( 1 + italic_c italic_t ) / roman_log ( 1 - italic_α ) + 1 end_ARG ] start_POSTSUPERSCRIPT divide start_ARG 1 - italic_ϵ end_ARG start_ARG italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - 1 end_ARG end_POSTSUPERSCRIPT end_ARG + italic_δ

where

J=2/γ2+r+1γ,c=γ11ϵ,K=γγ1+ϵ,α=γ(2K)γ11ϵ,r(1)=(c+1)2ϵγγ1,formulae-sequence𝐽2superscriptsubscript𝛾2𝑟1subscript𝛾formulae-sequence𝑐superscript𝛾11italic-ϵformulae-sequence𝐾superscript𝛾subscript𝛾1italic-ϵformulae-sequence𝛼subscript𝛾superscript2𝐾superscript𝛾11italic-ϵ𝑟1superscript𝑐12italic-ϵsuperscript𝛾superscript𝛾1\displaystyle J=2/\gamma_{*}^{2}+r+\frac{1}{\gamma_{*}},c=\frac{\gamma^{*}-1}{% 1-\epsilon},K=\frac{\gamma^{*}}{\gamma_{*}-1+\epsilon},\alpha=\frac{\gamma_{*}% }{(2K)^{\frac{\gamma*^{-}1}{1-\epsilon}}},r(1)=\left(c+1\right)^{\frac{2-% \epsilon-\gamma^{*}}{\gamma^{*}-1}},italic_J = 2 / italic_γ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_r + divide start_ARG 1 end_ARG start_ARG italic_γ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT end_ARG , italic_c = divide start_ARG italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - 1 end_ARG start_ARG 1 - italic_ϵ end_ARG , italic_K = divide start_ARG italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG start_ARG italic_γ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT - 1 + italic_ϵ end_ARG , italic_α = divide start_ARG italic_γ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT end_ARG start_ARG ( 2 italic_K ) start_POSTSUPERSCRIPT divide start_ARG italic_γ ∗ start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT 1 end_ARG start_ARG 1 - italic_ϵ end_ARG end_POSTSUPERSCRIPT end_ARG , italic_r ( 1 ) = ( italic_c + 1 ) start_POSTSUPERSCRIPT divide start_ARG 2 - italic_ϵ - italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG start_ARG italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - 1 end_ARG end_POSTSUPERSCRIPT ,
R=(4K)1ϵ2ϵγ,C=1+2[r(1)+1][1+ϵ1]+[r(1)+1]{R+r(1)(R+4K)}.formulae-sequence𝑅superscript4𝐾1italic-ϵ2italic-ϵsuperscript𝛾𝐶12delimited-[]𝑟11delimited-[]1superscriptitalic-ϵ1delimited-[]𝑟11𝑅𝑟1𝑅4𝐾\displaystyle R=(4K)^{\frac{1-\epsilon}{2-\epsilon-\gamma^{*}}},C=1+2\left[r(1% )+1\right][1+\epsilon^{-1}]+\left[r(1)+1\right]\left\{R+r(1)(R+4K)\right\}.italic_R = ( 4 italic_K ) start_POSTSUPERSCRIPT divide start_ARG 1 - italic_ϵ end_ARG start_ARG 2 - italic_ϵ - italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG end_POSTSUPERSCRIPT , italic_C = 1 + 2 [ italic_r ( 1 ) + 1 ] [ 1 + italic_ϵ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ] + [ italic_r ( 1 ) + 1 ] { italic_R + italic_r ( 1 ) ( italic_R + 4 italic_K ) } .
Proof.

We will use Theorem 10 to establish the upper bound. We first verify the expected diminishing adaptation condition (7). Let φ:[0,1]:𝜑01\varphi:\mathbb{R}\to[0,1]italic_φ : blackboard_R → [ 0 , 1 ] and for x>0𝑥0x>0italic_x > 0 and let ψx(y)=φ(y)φ(x)subscript𝜓𝑥𝑦𝜑𝑦𝜑𝑥\psi_{x}(y)=\varphi(y)-\varphi(x)italic_ψ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ( italic_y ) = italic_φ ( italic_y ) - italic_φ ( italic_x ). Then

𝒫γφ(x)𝒫γφ(x)=𝒫γψx(x)𝒫γψx(x)subscript𝒫superscript𝛾𝜑𝑥subscript𝒫𝛾𝜑𝑥subscript𝒫superscript𝛾subscript𝜓𝑥𝑥subscript𝒫𝛾subscript𝜓𝑥𝑥\displaystyle\mathcal{P}_{\gamma^{\prime}}\varphi(x)-\mathcal{P}_{\gamma}% \varphi(x)=\mathcal{P}_{\gamma^{\prime}}\psi_{x}(x)-\mathcal{P}_{\gamma}\psi_{% x}(x)caligraphic_P start_POSTSUBSCRIPT italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_φ ( italic_x ) - caligraphic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT italic_φ ( italic_x ) = caligraphic_P start_POSTSUBSCRIPT italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_ψ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ( italic_x ) - caligraphic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT italic_ψ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ( italic_x )
=ψx(y)[aγ(x,y)γexp(γy)aγ(x,y)γexp(γy)]𝑑yabsentsubscript𝜓𝑥𝑦delimited-[]subscript𝑎superscript𝛾𝑥𝑦superscript𝛾superscript𝛾𝑦subscript𝑎𝛾𝑥𝑦𝛾𝛾𝑦differential-d𝑦\displaystyle=\int\psi_{x}(y)\left[a_{\gamma^{\prime}}(x,y)\gamma^{\prime}\exp% (-\gamma^{\prime}y)-a_{\gamma}(x,y)\gamma\exp(-\gamma y)\right]dy= ∫ italic_ψ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ( italic_y ) [ italic_a start_POSTSUBSCRIPT italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_x , italic_y ) italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT roman_exp ( - italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_y ) - italic_a start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_x , italic_y ) italic_γ roman_exp ( - italic_γ italic_y ) ] italic_d italic_y
|γγ||yx|γexp(γy)𝑑y+|γexp(γy)γexp(γy)|𝑑yabsentsuperscript𝛾𝛾𝑦𝑥superscript𝛾superscript𝛾𝑦differential-d𝑦superscript𝛾superscript𝛾𝑦𝛾𝛾𝑦differential-d𝑦\displaystyle\leq|\gamma^{\prime}-\gamma|\int|y-x|\gamma^{\prime}\exp(-\gamma^% {\prime}y)dy+\int|\gamma^{\prime}\exp(-\gamma^{\prime}y)-\gamma\exp(-\gamma y)% |dy≤ | italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - italic_γ | ∫ | italic_y - italic_x | italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT roman_exp ( - italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_y ) italic_d italic_y + ∫ | italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT roman_exp ( - italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_y ) - italic_γ roman_exp ( - italic_γ italic_y ) | italic_d italic_y
(2/γ2+|x|)|γγ|+1γ|γγ|.absent2superscriptsubscript𝛾2𝑥superscript𝛾𝛾1subscript𝛾superscript𝛾𝛾\displaystyle\leq(2/\gamma_{*}^{2}+|x|)|\gamma^{\prime}-\gamma|+\frac{1}{% \gamma_{*}}|\gamma^{\prime}-\gamma|.≤ ( 2 / italic_γ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + | italic_x | ) | italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - italic_γ | + divide start_ARG 1 end_ARG start_ARG italic_γ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT end_ARG | italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - italic_γ | .

Let J=2/γ2+r+1γ𝐽2superscriptsubscript𝛾2𝑟1subscript𝛾J=2/\gamma_{*}^{2}+r+\frac{1}{\gamma_{*}}italic_J = 2 / italic_γ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_r + divide start_ARG 1 end_ARG start_ARG italic_γ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT end_ARG and so expected diminishing adaptation (7) since

supx[0,)𝔼(𝒫Γt+1(x,)𝒫Γt(x,)TVXt=x)subscriptsupremum𝑥0𝔼conditionalsubscriptdelimited-∥∥subscript𝒫subscriptΓ𝑡1𝑥subscript𝒫subscriptΓ𝑡𝑥TVsubscript𝑋𝑡𝑥\displaystyle\sup_{x\in[0,\infty)}\mathbb{E}\left(\left\lVert\mathcal{P}_{% \Gamma_{t+1}}(x,\cdot)-\mathcal{P}_{\Gamma_{t}}(x,\cdot)\right\rVert_{\text{TV% }}\mid X_{t}=x\right)roman_sup start_POSTSUBSCRIPT italic_x ∈ [ 0 , ∞ ) end_POSTSUBSCRIPT blackboard_E ( ∥ caligraphic_P start_POSTSUBSCRIPT roman_Γ start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x , ⋅ ) - caligraphic_P start_POSTSUBSCRIPT roman_Γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x , ⋅ ) ∥ start_POSTSUBSCRIPT TV end_POSTSUBSCRIPT ∣ italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_x ) Jsup|x|r𝔼(|Γt+1Γt|Xt=x)absent𝐽subscriptsupremum𝑥𝑟𝔼conditionalsubscriptΓ𝑡1subscriptΓ𝑡subscript𝑋𝑡𝑥\displaystyle\leq J\sup_{|x|\leq r}\mathbb{E}\left(|\Gamma_{t+1}-\Gamma_{t}|% \mid X_{t}=x\right)≤ italic_J roman_sup start_POSTSUBSCRIPT | italic_x | ≤ italic_r end_POSTSUBSCRIPT blackboard_E ( | roman_Γ start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT - roman_Γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | ∣ italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_x )
JG(t).absent𝐽𝐺𝑡\displaystyle\leq JG(t).≤ italic_J italic_G ( italic_t ) .

Next, we verify the simultaneous subgeometric drift condition. Let V(x)=exp(x)𝑉𝑥𝑥V(x)=\exp(x)italic_V ( italic_x ) = roman_exp ( italic_x ), and using the identity (12), for ϵ(0,1)italic-ϵ01\epsilon\in(0,1)italic_ϵ ∈ ( 0 , 1 ),

(𝒫γV1ϵ)(x)V1ϵ(x){γγ1+ϵ+γ1}γV(x)2ϵγ1ϵ.subscript𝒫𝛾superscript𝑉1italic-ϵ𝑥superscript𝑉1italic-ϵ𝑥superscript𝛾subscript𝛾1italic-ϵsuperscript𝛾1subscript𝛾𝑉superscript𝑥2italic-ϵsuperscript𝛾1italic-ϵ\displaystyle(\mathcal{P}_{\gamma}V^{1-\epsilon})(x)-V^{1-\epsilon}(x)\leq% \left\{\frac{\gamma^{*}}{\gamma_{*}-1+\epsilon}+\gamma^{*}-1\right\}-\gamma_{*% }V(x)^{\frac{2-\epsilon-\gamma^{*}}{1-\epsilon}}.( caligraphic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT italic_V start_POSTSUPERSCRIPT 1 - italic_ϵ end_POSTSUPERSCRIPT ) ( italic_x ) - italic_V start_POSTSUPERSCRIPT 1 - italic_ϵ end_POSTSUPERSCRIPT ( italic_x ) ≤ { divide start_ARG italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG start_ARG italic_γ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT - 1 + italic_ϵ end_ARG + italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - 1 } - italic_γ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT italic_V ( italic_x ) start_POSTSUPERSCRIPT divide start_ARG 2 - italic_ϵ - italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG start_ARG 1 - italic_ϵ end_ARG end_POSTSUPERSCRIPT .

Now we satisfy the simultaneous local coupling with a minorization condition. If V(x)R𝑉𝑥𝑅V(x)\leq Ritalic_V ( italic_x ) ≤ italic_R, then exp((γ1)x)Rγ11ϵ𝛾1𝑥superscript𝑅𝛾11italic-ϵ\exp((\gamma-1)x)\leq R^{\frac{\gamma-1}{1-\epsilon}}roman_exp ( ( italic_γ - 1 ) italic_x ) ≤ italic_R start_POSTSUPERSCRIPT divide start_ARG italic_γ - 1 end_ARG start_ARG 1 - italic_ϵ end_ARG end_POSTSUPERSCRIPT. Define νγ()=Z11exp((γ1)y)γexp(γy)dysubscript𝜈𝛾superscript𝑍1subscript1𝛾1𝑦𝛾𝛾𝑦𝑑𝑦\nu_{\gamma}(\cdot)=Z^{-1}\int_{\cdot}1\wedge\exp((\gamma-1)y)\gamma\exp(-% \gamma y)dyitalic_ν start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( ⋅ ) = italic_Z start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∫ start_POSTSUBSCRIPT ⋅ end_POSTSUBSCRIPT 1 ∧ roman_exp ( ( italic_γ - 1 ) italic_y ) italic_γ roman_exp ( - italic_γ italic_y ) italic_d italic_y where Z𝑍Zitalic_Z is the normalizing constant. So then

infV(x)R𝒫γ(x,)γRγ11ϵνγ().subscriptinfimum𝑉𝑥𝑅subscript𝒫𝛾𝑥subscript𝛾superscript𝑅superscript𝛾11italic-ϵsubscript𝜈𝛾\inf_{V(x)\leq R}\mathcal{P}_{\gamma}(x,\cdot)\geq\frac{\gamma_{*}}{R^{\frac{% \gamma*^{-}1}{1-\epsilon}}}\nu_{\gamma}(\cdot).roman_inf start_POSTSUBSCRIPT italic_V ( italic_x ) ≤ italic_R end_POSTSUBSCRIPT caligraphic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_x , ⋅ ) ≥ divide start_ARG italic_γ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT end_ARG start_ARG italic_R start_POSTSUPERSCRIPT divide start_ARG italic_γ ∗ start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT 1 end_ARG start_ARG 1 - italic_ϵ end_ARG end_POSTSUPERSCRIPT end_ARG italic_ν start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( ⋅ ) .

If adaptation diminishes fast enough, Proposition 12 shows the upper bound rate is essentially (1+t)1ϵ1γsuperscript1𝑡1italic-ϵ1superscript𝛾(1+t)^{\frac{1-\epsilon}{1-\gamma^{*}}}( 1 + italic_t ) start_POSTSUPERSCRIPT divide start_ARG 1 - italic_ϵ end_ARG start_ARG 1 - italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG end_POSTSUPERSCRIPT and depends on the largest tuning parameter γsuperscript𝛾\gamma^{*}italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT. This is due to the adaptation plan possibly concentrating on γsuperscript𝛾\gamma^{*}italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, which is farthest from the optimal choice. On the other hand, the lower bound rate (1+t)11γsuperscript1𝑡11subscript𝛾(1+t)^{\frac{1}{1-\gamma_{*}}}( 1 + italic_t ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 1 - italic_γ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT end_ARG end_POSTSUPERSCRIPT depends on the smallest tuning parameter γsubscript𝛾\gamma_{*}italic_γ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT being closest to the optimal choice. In particular, there can be a gap in the upper and lower bounds on the convergence characterized by potential tuning parameters. In Table 3, we compare the upper bound convergence rates with ϵ=.01italic-ϵ.01\epsilon=.01italic_ϵ = .01, G(t)=exp(t)𝐺𝑡𝑡G(t)=\exp(-t)italic_G ( italic_t ) = roman_exp ( - italic_t ), and δ=(1+ct)1ϵ1γ𝛿superscript1𝑐𝑡1italic-ϵ1superscript𝛾\delta=(1+ct)^{\frac{1-\epsilon}{1-\gamma^{*}}}italic_δ = ( 1 + italic_c italic_t ) start_POSTSUPERSCRIPT divide start_ARG 1 - italic_ϵ end_ARG start_ARG 1 - italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG end_POSTSUPERSCRIPT. We observe that the upper bound is sensitive to the tuning of γsuperscript𝛾\gamma^{*}italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT.

Upper bound computations from Proposition 12
Iteration t𝑡titalic_t (γ,γ)=(1.2,1.5)subscript𝛾superscript𝛾1.21.5(\gamma_{*},\gamma^{*})=(1.2,1.5)( italic_γ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT , italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) = ( 1.2 , 1.5 ) (γ,γ)=(1.2,1.6)subscript𝛾superscript𝛾1.21.6(\gamma_{*},\gamma^{*})=(1.2,1.6)( italic_γ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT , italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) = ( 1.2 , 1.6 ) (γ,γ)=(1.2,1.7)subscript𝛾superscript𝛾1.21.7(\gamma_{*},\gamma^{*})=(1.2,1.7)( italic_γ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT , italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) = ( 1.2 , 1.7 )
103superscript10310^{3}10 start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT 2.0481012.048superscript1012.048\cdot 10^{-1}2.048 ⋅ 10 start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT 7.8591007.859superscript1007.859\cdot 10^{0}7.859 ⋅ 10 start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT 7.3681027.368superscript1027.368\cdot 10^{2}7.368 ⋅ 10 start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
104superscript10410^{4}10 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT 2.2171032.217superscript1032.217\cdot 10^{-3}2.217 ⋅ 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT 1.7841011.784superscript1011.784\cdot 10^{-1}1.784 ⋅ 10 start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT 2.8671012.867superscript1012.867\cdot 10^{1}2.867 ⋅ 10 start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT
105superscript10510^{5}10 start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT 2.3791052.379superscript1052.379\cdot 10^{-5}2.379 ⋅ 10 start_POSTSUPERSCRIPT - 5 end_POSTSUPERSCRIPT 4.0181054.018superscript1054.018\cdot 10^{-5}4.018 ⋅ 10 start_POSTSUPERSCRIPT - 5 end_POSTSUPERSCRIPT 1.1061001.106superscript1001.106\cdot 10^{0}1.106 ⋅ 10 start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT
106superscript10610^{6}10 start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT 2.5501072.550superscript1072.550\cdot 10^{-7}2.550 ⋅ 10 start_POSTSUPERSCRIPT - 7 end_POSTSUPERSCRIPT 9.041059.04superscript1059.04\cdot 10^{-5}9.04 ⋅ 10 start_POSTSUPERSCRIPT - 5 end_POSTSUPERSCRIPT 4.261024.26superscript1024.26\cdot 10^{-2}4.26 ⋅ 10 start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT
Table 3: A comparison of the upper bounds from Proposition 12 for the adaptive Metropolis-Hastings independence sampler with ϵ=.01italic-ϵ.01\epsilon=.01italic_ϵ = .01, G(t)=exp(t)𝐺𝑡𝑡G(t)=\exp(-t)italic_G ( italic_t ) = roman_exp ( - italic_t ), and δ=(1+ct)1ϵ1γ𝛿superscript1𝑐𝑡1italic-ϵ1superscript𝛾\delta=(1+ct)^{\frac{1-\epsilon}{1-\gamma^{*}}}italic_δ = ( 1 + italic_c italic_t ) start_POSTSUPERSCRIPT divide start_ARG 1 - italic_ϵ end_ARG start_ARG 1 - italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG end_POSTSUPERSCRIPT.

5 Example: Adaptive random walk Metropolis

Adaptive random-walk Metropolis is a popular simulation algorithm for Bayesian statistics [Haario et al., 2001]. Let U:d:𝑈superscript𝑑U:\mathbb{R}^{d}\to\mathbb{R}italic_U : blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT → blackboard_R and consider the target measure with with normalizing constant Z=dexp(U(x))𝑑x𝑍subscriptsuperscript𝑑𝑈𝑥differential-d𝑥Z=\int_{\mathbb{R}^{d}}\exp(-U(x))dxitalic_Z = ∫ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_POSTSUBSCRIPT roman_exp ( - italic_U ( italic_x ) ) italic_d italic_x defined by π(dx)=Z1exp(U(x))dx𝜋𝑑𝑥superscript𝑍1𝑈𝑥𝑑𝑥\pi(dx)=Z^{-1}\exp(-U(x))dxitalic_π ( italic_d italic_x ) = italic_Z start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT roman_exp ( - italic_U ( italic_x ) ) italic_d italic_x. We make the following regularity assumptions on the target π𝜋\piitalic_π which have been used previously to show convergence results in MCMC [Douc et al., 2004]. Let Fsubscriptdelimited-∥∥𝐹\left\lVert\cdot\right\rVert_{F}∥ ⋅ ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT denote the Frobenius norm.

Assumption 13.

Suppose U𝑈Uitalic_U is continuous and twice continuously differentiable and there exists a minimum such that xsubscript𝑥x_{*}italic_x start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT such that U(x)=0𝑈subscript𝑥0U(x_{*})=0italic_U ( italic_x start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) = 0. Assume there are constants dk,Dk,r>0subscript𝑑𝑘subscript𝐷𝑘𝑟0d_{k},D_{k},r>0italic_d start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_D start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_r > 0 for k=1,2,3𝑘123k=1,2,3italic_k = 1 , 2 , 3 such that for all xd𝑥superscript𝑑x\in\mathbb{R}^{d}italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT with xRdelimited-∥∥𝑥𝑅\left\lVert x\right\rVert\geq R∥ italic_x ∥ ≥ italic_R for some R>0𝑅0R>0italic_R > 0,

d1xmU(x)D1xm,subscript𝑑1superscriptdelimited-∥∥𝑥𝑚𝑈𝑥subscript𝐷1superscriptdelimited-∥∥𝑥𝑚\displaystyle d_{1}\left\lVert x\right\rVert^{m}\leq U(x)\leq D_{1}\left\lVert x% \right\rVert^{m},italic_d start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∥ italic_x ∥ start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ≤ italic_U ( italic_x ) ≤ italic_D start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∥ italic_x ∥ start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT , (14)
d2xm1U(x)D2xm1,subscript𝑑2superscriptdelimited-∥∥𝑥𝑚1delimited-∥∥𝑈𝑥subscript𝐷2superscriptdelimited-∥∥𝑥𝑚1\displaystyle d_{2}\left\lVert x\right\rVert^{m-1}\leq\left\lVert\nabla U(x)% \right\rVert\leq D_{2}\left\lVert x\right\rVert^{m-1},italic_d start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∥ italic_x ∥ start_POSTSUPERSCRIPT italic_m - 1 end_POSTSUPERSCRIPT ≤ ∥ ∇ italic_U ( italic_x ) ∥ ≤ italic_D start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∥ italic_x ∥ start_POSTSUPERSCRIPT italic_m - 1 end_POSTSUPERSCRIPT , U(x)U(x)xxr,𝑈𝑥delimited-∥∥𝑈𝑥𝑥delimited-∥∥𝑥𝑟\displaystyle\frac{\nabla U(x)}{\left\lVert U(x)\right\rVert}\cdot\frac{x}{% \left\lVert x\right\rVert}\geq r,divide start_ARG ∇ italic_U ( italic_x ) end_ARG start_ARG ∥ italic_U ( italic_x ) ∥ end_ARG ⋅ divide start_ARG italic_x end_ARG start_ARG ∥ italic_x ∥ end_ARG ≥ italic_r , (15)
d3xm22U(x)FD3xm2.subscript𝑑3superscriptdelimited-∥∥𝑥𝑚2subscriptdelimited-∥∥superscript2𝑈𝑥𝐹subscript𝐷3superscriptdelimited-∥∥𝑥𝑚2\displaystyle d_{3}\left\lVert x\right\rVert^{m-2}\leq\left\lVert\nabla^{2}U(x% )\right\rVert_{F}\leq D_{3}\left\lVert x\right\rVert^{m-2}.italic_d start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ∥ italic_x ∥ start_POSTSUPERSCRIPT italic_m - 2 end_POSTSUPERSCRIPT ≤ ∥ ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_U ( italic_x ) ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT ≤ italic_D start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ∥ italic_x ∥ start_POSTSUPERSCRIPT italic_m - 2 end_POSTSUPERSCRIPT . (16)

While Assumption (13) is strong, the Weibull distribution is one example (see [Fort and Moulines, 2000]). Let gKsubscript𝑔𝐾g_{K}italic_g start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT be a Lebesgue density on dsuperscript𝑑\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT used for the proposal supported on a compact set Kd𝐾superscript𝑑K\subset\mathbb{R}^{d}italic_K ⊂ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT satisfying

gK(ξ)=gK(ξ) for all ξd,subscript𝑔𝐾𝜉subscript𝑔𝐾delimited-∥∥𝜉 for all 𝜉superscript𝑑\displaystyle g_{K}(\xi)=g_{K}(\left\lVert\xi\right\rVert)\text{ for all }\xi% \in\mathbb{R}^{d},italic_g start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( italic_ξ ) = italic_g start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( ∥ italic_ξ ∥ ) for all italic_ξ ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT , infξKgK(ξ)>0.subscriptinfimum𝜉𝐾subscript𝑔𝐾𝜉0\displaystyle\inf_{\xi\in K}g_{K}(\xi)>0.roman_inf start_POSTSUBSCRIPT italic_ξ ∈ italic_K end_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( italic_ξ ) > 0 . (17)

For γ𝒴𝛾𝒴\gamma\in\mathcal{Y}italic_γ ∈ caligraphic_Y, define the random-walk Metropolis Markov family for xd𝑥superscript𝑑x\in\mathbb{R}^{d}italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT and Borel Ad𝐴superscript𝑑A\subseteq\mathbb{R}^{d}italic_A ⊆ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT by

𝒫γ(x,A)=Aa(x,x+γ1/2ξ)gK(ξ)𝑑ξ+δx(A)Rγ(x)subscript𝒫𝛾𝑥𝐴subscript𝐴𝑎𝑥𝑥superscript𝛾12𝜉subscript𝑔𝐾𝜉differential-d𝜉subscript𝛿𝑥𝐴subscript𝑅𝛾𝑥\displaystyle\mathcal{P}_{\gamma}(x,A)=\int_{A}a(x,x+\gamma^{1/2}\xi)g_{K}(\xi% )d\xi+\delta_{x}(A)R_{\gamma}(x)caligraphic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_x , italic_A ) = ∫ start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT italic_a ( italic_x , italic_x + italic_γ start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT italic_ξ ) italic_g start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( italic_ξ ) italic_d italic_ξ + italic_δ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ( italic_A ) italic_R start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_x ) (18)

with acceptance function a(x,y)=exp[U(x)U(y)]1𝑎𝑥𝑦𝑈𝑥𝑈𝑦1a(x,y)=\exp[U(x)-U(y)]\wedge 1italic_a ( italic_x , italic_y ) = roman_exp [ italic_U ( italic_x ) - italic_U ( italic_y ) ] ∧ 1, and rejection probability Rγ(x)=1da(x,x+γ1/2ξ)gK(ξ)𝑑ξsubscript𝑅𝛾𝑥1subscriptsuperscript𝑑𝑎𝑥𝑥superscript𝛾12𝜉subscript𝑔𝐾𝜉differential-d𝜉R_{\gamma}(x)=1-\int_{\mathbb{R}^{d}}a(x,x+\gamma^{1/2}\xi)g_{K}(\xi)d\xiitalic_R start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_x ) = 1 - ∫ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_a ( italic_x , italic_x + italic_γ start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT italic_ξ ) italic_g start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( italic_ξ ) italic_d italic_ξ. We define an adaptive random-walk Metropolis process by adapting the covariance of the proposal [Haario et al., 2001] with dynamics Γt|(Γs,Xs)st1conditionalsubscriptΓ𝑡subscriptsubscriptΓ𝑠subscript𝑋𝑠𝑠𝑡1\Gamma_{t}|(\Gamma_{s},X_{s})_{s\leq t-1}roman_Γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | ( roman_Γ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_s ≤ italic_t - 1 end_POSTSUBSCRIPT first updating the covariance matrix, and then Xt|Γt,Xt1conditionalsubscript𝑋𝑡subscriptΓ𝑡subscript𝑋𝑡1X_{t}|\Gamma_{t},X_{t-1}italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | roman_Γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT updating the current state with random-walk Metropolis.

For the tuning parameter set 𝒴𝒴\mathcal{Y}caligraphic_Y, we consider the set of symmetric positive definite matrices on dsuperscript𝑑\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT such that the eigenvalues are bounded by constants λ,λ>0subscript𝜆superscript𝜆0\lambda_{*},\lambda^{*}>0italic_λ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT , italic_λ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT > 0, that is,

𝒴={γd×d:λIγλI,γT=γ}.𝒴conditional-set𝛾superscript𝑑𝑑formulae-sequencesubscript𝜆𝐼𝛾superscript𝜆𝐼superscript𝛾𝑇𝛾\displaystyle\mathcal{Y}=\left\{\gamma\in\mathbb{R}^{d\times d}:\lambda_{*}I% \leq\gamma\leq\lambda^{*}I,\gamma^{T}=\gamma\right\}.caligraphic_Y = { italic_γ ∈ blackboard_R start_POSTSUPERSCRIPT italic_d × italic_d end_POSTSUPERSCRIPT : italic_λ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT italic_I ≤ italic_γ ≤ italic_λ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT italic_I , italic_γ start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT = italic_γ } . (19)

One example is to adapt a sample covariance matrix scaled by h>00h>0italic_h > 0 using the following identity

Γt=hts=0t(XsX¯t)(XsX¯t)T=t1tΓt1+ht+1(XtX¯t1)(XtX¯t1)TsubscriptΓ𝑡𝑡superscriptsubscript𝑠0𝑡subscript𝑋𝑠subscript¯𝑋𝑡superscriptsubscript𝑋𝑠subscript¯𝑋𝑡𝑇𝑡1𝑡subscriptΓ𝑡1𝑡1subscript𝑋𝑡subscript¯𝑋𝑡1superscriptsubscript𝑋𝑡subscript¯𝑋𝑡1𝑇\displaystyle\Gamma_{t}=\frac{h}{t}\sum_{s=0}^{t}(X_{s}-\bar{X}_{t})(X_{s}-% \bar{X}_{t})^{T}=\frac{t-1}{t}\Gamma_{t-1}+\frac{h}{t+1}(X_{t}-\bar{X}_{t-1})(% X_{t}-\bar{X}_{t-1})^{T}roman_Γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = divide start_ARG italic_h end_ARG start_ARG italic_t end_ARG ∑ start_POSTSUBSCRIPT italic_s = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( italic_X start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT - over¯ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ( italic_X start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT - over¯ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT = divide start_ARG italic_t - 1 end_ARG start_ARG italic_t end_ARG roman_Γ start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT + divide start_ARG italic_h end_ARG start_ARG italic_t + 1 end_ARG ( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - over¯ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) ( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - over¯ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT (20)

where X¯t=(t+1)1s=0tXssubscript¯𝑋𝑡superscript𝑡11superscriptsubscript𝑠0𝑡subscript𝑋𝑠\bar{X}_{t}=(t+1)^{-1}\sum_{s=0}^{t}X_{s}over¯ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = ( italic_t + 1 ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_s = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_X start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT [Haario et al., 2001, Andrieu and Moulines, 2006]. The set 𝒴𝒴\mathcal{Y}caligraphic_Y is convex and one way to ensure the updates remain in 𝒴𝒴\mathcal{Y}caligraphic_Y is to truncate the eigenvalues of (20).

Under Assumption (13), we first obtain a lower bound on the convergence rate for the adaptive random-walk Metropolis process.

Proposition 14.

For t+𝑡subscriptt\in\mathbb{Z}_{+}italic_t ∈ blackboard_Z start_POSTSUBSCRIPT + end_POSTSUBSCRIPT, let 𝒜𝒬(t)(γ0,x0,)superscriptsubscript𝒜𝒬𝑡subscript𝛾0subscript𝑥0\mathcal{A}_{\mathcal{Q}}^{(t)}(\gamma_{0},x_{0},\cdot)caligraphic_A start_POSTSUBSCRIPT caligraphic_Q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ( italic_γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , ⋅ ) be the marginal of the adaptive random-walk Metropolis process initialized at x0,γ0d×𝒴subscript𝑥0subscript𝛾0superscript𝑑𝒴x_{0},\gamma_{0}\in\mathbb{R}^{d}\times\mathcal{Y}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT × caligraphic_Y from the Metropolis-Hastings family (18) and adaptation parameter set (19). If Assumption (13) holds for π𝜋\piitalic_π, then there are constants c>0subscript𝑐0c_{*}>0italic_c start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT > 0 depending on m𝑚mitalic_m and Msubscript𝑀M_{*}italic_M start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT depending on m𝑚mitalic_m and x0subscript𝑥0x_{0}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT such that

inf𝒬𝒮(𝒳,𝒴)𝒲1(𝒜𝒬(t)(γ0,x0,),π)Mexp(ctm2m).subscriptinfimum𝒬𝒮𝒳𝒴subscript𝒲delimited-∥∥1superscriptsubscript𝒜𝒬𝑡subscript𝛾0subscript𝑥0𝜋subscript𝑀subscript𝑐superscript𝑡𝑚2𝑚\displaystyle\inf_{\mathcal{Q}\in\mathcal{S}(\mathcal{X},\mathcal{Y})}\mathcal% {W}_{\left\lVert\cdot\right\rVert\wedge 1}\left(\mathcal{A}_{\mathcal{Q}}^{(t)% }(\gamma_{0},x_{0},\cdot),\pi\right)\geq M_{*}\exp\left(-c_{*}{t}^{\frac{m}{2-% m}}\right).roman_inf start_POSTSUBSCRIPT caligraphic_Q ∈ caligraphic_S ( caligraphic_X , caligraphic_Y ) end_POSTSUBSCRIPT caligraphic_W start_POSTSUBSCRIPT ∥ ⋅ ∥ ∧ 1 end_POSTSUBSCRIPT ( caligraphic_A start_POSTSUBSCRIPT caligraphic_Q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ( italic_γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , ⋅ ) , italic_π ) ≥ italic_M start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT roman_exp ( - italic_c start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT italic_t start_POSTSUPERSCRIPT divide start_ARG italic_m end_ARG start_ARG 2 - italic_m end_ARG end_POSTSUPERSCRIPT ) .

In order to proceed, we will first establish a simultaneous growth condition on the Markov family.

Lemma 15.

With W(x)=exp(U(x))𝑊𝑥𝑈𝑥W(x)=\exp(U(x))italic_W ( italic_x ) = roman_exp ( italic_U ( italic_x ) ), there are constants M0,N0,L>0subscript𝑀0subscript𝑁0𝐿0M_{0},N_{0},L>0italic_M start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_N start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_L > 0 depending on m𝑚mitalic_m such that for any α>0𝛼0\alpha>0italic_α > 0 and x,γ𝒳×𝒴𝑥𝛾𝒳𝒴x,\gamma\in\mathcal{X}\times\mathcal{Y}italic_x , italic_γ ∈ caligraphic_X × caligraphic_Y,

(𝒫γWα)(x)Wα(x)subscript𝒫𝛾superscript𝑊𝛼𝑥superscript𝑊𝛼𝑥\displaystyle(\mathcal{P}_{\gamma}W^{\alpha})(x)-W^{\alpha}(x)( caligraphic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT italic_W start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ) ( italic_x ) - italic_W start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ( italic_x ) M0α32/m[αN0]φ(Wα(x))+Labsentsubscript𝑀0superscript𝛼32𝑚delimited-[]𝛼subscript𝑁0𝜑superscript𝑊𝛼𝑥𝐿\displaystyle\leq M_{0}\alpha^{3-2/m}\left[\alpha-N_{0}\right]\varphi(W^{% \alpha}(x))+L≤ italic_M start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_α start_POSTSUPERSCRIPT 3 - 2 / italic_m end_POSTSUPERSCRIPT [ italic_α - italic_N start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ] italic_φ ( italic_W start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ( italic_x ) ) + italic_L (21)

where for Km=exp(2/(m2)+1)subscript𝐾𝑚2𝑚21K_{m}=\exp(2/(m-2)+1)italic_K start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT = roman_exp ( 2 / ( italic_m - 2 ) + 1 ) and w1𝑤1w\geq 1italic_w ≥ 1,

φ(w)=w+Kmlog[w+Km]2/m2.\varphi(w)=\frac{w+K_{m}}{\log\left[w+K_{m}\right]^{2/m-2}}.italic_φ ( italic_w ) = divide start_ARG italic_w + italic_K start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_ARG start_ARG roman_log [ italic_w + italic_K start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT 2 / italic_m - 2 end_POSTSUPERSCRIPT end_ARG .
Proof.

Similar to [Fort and Moulines, 2000, Lemma B.4], using the fundamental theorem of calculus and (15)

supξKWα(x+γ1/2ξ)Wα(x)subscriptsupremum𝜉𝐾superscript𝑊𝛼𝑥superscript𝛾12𝜉superscript𝑊𝛼𝑥\displaystyle\sup_{\xi\in K}\frac{W^{\alpha}(x+\gamma^{1/2}\xi)}{W^{\alpha}(x)}roman_sup start_POSTSUBSCRIPT italic_ξ ∈ italic_K end_POSTSUBSCRIPT divide start_ARG italic_W start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ( italic_x + italic_γ start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT italic_ξ ) end_ARG start_ARG italic_W start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ( italic_x ) end_ARG =1+01Wα(x+tγ1/2ξ)Wα(x)γU(x+tγξ)𝑑tabsent1superscriptsubscript01superscript𝑊𝛼𝑥𝑡superscript𝛾12𝜉superscript𝑊𝛼𝑥𝛾𝑈𝑥𝑡𝛾𝜉differential-d𝑡\displaystyle=1+\int_{0}^{1}\frac{W^{\alpha}(x+t\gamma^{1/2}\xi)}{W^{\alpha}(x% )}\gamma\nabla U(x+t\gamma\xi)dt= 1 + ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT divide start_ARG italic_W start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ( italic_x + italic_t italic_γ start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT italic_ξ ) end_ARG start_ARG italic_W start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ( italic_x ) end_ARG italic_γ ∇ italic_U ( italic_x + italic_t italic_γ italic_ξ ) italic_d italic_t
1+γsupξKξsupξKWα(x+tγ1/2ξ)Wα(x)01x+tγ1/2ξm1𝑑t.absent1delimited-∥∥𝛾subscriptsupremum𝜉𝐾delimited-∥∥𝜉subscriptsupremum𝜉𝐾superscript𝑊𝛼𝑥𝑡superscript𝛾12𝜉superscript𝑊𝛼𝑥superscriptsubscript01superscriptdelimited-∥∥𝑥𝑡superscript𝛾12𝜉𝑚1differential-d𝑡\displaystyle\leq 1+\left\lVert\gamma\right\rVert\sup_{\xi\in K}\left\lVert\xi% \right\rVert\sup_{\xi\in K}\frac{W^{\alpha}(x+t\gamma^{1/2}\xi)}{W^{\alpha}(x)% }\int_{0}^{1}\left\lVert x+t\gamma^{1/2}\xi\right\rVert^{m-1}dt.≤ 1 + ∥ italic_γ ∥ roman_sup start_POSTSUBSCRIPT italic_ξ ∈ italic_K end_POSTSUBSCRIPT ∥ italic_ξ ∥ roman_sup start_POSTSUBSCRIPT italic_ξ ∈ italic_K end_POSTSUBSCRIPT divide start_ARG italic_W start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ( italic_x + italic_t italic_γ start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT italic_ξ ) end_ARG start_ARG italic_W start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ( italic_x ) end_ARG ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ∥ italic_x + italic_t italic_γ start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT italic_ξ ∥ start_POSTSUPERSCRIPT italic_m - 1 end_POSTSUPERSCRIPT italic_d italic_t .

It follows that

limrsupxrsupξKWα(x+γ1/2ξ)Wα(x)subscript𝑟subscriptsupremumdelimited-∥∥𝑥𝑟subscriptsupremum𝜉𝐾superscript𝑊𝛼𝑥superscript𝛾12𝜉superscript𝑊𝛼𝑥\displaystyle\lim_{r\to\infty}\sup_{\left\lVert x\right\rVert\geq r}\sup_{\xi% \in K}\frac{W^{\alpha}(x+\gamma^{1/2}\xi)}{W^{\alpha}(x)}roman_lim start_POSTSUBSCRIPT italic_r → ∞ end_POSTSUBSCRIPT roman_sup start_POSTSUBSCRIPT ∥ italic_x ∥ ≥ italic_r end_POSTSUBSCRIPT roman_sup start_POSTSUBSCRIPT italic_ξ ∈ italic_K end_POSTSUBSCRIPT divide start_ARG italic_W start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ( italic_x + italic_γ start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT italic_ξ ) end_ARG start_ARG italic_W start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ( italic_x ) end_ARG 1.absent1\displaystyle\leq 1.≤ 1 .

Using the fundamental theorem of calculus twice with (15) and (16), there is a constant M>0𝑀0M>0italic_M > 0 such that for large enough xdelimited-∥∥𝑥\left\lVert x\right\rVert∥ italic_x ∥

|Wα(x+γ1/2ξ)Wα(x)1αγ1/2U(x)ξ|superscript𝑊𝛼𝑥superscript𝛾12𝜉superscript𝑊𝛼𝑥1𝛼superscript𝛾12𝑈𝑥𝜉\displaystyle\left|\frac{W^{\alpha}(x+\gamma^{1/2}\xi)}{W^{\alpha}(x)}-1-% \alpha\gamma^{1/2}\nabla U(x)\cdot\xi\right|| divide start_ARG italic_W start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ( italic_x + italic_γ start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT italic_ξ ) end_ARG start_ARG italic_W start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ( italic_x ) end_ARG - 1 - italic_α italic_γ start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT ∇ italic_U ( italic_x ) ⋅ italic_ξ |
01Wα(x+tCξ)Wα(x){α2[γ1/2U(x+tγ1/2ξ)ξ]2+αξγ1/22U(x+tγ1/2ξ)γ1/2ξ}(1t)𝑑tabsentsuperscriptsubscript01superscript𝑊𝛼𝑥𝑡𝐶𝜉superscript𝑊𝛼𝑥superscript𝛼2superscriptdelimited-[]superscript𝛾12𝑈𝑥𝑡superscript𝛾12𝜉𝜉2𝛼𝜉superscript𝛾12superscript2𝑈𝑥𝑡superscript𝛾12𝜉superscript𝛾12𝜉1𝑡differential-d𝑡\displaystyle\leq\int_{0}^{1}\frac{W^{\alpha}(x+tC\xi)}{W^{\alpha}(x)}\left\{% \alpha^{2}[\gamma^{1/2}\nabla U(x+t\gamma^{1/2}\xi)\cdot\xi]^{2}+\alpha\xi% \cdot\gamma^{1/2}\nabla^{2}U(x+t\gamma^{1/2}\xi)\gamma^{1/2}\xi\right\}(1-t)dt≤ ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT divide start_ARG italic_W start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ( italic_x + italic_t italic_C italic_ξ ) end_ARG start_ARG italic_W start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ( italic_x ) end_ARG { italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT [ italic_γ start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT ∇ italic_U ( italic_x + italic_t italic_γ start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT italic_ξ ) ⋅ italic_ξ ] start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_α italic_ξ ⋅ italic_γ start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_U ( italic_x + italic_t italic_γ start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT italic_ξ ) italic_γ start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT italic_ξ } ( 1 - italic_t ) italic_d italic_t
α2Mx2(m1).absentsuperscript𝛼2𝑀superscriptdelimited-∥∥𝑥2𝑚1\displaystyle\leq\alpha^{2}M\left\lVert x\right\rVert^{2(m-1)}.≤ italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_M ∥ italic_x ∥ start_POSTSUPERSCRIPT 2 ( italic_m - 1 ) end_POSTSUPERSCRIPT .

Similarly, we obtain

|exp(U(x)U(x+γ1/2ξ))1+γ1/2U(x)ξ|Mx2(m1).𝑈𝑥𝑈𝑥superscript𝛾12𝜉1superscript𝛾12𝑈𝑥𝜉𝑀superscriptdelimited-∥∥𝑥2𝑚1\left|\exp(U(x)-U(x+\gamma^{1/2}\xi))-1+\gamma^{1/2}\nabla U(x)\cdot\xi\right|% \leq M\left\lVert x\right\rVert^{2(m-1)}.| roman_exp ( italic_U ( italic_x ) - italic_U ( italic_x + italic_γ start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT italic_ξ ) ) - 1 + italic_γ start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT ∇ italic_U ( italic_x ) ⋅ italic_ξ | ≤ italic_M ∥ italic_x ∥ start_POSTSUPERSCRIPT 2 ( italic_m - 1 ) end_POSTSUPERSCRIPT .

Let rγ(x)={yd:U(x)<U(x+γ1/2y)}subscript𝑟𝛾𝑥conditional-set𝑦superscript𝑑𝑈𝑥𝑈𝑥superscript𝛾12𝑦r_{\gamma}(x)=\{y\in\mathbb{R}^{d}:U(x)<U(x+\gamma^{1/2}y)\}italic_r start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_x ) = { italic_y ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT : italic_U ( italic_x ) < italic_U ( italic_x + italic_γ start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT italic_y ) } denote the rejection region. By [Fort and Moulines, 2000, Lemma B.3] combined with (15), there is a constant a>0𝑎0a>0italic_a > 0 such that for large enough xdelimited-∥∥𝑥\left\lVert x\right\rVert∥ italic_x ∥

rγ(x)[γ1/2U(x)ξ]2gK(ξ)𝑑ξaU(x)2.subscriptsubscript𝑟𝛾𝑥superscriptdelimited-[]superscript𝛾12𝑈𝑥𝜉2subscript𝑔𝐾𝜉differential-d𝜉𝑎superscriptdelimited-∥∥𝑈𝑥2\int_{r_{\gamma}(x)}\left[\gamma^{1/2}\nabla U(x)\cdot\xi\right]^{2}g_{K}(\xi)% d\xi\geq a\left\lVert\nabla U(x)\right\rVert^{2}.∫ start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_x ) end_POSTSUBSCRIPT [ italic_γ start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT ∇ italic_U ( italic_x ) ⋅ italic_ξ ] start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_g start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( italic_ξ ) italic_d italic_ξ ≥ italic_a ∥ ∇ italic_U ( italic_x ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

Applying these bounds, for large enough xdelimited-∥∥𝑥\left\lVert x\right\rVert∥ italic_x ∥,

(𝒫γWα)(x)Wα(x)1subscript𝒫𝛾superscript𝑊𝛼𝑥superscript𝑊𝛼𝑥1\displaystyle\frac{(\mathcal{P}_{\gamma}W^{\alpha})(x)}{W^{\alpha}(x)}-1divide start_ARG ( caligraphic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT italic_W start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ) ( italic_x ) end_ARG start_ARG italic_W start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ( italic_x ) end_ARG - 1
αdγ1/2U(x)ξa(x,x+γ1/2ξ)gK(ξ)𝑑ξ+Mα2x2(m1)absent𝛼subscriptsuperscript𝑑superscript𝛾12𝑈𝑥𝜉𝑎𝑥𝑥superscript𝛾12𝜉subscript𝑔𝐾𝜉differential-d𝜉𝑀superscript𝛼2superscriptdelimited-∥∥𝑥2𝑚1\displaystyle\leq\alpha\int_{\mathbb{R}^{d}}\gamma^{1/2}\nabla U(x)\cdot\xi a(% x,x+\gamma^{1/2}\xi)g_{K}(\xi)d\xi+M\alpha^{2}\left\lVert x\right\rVert^{2(m-1)}≤ italic_α ∫ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_γ start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT ∇ italic_U ( italic_x ) ⋅ italic_ξ italic_a ( italic_x , italic_x + italic_γ start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT italic_ξ ) italic_g start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( italic_ξ ) italic_d italic_ξ + italic_M italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ italic_x ∥ start_POSTSUPERSCRIPT 2 ( italic_m - 1 ) end_POSTSUPERSCRIPT
=αrγ(x)γ1/2U(x)ξ(exp[U(x)U(x+γ1/2ξ)]1)gK(ξ)𝑑ξ+Mα2x2(m1)absent𝛼subscriptsubscript𝑟𝛾𝑥superscript𝛾12𝑈𝑥𝜉𝑈𝑥𝑈𝑥superscript𝛾12𝜉1subscript𝑔𝐾𝜉differential-d𝜉𝑀superscript𝛼2superscriptdelimited-∥∥𝑥2𝑚1\displaystyle=\alpha\int_{r_{\gamma}(x)}\gamma^{1/2}\nabla U(x)\cdot\xi\left(% \exp[U(x)-U(x+\gamma^{1/2}\xi)]-1\right)g_{K}(\xi)d\xi+M\alpha^{2}\left\lVert x% \right\rVert^{2(m-1)}= italic_α ∫ start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_x ) end_POSTSUBSCRIPT italic_γ start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT ∇ italic_U ( italic_x ) ⋅ italic_ξ ( roman_exp [ italic_U ( italic_x ) - italic_U ( italic_x + italic_γ start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT italic_ξ ) ] - 1 ) italic_g start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( italic_ξ ) italic_d italic_ξ + italic_M italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ italic_x ∥ start_POSTSUPERSCRIPT 2 ( italic_m - 1 ) end_POSTSUPERSCRIPT
αrγ(x)[γ1/2U(x)ξ]2𝑑G(ξ)+Mα2x2(m1)absent𝛼subscriptsubscript𝑟𝛾𝑥superscriptdelimited-[]superscript𝛾12𝑈𝑥𝜉2differential-d𝐺𝜉𝑀superscript𝛼2superscriptdelimited-∥∥𝑥2𝑚1\displaystyle\leq-\alpha\int_{r_{\gamma}(x)}\left[\gamma^{1/2}\nabla U(x)\cdot% \xi\right]^{2}dG(\xi)+M\alpha^{2}\left\lVert x\right\rVert^{2(m-1)}≤ - italic_α ∫ start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_x ) end_POSTSUBSCRIPT [ italic_γ start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT ∇ italic_U ( italic_x ) ⋅ italic_ξ ] start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_d italic_G ( italic_ξ ) + italic_M italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ italic_x ∥ start_POSTSUPERSCRIPT 2 ( italic_m - 1 ) end_POSTSUPERSCRIPT
αaU(x)2+Mα2x2(m1).absent𝛼𝑎superscriptdelimited-∥∥𝑈𝑥2𝑀superscript𝛼2superscriptdelimited-∥∥𝑥2𝑚1\displaystyle\leq-\alpha a\left\lVert\nabla U(x)\right\rVert^{2}+M\alpha^{2}% \left\lVert x\right\rVert^{2(m-1)}.≤ - italic_α italic_a ∥ ∇ italic_U ( italic_x ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_M italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ italic_x ∥ start_POSTSUPERSCRIPT 2 ( italic_m - 1 ) end_POSTSUPERSCRIPT .

Applying (14) and (15), there are constants M0,N0,Km>0subscript𝑀0subscript𝑁0subscript𝐾𝑚0M_{0},N_{0},K_{m}>0italic_M start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_N start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_K start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT > 0 such that xdelimited-∥∥𝑥\left\lVert x\right\rVert∥ italic_x ∥ sufficiently large

(𝒫γWα)(x)Wα(x)subscript𝒫𝛾superscript𝑊𝛼𝑥superscript𝑊𝛼𝑥\displaystyle(\mathcal{P}_{\gamma}W^{\alpha})(x)-W^{\alpha}(x)( caligraphic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT italic_W start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ) ( italic_x ) - italic_W start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ( italic_x ) M0α[αN0]Wα(x)U(x)2/m2absentsubscript𝑀0𝛼delimited-[]𝛼subscript𝑁0superscript𝑊𝛼𝑥𝑈superscript𝑥2𝑚2\displaystyle\leq M_{0}\alpha\left[\alpha-N_{0}\right]\frac{W^{\alpha}(x)}{U(x% )^{2/m-2}}≤ italic_M start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_α [ italic_α - italic_N start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ] divide start_ARG italic_W start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ( italic_x ) end_ARG start_ARG italic_U ( italic_x ) start_POSTSUPERSCRIPT 2 / italic_m - 2 end_POSTSUPERSCRIPT end_ARG
M0α32/m[αN0]Wα(x)+Kmlog(Wα(x)+Km)2/m2.\displaystyle\leq M_{0}\alpha^{3-2/m}\left[\alpha-N_{0}\right]\frac{W^{\alpha}% (x)+K_{m}}{\log(W^{\alpha}(x)+K_{m})^{2/m-2}}.≤ italic_M start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_α start_POSTSUPERSCRIPT 3 - 2 / italic_m end_POSTSUPERSCRIPT [ italic_α - italic_N start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ] divide start_ARG italic_W start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ( italic_x ) + italic_K start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_ARG start_ARG roman_log ( italic_W start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ( italic_x ) + italic_K start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 / italic_m - 2 end_POSTSUPERSCRIPT end_ARG .

For small xdelimited-∥∥𝑥\left\lVert x\right\rVert∥ italic_x ∥, we have by continuity, the sub-level sets of W𝑊Witalic_W are compact and (𝒫γWα)(x)Wα(x)subscript𝒫𝛾superscript𝑊𝛼𝑥superscript𝑊𝛼𝑥(\mathcal{P}_{\gamma}W^{\alpha})(x)-W^{\alpha}(x)( caligraphic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT italic_W start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ) ( italic_x ) - italic_W start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ( italic_x ) is bounded on compact sets, so the conclusion follows at once. ∎

We may now apply Lemma 15 to obtain the lower bound.

Proof of Proposition 14.

Changing to polar coordinates, we have for r𝑟ritalic_r large enough

π(exp(U(x))r)𝜋𝑈𝑥𝑟\displaystyle\pi\left(\exp(U(x))\geq r\right)italic_π ( roman_exp ( italic_U ( italic_x ) ) ≥ italic_r ) π(xmlog(r))absent𝜋superscriptdelimited-∥∥𝑥𝑚𝑟\displaystyle\geq\pi\left(\left\lVert x\right\rVert^{m}\geq\log(r)\right)≥ italic_π ( ∥ italic_x ∥ start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ≥ roman_log ( italic_r ) )
2πd/2ZΓ(d/2)smrsd1exp(sm)𝑑sabsent2superscript𝜋𝑑2𝑍Γ𝑑2subscriptsuperscript𝑠𝑚𝑟superscript𝑠𝑑1superscript𝑠𝑚differential-d𝑠\displaystyle\geq\frac{2\pi^{d/2}}{Z\Gamma(d/2)}\int_{s^{m}\geq r}s^{d-1}\exp(% -s^{m})ds≥ divide start_ARG 2 italic_π start_POSTSUPERSCRIPT italic_d / 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_Z roman_Γ ( italic_d / 2 ) end_ARG ∫ start_POSTSUBSCRIPT italic_s start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ≥ italic_r end_POSTSUBSCRIPT italic_s start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT roman_exp ( - italic_s start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ) italic_d italic_s
2πd/2ZmΓ(d/2)1r.absent2superscript𝜋𝑑2𝑍𝑚Γ𝑑21𝑟\displaystyle\geq\frac{2\pi^{d/2}}{Zm\Gamma(d/2)}\frac{1}{r}.≥ divide start_ARG 2 italic_π start_POSTSUPERSCRIPT italic_d / 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_Z italic_m roman_Γ ( italic_d / 2 ) end_ARG divide start_ARG 1 end_ARG start_ARG italic_r end_ARG .

where Z𝑍Zitalic_Z is the normalizing constant and Γ(t)=0ut1exp(u)𝑑uΓ𝑡superscriptsubscript0superscript𝑢𝑡1𝑢differential-d𝑢\Gamma(t)=\int_{0}^{\infty}u^{t-1}\exp(-u)duroman_Γ ( italic_t ) = ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_u start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT roman_exp ( - italic_u ) italic_d italic_u for t>0𝑡0t>0italic_t > 0 is the Gamma function.

By Lemma 15, for α𝛼\alphaitalic_α sufficiently large, we have constants M,Km>0𝑀subscript𝐾𝑚0M,K_{m}>0italic_M , italic_K start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT > 0 depending on α,m𝛼𝑚\alpha,mitalic_α , italic_m such that

(𝒫γWα)(x)Wα(x)subscript𝒫𝛾superscript𝑊𝛼𝑥superscript𝑊𝛼𝑥\displaystyle(\mathcal{P}_{\gamma}W^{\alpha})(x)-W^{\alpha}(x)( caligraphic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT italic_W start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ) ( italic_x ) - italic_W start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ( italic_x ) MWα(x)+Kmlog(Wα(x)+Km)2/m2\displaystyle\leq M\frac{W^{\alpha}(x)+K_{m}}{\log(W^{\alpha}(x)+K_{m})^{2/m-2}}≤ italic_M divide start_ARG italic_W start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ( italic_x ) + italic_K start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_ARG start_ARG roman_log ( italic_W start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ( italic_x ) + italic_K start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 / italic_m - 2 end_POSTSUPERSCRIPT end_ARG

holds for all x,γd×𝒴𝑥𝛾superscript𝑑𝒴x,\gamma\in\mathbb{R}^{d}\times\mathcal{Y}italic_x , italic_γ ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT × caligraphic_Y. Therefore, there is a constant cm>0subscript𝑐𝑚0c_{m}>0italic_c start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT > 0 depending on m𝑚mitalic_m such that

Hφ,Wα(x0)(u)subscript𝐻𝜑superscript𝑊𝛼subscript𝑥0𝑢\displaystyle H_{\varphi,W^{\alpha}(x_{0})}(u)italic_H start_POSTSUBSCRIPT italic_φ , italic_W start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT ( italic_u ) =MWα(x)ulog(x+Km)2/m2x+Km𝑑x\displaystyle=M\int_{W^{\alpha}(x)}^{u}\frac{\log(x+K_{m})^{2/m-2}}{x+K_{m}}dx= italic_M ∫ start_POSTSUBSCRIPT italic_W start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ( italic_x ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT divide start_ARG roman_log ( italic_x + italic_K start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 / italic_m - 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_x + italic_K start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_ARG italic_d italic_x
Hφ,Wα(x0)1(t)superscriptsubscript𝐻𝜑superscript𝑊𝛼subscript𝑥01𝑡\displaystyle H_{\varphi,W^{\alpha}(x_{0})}^{-1}(t)italic_H start_POSTSUBSCRIPT italic_φ , italic_W start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_t ) Mexp(αU(x0))exp(cmtm2m).absent𝑀𝛼𝑈subscript𝑥0subscript𝑐𝑚superscript𝑡𝑚2𝑚\displaystyle\leq M\exp(\alpha U(x_{0}))\exp\left(c_{m}t^{\frac{m}{2-m}}\right).≤ italic_M roman_exp ( italic_α italic_U ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ) roman_exp ( italic_c start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT italic_t start_POSTSUPERSCRIPT divide start_ARG italic_m end_ARG start_ARG 2 - italic_m end_ARG end_POSTSUPERSCRIPT ) .

Since W()𝑊W(\cdot)italic_W ( ⋅ ) has compact sublevel sets, the lower bound then follows by Theorem 4. ∎

We investigate now an upper bound with the expected diminishing adaptation condition (7) that can approximately achieve the lower bound rate. The following upper and lower bounds show that the convergence of adaptive random-walk Metropolis in this situation is not geometric. One drawback is that we do not obtain explicit constants in the upper and lower bounds.

Proposition 16.

For t+𝑡subscriptt\in\mathbb{Z}_{+}italic_t ∈ blackboard_Z start_POSTSUBSCRIPT + end_POSTSUBSCRIPT, let 𝒜𝒬(t)(γ0,x0,)superscriptsubscript𝒜𝒬𝑡subscript𝛾0subscript𝑥0\mathcal{A}_{\mathcal{Q}}^{(t)}(\gamma_{0},x_{0},\cdot)caligraphic_A start_POSTSUBSCRIPT caligraphic_Q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ( italic_γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , ⋅ ) be the marginal of an adaptive random-walk Metropolis process as in Proposition 14. Assume the proposal g𝑔gitalic_g is a truncated Gaussian on a centered closed ball and

supx𝒳𝔼(Γt+1ΓtFXt=x)G(t)subscriptsupremum𝑥𝒳𝔼conditionalsubscriptdelimited-∥∥subscriptΓ𝑡1subscriptΓ𝑡𝐹subscript𝑋𝑡𝑥𝐺𝑡\sup_{x\in\mathcal{X}}\mathbb{E}\left(\left\lVert\Gamma_{t+1}-\Gamma_{t}\right% \rVert_{F}\mid X_{t}=x\right)\leq G(t)roman_sup start_POSTSUBSCRIPT italic_x ∈ caligraphic_X end_POSTSUBSCRIPT blackboard_E ( ∥ roman_Γ start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT - roman_Γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT ∣ italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_x ) ≤ italic_G ( italic_t )

with G()𝐺G(\cdot)italic_G ( ⋅ ) strictly decreasing to infinity. Then there are constants M,Msubscript𝑀superscript𝑀M_{*},M^{*}italic_M start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT , italic_M start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT depending on x0subscript𝑥0x_{0}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and c,c,J>0subscript𝑐superscript𝑐superscript𝐽0c_{*},c^{*},J^{*}>0italic_c start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT , italic_c start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , italic_J start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT > 0 such that for all ϵ(0,1)italic-ϵ01\epsilon\in(0,1)italic_ϵ ∈ ( 0 , 1 ) and all t+𝑡subscriptt\in\mathbb{Z}_{+}italic_t ∈ blackboard_Z start_POSTSUBSCRIPT + end_POSTSUBSCRIPT with Tϵ,tF1(Jt2/ϵ)subscript𝑇italic-ϵ𝑡superscript𝐹1superscript𝐽superscript𝑡2italic-ϵT_{\epsilon,t}\geq F^{-1}\left(J^{*}t^{2}/\epsilon\right)italic_T start_POSTSUBSCRIPT italic_ϵ , italic_t end_POSTSUBSCRIPT ≥ italic_F start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_J start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / italic_ϵ ),

Mexp[c(Tϵ,t+t)m2m]𝒲1(𝒜𝒬(Tϵ,t+t)((γ0,x0),),π)MTϵ,texp[ctm2m]+ϵ.subscript𝑀subscript𝑐superscriptsubscript𝑇italic-ϵ𝑡𝑡𝑚2𝑚subscript𝒲delimited-∥∥1superscriptsubscript𝒜𝒬subscript𝑇italic-ϵ𝑡𝑡subscript𝛾0subscript𝑥0𝜋superscript𝑀subscript𝑇italic-ϵ𝑡superscript𝑐superscript𝑡𝑚2𝑚italic-ϵM_{*}\exp\left[-c_{*}(T_{\epsilon,t}+t)^{\frac{m}{2-m}}\right]\leq\mathcal{W}_% {\left\lVert\cdot\right\rVert\wedge 1}\left(\mathcal{A}_{\mathcal{Q}}^{(T_{% \epsilon,t}+t)}((\gamma_{0},x_{0}),\cdot),\pi\right)\leq M^{*}T_{\epsilon,t}% \exp\left[-c^{*}{t}^{\frac{m}{2-m}}\right]+\epsilon.italic_M start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT roman_exp [ - italic_c start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ( italic_T start_POSTSUBSCRIPT italic_ϵ , italic_t end_POSTSUBSCRIPT + italic_t ) start_POSTSUPERSCRIPT divide start_ARG italic_m end_ARG start_ARG 2 - italic_m end_ARG end_POSTSUPERSCRIPT ] ≤ caligraphic_W start_POSTSUBSCRIPT ∥ ⋅ ∥ ∧ 1 end_POSTSUBSCRIPT ( caligraphic_A start_POSTSUBSCRIPT caligraphic_Q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_T start_POSTSUBSCRIPT italic_ϵ , italic_t end_POSTSUBSCRIPT + italic_t ) end_POSTSUPERSCRIPT ( ( italic_γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) , ⋅ ) , italic_π ) ≤ italic_M start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT italic_T start_POSTSUBSCRIPT italic_ϵ , italic_t end_POSTSUBSCRIPT roman_exp [ - italic_c start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT italic_t start_POSTSUPERSCRIPT divide start_ARG italic_m end_ARG start_ARG 2 - italic_m end_ARG end_POSTSUPERSCRIPT ] + italic_ϵ .
Proof.

We will apply Theorem 10 to obtain the conclusion. Choosing α𝛼\alphaitalic_α sufficiently small in the simultaneous drift condition from Lemma 15, and a compactness and continuity argument shows a simultaneous minorization condition holds.

It remains to verify expected diminishing adaptation (7). For γ𝒴𝛾𝒴\gamma\in\mathcal{Y}italic_γ ∈ caligraphic_Y, define

fγ(y)=exp(12log(det(γ))12yTγ1y).subscript𝑓𝛾𝑦12𝛾12superscript𝑦𝑇superscript𝛾1𝑦f_{\gamma}(y)=\exp\left(-\frac{1}{2}\log(\det(\gamma))-\frac{1}{2}y^{T}{\gamma% }^{-1}y\right).italic_f start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_y ) = roman_exp ( - divide start_ARG 1 end_ARG start_ARG 2 end_ARG roman_log ( roman_det ( italic_γ ) ) - divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_y start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_γ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_y ) .

Following [Andrieu and Moulines, 2006, Lemma 13], the mean value theorem gives the upper bound

d|fγ(y)fγ(y)|𝑑ysubscriptsuperscript𝑑subscript𝑓superscript𝛾𝑦subscript𝑓𝛾𝑦differential-d𝑦\displaystyle\int_{\mathbb{R}^{d}}\left|f_{\gamma^{\prime}}(y)-f_{\gamma}(y)% \right|dy∫ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_POSTSUBSCRIPT | italic_f start_POSTSUBSCRIPT italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_y ) - italic_f start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_y ) | italic_d italic_y 12d01fγt(y)|tr(γt1(γγ)+γt1yyTγt1(γγ))|𝑑t𝑑yabsent12subscriptsuperscript𝑑superscriptsubscript01subscript𝑓subscript𝛾𝑡𝑦trsuperscriptsubscript𝛾𝑡1superscript𝛾𝛾superscriptsubscript𝛾𝑡1𝑦superscript𝑦𝑇superscriptsubscript𝛾𝑡1superscript𝛾𝛾differential-d𝑡differential-d𝑦\displaystyle\leq\frac{1}{2}\int_{\mathbb{R}^{d}}\int_{0}^{1}f_{\gamma_{t}}(y)% \left|\text{tr}\left(\gamma_{t}^{-1}(\gamma^{\prime}-\gamma)+\gamma_{t}^{-1}yy% ^{T}\gamma_{t}^{-1}(\gamma^{\prime}-\gamma)\right)\right|dtdy≤ divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∫ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT italic_γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_y ) | tr ( italic_γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - italic_γ ) + italic_γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_y italic_y start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - italic_γ ) ) | italic_d italic_t italic_d italic_y
d(2π)d/2λγγF.absent𝑑superscript2𝜋𝑑2subscript𝜆subscriptdelimited-∥∥superscript𝛾𝛾𝐹\displaystyle\leq\frac{d(2\pi)^{d/2}}{\lambda_{*}}\left\lVert\gamma^{\prime}-% \gamma\right\rVert_{F}.≤ divide start_ARG italic_d ( 2 italic_π ) start_POSTSUPERSCRIPT italic_d / 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_λ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT end_ARG ∥ italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - italic_γ ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT .

Since the proposal gKsubscript𝑔𝐾g_{K}italic_g start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT is symmetric, then for Borel φ:𝒳[0,1]:𝜑𝒳01\varphi:\mathcal{X}\to[0,1]italic_φ : caligraphic_X → [ 0 , 1 ], let ψ(x,y)=(φ(y)φ(x))a(x,y)𝜓𝑥𝑦𝜑𝑦𝜑𝑥𝑎𝑥𝑦\psi(x,y)=(\varphi(y)-\varphi(x))a(x,y)italic_ψ ( italic_x , italic_y ) = ( italic_φ ( italic_y ) - italic_φ ( italic_x ) ) italic_a ( italic_x , italic_y ) and

𝒫γφ(x)𝒫γφ(x)subscript𝒫superscript𝛾𝜑𝑥subscript𝒫𝛾𝜑𝑥\displaystyle\mathcal{P}_{\gamma^{\prime}}\varphi(x)-\mathcal{P}_{\gamma}% \varphi(x)caligraphic_P start_POSTSUBSCRIPT italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_φ ( italic_x ) - caligraphic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT italic_φ ( italic_x ) =ψ(x,y)gK(γ1/2(yx))𝑑yψ(x,y)gK(γ1/2(yx))𝑑yabsent𝜓𝑥𝑦subscript𝑔𝐾superscriptsuperscript𝛾12𝑦𝑥differential-d𝑦𝜓𝑥𝑦subscript𝑔𝐾superscript𝛾12𝑦𝑥differential-d𝑦\displaystyle=\int\psi(x,y)g_{K}({\gamma^{\prime}}^{-1/2}(y-x))dy-\int\psi(x,y% )g_{K}(\gamma^{-1/2}(y-x))dy= ∫ italic_ψ ( italic_x , italic_y ) italic_g start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT ( italic_y - italic_x ) ) italic_d italic_y - ∫ italic_ψ ( italic_x , italic_y ) italic_g start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( italic_γ start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT ( italic_y - italic_x ) ) italic_d italic_y
2Kexp(z2/2)𝑑zx+K|fγ(y)fγ(y)|𝑑yabsent2subscript𝐾superscriptdelimited-∥∥𝑧22differential-d𝑧subscript𝑥𝐾subscript𝑓superscript𝛾𝑦subscript𝑓𝛾𝑦differential-d𝑦\displaystyle\leq\frac{2}{\int_{K}\exp(-\left\lVert z\right\rVert^{2}/2)dz}% \int_{x+K}\left|f_{\gamma^{\prime}}(y)-f_{\gamma}(y)\right|dy≤ divide start_ARG 2 end_ARG start_ARG ∫ start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT roman_exp ( - ∥ italic_z ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / 2 ) italic_d italic_z end_ARG ∫ start_POSTSUBSCRIPT italic_x + italic_K end_POSTSUBSCRIPT | italic_f start_POSTSUBSCRIPT italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_y ) - italic_f start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_y ) | italic_d italic_y
JγγFabsentsuperscript𝐽subscriptdelimited-∥∥superscript𝛾𝛾𝐹\displaystyle\leq J^{*}\left\lVert\gamma^{\prime}-\gamma\right\rVert_{F}≤ italic_J start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - italic_γ ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT

where J=2d(2π)d/2/[λKexp(z2/2)𝑑z]superscript𝐽2𝑑superscript2𝜋𝑑2delimited-[]subscript𝜆subscript𝐾superscriptdelimited-∥∥𝑧22differential-d𝑧J^{*}=2d(2\pi)^{d/2}/[\lambda_{*}\int_{K}\exp(-\left\lVert z\right\rVert^{2}/2% )dz]italic_J start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = 2 italic_d ( 2 italic_π ) start_POSTSUPERSCRIPT italic_d / 2 end_POSTSUPERSCRIPT / [ italic_λ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ∫ start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT roman_exp ( - ∥ italic_z ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / 2 ) italic_d italic_z ]. Taking the supremum over φ𝜑\varphiitalic_φ, we then have for each t+𝑡subscriptt\in\mathbb{Z}_{+}italic_t ∈ blackboard_Z start_POSTSUBSCRIPT + end_POSTSUBSCRIPT,

supx𝒳𝔼[𝒫Γt+1(x,)𝒫Γt(x,)TVXt=x]subscriptsupremum𝑥𝒳𝔼delimited-[]conditionalsubscriptdelimited-∥∥subscript𝒫subscriptΓ𝑡1𝑥subscript𝒫subscriptΓ𝑡𝑥TVsubscript𝑋𝑡𝑥\displaystyle\sup_{x\in\mathcal{X}}\mathbb{E}\left[\left\lVert\mathcal{P}_{% \Gamma_{t+1}}(x,\cdot)-\mathcal{P}_{\Gamma_{t}}(x,\cdot)\right\rVert_{\text{TV% }}\mid X_{t}=x\right]roman_sup start_POSTSUBSCRIPT italic_x ∈ caligraphic_X end_POSTSUBSCRIPT blackboard_E [ ∥ caligraphic_P start_POSTSUBSCRIPT roman_Γ start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x , ⋅ ) - caligraphic_P start_POSTSUBSCRIPT roman_Γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x , ⋅ ) ∥ start_POSTSUBSCRIPT TV end_POSTSUBSCRIPT ∣ italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_x ] Jsupx𝒳𝔼[Γt+1ΓtFXt=x]absentsuperscript𝐽subscriptsupremum𝑥𝒳𝔼delimited-[]conditionalsubscriptdelimited-∥∥subscriptΓ𝑡1subscriptΓ𝑡𝐹subscript𝑋𝑡𝑥\displaystyle\leq J^{*}\sup_{x\in\mathcal{X}}\mathbb{E}\left[\left\lVert\Gamma% _{t+1}-\Gamma_{t}\right\rVert_{F}\mid X_{t}=x\right]≤ italic_J start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT roman_sup start_POSTSUBSCRIPT italic_x ∈ caligraphic_X end_POSTSUBSCRIPT blackboard_E [ ∥ roman_Γ start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT - roman_Γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT ∣ italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_x ]
JG(t).absentsuperscript𝐽𝐺𝑡\displaystyle\leq J^{*}G(t).≤ italic_J start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT italic_G ( italic_t ) .

6 Final discussion

The general lower bound convergence rates developed here in combination with upper bounds for adaptive MCMC can provide useful guidance in designing adaptation strategies in MCMC. We showed that the lower bound for weak convergence in Theorem 4 can produce the same rate as the lower bound in total variation from Theorem 1. We also used a novel expected diminishing adaptation condition (7) to show these lower bounds can be accompanied by upper bounds with subgeometric convergence rates. Our contributions are useful not only in understanding the convergence of adaptive MCMC, but also for gaining intuition for constructing adaptation strategies in practice.

Choosing an optimal adaptation strategy for an adaptive MCMC simulation remains a difficult task in general and our subgeometric upper bounds are limited by requiring the adaptation to diminish sufficiently fast according to (7). While this is to be expected, some interesting future research directions could include finding more precise classes of adaptation strategies that are capable of achieving upper bounds that can approximately match the lower bound rate. Another area of interest is studying requirements on the adaptation that result in geometric convergence rates for adaptive MCMC.

Appendix A Supporting technical results

The following is a technical result to ensure Borel measurability of conditional Wasserstein distances used in adaptive MCMC.

Proposition 17.

Let 𝒳𝒳\mathcal{X}caligraphic_X be a Polish space and 𝒴𝒴\mathcal{Y}caligraphic_Y be a Borel measurable space. Assume γμγmaps-to𝛾subscript𝜇𝛾\gamma\mapsto\mu_{\gamma}italic_γ ↦ italic_μ start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT is Borel measurable where μγsubscript𝜇𝛾\mu_{\gamma}italic_μ start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT is a Borel probability measure on 𝒳𝒳\mathcal{X}caligraphic_X and γ𝒴𝛾𝒴\gamma\in\mathcal{Y}italic_γ ∈ caligraphic_Y. Let c:𝒳×𝒳[0,):𝑐𝒳𝒳0c:\mathcal{X}\times\mathcal{X}\to[0,\infty)italic_c : caligraphic_X × caligraphic_X → [ 0 , ∞ ) be a lower semicontinuous function and for each γ,γ𝒴𝛾superscript𝛾𝒴\gamma,\gamma^{\prime}\in\mathcal{Y}italic_γ , italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_Y, let

𝒲c(μγ,μγ)=infξ𝒞(μγ,μγ)𝒳×𝒳c(u,v)ξ(du,dv).subscript𝒲𝑐subscript𝜇𝛾subscript𝜇superscript𝛾subscriptinfimum𝜉𝒞subscript𝜇𝛾subscript𝜇superscript𝛾subscript𝒳𝒳𝑐𝑢𝑣𝜉𝑑𝑢𝑑𝑣\mathcal{W}_{c}\left(\mu_{\gamma},\mu_{\gamma^{\prime}}\right)=\inf_{\xi\in% \mathcal{C}\left(\mu_{\gamma},\mu_{\gamma^{\prime}}\right)}\int_{\mathcal{X}% \times\mathcal{X}}c(u,v)\xi(du,dv).caligraphic_W start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ( italic_μ start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT , italic_μ start_POSTSUBSCRIPT italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) = roman_inf start_POSTSUBSCRIPT italic_ξ ∈ caligraphic_C ( italic_μ start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT , italic_μ start_POSTSUBSCRIPT italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT ∫ start_POSTSUBSCRIPT caligraphic_X × caligraphic_X end_POSTSUBSCRIPT italic_c ( italic_u , italic_v ) italic_ξ ( italic_d italic_u , italic_d italic_v ) .

Then there is a Borel measurable choice of the function

γ,γ𝒲c(μγ,μγ).maps-to𝛾superscript𝛾subscript𝒲𝑐subscript𝜇𝛾subscript𝜇superscript𝛾\gamma,\gamma^{\prime}\mapsto\mathcal{W}_{c}(\mu_{\gamma},\mu_{\gamma^{\prime}% }).italic_γ , italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ↦ caligraphic_W start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ( italic_μ start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT , italic_μ start_POSTSUBSCRIPT italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) .
Proof.

First assume c(,)𝑐c(\cdot,\cdot)italic_c ( ⋅ , ⋅ ) is continuous. Let T𝑇Titalic_T be the set of Borel optimal couplings ξP(X×X)superscript𝜉𝑃𝑋𝑋\xi^{*}\in P(X\times X)italic_ξ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ italic_P ( italic_X × italic_X ) satisfying

infξ𝒞(μ,ν)𝒳×𝒳c(u,v)ξ(du,dv)=𝒳×𝒳c(u,v)ξ(du,dv)subscriptinfimum𝜉𝒞𝜇𝜈subscript𝒳𝒳𝑐𝑢𝑣𝜉𝑑𝑢𝑑𝑣subscript𝒳𝒳𝑐𝑢𝑣superscript𝜉𝑑𝑢𝑑𝑣\inf_{\xi\in\mathcal{C}\left(\mu,\nu\right)}\int_{\mathcal{X}\times\mathcal{X}% }c(u,v)\xi(du,dv)=\int_{\mathcal{X}\times\mathcal{X}}c(u,v)\xi^{*}(du,dv)roman_inf start_POSTSUBSCRIPT italic_ξ ∈ caligraphic_C ( italic_μ , italic_ν ) end_POSTSUBSCRIPT ∫ start_POSTSUBSCRIPT caligraphic_X × caligraphic_X end_POSTSUBSCRIPT italic_c ( italic_u , italic_v ) italic_ξ ( italic_d italic_u , italic_d italic_v ) = ∫ start_POSTSUBSCRIPT caligraphic_X × caligraphic_X end_POSTSUBSCRIPT italic_c ( italic_u , italic_v ) italic_ξ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_d italic_u , italic_d italic_v )

for some Borel probability measures μ,ν𝜇𝜈\mu,\nuitalic_μ , italic_ν on 𝒳𝒳\mathcal{X}caligraphic_X. Define Φ:TP(𝒳)×P(𝒳):Φ𝑇𝑃𝒳𝑃𝒳\Phi:T\to P(\mathcal{X})\times P(\mathcal{X})roman_Φ : italic_T → italic_P ( caligraphic_X ) × italic_P ( caligraphic_X ). By [Villani, 2009, Theorem 4.1], then ΦΦ\Phiroman_Φ is surjective and [Villani, 2009, Theorem 5.20] 𝒞(μγ,μγ)𝒞subscript𝜇𝛾superscriptsubscript𝜇𝛾\mathcal{C}\left(\mu_{\gamma},\mu_{\gamma}^{\prime}\right)caligraphic_C ( italic_μ start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT , italic_μ start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) is a Polish space. By the Lusin-Novikov uniformization theorem [Kechris, 2012, Theorem 18.10], there is a Borel measurable right inverse Φ1superscriptΦ1\Phi^{-1}roman_Φ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT. Let ψ:γ,γ(μγ,μγ):𝜓maps-to𝛾superscript𝛾subscript𝜇𝛾superscriptsubscript𝜇𝛾\psi:\gamma,\gamma^{\prime}\mapsto(\mu_{\gamma},\mu_{\gamma}^{\prime})italic_ψ : italic_γ , italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ↦ ( italic_μ start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT , italic_μ start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) and this is Borel measurable. Thus, Φ1(ψ)superscriptΦ1𝜓\Phi^{-1}(\psi)roman_Φ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_ψ ) is Borel measurable and so is

γ,γ𝒳×𝒳c(u,v)ξγ,γ(du,dv).maps-to𝛾superscript𝛾subscript𝒳𝒳𝑐𝑢𝑣superscriptsubscript𝜉𝛾superscript𝛾𝑑𝑢𝑑𝑣\gamma,\gamma^{\prime}\mapsto\int_{\mathcal{X}\times\mathcal{X}}c(u,v)\xi_{% \gamma,\gamma^{\prime}}^{*}(du,dv).italic_γ , italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ↦ ∫ start_POSTSUBSCRIPT caligraphic_X × caligraphic_X end_POSTSUBSCRIPT italic_c ( italic_u , italic_v ) italic_ξ start_POSTSUBSCRIPT italic_γ , italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_d italic_u , italic_d italic_v ) .

Since c(,)𝑐c(\cdot,\cdot)italic_c ( ⋅ , ⋅ ) is lower semicontinuous, then it is the monotone limit of continuous functions (cn)nsubscriptsubscript𝑐𝑛𝑛(c_{n})_{n}( italic_c start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT. Then by monotone convergence cn(u,v)ξγ,γ(du,dv)c(u,v)ξγ,γ(du,dv)subscript𝑐𝑛𝑢𝑣superscriptsubscript𝜉𝛾superscript𝛾𝑑𝑢𝑑𝑣𝑐𝑢𝑣superscriptsubscript𝜉𝛾superscript𝛾𝑑𝑢𝑑𝑣\int c_{n}(u,v)\xi_{\gamma,\gamma^{\prime}}^{*}(du,dv)\to\int c(u,v)\xi_{% \gamma,\gamma^{\prime}}^{*}(du,dv)∫ italic_c start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_u , italic_v ) italic_ξ start_POSTSUBSCRIPT italic_γ , italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_d italic_u , italic_d italic_v ) → ∫ italic_c ( italic_u , italic_v ) italic_ξ start_POSTSUBSCRIPT italic_γ , italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_d italic_u , italic_d italic_v ). Since this is a limit of a measurable sequence, the conclusion follows at once. ∎

The following provides useful properties for the function H1,φsubscript𝐻1𝜑H_{1,\varphi}italic_H start_POSTSUBSCRIPT 1 , italic_φ end_POSTSUBSCRIPT and φ𝜑\varphiitalic_φ defined in Section 2.

Lemma 18.

Let φ:(0,)(0,):𝜑00\varphi:(0,\infty)\to(0,\infty)italic_φ : ( 0 , ∞ ) → ( 0 , ∞ ) be concave and define for w1𝑤1w\geq 1italic_w ≥ 1,

H1,φ(w)=1w1φ(v)𝑑v.subscript𝐻1𝜑𝑤superscriptsubscript1𝑤1𝜑𝑣differential-d𝑣\displaystyle H_{1,\varphi}(w)=\int_{1}^{w}\frac{1}{\varphi(v)}dv.italic_H start_POSTSUBSCRIPT 1 , italic_φ end_POSTSUBSCRIPT ( italic_w ) = ∫ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_φ ( italic_v ) end_ARG italic_d italic_v . (22)

Then φ𝜑\varphiitalic_φ is non-decreasing and H()𝐻H(\cdot)italic_H ( ⋅ ) is strictly increasing.

Proof.

The fundamental theorem of calculus implies H1,φ()subscript𝐻1𝜑H_{1,\varphi}(\cdot)italic_H start_POSTSUBSCRIPT 1 , italic_φ end_POSTSUBSCRIPT ( ⋅ ) is strictly increasing and then this implies that H1,φ1()superscriptsubscript𝐻1𝜑1H_{1,\varphi}^{-1}(\cdot)italic_H start_POSTSUBSCRIPT 1 , italic_φ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( ⋅ ) exists. We need to show that φ𝜑\varphiitalic_φ is non-decreasing. Suppose by contradiction that for some u<v𝑢𝑣u<vitalic_u < italic_v that φ(v)<φ(u)𝜑𝑣𝜑𝑢\varphi(v)<\varphi(u)italic_φ ( italic_v ) < italic_φ ( italic_u ). By the subgradient inequality using concavity, for any subgradient φ(u)>0𝜑𝑢0\partial\varphi(u)>0∂ italic_φ ( italic_u ) > 0, φ(v)φ(u)φ(u)(vu)𝜑𝑣𝜑𝑢𝜑𝑢𝑣𝑢\varphi(v)\leq\varphi(u)-\partial\varphi(u)(v-u)italic_φ ( italic_v ) ≤ italic_φ ( italic_u ) - ∂ italic_φ ( italic_u ) ( italic_v - italic_u ). But for large v𝑣vitalic_v, this contradicts that φ0𝜑0\varphi\geq 0italic_φ ≥ 0. ∎

The next simple lemma is used for drift conditions.

Lemma 19.

Assume there is a Borel function V:𝒳[0,):𝑉𝒳0V:\mathcal{X}\to[0,\infty)italic_V : caligraphic_X → [ 0 , ∞ ), an strictly increasing function φ:[0,)[0,):𝜑00\varphi:[0,\infty)\to[0,\infty)italic_φ : [ 0 , ∞ ) → [ 0 , ∞ ), and a constant K>0𝐾0K>0italic_K > 0 such that

(𝒫γV)(x)V(x)φ(V(x))+Ksubscript𝒫𝛾𝑉𝑥𝑉𝑥𝜑𝑉𝑥𝐾\displaystyle(\mathcal{P}_{\gamma}V)(x)-V(x)\leq-\varphi(V(x))+K( caligraphic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT italic_V ) ( italic_x ) - italic_V ( italic_x ) ≤ - italic_φ ( italic_V ( italic_x ) ) + italic_K (23)

holds for every x,γ𝒳×𝒴𝑥𝛾𝒳𝒴x,\gamma\in\mathcal{X}\times\mathcal{Y}italic_x , italic_γ ∈ caligraphic_X × caligraphic_Y. Then for any δ(0,1)𝛿01\delta\in(0,1)italic_δ ∈ ( 0 , 1 ) and Cδ={x𝒳:V(x)φ1(K/(1δ))}subscript𝐶𝛿conditional-set𝑥𝒳𝑉𝑥superscript𝜑1𝐾1𝛿C_{\delta}=\{x\in\mathcal{X}:V(x)\leq\varphi^{-1}(K/(1-\delta))\}italic_C start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT = { italic_x ∈ caligraphic_X : italic_V ( italic_x ) ≤ italic_φ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_K / ( 1 - italic_δ ) ) },

(𝒫γV)(x)V(x)δφ(V(x))+[φ1(K/(1δ))+K]ICδ(x)subscript𝒫𝛾𝑉𝑥𝑉𝑥𝛿𝜑𝑉𝑥delimited-[]superscript𝜑1𝐾1𝛿𝐾subscript𝐼subscript𝐶𝛿𝑥(\mathcal{P}_{\gamma}V)(x)-V(x)\leq-\delta\varphi(V(x))+\left[\varphi^{-1}(K/(% 1-\delta))+K\right]I_{C_{\delta}}(x)( caligraphic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT italic_V ) ( italic_x ) - italic_V ( italic_x ) ≤ - italic_δ italic_φ ( italic_V ( italic_x ) ) + [ italic_φ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_K / ( 1 - italic_δ ) ) + italic_K ] italic_I start_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x )

holds for all x𝒳𝑥𝒳x\in\mathcal{X}italic_x ∈ caligraphic_X.

Proof.

By the drift condition,

(𝒫γV)(x)V(x)(KR1)φ(V(x))δφ(V(x))subscript𝒫𝛾𝑉𝑥𝑉𝑥𝐾𝑅1𝜑𝑉𝑥𝛿𝜑𝑉𝑥\displaystyle(\mathcal{P}_{\gamma}V)(x)-V(x)\leq\left(\frac{K}{R}-1\right)% \varphi(V(x))\leq-\delta\varphi(V(x))( caligraphic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT italic_V ) ( italic_x ) - italic_V ( italic_x ) ≤ ( divide start_ARG italic_K end_ARG start_ARG italic_R end_ARG - 1 ) italic_φ ( italic_V ( italic_x ) ) ≤ - italic_δ italic_φ ( italic_V ( italic_x ) )

holds for all φ(V(x))K/(1δ)𝜑𝑉𝑥𝐾1𝛿\varphi(V(x))\geq K/(1-\delta)italic_φ ( italic_V ( italic_x ) ) ≥ italic_K / ( 1 - italic_δ ). Since φ𝜑\varphiitalic_φ is strictly concave, it is strictly increasing, so then

(𝒫γV)(x)V(x)+Kφ1(K/(1δ))+Ksubscript𝒫𝛾𝑉𝑥𝑉𝑥𝐾superscript𝜑1𝐾1𝛿𝐾\displaystyle(\mathcal{P}_{\gamma}V)(x)\leq V(x)+K\leq\varphi^{-1}(K/(1-\delta% ))+K( caligraphic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT italic_V ) ( italic_x ) ≤ italic_V ( italic_x ) + italic_K ≤ italic_φ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_K / ( 1 - italic_δ ) ) + italic_K

for all φ(V(x))K/(1δ)𝜑𝑉𝑥𝐾1𝛿\varphi(V(x))\leq K/(1-\delta)italic_φ ( italic_V ( italic_x ) ) ≤ italic_K / ( 1 - italic_δ ). ∎

The following is a standrad result in topology.

Lemma 20.

Let A𝐴Aitalic_A be a closed set in dsuperscript𝑑\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT. Then for any xA𝑥𝐴x\notin Aitalic_x ∉ italic_A, d(x,A)=d(x,A).𝑑𝑥𝐴𝑑𝑥𝐴d(x,A)=d(x,\partial A).italic_d ( italic_x , italic_A ) = italic_d ( italic_x , ∂ italic_A ) .

References

  • Andrieu and Moulines [2006] Christophe Andrieu and Éric Moulines. On the ergodicity properties of some adaptive MCMC algorithms. The Annals of Applied Probability, 16(3):1462 – 1505, 2006. doi: 10.1214/105051606000000286.
  • Brešar and Mijatović [2024] Miha Brešar and Aleksandar Mijatović. Subexponential lower bounds for f-ergodic Markov processes. Probability Theory and Related Fields, pages 1–58, 2024.
  • Douc et al. [2004] Randal Douc, Gersende Fort, Eric Moulines, and Philippe Soulier. Practical drift conditions for subgeometric rates of convergence. The Annals of Applied Probability, 14(3):1353–1377, 2004.
  • Dudley [2018] Richard M Dudley. Real analysis and probability. Chapman and Hall/CRC, 2018.
  • Durmus et al. [2016] Alain Durmus, Gersende Fort, and Éric Moulines. Subgeometric rates of convergence in Wasserstein distance for Markov chains. Annales de l’Institut Henri Poincaré, Probabilités et Statistiques, 52(4):1799 – 1822, 2016. doi: 10.1214/15-aihp699.
  • Fort and Moulines [2000] Gersende Fort and Eric Moulines. V-subgeometric ergodicity for a Hastings–Metropolis algorithm. Statistics and Probability Letters, 49(4):401–410, 2000.
  • Haario et al. [2001] Heikki Haario, Eero Saksman, and Johanna Tamminen. An adaptive Metropolis algorithm. Bernoulli, 7(2):223–242, 2001.
  • Hairer [2009] Martin Hairer. How hot can a heat bath get? Communications in Mathematical Physics, 292(1):131–177, 2009.
  • Jarner and Tweedie [2001] S. F. Jarner and R. L. Tweedie. Locally contracting iterated functions and stability of Markov chains. Journal of Applied Probability, 38(2):494–507, 2001. ISSN 00219002.
  • Jarner and Roberts [2002] Søren F. Jarner and Gareth O. Roberts. Polynomial convergence rates of Markov chains. The Annals of Applied Probability, 12(1):224 – 247, 2002. doi: 10.1214/aoap/1015961162.
  • Kamatani [2009] Kengo Kamatani. Metropolis–Hastings algorithms with acceptance ratios of nearly 1. Annals of the Institute of Statistical Mathematics, 61(4):949–967, 2009. ISSN 1572-9052. doi: 10.1007/s10463-008-0180-6.
  • Kechris [2012] Alexander Kechris. Classical descriptive set theory, volume 156. Springer Science & Business Media, 2012.
  • Laitinen and Vihola [2024] Pietari Laitinen and Matti Vihola. An invitation to adaptive Markov chain Monte Carlo convergence theory, 2024.
  • Pompe et al. [2020] Emilia Pompe, Chris Holmes, and Krzysztof Latuszynski. A framework for adaptive MCMC targeting multimodal distributions. The Annals of Statistics, 48(5):2930 – 2952, 2020. doi: 10.1214/19-aos1916.
  • Roberts and Rosenthal [2007] Gareth O. Roberts and Jeffrey S. Rosenthal. Coupling and ergodicity of adaptive Markov chain Monte Carlo algorithms. Journal of Applied Probability, 44(2):458–475, 2007.
  • Sandrić et al. [2022] Nikola Sandrić, Ari Arapostathis, and Guodong Pang. Subexponential upper and lower bounds in Wasserstein distance for Markov processes. Applied Mathematics & Optimization, 85(3):37, May 2022.
  • Schmidler and Woodard [2011] Scott C. Schmidler and Dawn B. Woodard. Lower bounds on the convergence rates of adaptive MCMC methods. Tech. rep., Duke Univ., 2011.
  • Strassen [1965] V. Strassen. The existence of probability measures with given marginals. The Annals of Mathematical Statistics, 36(2):423–439, 1965.
  • Villani [2003] Cédric Villani. Topics in Optimal Transportation. Providence, RI: Amer. Math. Soc., 2003.
  • Villani [2009] Cédric Villani. Optimal Transport: Old and New. Berlin: Springer, 2009.