1 Introduction
Let π 𝜋 \pi italic_π be a Borel probability measure on a Polish space 𝒳 𝒳 \mathcal{X} caligraphic_X .
Adaptive Markov chain Monte Carlo [Haario et al., 2001 , Roberts and Rosenthal, 2007 ] is a widely successful framework to simulate realizations from π 𝜋 \pi italic_π when optimal tuning parameters for the Markov chain are not readily available.
The adaptive process ( Γ t , X t ) t = 1 ∞ superscript subscript subscript Γ 𝑡 subscript 𝑋 𝑡 𝑡 1 (\Gamma_{t},X_{t})_{t=1}^{\infty} ( roman_Γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT is constructed from a family of Markov kernels indexed by a set of potential tuning parameters.
The discrete-time adaptive process first updates the tuning parameter Γ t | ( Γ s , X s ) 0 ≤ s ≤ t − 1 conditional subscript Γ 𝑡 subscript subscript Γ 𝑠 subscript 𝑋 𝑠 0 𝑠 𝑡 1 \Gamma_{t}|(\Gamma_{s},X_{s})_{0\leq s\leq t-1} roman_Γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | ( roman_Γ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT 0 ≤ italic_s ≤ italic_t - 1 end_POSTSUBSCRIPT with an adaptation strategy utilizing previous history and next, updates X t | Γ t , X t − 1 conditional subscript 𝑋 𝑡 subscript Γ 𝑡 subscript 𝑋 𝑡 1
X_{t}|\Gamma_{t},X_{t-1} italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | roman_Γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT using a Markov transition kernel.
The goal is for the adaptive process to “learn” optimal tuning parameters so that the marginal distribution of the random variable X t subscript 𝑋 𝑡 X_{t} italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT produces a close approximation to the measure π 𝜋 \pi italic_π .
With a large option for adaptation strategies, theoretical convergence rates of adaptive algorithms are less understood than for non-adaptive Markov chain Monte Carlo (MCMC) where fixed tuning parameters are chosen carefully beforehand.
In particular, a theoretical understanding of the rate of convergence is essential in applications as it helps to ensure a stable and reliable Monte Carlo simulation.
However, adaptive MCMC can exhibit empirical performance superseding the performance of standard MCMC even though much of the theoretical understanding is lacking.
For example, adaptive MCMC is widely used to automatically learn the covariance in random-walk Metropolis-Hastings [Haario et al., 2001 ] , which is often difficult or impossible to choose optimally with only fixed tuning parameter choices.
The main contributions of this paper develop general subgeometric lower bounds in total variation and the weak convergence rate of adaptive MCMC paired with upper bounds under strong conditions on the rate at which adaptation diminishes.
Applications of the theory are demonstrated on an adaptive unadjusted Langevin algorithm, Metropolis-Hastings independence sampler, and an adaptive Metropolis-Hastings random-walk.
The lower bounds for convergence hold under arbitrary adaptation plans and serve as a measurement of the optimal convergence behavior for adaptive MCMC.
The techniques for obtaining these lower bounds are based on finding large discrepancies between the tail probabilities of the marginal adaptive process and the target measure π 𝜋 \pi italic_π .
Since the convergence rate is determined by tail properties, this may guide further theoretical understanding of some modern adaptation strategies that restrict adaptation to compact sets [Pompe et al., 2020 ] .
Convergence rate lower bounds can also be of practical use in applications to determine if an appropriate rate is achievable so that central limit theorems may hold [Andrieu and Moulines, 2006 , Laitinen and Vihola, 2024 ] .
One barrier in developing lower bounds for adaptive MCMC is due to the non-Markovian, non-reversible nature of these processes and spectral analysis for reversible Markov processes is not directly available.
To the best of our knowledge, the lower bounds for weak convergence developed here are novel, even when applied to non-adapted Markov chains, and general total variation lower bounds have not yet been explored for adaptive MCMC.
In specific situations, adaptive random-walk algorithms have been shown to improve “local” behavior but fail to adapt to “global” properties of the target measure, such as the tail probabilities, and proven to experience poor convergence properties [Schmidler and Woodard, 2011 ] .
Related research develops general lower bounds in total variation for Markov processes [Hairer, 2009 , Theorem 3.6, Corollary 3.7] .
More recently, this technique has also been extended to polynomial rate lower bounds in unbounded Wasserstein distances for some Markov processes [Sandrić et al., 2022 , Theorem 1.2] .
When the tail decay of the target measure is unavailable, lower bounds for Markov processes in total variation have recently been developed, but a precise computation of the constants is not available [Brešar and Mijatović, 2024 ] .
In addition to lower bounds, we develop explicit quantitative subgeometric upper bounds in total variation that can match the lower bound rate if the adaptation diminishes sufficiently fast.
The condition required on the adaptation is similar to the well-known diminishing adaptation condition [Roberts and Rosenthal, 2007 ] often used for the asymptotic convergence of adaptive MCMC.
To the best of our knowledge, this is the first subgeometric upper bound to quantify the mixing for adaptive MCMC in total variation.
In comparison, existing convergence results require strong assumptions for adaptive MCMC and are not quantitative [Andrieu and Moulines, 2006 ] or develop central limit theorems through Poisson’s equation [Laitinen and Vihola, 2024 ] .
The organization of this article is as follows.
Section 2 first develops lower bounds in total variation for large classes of adaptation strategies and then extends these lower bounds to weak convergence when the state space is Euclidean.
A lower bound is shown on a concrete example for the adapted unadjusted Langevin algorithm.
Section 3 proves comparable upper bounds under diminishing conditions on the adaptation plans that are capable of approximately matching the lower bound rates.
Section 4 illustrates the lower bounds on a toy example with an adaptive Metropolis-Hastings independence sampler, and Section 5 applies the lower bounds to the popular adaptive random-walk Metropolis-Hastings.
Section 6 provides a final discussion on the results and future research directions.
2 Lower bounds on the convergence of adaptive MCMC
For two Borel probability measures μ , ν 𝜇 𝜈
\mu,\nu italic_μ , italic_ν on 𝒳 𝒳 \mathcal{X} caligraphic_X , let 𝒞 ( μ , ν ) 𝒞 𝜇 𝜈 \mathcal{C}(\mu,\nu) caligraphic_C ( italic_μ , italic_ν ) be the set of all couplings consisting of Borel probability measures on 𝒳 × 𝒳 𝒳 𝒳 \mathcal{X}\times\mathcal{X} caligraphic_X × caligraphic_X satisfying Γ ( ⋅ × 𝒳 ) = μ \Gamma(\cdot\times\mathcal{X})=\mu roman_Γ ( ⋅ × caligraphic_X ) = italic_μ and Γ ( 𝒳 × ⋅ ) = ν \Gamma(\mathcal{X}\times\cdot)=\nu roman_Γ ( caligraphic_X × ⋅ ) = italic_ν . Denote then the total variation distance between μ 𝜇 \mu italic_μ and ν 𝜈 \nu italic_ν as the best probability of the off-diagonal over all possible couplings, that is,
∥ μ − ν ∥ TV = inf ξ ∈ 𝒞 ( μ , ν ) ξ ( { ( x , y ) ∈ 𝒳 × 𝒳 : x ≠ y } ) . subscript delimited-∥∥ 𝜇 𝜈 TV subscript infimum 𝜉 𝒞 𝜇 𝜈 𝜉 conditional-set 𝑥 𝑦 𝒳 𝒳 𝑥 𝑦 \left\lVert\mu-\nu\right\rVert_{\text{TV}}=\inf_{\xi\in\mathcal{C}(\mu,\nu)}%
\xi(\{(x,y)\in\mathcal{X}\times\mathcal{X}:x\not=y\}). ∥ italic_μ - italic_ν ∥ start_POSTSUBSCRIPT TV end_POSTSUBSCRIPT = roman_inf start_POSTSUBSCRIPT italic_ξ ∈ caligraphic_C ( italic_μ , italic_ν ) end_POSTSUBSCRIPT italic_ξ ( { ( italic_x , italic_y ) ∈ caligraphic_X × caligraphic_X : italic_x ≠ italic_y } ) .
Denote the min and max of a , b ∈ ℝ 𝑎 𝑏
ℝ a,b\in\mathbb{R} italic_a , italic_b ∈ blackboard_R by a ∧ b 𝑎 𝑏 a\wedge b italic_a ∧ italic_b and a ∨ b 𝑎 𝑏 a\vee b italic_a ∨ italic_b respectively.
On a Polish space ( 𝒳 , d ) 𝒳 𝑑 (\mathcal{X},d) ( caligraphic_X , italic_d ) where d : 𝒳 × 𝒳 → [ 0 , ∞ ) : 𝑑 → 𝒳 𝒳 0 d:\mathcal{X}\times\mathcal{X}\to[0,\infty) italic_d : caligraphic_X × caligraphic_X → [ 0 , ∞ ) is a metric, we denote the Wasserstein distance that metrizes the weak convergence of probability measures [Dudley, 2018 , Theorem 11.3.3]
𝒲 d ∧ 1 ( μ , ν ) = inf ξ ∈ 𝒞 ( μ , ν ) ∫ 𝒳 × 𝒳 [ d ( x , y ) ∧ 1 ] ξ ( d x , d y ) . subscript 𝒲 𝑑 1 𝜇 𝜈 subscript infimum 𝜉 𝒞 𝜇 𝜈 subscript 𝒳 𝒳 delimited-[] 𝑑 𝑥 𝑦 1 𝜉 𝑑 𝑥 𝑑 𝑦 \mathcal{W}_{d\wedge 1}(\mu,\nu)=\inf_{\xi\in\mathcal{C}(\mu,\nu)}\int_{%
\mathcal{X}\times\mathcal{X}}\left[d(x,y)\wedge 1\right]\xi(dx,dy). caligraphic_W start_POSTSUBSCRIPT italic_d ∧ 1 end_POSTSUBSCRIPT ( italic_μ , italic_ν ) = roman_inf start_POSTSUBSCRIPT italic_ξ ∈ caligraphic_C ( italic_μ , italic_ν ) end_POSTSUBSCRIPT ∫ start_POSTSUBSCRIPT caligraphic_X × caligraphic_X end_POSTSUBSCRIPT [ italic_d ( italic_x , italic_y ) ∧ 1 ] italic_ξ ( italic_d italic_x , italic_d italic_y ) .
Let 𝒳 𝒳 \mathcal{X} caligraphic_X be a Polish space and 𝒴 𝒴 \mathcal{Y} caligraphic_Y be a Borel measurable space equipped with their Borel sigma-algebras ℬ ( 𝒳 ) ℬ 𝒳 \mathcal{B}(\mathcal{X}) caligraphic_B ( caligraphic_X ) and ℬ ( 𝒴 ) ℬ 𝒴 \mathcal{B}(\mathcal{Y}) caligraphic_B ( caligraphic_Y ) respectively where 𝒳 𝒳 \mathcal{X} caligraphic_X is the state space and 𝒴 𝒴 \mathcal{Y} caligraphic_Y is the space for tuning parameters.
We now define the adaptive process ( Γ t , X t ) t = 0 ∞ superscript subscript subscript Γ 𝑡 subscript 𝑋 𝑡 𝑡 0 (\Gamma_{t},X_{t})_{t=0}^{\infty} ( roman_Γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT on 𝒴 × 𝒳 𝒴 𝒳 \mathcal{Y}\times\mathcal{X} caligraphic_Y × caligraphic_X using the filtration ℋ t = ℬ ( Γ s , X s , 0 ≤ s ≤ t ) subscript ℋ 𝑡 ℬ subscript Γ 𝑠 subscript 𝑋 𝑠 0
𝑠 𝑡 \mathcal{H}_{t}=\mathcal{B}(\Gamma_{s},X_{s},0\leq s\leq t) caligraphic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = caligraphic_B ( roman_Γ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT , 0 ≤ italic_s ≤ italic_t ) .
Let 𝒬 𝒬 \mathcal{Q} caligraphic_Q define an adaptation plan which denotes the map t ↦ 𝒬 t maps-to 𝑡 subscript 𝒬 𝑡 t\mapsto\mathcal{Q}_{t} italic_t ↦ caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT for all t ∈ ℤ + 𝑡 subscript ℤ t\in\mathbb{Z}_{+} italic_t ∈ blackboard_Z start_POSTSUBSCRIPT + end_POSTSUBSCRIPT where 𝒬 t : ( 𝒴 × 𝒳 ) t × ℬ ( 𝒴 ) → [ 0 , 1 ] : subscript 𝒬 𝑡 → superscript 𝒴 𝒳 𝑡 ℬ 𝒴 0 1 \mathcal{Q}_{t}:(\mathcal{Y}\times\mathcal{X})^{t}\times\mathcal{B}(\mathcal{Y%
})\to[0,1] caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT : ( caligraphic_Y × caligraphic_X ) start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT × caligraphic_B ( caligraphic_Y ) → [ 0 , 1 ] is a Borel probability kernel.
The kernels 𝒬 t subscript 𝒬 𝑡 \mathcal{Q}_{t} caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT act on Borel functions g : 𝒴 → ℝ : 𝑔 → 𝒴 ℝ g:\mathcal{Y}\to\mathbb{R} italic_g : caligraphic_Y → blackboard_R and Borel measures ν 𝜈 \nu italic_ν on ( 𝒴 × 𝒳 ) t superscript 𝒴 𝒳 𝑡 (\mathcal{Y}\times\mathcal{X})^{t} ( caligraphic_Y × caligraphic_X ) start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT with
( 𝒬 t g ) ( γ 0 , x 0 , … , γ t − 1 , x t − 1 ) = ∫ 𝒳 g ( γ t ) 𝒬 t ( γ 0 , x 0 , … , γ t − 1 , x t − 1 , d γ t ) subscript 𝒬 𝑡 𝑔 subscript 𝛾 0 subscript 𝑥 0 … subscript 𝛾 𝑡 1 subscript 𝑥 𝑡 1 subscript 𝒳 𝑔 subscript 𝛾 𝑡 subscript 𝒬 𝑡 subscript 𝛾 0 subscript 𝑥 0 … subscript 𝛾 𝑡 1 subscript 𝑥 𝑡 1 𝑑 subscript 𝛾 𝑡 \displaystyle(\mathcal{Q}_{t}g)(\gamma_{0},x_{0},\ldots,\gamma_{t-1},x_{t-1})=%
\int_{\mathcal{X}}g(\gamma_{t})\mathcal{Q}_{t}(\gamma_{0},x_{0},\ldots,\gamma_%
{t-1},x_{t-1},d\gamma_{t}) ( caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_g ) ( italic_γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … , italic_γ start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) = ∫ start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT italic_g ( italic_γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … , italic_γ start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT , italic_d italic_γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT )
( ν 𝒬 t ) ( ⋅ ) = ∫ 𝒳 𝒬 t ( γ 0 , x 0 , … , γ t − 1 , x t − 1 , ⋅ ) ν ( d γ 0 , d x 0 , … , d γ t − 1 , d x t − 1 ) 𝜈 subscript 𝒬 𝑡 ⋅ subscript 𝒳 subscript 𝒬 𝑡 subscript 𝛾 0 subscript 𝑥 0 … subscript 𝛾 𝑡 1 subscript 𝑥 𝑡 1 ⋅ 𝜈 𝑑 subscript 𝛾 0 𝑑 subscript 𝑥 0 … 𝑑 subscript 𝛾 𝑡 1 𝑑 subscript 𝑥 𝑡 1 \displaystyle(\nu\mathcal{Q}_{t})(\cdot)=\int_{\mathcal{X}}\mathcal{Q}_{t}(%
\gamma_{0},x_{0},\ldots,\gamma_{t-1},x_{t-1},\cdot)\nu(d\gamma_{0},dx_{0},%
\ldots,d\gamma_{t-1},dx_{t-1}) ( italic_ν caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ( ⋅ ) = ∫ start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … , italic_γ start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT , ⋅ ) italic_ν ( italic_d italic_γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_d italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … , italic_d italic_γ start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT , italic_d italic_x start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT )
for all t ∈ ℤ + 𝑡 subscript ℤ t\in\mathbb{Z}_{+} italic_t ∈ blackboard_Z start_POSTSUBSCRIPT + end_POSTSUBSCRIPT and γ 0 , x 0 , … , γ t − 1 , x t − 1 ∈ ( 𝒴 × 𝒳 ) t subscript 𝛾 0 subscript 𝑥 0 … subscript 𝛾 𝑡 1 subscript 𝑥 𝑡 1
superscript 𝒴 𝒳 𝑡 \gamma_{0},x_{0},\ldots,\gamma_{t-1},x_{t-1}\in(\mathcal{Y}\times\mathcal{X})^%
{t} italic_γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … , italic_γ start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ∈ ( caligraphic_Y × caligraphic_X ) start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT .
Initialized at fixed x 0 , γ 0 ∈ 𝒳 × 𝒴 subscript 𝑥 0 subscript 𝛾 0
𝒳 𝒴 x_{0},\gamma_{0}\in\mathcal{X}\times\mathcal{Y} italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∈ caligraphic_X × caligraphic_Y , the discrete-time adaptive process first updates the tuning parameter
Γ t | ( Γ s , X s ) 0 ≤ s ≤ t − 1 ∼ 𝒬 t ( ( Γ s , X s ) 0 ≤ s ≤ t − 1 , ⋅ ) similar-to conditional subscript Γ 𝑡 subscript subscript Γ 𝑠 subscript 𝑋 𝑠 0 𝑠 𝑡 1 subscript 𝒬 𝑡 subscript subscript Γ 𝑠 subscript 𝑋 𝑠 0 𝑠 𝑡 1 ⋅ \Gamma_{t}|(\Gamma_{s},X_{s})_{0\leq s\leq t-1}\sim\mathcal{Q}_{t}((\Gamma_{s}%
,X_{s})_{0\leq s\leq t-1},\cdot) roman_Γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | ( roman_Γ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT 0 ≤ italic_s ≤ italic_t - 1 end_POSTSUBSCRIPT ∼ caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( ( roman_Γ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT 0 ≤ italic_s ≤ italic_t - 1 end_POSTSUBSCRIPT , ⋅ )
using an adaptation plan.
Let ( 𝒫 γ ) γ ∈ 𝒴 subscript subscript 𝒫 𝛾 𝛾 𝒴 (\mathcal{P}_{\gamma})_{\gamma\in\mathcal{Y}} ( caligraphic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_γ ∈ caligraphic_Y end_POSTSUBSCRIPT be a family of Borel Markov kernels where 𝒫 γ : 𝒳 × ℬ ( 𝒳 ) → [ 0 , 1 ] : subscript 𝒫 𝛾 → 𝒳 ℬ 𝒳 0 1 \mathcal{P}_{\gamma}:\mathcal{X}\times\mathcal{B}(\mathcal{X})\to[0,1] caligraphic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT : caligraphic_X × caligraphic_B ( caligraphic_X ) → [ 0 , 1 ] for each γ ∈ 𝒴 𝛾 𝒴 \gamma\in\mathcal{Y} italic_γ ∈ caligraphic_Y and for each x ∈ 𝒳 𝑥 𝒳 x\in\mathcal{X} italic_x ∈ caligraphic_X , γ ↦ 𝒫 γ ( x , ⋅ ) maps-to 𝛾 subscript 𝒫 𝛾 𝑥 ⋅ \gamma\mapsto\mathcal{P}_{\gamma}(x,\cdot) italic_γ ↦ caligraphic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_x , ⋅ ) is Borel measurable.
The Markov family acts on Borel functions f : 𝒳 → ℝ : 𝑓 → 𝒳 ℝ f:\mathcal{X}\to\mathbb{R} italic_f : caligraphic_X → blackboard_R and Borel measures μ 𝜇 \mu italic_μ on 𝒳 𝒳 \mathcal{X} caligraphic_X with
( 𝒫 γ f ) ( x ) = ∫ 𝒳 f ( y ) 𝒫 γ ( x , d y ) subscript 𝒫 𝛾 𝑓 𝑥 subscript 𝒳 𝑓 𝑦 subscript 𝒫 𝛾 𝑥 𝑑 𝑦 \displaystyle(\mathcal{P}_{\gamma}f)(x)=\int_{\mathcal{X}}f(y)\mathcal{P}_{%
\gamma}(x,dy) ( caligraphic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT italic_f ) ( italic_x ) = ∫ start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT italic_f ( italic_y ) caligraphic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_x , italic_d italic_y )
( μ 𝒫 γ ) ( ⋅ ) = ∫ 𝒳 𝒫 γ ( x , ⋅ ) μ ( d x ) 𝜇 subscript 𝒫 𝛾 ⋅ subscript 𝒳 subscript 𝒫 𝛾 𝑥 ⋅ 𝜇 𝑑 𝑥 \displaystyle(\mu\mathcal{P}_{\gamma})(\cdot)=\int_{\mathcal{X}}\mathcal{P}_{%
\gamma}(x,\cdot)\mu(dx) ( italic_μ caligraphic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ) ( ⋅ ) = ∫ start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT caligraphic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_x , ⋅ ) italic_μ ( italic_d italic_x )
for all x , γ ∈ 𝒳 × 𝒴 𝑥 𝛾
𝒳 𝒴 x,\gamma\in\mathcal{X}\times\mathcal{Y} italic_x , italic_γ ∈ caligraphic_X × caligraphic_Y .
The process then updates the state space given the updated tuning parameters
X t | Γ t , X t − 1 ∼ 𝒫 Γ t ( X t − 1 , ⋅ ) similar-to conditional subscript 𝑋 𝑡 subscript Γ 𝑡 subscript 𝑋 𝑡 1
subscript 𝒫 subscript Γ 𝑡 subscript 𝑋 𝑡 1 ⋅ X_{t}|\Gamma_{t},X_{t-1}\sim\mathcal{P}_{\Gamma_{t}}(X_{t-1},\cdot) italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | roman_Γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ∼ caligraphic_P start_POSTSUBSCRIPT roman_Γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT , ⋅ )
using the Markov kernel.
Let 𝒮 ( 𝒳 , 𝒴 ) 𝒮 𝒳 𝒴 \mathcal{S}(\mathcal{X},\mathcal{Y}) caligraphic_S ( caligraphic_X , caligraphic_Y ) denote the set of all possible adaptation plans 𝒬 𝒬 \mathcal{Q} caligraphic_Q that define the Borel kernels 𝒬 t subscript 𝒬 𝑡 \mathcal{Q}_{t} caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT updating the tuning parameters at every iteration time t 𝑡 t italic_t .
For a chosen adaptive strategy 𝒬 ∈ 𝒮 ( 𝒳 , 𝒴 ) 𝒬 𝒮 𝒳 𝒴 \mathcal{Q}\in\mathcal{S}(\mathcal{X},\mathcal{Y}) caligraphic_Q ∈ caligraphic_S ( caligraphic_X , caligraphic_Y ) , we denote the marginal of the adaptive process at iteration time t 𝑡 t italic_t by X t ∼ 𝒜 𝒬 ( t ) ( ( γ 0 , x 0 ) , ⋅ ) similar-to subscript 𝑋 𝑡 superscript subscript 𝒜 𝒬 𝑡 subscript 𝛾 0 subscript 𝑥 0 ⋅ X_{t}\sim\mathcal{A}_{\mathcal{Q}}^{(t)}((\gamma_{0},x_{0}),\cdot) italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∼ caligraphic_A start_POSTSUBSCRIPT caligraphic_Q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ( ( italic_γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) , ⋅ ) .
We will develop conditions to lower bound the the total variation over all feasible adaptation strategies, that is, to lower bound
inf 𝒬 ∈ 𝒮 ( 𝒳 , 𝒴 ) ∥ 𝒜 𝒬 ( t ) ( γ 0 , x 0 , ⋅ ) − π ∥ TV subscript infimum 𝒬 𝒮 𝒳 𝒴 subscript delimited-∥∥ superscript subscript 𝒜 𝒬 𝑡 subscript 𝛾 0 subscript 𝑥 0 ⋅ 𝜋 TV \inf_{\mathcal{Q}\in\mathcal{S}(\mathcal{X},\mathcal{Y})}\left\lVert\mathcal{A%
}_{\mathcal{Q}}^{(t)}(\gamma_{0},x_{0},\cdot)-\pi\right\rVert_{\text{TV}} roman_inf start_POSTSUBSCRIPT caligraphic_Q ∈ caligraphic_S ( caligraphic_X , caligraphic_Y ) end_POSTSUBSCRIPT ∥ caligraphic_A start_POSTSUBSCRIPT caligraphic_Q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ( italic_γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , ⋅ ) - italic_π ∥ start_POSTSUBSCRIPT TV end_POSTSUBSCRIPT
for t ∈ ℤ + 𝑡 subscript ℤ t\in\mathbb{Z}_{+} italic_t ∈ blackboard_Z start_POSTSUBSCRIPT + end_POSTSUBSCRIPT .
The main tool will be a function prescribing a subgeometric rate defined implicitly as an inverse which we now define.
For concave functions φ : ( 0 , ∞ ) → ( 0 , ∞ ) : 𝜑 → 0 0 \varphi:(0,\infty)\to(0,\infty) italic_φ : ( 0 , ∞ ) → ( 0 , ∞ ) and w 0 ∈ [ 1 , ∞ ) subscript 𝑤 0 1 w_{0}\in[1,\infty) italic_w start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∈ [ 1 , ∞ ) , define
H w 0 , φ ( w ) = ∫ w 0 w d v φ ( v ) subscript 𝐻 subscript 𝑤 0 𝜑
𝑤 superscript subscript subscript 𝑤 0 𝑤 𝑑 𝑣 𝜑 𝑣 \displaystyle H_{w_{0},\varphi}(w)=\int_{w_{0}}^{w}\frac{dv}{\varphi(v)} italic_H start_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_φ end_POSTSUBSCRIPT ( italic_w ) = ∫ start_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT divide start_ARG italic_d italic_v end_ARG start_ARG italic_φ ( italic_v ) end_ARG
(1)
for all w ≥ w 0 𝑤 subscript 𝑤 0 w\geq w_{0} italic_w ≥ italic_w start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT .
The assumptions on φ 𝜑 \varphi italic_φ imply it is non-decreasing and H w 0 , φ ( ⋅ ) subscript 𝐻 subscript 𝑤 0 𝜑
⋅ H_{w_{0},\varphi}(\cdot) italic_H start_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_φ end_POSTSUBSCRIPT ( ⋅ ) is strictly increasing as well as the inverse H w 0 , φ − 1 ( ⋅ ) subscript superscript 𝐻 1 subscript 𝑤 0 𝜑
⋅ H^{-1}_{w_{0},\varphi}(\cdot) italic_H start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_φ end_POSTSUBSCRIPT ( ⋅ ) exists.
Depending on the form of φ 𝜑 \varphi italic_φ , the inverse function H w 0 , φ − 1 ( ⋅ ) subscript superscript 𝐻 1 subscript 𝑤 0 𝜑
⋅ H^{-1}_{w_{0},\varphi}(\cdot) italic_H start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_φ end_POSTSUBSCRIPT ( ⋅ ) defines a polynomial, subgeometric, or geometric function increasing to infinity.
The first lower bound in total variation uses a technique extended from [Hairer, 2009 , Corollary 3.7] to adaptive MCMC over all adaptive strategies.
Theorem 1 .
Assume there is a Borel function W : 𝒳 → [ 1 , ∞ ) : 𝑊 → 𝒳 1 W:\mathcal{X}\to[1,\infty) italic_W : caligraphic_X → [ 1 , ∞ ) and constants C , κ > 0 𝐶 𝜅
0 C,\kappa>0 italic_C , italic_κ > 0 where
π ( W ≥ r ) ≥ C r − κ 𝜋 𝑊 𝑟 𝐶 superscript 𝑟 𝜅 \displaystyle\pi(W\geq r)\geq Cr^{-\kappa} italic_π ( italic_W ≥ italic_r ) ≥ italic_C italic_r start_POSTSUPERSCRIPT - italic_κ end_POSTSUPERSCRIPT
(2)
holds for all r > 0 𝑟 0 r>0 italic_r > 0 and there is an α > κ 𝛼 𝜅 \alpha>\kappa italic_α > italic_κ and a concave function φ : ( 0 , ∞ ) → ( 0 , ∞ ) : 𝜑 → 0 0 \varphi:(0,\infty)\to(0,\infty) italic_φ : ( 0 , ∞ ) → ( 0 , ∞ ) such that
( 𝒫 γ W α ) ( x ) − W ( x ) α ≤ φ ( W ( x ) α ) subscript 𝒫 𝛾 superscript 𝑊 𝛼 𝑥 𝑊 superscript 𝑥 𝛼 𝜑 𝑊 superscript 𝑥 𝛼 \displaystyle(\mathcal{P}_{\gamma}W^{\alpha})(x)-W(x)^{\alpha}\leq\varphi(W(x)%
^{\alpha}) ( caligraphic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT italic_W start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ) ( italic_x ) - italic_W ( italic_x ) start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ≤ italic_φ ( italic_W ( italic_x ) start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT )
(3)
holds for all x , γ ∈ 𝒳 × 𝒴 𝑥 𝛾
𝒳 𝒴 x,\gamma\in\mathcal{X}\times\mathcal{Y} italic_x , italic_γ ∈ caligraphic_X × caligraphic_Y .
Then for all t ∈ ℤ + 𝑡 subscript ℤ t\in\mathbb{Z}_{+} italic_t ∈ blackboard_Z start_POSTSUBSCRIPT + end_POSTSUBSCRIPT ,
inf 𝒬 ∈ 𝒮 ( 𝒳 , 𝒴 ) ∥ 𝒜 𝒬 ( t ) ( γ 0 , x 0 , ⋅ ) − π ∥ TV ≥ M ( H W ( x 0 ) α , φ − 1 ( t ) ) κ α − κ subscript infimum 𝒬 𝒮 𝒳 𝒴 subscript delimited-∥∥ superscript subscript 𝒜 𝒬 𝑡 subscript 𝛾 0 subscript 𝑥 0 ⋅ 𝜋 TV 𝑀 superscript superscript subscript 𝐻 𝑊 superscript subscript 𝑥 0 𝛼 𝜑
1 𝑡 𝜅 𝛼 𝜅 \displaystyle\inf_{\mathcal{Q}\in\mathcal{S}(\mathcal{X},\mathcal{Y})}\left%
\lVert\mathcal{A}_{\mathcal{Q}}^{(t)}(\gamma_{0},x_{0},\cdot)-\pi\right\rVert_%
{\text{TV}}\geq\frac{M}{\left(H_{W(x_{0})^{\alpha},\varphi}^{-1}(t)\right)^{%
\frac{\kappa}{\alpha-\kappa}}} roman_inf start_POSTSUBSCRIPT caligraphic_Q ∈ caligraphic_S ( caligraphic_X , caligraphic_Y ) end_POSTSUBSCRIPT ∥ caligraphic_A start_POSTSUBSCRIPT caligraphic_Q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ( italic_γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , ⋅ ) - italic_π ∥ start_POSTSUBSCRIPT TV end_POSTSUBSCRIPT ≥ divide start_ARG italic_M end_ARG start_ARG ( italic_H start_POSTSUBSCRIPT italic_W ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT , italic_φ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_t ) ) start_POSTSUPERSCRIPT divide start_ARG italic_κ end_ARG start_ARG italic_α - italic_κ end_ARG end_POSTSUPERSCRIPT end_ARG
where
M = C α α − κ [ ( κ / α ) κ α − κ − ( κ / α ) α α − κ ] . 𝑀 superscript 𝐶 𝛼 𝛼 𝜅 delimited-[] superscript 𝜅 𝛼 𝜅 𝛼 𝜅 superscript 𝜅 𝛼 𝛼 𝛼 𝜅 \displaystyle M=C^{\frac{\alpha}{\alpha-\kappa}}\left[(\kappa/\alpha)^{\frac{%
\kappa}{\alpha-\kappa}}-(\kappa/\alpha)^{\frac{\alpha}{\alpha-\kappa}}\right]. italic_M = italic_C start_POSTSUPERSCRIPT divide start_ARG italic_α end_ARG start_ARG italic_α - italic_κ end_ARG end_POSTSUPERSCRIPT [ ( italic_κ / italic_α ) start_POSTSUPERSCRIPT divide start_ARG italic_κ end_ARG start_ARG italic_α - italic_κ end_ARG end_POSTSUPERSCRIPT - ( italic_κ / italic_α ) start_POSTSUPERSCRIPT divide start_ARG italic_α end_ARG start_ARG italic_α - italic_κ end_ARG end_POSTSUPERSCRIPT ] .
(4)
Proof.
Let V ( x ) = W α ( x ) 𝑉 𝑥 superscript 𝑊 𝛼 𝑥 V(x)=W^{\alpha}(x) italic_V ( italic_x ) = italic_W start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ( italic_x ) , and let t ∈ ℤ + 𝑡 subscript ℤ t\in\mathbb{Z}_{+} italic_t ∈ blackboard_Z start_POSTSUBSCRIPT + end_POSTSUBSCRIPT , so then we have
𝔼 ( V ( X t + 1 ) | ℋ t ) − V ( X t ) ≤ φ ( X t ) . 𝔼 conditional 𝑉 subscript 𝑋 𝑡 1 subscript ℋ 𝑡 𝑉 subscript 𝑋 𝑡 𝜑 subscript 𝑋 𝑡 \mathbb{E}\left(V(X_{t+1})|\mathcal{H}_{t}\right)-V(X_{t})\leq\varphi(X_{t}). blackboard_E ( italic_V ( italic_X start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ) | caligraphic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - italic_V ( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ≤ italic_φ ( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) .
Since 𝔼 [ V ( X 1 ) ] − V ( x 0 ) ≤ φ ( V ( x 0 ) ) 𝔼 delimited-[] 𝑉 subscript 𝑋 1 𝑉 subscript 𝑥 0 𝜑 𝑉 subscript 𝑥 0 \mathbb{E}\left[V(X_{1})\right]-V(x_{0})\leq\varphi(V(x_{0})) blackboard_E [ italic_V ( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ] - italic_V ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ≤ italic_φ ( italic_V ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ) , then assume by induction for all k ≤ t 𝑘 𝑡 k\leq t italic_k ≤ italic_t ,
𝔼 [ V ( X k + 1 ) ] − 𝔼 [ V ( X k ) ] ≤ φ ( 𝔼 [ V ( X k ] ) \mathbb{E}\left[V(X_{k+1})\right]-\mathbb{E}[V(X_{k})]\leq\varphi(\mathbb{E}[V%
(X_{k}]) blackboard_E [ italic_V ( italic_X start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) ] - blackboard_E [ italic_V ( italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ] ≤ italic_φ ( blackboard_E [ italic_V ( italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ] )
and 𝔼 [ V ( X k ) ] < ∞ 𝔼 delimited-[] 𝑉 subscript 𝑋 𝑘 \mathbb{E}[V(X_{k})]<\infty blackboard_E [ italic_V ( italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ] < ∞ .
By the induction hypothesis and Jensen’s inequality,
𝔼 [ V ( X t + 1 ) ] − 𝔼 [ V ( X t ) ] 𝔼 delimited-[] 𝑉 subscript 𝑋 𝑡 1 𝔼 delimited-[] 𝑉 subscript 𝑋 𝑡 \displaystyle\mathbb{E}\left[V(X_{t+1})\right]-\mathbb{E}\left[V(X_{t})\right] blackboard_E [ italic_V ( italic_X start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ) ] - blackboard_E [ italic_V ( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ]
= 𝔼 [ 𝔼 ( V ( X t + 1 ) | ℋ t ) − V ( X t ) ] absent 𝔼 delimited-[] 𝔼 conditional 𝑉 subscript 𝑋 𝑡 1 subscript ℋ 𝑡 𝑉 subscript 𝑋 𝑡 \displaystyle=\mathbb{E}\left[\mathbb{E}\left(V(X_{t+1})|\mathcal{H}_{t}\right%
)-V(X_{t})\right] = blackboard_E [ blackboard_E ( italic_V ( italic_X start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ) | caligraphic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - italic_V ( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ]
≤ 𝔼 [ φ [ V ( X t ) ] ] absent 𝔼 delimited-[] 𝜑 delimited-[] 𝑉 subscript 𝑋 𝑡 \displaystyle\leq\mathbb{E}\left[\varphi[V(X_{t})]\right] ≤ blackboard_E [ italic_φ [ italic_V ( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ] ]
≤ φ [ 𝔼 ( V ( X t ) ] . \displaystyle\leq\varphi[\mathbb{E}\left(V(X_{t}\right)]. ≤ italic_φ [ blackboard_E ( italic_V ( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ] .
(5)
The inverse function theorem implies the derivative
d d s H V ( x 0 ) , φ − 1 ( s ) = φ ( H V ( x 0 ) , φ − 1 ( s ) ) . 𝑑 𝑑 𝑠 subscript superscript 𝐻 1 𝑉 subscript 𝑥 0 𝜑
𝑠 𝜑 subscript superscript 𝐻 1 𝑉 subscript 𝑥 0 𝜑
𝑠 \frac{d}{ds}H^{-1}_{V(x_{0}),\varphi}(s)=\varphi(H^{-1}_{V(x_{0}),\varphi}(s)). divide start_ARG italic_d end_ARG start_ARG italic_d italic_s end_ARG italic_H start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_V ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) , italic_φ end_POSTSUBSCRIPT ( italic_s ) = italic_φ ( italic_H start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_V ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) , italic_φ end_POSTSUBSCRIPT ( italic_s ) ) .
Since H V ( x 0 ) , φ − 1 ( 0 ) ≥ V ( x 0 ) subscript superscript 𝐻 1 𝑉 subscript 𝑥 0 𝜑
0 𝑉 subscript 𝑥 0 H^{-1}_{V(x_{0}),\varphi}(0)\geq V(x_{0}) italic_H start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_V ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) , italic_φ end_POSTSUBSCRIPT ( 0 ) ≥ italic_V ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) , assume by induction H V ( x 0 ) , φ − 1 ( k ) ≥ 𝔼 [ V ( X k ) ] subscript superscript 𝐻 1 𝑉 subscript 𝑥 0 𝜑
𝑘 𝔼 delimited-[] 𝑉 subscript 𝑋 𝑘 H^{-1}_{V(x_{0}),\varphi}(k)\geq\mathbb{E}[V(X_{k})] italic_H start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_V ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) , italic_φ end_POSTSUBSCRIPT ( italic_k ) ≥ blackboard_E [ italic_V ( italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ] for all k ≤ t 𝑘 𝑡 k\leq t italic_k ≤ italic_t .
Since φ 𝜑 \varphi italic_φ is non-decreasing, the fundamental theorem of calculus, and (5 ),
H V ( x 0 ) , φ − 1 ( t + 1 ) = H V ( x 0 ) , φ − 1 ( t ) + ∫ t t + 1 φ ( H V ( x 0 ) , φ − 1 ( s ) ) 𝑑 s subscript superscript 𝐻 1 𝑉 subscript 𝑥 0 𝜑
𝑡 1 subscript superscript 𝐻 1 𝑉 subscript 𝑥 0 𝜑
𝑡 superscript subscript 𝑡 𝑡 1 𝜑 subscript superscript 𝐻 1 𝑉 subscript 𝑥 0 𝜑
𝑠 differential-d 𝑠 \displaystyle H^{-1}_{V(x_{0}),\varphi}(t+1)=H^{-1}_{V(x_{0}),\varphi}(t)+\int%
_{t}^{t+1}\varphi(H^{-1}_{V(x_{0}),\varphi}(s))ds italic_H start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_V ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) , italic_φ end_POSTSUBSCRIPT ( italic_t + 1 ) = italic_H start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_V ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) , italic_φ end_POSTSUBSCRIPT ( italic_t ) + ∫ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT italic_φ ( italic_H start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_V ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) , italic_φ end_POSTSUBSCRIPT ( italic_s ) ) italic_d italic_s
≥ H V ( x 0 ) , φ − 1 ( t ) + φ ( H V ( x 0 ) , φ − 1 ( t ) ) absent subscript superscript 𝐻 1 𝑉 subscript 𝑥 0 𝜑
𝑡 𝜑 subscript superscript 𝐻 1 𝑉 subscript 𝑥 0 𝜑
𝑡 \displaystyle\geq H^{-1}_{V(x_{0}),\varphi}(t)+\varphi(H^{-1}_{V(x_{0}),%
\varphi}(t)) ≥ italic_H start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_V ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) , italic_φ end_POSTSUBSCRIPT ( italic_t ) + italic_φ ( italic_H start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_V ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) , italic_φ end_POSTSUBSCRIPT ( italic_t ) )
≥ 𝔼 [ V ( X t ) ] + φ ( 𝔼 [ V ( X t ) ] ) absent 𝔼 delimited-[] 𝑉 subscript 𝑋 𝑡 𝜑 𝔼 delimited-[] 𝑉 subscript 𝑋 𝑡 \displaystyle\geq\mathbb{E}\left[V(X_{t})\right]+\varphi(\mathbb{E}\left[V(X_{%
t})\right]) ≥ blackboard_E [ italic_V ( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ] + italic_φ ( blackboard_E [ italic_V ( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ] )
≥ 𝔼 [ V ( X t + 1 ) ] . absent 𝔼 delimited-[] 𝑉 subscript 𝑋 𝑡 1 \displaystyle\geq\mathbb{E}\left[V(X_{t+1})\right]. ≥ blackboard_E [ italic_V ( italic_X start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ) ] .
By Markov’s inequality,
ℙ ( W ( X t ) ≥ r ) ≤ 𝔼 [ W ( X t ) α ] r α ≤ H W ( x 0 ) α , φ − 1 ( t ) r α . ℙ 𝑊 subscript 𝑋 𝑡 𝑟 𝔼 delimited-[] 𝑊 superscript subscript 𝑋 𝑡 𝛼 superscript 𝑟 𝛼 superscript subscript 𝐻 𝑊 superscript subscript 𝑥 0 𝛼 𝜑
1 𝑡 superscript 𝑟 𝛼 \mathbb{P}(W(X_{t})\geq r)\leq\frac{\mathbb{E}\left[W(X_{t})^{\alpha}\right]}{%
r^{\alpha}}\leq\frac{H_{W(x_{0})^{\alpha},\varphi}^{-1}(t)}{r^{\alpha}}. blackboard_P ( italic_W ( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ≥ italic_r ) ≤ divide start_ARG blackboard_E [ italic_W ( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ] end_ARG start_ARG italic_r start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT end_ARG ≤ divide start_ARG italic_H start_POSTSUBSCRIPT italic_W ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT , italic_φ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_t ) end_ARG start_ARG italic_r start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT end_ARG .
Optimizing r 𝑟 r italic_r gives the lower bound
∥ 𝒜 ( t ) ( γ 0 , x 0 , ⋅ ) − π ∥ TV subscript delimited-∥∥ superscript 𝒜 𝑡 subscript 𝛾 0 subscript 𝑥 0 ⋅ 𝜋 TV \displaystyle\left\lVert\mathcal{A}^{(t)}(\gamma_{0},x_{0},\cdot)-\pi\right%
\rVert_{\text{TV}} ∥ caligraphic_A start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ( italic_γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , ⋅ ) - italic_π ∥ start_POSTSUBSCRIPT TV end_POSTSUBSCRIPT
≥ π ( W ≥ r ) − ℙ ( W ( X t ) ≥ r ) ≥ C r κ − H W ( x 0 ) α , φ − 1 ( t ) r α absent 𝜋 𝑊 𝑟 ℙ 𝑊 subscript 𝑋 𝑡 𝑟 𝐶 superscript 𝑟 𝜅 superscript subscript 𝐻 𝑊 superscript subscript 𝑥 0 𝛼 𝜑
1 𝑡 superscript 𝑟 𝛼 \displaystyle\geq\pi(W\geq r)-\mathbb{P}(W(X_{t})\geq r)\geq\frac{C}{r^{\kappa%
}}-\frac{H_{W(x_{0})^{\alpha},\varphi}^{-1}(t)}{r^{\alpha}} ≥ italic_π ( italic_W ≥ italic_r ) - blackboard_P ( italic_W ( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ≥ italic_r ) ≥ divide start_ARG italic_C end_ARG start_ARG italic_r start_POSTSUPERSCRIPT italic_κ end_POSTSUPERSCRIPT end_ARG - divide start_ARG italic_H start_POSTSUBSCRIPT italic_W ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT , italic_φ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_t ) end_ARG start_ARG italic_r start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT end_ARG
≥ M ( H W ( x 0 ) α , φ − 1 ( t ) ) κ α − κ . absent 𝑀 superscript superscript subscript 𝐻 𝑊 superscript subscript 𝑥 0 𝛼 𝜑
1 𝑡 𝜅 𝛼 𝜅 \displaystyle\geq\frac{M}{\left(H_{W(x_{0})^{\alpha},\varphi}^{-1}(t)\right)^{%
\frac{\kappa}{\alpha-\kappa}}}. ≥ divide start_ARG italic_M end_ARG start_ARG ( italic_H start_POSTSUBSCRIPT italic_W ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT , italic_φ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_t ) ) start_POSTSUPERSCRIPT divide start_ARG italic_κ end_ARG start_ARG italic_α - italic_κ end_ARG end_POSTSUPERSCRIPT end_ARG .
∎
Assumption (3 ) of Theorem 1 requires the Markov family ( 𝒫 γ ) γ ∈ 𝒴 subscript subscript 𝒫 𝛾 𝛾 𝒴 (\mathcal{P}_{\gamma})_{\gamma\in\mathcal{Y}} ( caligraphic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_γ ∈ caligraphic_Y end_POSTSUBSCRIPT to satisfy a simultaneous growth condition for some concave function φ 𝜑 \varphi italic_φ .
We look at some concrete examples of concave functions that lead to common subgeometric convergence rates that have been explored previously for upper bounds [Douc et al., 2004 ] .
Example 2 .
(Polynomial lower bounds)
Assume (2 ) holds with constants C > 0 𝐶 0 C>0 italic_C > 0 and κ = 1 𝜅 1 \kappa=1 italic_κ = 1 and additionally, (3 ) holds with function W ( ⋅ ) 𝑊 ⋅ W(\cdot) italic_W ( ⋅ ) , α = 2 𝛼 2 \alpha=2 italic_α = 2 , and φ ( w ) = c w β 𝜑 𝑤 𝑐 superscript 𝑤 𝛽 \varphi(w)=cw^{\beta} italic_φ ( italic_w ) = italic_c italic_w start_POSTSUPERSCRIPT italic_β end_POSTSUPERSCRIPT for some constants c > 0 𝑐 0 c>0 italic_c > 0 and β ∈ ( 0 , 1 ) 𝛽 0 1 \beta\in(0,1) italic_β ∈ ( 0 , 1 ) . Then a straight forward calculation gives
H W ( x 0 ) 2 , φ − 1 ( t ) = ( ( 1 − β ) c t + W ( x 0 ) 2 ( 1 − β ) ) 1 1 − β subscript superscript 𝐻 1 𝑊 superscript subscript 𝑥 0 2 𝜑
𝑡 superscript 1 𝛽 𝑐 𝑡 𝑊 superscript subscript 𝑥 0 2 1 𝛽 1 1 𝛽 H^{-1}_{W(x_{0})^{2},\varphi}(t)=\left((1-\beta)ct+W(x_{0})^{2(1-\beta)}\right%
)^{\frac{1}{1-\beta}} italic_H start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_W ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , italic_φ end_POSTSUBSCRIPT ( italic_t ) = ( ( 1 - italic_β ) italic_c italic_t + italic_W ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 ( 1 - italic_β ) end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 1 - italic_β end_ARG end_POSTSUPERSCRIPT
and Theorem 1 implies for all t ∈ ℤ + 𝑡 subscript ℤ t\in\mathbb{Z}_{+} italic_t ∈ blackboard_Z start_POSTSUBSCRIPT + end_POSTSUBSCRIPT ,
inf 𝒬 ∈ 𝒮 ( 𝒳 , 𝒴 ) ∥ 𝒜 𝒬 ( t ) ( γ 0 , x 0 , ⋅ ) − π ∥ TV ≥ C 2 4 ( ( 1 − β ) c t + W ( x 0 ) 2 ( 1 − β ) ) 1 1 − β . subscript infimum 𝒬 𝒮 𝒳 𝒴 subscript delimited-∥∥ superscript subscript 𝒜 𝒬 𝑡 subscript 𝛾 0 subscript 𝑥 0 ⋅ 𝜋 TV superscript 𝐶 2 4 superscript 1 𝛽 𝑐 𝑡 𝑊 superscript subscript 𝑥 0 2 1 𝛽 1 1 𝛽 \displaystyle\inf_{\mathcal{Q}\in\mathcal{S}(\mathcal{X},\mathcal{Y})}\left%
\lVert\mathcal{A}_{\mathcal{Q}}^{(t)}(\gamma_{0},x_{0},\cdot)-\pi\right\rVert_%
{\text{TV}}\geq\frac{C^{2}}{4\left((1-\beta)ct+W(x_{0})^{2(1-\beta)}\right)^{%
\frac{1}{1-\beta}}}. roman_inf start_POSTSUBSCRIPT caligraphic_Q ∈ caligraphic_S ( caligraphic_X , caligraphic_Y ) end_POSTSUBSCRIPT ∥ caligraphic_A start_POSTSUBSCRIPT caligraphic_Q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ( italic_γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , ⋅ ) - italic_π ∥ start_POSTSUBSCRIPT TV end_POSTSUBSCRIPT ≥ divide start_ARG italic_C start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 4 ( ( 1 - italic_β ) italic_c italic_t + italic_W ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 ( 1 - italic_β ) end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 1 - italic_β end_ARG end_POSTSUPERSCRIPT end_ARG .
Example 3 .
(Subgeometric lower bounds)
If (2 ) holds with constants C > 0 𝐶 0 C>0 italic_C > 0 and κ = 1 𝜅 1 \kappa=1 italic_κ = 1 and (3 ) holds with W ( ⋅ ) 𝑊 ⋅ W(\cdot) italic_W ( ⋅ ) , α = 2 𝛼 2 \alpha=2 italic_α = 2 , and φ ( x ) = c ( x + K β ) / log ( x + K β ) β \varphi(x)=c(x+K_{\beta})/\log(x+K_{\beta})^{\beta} italic_φ ( italic_x ) = italic_c ( italic_x + italic_K start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT ) / roman_log ( italic_x + italic_K start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_β end_POSTSUPERSCRIPT where K β = exp ( β + 1 ) subscript 𝐾 𝛽 𝛽 1 K_{\beta}=\exp(\beta+1) italic_K start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT = roman_exp ( italic_β + 1 ) , then
H W ( x 0 ) 2 , φ − 1 ( t ) ≤ ( W ( x 0 ) 2 + K β ) exp ( ( 1 + β ) c t 1 1 + β ) . subscript superscript 𝐻 1 𝑊 superscript subscript 𝑥 0 2 𝜑
𝑡 𝑊 superscript subscript 𝑥 0 2 subscript 𝐾 𝛽 1 𝛽 𝑐 superscript 𝑡 1 1 𝛽 H^{-1}_{W(x_{0})^{2},\varphi}(t)\leq(W(x_{0})^{2}+K_{\beta})\exp\left((1+\beta%
)ct^{\frac{1}{1+\beta}}\right). italic_H start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_W ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , italic_φ end_POSTSUBSCRIPT ( italic_t ) ≤ ( italic_W ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_K start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT ) roman_exp ( ( 1 + italic_β ) italic_c italic_t start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 1 + italic_β end_ARG end_POSTSUPERSCRIPT ) .
By Theorem 1 , then for all t ∈ ℤ + 𝑡 subscript ℤ t\in\mathbb{Z}_{+} italic_t ∈ blackboard_Z start_POSTSUBSCRIPT + end_POSTSUBSCRIPT
inf 𝒬 ∈ 𝒮 ( 𝒳 , 𝒴 ) ∥ 𝒜 𝒬 ( t ) ( γ 0 , x 0 , ⋅ ) − π ∥ TV ≥ C 2 4 ( W ( x 0 ) 2 + K β ) exp ( − ( 1 + β ) c t 1 1 + β ) . subscript infimum 𝒬 𝒮 𝒳 𝒴 subscript delimited-∥∥ superscript subscript 𝒜 𝒬 𝑡 subscript 𝛾 0 subscript 𝑥 0 ⋅ 𝜋 TV superscript 𝐶 2 4 𝑊 superscript subscript 𝑥 0 2 subscript 𝐾 𝛽 1 𝛽 𝑐 superscript 𝑡 1 1 𝛽 \displaystyle\inf_{\mathcal{Q}\in\mathcal{S}(\mathcal{X},\mathcal{Y})}\left%
\lVert\mathcal{A}_{\mathcal{Q}}^{(t)}(\gamma_{0},x_{0},\cdot)-\pi\right\rVert_%
{\text{TV}}\geq\frac{C^{2}}{4(W(x_{0})^{2}+K_{\beta})}\exp\left(-(1+\beta)ct^{%
\frac{1}{1+\beta}}\right). roman_inf start_POSTSUBSCRIPT caligraphic_Q ∈ caligraphic_S ( caligraphic_X , caligraphic_Y ) end_POSTSUBSCRIPT ∥ caligraphic_A start_POSTSUBSCRIPT caligraphic_Q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ( italic_γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , ⋅ ) - italic_π ∥ start_POSTSUBSCRIPT TV end_POSTSUBSCRIPT ≥ divide start_ARG italic_C start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 4 ( italic_W ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_K start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT ) end_ARG roman_exp ( - ( 1 + italic_β ) italic_c italic_t start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 1 + italic_β end_ARG end_POSTSUPERSCRIPT ) .
Now we obtain a matching weak lower bound rate under essentially the same conditions as total variation in Euclidean spaces.
Let ∥ ⋅ ∥ delimited-∥∥ ⋅ \left\lVert\cdot\right\rVert ∥ ⋅ ∥ denote the Euclidean norm.
Theorem 4 .
Let 𝒳 = ℝ d 𝒳 superscript ℝ 𝑑 \mathcal{X}=\mathbb{R}^{d} caligraphic_X = blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT for d ∈ ℤ + 𝑑 subscript ℤ d\in\mathbb{Z}_{+} italic_d ∈ blackboard_Z start_POSTSUBSCRIPT + end_POSTSUBSCRIPT .
Assume (2 ) holds with C , κ 𝐶 𝜅
C,\kappa italic_C , italic_κ and (3 ) holds with W ( ⋅ ) 𝑊 ⋅ W(\cdot) italic_W ( ⋅ ) , and α 𝛼 \alpha italic_α , and let M 𝑀 M italic_M be defined as in (4 ).
Assume for each r > 0 𝑟 0 r>0 italic_r > 0 , the sets { x ∈ ℝ d : W ( x ) ≤ r } conditional-set 𝑥 superscript ℝ 𝑑 𝑊 𝑥 𝑟 \{x\in\mathbb{R}^{d}:W(x)\leq r\} { italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT : italic_W ( italic_x ) ≤ italic_r } are compact.
Then for any ϵ ∈ ( 0 , 1 ) italic-ϵ 0 1 \epsilon\in(0,1) italic_ϵ ∈ ( 0 , 1 ) ,
inf 𝒬 ∈ 𝒮 ( 𝒳 , 𝒴 ) inf ξ ∈ 𝒞 [ 𝒜 𝒬 ( t ) ( γ 0 , x 0 , ⋅ ) , π ] ξ ( { x , y ∈ 𝒳 × 𝒳 : ∥ x − y ∥ > δ ϵ } ) ≥ ( 1 − ϵ ) M H W ( x 0 ) α , φ − 1 ( t ) κ α − κ subscript infimum 𝒬 𝒮 𝒳 𝒴 subscript infimum 𝜉 𝒞 superscript subscript 𝒜 𝒬 𝑡 subscript 𝛾 0 subscript 𝑥 0 ⋅ 𝜋 𝜉 conditional-set 𝑥 𝑦
𝒳 𝒳 delimited-∥∥ 𝑥 𝑦 subscript 𝛿 italic-ϵ 1 italic-ϵ 𝑀 superscript subscript 𝐻 𝑊 superscript subscript 𝑥 0 𝛼 𝜑
1 superscript 𝑡 𝜅 𝛼 𝜅 \displaystyle\inf_{\mathcal{Q}\in\mathcal{S}(\mathcal{X},\mathcal{Y})}\inf_{%
\xi\in\mathcal{C}\left[\mathcal{A}_{\mathcal{Q}}^{(t)}(\gamma_{0},x_{0},\cdot)%
,\pi\right]}\xi(\{x,y\in\mathcal{X}\times\mathcal{X}:\left\lVert x-y\right%
\rVert>\delta_{\epsilon}\})\geq\frac{(1-\epsilon)M}{H_{W(x_{0})^{\alpha},%
\varphi}^{-1}(t)^{\frac{\kappa}{\alpha-\kappa}}} roman_inf start_POSTSUBSCRIPT caligraphic_Q ∈ caligraphic_S ( caligraphic_X , caligraphic_Y ) end_POSTSUBSCRIPT roman_inf start_POSTSUBSCRIPT italic_ξ ∈ caligraphic_C [ caligraphic_A start_POSTSUBSCRIPT caligraphic_Q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ( italic_γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , ⋅ ) , italic_π ] end_POSTSUBSCRIPT italic_ξ ( { italic_x , italic_y ∈ caligraphic_X × caligraphic_X : ∥ italic_x - italic_y ∥ > italic_δ start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT } ) ≥ divide start_ARG ( 1 - italic_ϵ ) italic_M end_ARG start_ARG italic_H start_POSTSUBSCRIPT italic_W ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT , italic_φ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_t ) start_POSTSUPERSCRIPT divide start_ARG italic_κ end_ARG start_ARG italic_α - italic_κ end_ARG end_POSTSUPERSCRIPT end_ARG
holds for some δ ϵ ∈ ( 0 , 1 ) subscript 𝛿 italic-ϵ 0 1 \delta_{\epsilon}\in(0,1) italic_δ start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ∈ ( 0 , 1 ) and all t ≥ H W ( x 0 ) α , φ ( κ C ( 1 − ϵ ) α / α ) 𝑡 subscript 𝐻 𝑊 superscript subscript 𝑥 0 𝛼 𝜑
𝜅 𝐶 superscript 1 italic-ϵ 𝛼 𝛼 t\geq H_{W(x_{0})^{\alpha},\varphi}\left(\kappa C(1-\epsilon)^{\alpha}/\alpha\right) italic_t ≥ italic_H start_POSTSUBSCRIPT italic_W ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT , italic_φ end_POSTSUBSCRIPT ( italic_κ italic_C ( 1 - italic_ϵ ) start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT / italic_α ) . In particular,
inf 𝒬 ∈ 𝒮 ( 𝒳 , 𝒴 ) 𝒲 ∥ ⋅ ∥ ∧ 1 ( 𝒜 𝒬 ( t ) ( γ 0 , x 0 , ⋅ ) , π ) ≥ δ ϵ ( 1 − ϵ ) M H W ( x 0 ) α , φ − 1 ( t ) κ α − κ . subscript infimum 𝒬 𝒮 𝒳 𝒴 subscript 𝒲 delimited-∥∥ ⋅ 1 superscript subscript 𝒜 𝒬 𝑡 subscript 𝛾 0 subscript 𝑥 0 ⋅ 𝜋 subscript 𝛿 italic-ϵ 1 italic-ϵ 𝑀 superscript subscript 𝐻 𝑊 superscript subscript 𝑥 0 𝛼 𝜑
1 superscript 𝑡 𝜅 𝛼 𝜅 \inf_{\mathcal{Q}\in\mathcal{S}(\mathcal{X},\mathcal{Y})}\mathcal{W}_{\left%
\lVert\cdot\right\rVert\wedge 1}\left(\mathcal{A}_{\mathcal{Q}}^{(t)}(\gamma_{%
0},x_{0},\cdot),\pi\right)\geq\frac{\delta_{\epsilon}(1-\epsilon)M}{H_{W(x_{0}%
)^{\alpha},\varphi}^{-1}(t)^{\frac{\kappa}{\alpha-\kappa}}}. roman_inf start_POSTSUBSCRIPT caligraphic_Q ∈ caligraphic_S ( caligraphic_X , caligraphic_Y ) end_POSTSUBSCRIPT caligraphic_W start_POSTSUBSCRIPT ∥ ⋅ ∥ ∧ 1 end_POSTSUBSCRIPT ( caligraphic_A start_POSTSUBSCRIPT caligraphic_Q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ( italic_γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , ⋅ ) , italic_π ) ≥ divide start_ARG italic_δ start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ( 1 - italic_ϵ ) italic_M end_ARG start_ARG italic_H start_POSTSUBSCRIPT italic_W ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT , italic_φ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_t ) start_POSTSUPERSCRIPT divide start_ARG italic_κ end_ARG start_ARG italic_α - italic_κ end_ARG end_POSTSUPERSCRIPT end_ARG .
Proof.
Let r ≥ 1 𝑟 1 r\geq 1 italic_r ≥ 1 and let T = { x ∈ 𝒳 : W ( x ) ≥ r } 𝑇 conditional-set 𝑥 𝒳 𝑊 𝑥 𝑟 T=\{x\in\mathcal{X}:W(x)\geq r\} italic_T = { italic_x ∈ caligraphic_X : italic_W ( italic_x ) ≥ italic_r } .
Since W 𝑊 W italic_W is continuous, then T 𝑇 T italic_T is closed and by Strassen’s theorem ([Strassen, 1965 ] and [Villani, 2003 , Corollary 1.28] ), then for any δ > 0 𝛿 0 \delta>0 italic_δ > 0 ,
inf ξ ∈ 𝒞 [ 𝒜 𝒬 ( t ) ( γ 0 , x 0 , ⋅ ) , π ] ξ ( { x , y ∈ 𝒳 × 𝒳 : ∥ x − y ∥ > δ } ) ≥ π ( T ) − 𝒜 𝒬 ( t ) ( γ 0 , x 0 , T δ ) subscript infimum 𝜉 𝒞 superscript subscript 𝒜 𝒬 𝑡 subscript 𝛾 0 subscript 𝑥 0 ⋅ 𝜋 𝜉 conditional-set 𝑥 𝑦
𝒳 𝒳 delimited-∥∥ 𝑥 𝑦 𝛿 𝜋 𝑇 superscript subscript 𝒜 𝒬 𝑡 subscript 𝛾 0 subscript 𝑥 0 superscript 𝑇 𝛿 \inf_{\xi\in\mathcal{C}\left[\mathcal{A}_{\mathcal{Q}}^{(t)}(\gamma_{0},x_{0},%
\cdot),\pi\right]}\xi(\{x,y\in\mathcal{X}\times\mathcal{X}:\left\lVert x-y%
\right\rVert>\delta\})\geq\pi(T)-\mathcal{A}_{\mathcal{Q}}^{(t)}(\gamma_{0},x_%
{0},T^{\delta}) roman_inf start_POSTSUBSCRIPT italic_ξ ∈ caligraphic_C [ caligraphic_A start_POSTSUBSCRIPT caligraphic_Q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ( italic_γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , ⋅ ) , italic_π ] end_POSTSUBSCRIPT italic_ξ ( { italic_x , italic_y ∈ caligraphic_X × caligraphic_X : ∥ italic_x - italic_y ∥ > italic_δ } ) ≥ italic_π ( italic_T ) - caligraphic_A start_POSTSUBSCRIPT caligraphic_Q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ( italic_γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_T start_POSTSUPERSCRIPT italic_δ end_POSTSUPERSCRIPT )
where T δ = { y ∈ ℝ d : dist ( y , T ) ≤ δ } superscript 𝑇 𝛿 conditional-set 𝑦 superscript ℝ 𝑑 dist 𝑦 𝑇 𝛿 T^{\delta}=\{y\in\mathbb{R}^{d}:\text{dist}(y,T)\leq\delta\} italic_T start_POSTSUPERSCRIPT italic_δ end_POSTSUPERSCRIPT = { italic_y ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT : dist ( italic_y , italic_T ) ≤ italic_δ } and dist ( y , T ) = inf x ∈ T ∥ x − y ∥ dist 𝑦 𝑇 subscript infimum 𝑥 𝑇 delimited-∥∥ 𝑥 𝑦 \text{dist}(y,T)=\inf_{x\in T}\left\lVert x-y\right\rVert dist ( italic_y , italic_T ) = roman_inf start_POSTSUBSCRIPT italic_x ∈ italic_T end_POSTSUBSCRIPT ∥ italic_x - italic_y ∥ .
Thus, we will find a discrepancy between π ( { W ≥ r } ) 𝜋 𝑊 𝑟 \pi\left(\{W\geq r\}\right) italic_π ( { italic_W ≥ italic_r } ) and 𝒜 𝒬 ( t ) ( γ 0 , x 0 , { W ≥ ( 1 − ϵ ) r } ) superscript subscript 𝒜 𝒬 𝑡 subscript 𝛾 0 subscript 𝑥 0 𝑊 1 italic-ϵ 𝑟 \mathcal{A}_{\mathcal{Q}}^{(t)}\left(\gamma_{0},x_{0},\{W\geq(1-\epsilon)r\}\right) caligraphic_A start_POSTSUBSCRIPT caligraphic_Q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ( italic_γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , { italic_W ≥ ( 1 - italic_ϵ ) italic_r } ) for small ϵ italic-ϵ \epsilon italic_ϵ and the intuition is illustrated in Figure 1 .
Figure 1:
The diagram illustrates intuition for a discrepancy between the set { W ≥ r } 𝑊 𝑟 \{W\geq r\} { italic_W ≥ italic_r } for the adaptive process and the target measure and also { W ≥ ( 1 − ϵ ) r } 𝑊 1 italic-ϵ 𝑟 \{W\geq(1-\epsilon)r\} { italic_W ≥ ( 1 - italic_ϵ ) italic_r } for small ϵ italic-ϵ \epsilon italic_ϵ .
Let ∂ A = cl ( A ) ∖ int ( A ) 𝐴 cl 𝐴 int 𝐴 \partial A=\text{cl}(A)\setminus\text{int}(A) ∂ italic_A = cl ( italic_A ) ∖ int ( italic_A ) denote the boundary of a set A 𝐴 A italic_A where cl is the closure and int is the interior. Since ℝ d superscript ℝ 𝑑 \mathbb{R}^{d} blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT is convex, we have that d ( x , T ) = d ( x , ∂ T ) 𝑑 𝑥 𝑇 𝑑 𝑥 𝑇 d(x,T)=d(x,\partial T) italic_d ( italic_x , italic_T ) = italic_d ( italic_x , ∂ italic_T ) (see Lemma 20 ).
Since K = { x ∈ ℝ d : W ( x ) ≤ r } 𝐾 conditional-set 𝑥 superscript ℝ 𝑑 𝑊 𝑥 𝑟 K=\{x\in\mathbb{R}^{d}:W(x)\leq r\} italic_K = { italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT : italic_W ( italic_x ) ≤ italic_r } is compact, then W 𝑊 W italic_W is uniformly continuous on K 𝐾 K italic_K .
For ϵ ∈ ( 0 , 1 ) italic-ϵ 0 1 \epsilon\in(0,1) italic_ϵ ∈ ( 0 , 1 ) , we can then choose δ ϵ subscript 𝛿 italic-ϵ \delta_{\epsilon} italic_δ start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT depending on ϵ italic-ϵ \epsilon italic_ϵ sufficiently small so that W ( x ) ≥ ( 1 − ϵ ) r 𝑊 𝑥 1 italic-ϵ 𝑟 W(x)\geq(1-\epsilon)r italic_W ( italic_x ) ≥ ( 1 - italic_ϵ ) italic_r if dist ( x , T ) ≤ δ ϵ dist 𝑥 𝑇 subscript 𝛿 italic-ϵ \text{dist}(x,T)\leq\delta_{\epsilon} dist ( italic_x , italic_T ) ≤ italic_δ start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT and so
ℙ ( X t ∈ T δ ϵ ) ℙ subscript 𝑋 𝑡 superscript 𝑇 subscript 𝛿 italic-ϵ \displaystyle\mathbb{P}(X_{t}\in T^{\delta_{\epsilon}}) blackboard_P ( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ italic_T start_POSTSUPERSCRIPT italic_δ start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT end_POSTSUPERSCRIPT )
≤ ℙ ( W ( X t ) ≥ ( 1 − ϵ ) r ) . absent ℙ 𝑊 subscript 𝑋 𝑡 1 italic-ϵ 𝑟 \displaystyle\leq\mathbb{P}(W(X_{t})\geq(1-\epsilon)r). ≤ blackboard_P ( italic_W ( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ≥ ( 1 - italic_ϵ ) italic_r ) .
Markov’s inequality and (3 ) imply that
ℙ ( W ( X t ) ≥ ( 1 − ϵ ) r ) ≤ 𝔼 [ W α ( X t ) ] ( 1 − ϵ ) α r α ≤ H W ( x 0 ) α , φ − 1 ( t ) ( 1 − ϵ ) α r α . ℙ 𝑊 subscript 𝑋 𝑡 1 italic-ϵ 𝑟 𝔼 delimited-[] superscript 𝑊 𝛼 subscript 𝑋 𝑡 superscript 1 italic-ϵ 𝛼 superscript 𝑟 𝛼 superscript subscript 𝐻 𝑊 superscript subscript 𝑥 0 𝛼 𝜑
1 𝑡 superscript 1 italic-ϵ 𝛼 superscript 𝑟 𝛼 \mathbb{P}(W(X_{t})\geq(1-\epsilon)r)\leq\frac{\mathbb{E}\left[W^{\alpha}(X_{t%
})\right]}{(1-\epsilon)^{\alpha}r^{\alpha}}\leq\frac{H_{W(x_{0})^{\alpha},%
\varphi}^{-1}(t)}{(1-\epsilon)^{\alpha}r^{\alpha}}. blackboard_P ( italic_W ( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ≥ ( 1 - italic_ϵ ) italic_r ) ≤ divide start_ARG blackboard_E [ italic_W start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ] end_ARG start_ARG ( 1 - italic_ϵ ) start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT italic_r start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT end_ARG ≤ divide start_ARG italic_H start_POSTSUBSCRIPT italic_W ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT , italic_φ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_t ) end_ARG start_ARG ( 1 - italic_ϵ ) start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT italic_r start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT end_ARG .
Optimizing, we get for t 𝑡 t italic_t large enough so that
r = ( α κ C ( 1 − ϵ ) α H W ( x 0 ) α , φ − 1 ( t ) ) 1 α − κ ≥ 1 𝑟 superscript 𝛼 𝜅 𝐶 superscript 1 italic-ϵ 𝛼 superscript subscript 𝐻 𝑊 superscript subscript 𝑥 0 𝛼 𝜑
1 𝑡 1 𝛼 𝜅 1 r=\left(\frac{\alpha}{\kappa C(1-\epsilon)^{\alpha}}H_{W(x_{0})^{\alpha},%
\varphi}^{-1}(t)\right)^{\frac{1}{\alpha-\kappa}}\geq 1 italic_r = ( divide start_ARG italic_α end_ARG start_ARG italic_κ italic_C ( 1 - italic_ϵ ) start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT end_ARG italic_H start_POSTSUBSCRIPT italic_W ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT , italic_φ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_t ) ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_α - italic_κ end_ARG end_POSTSUPERSCRIPT ≥ 1
and this yields the lower bound
δ ϵ − 1 𝒲 ∥ ⋅ ∥ ∧ 1 ( 𝒜 𝒬 ( t ) ( γ 0 , x 0 , ⋅ ) , π ) superscript subscript 𝛿 italic-ϵ 1 subscript 𝒲 delimited-∥∥ ⋅ 1 superscript subscript 𝒜 𝒬 𝑡 subscript 𝛾 0 subscript 𝑥 0 ⋅ 𝜋 \displaystyle\delta_{\epsilon}^{-1}\mathcal{W}_{\left\lVert\cdot\right\rVert%
\wedge 1}\left(\mathcal{A}_{\mathcal{Q}}^{(t)}(\gamma_{0},x_{0},\cdot),\pi\right) italic_δ start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT caligraphic_W start_POSTSUBSCRIPT ∥ ⋅ ∥ ∧ 1 end_POSTSUBSCRIPT ( caligraphic_A start_POSTSUBSCRIPT caligraphic_Q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ( italic_γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , ⋅ ) , italic_π )
≥ inf ξ ∈ 𝒞 [ 𝒜 𝒬 ( t ) ( γ 0 , x 0 , ⋅ ) , π ] ξ ( { x , y : ∥ x − y ∥ > δ ϵ } ) absent subscript infimum 𝜉 𝒞 superscript subscript 𝒜 𝒬 𝑡 subscript 𝛾 0 subscript 𝑥 0 ⋅ 𝜋 𝜉 conditional-set 𝑥 𝑦
delimited-∥∥ 𝑥 𝑦 subscript 𝛿 italic-ϵ \displaystyle\geq\inf_{\xi\in\mathcal{C}\left[\mathcal{A}_{\mathcal{Q}}^{(t)}(%
\gamma_{0},x_{0},\cdot),\pi\right]}\xi(\{x,y:\left\lVert x-y\right\rVert>%
\delta_{\epsilon}\}) ≥ roman_inf start_POSTSUBSCRIPT italic_ξ ∈ caligraphic_C [ caligraphic_A start_POSTSUBSCRIPT caligraphic_Q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ( italic_γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , ⋅ ) , italic_π ] end_POSTSUBSCRIPT italic_ξ ( { italic_x , italic_y : ∥ italic_x - italic_y ∥ > italic_δ start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT } )
≥ C r κ − H W ( x 0 ) α , φ − 1 ( t ) ( 1 − ϵ ) α r α . absent 𝐶 superscript 𝑟 𝜅 superscript subscript 𝐻 𝑊 superscript subscript 𝑥 0 𝛼 𝜑
1 𝑡 superscript 1 italic-ϵ 𝛼 superscript 𝑟 𝛼 \displaystyle\geq\frac{C}{r^{\kappa}}-\frac{H_{W(x_{0})^{\alpha},\varphi}^{-1}%
(t)}{(1-\epsilon)^{\alpha}r^{\alpha}}. ≥ divide start_ARG italic_C end_ARG start_ARG italic_r start_POSTSUPERSCRIPT italic_κ end_POSTSUPERSCRIPT end_ARG - divide start_ARG italic_H start_POSTSUBSCRIPT italic_W ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT , italic_φ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_t ) end_ARG start_ARG ( 1 - italic_ϵ ) start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT italic_r start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT end_ARG .
≥ ( 1 − ϵ ) α κ α − κ M H W ( x 0 ) α , φ − 1 ( t ) κ α − κ absent superscript 1 italic-ϵ 𝛼 𝜅 𝛼 𝜅 𝑀 superscript subscript 𝐻 𝑊 superscript subscript 𝑥 0 𝛼 𝜑
1 superscript 𝑡 𝜅 𝛼 𝜅 \displaystyle\geq(1-\epsilon)^{\frac{\alpha\kappa}{\alpha-\kappa}}\frac{M}{H_{%
W(x_{0})^{\alpha},\varphi}^{-1}(t)^{\frac{\kappa}{\alpha-\kappa}}} ≥ ( 1 - italic_ϵ ) start_POSTSUPERSCRIPT divide start_ARG italic_α italic_κ end_ARG start_ARG italic_α - italic_κ end_ARG end_POSTSUPERSCRIPT divide start_ARG italic_M end_ARG start_ARG italic_H start_POSTSUBSCRIPT italic_W ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT , italic_φ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_t ) start_POSTSUPERSCRIPT divide start_ARG italic_κ end_ARG start_ARG italic_α - italic_κ end_ARG end_POSTSUPERSCRIPT end_ARG
where M 𝑀 M italic_M is defined by (4 ).
The conclusion follows since ϵ italic-ϵ \epsilon italic_ϵ is arbitrary.
∎
An interpretation of Theorem 4 is the best possible rate of convergence for adaptive MCMC satisfying (3 ) for target measure satisfying (2 ).
The conclusion of Theorem 4 can also be extended to general path-connected state spaces 𝒳 𝒳 \mathcal{X} caligraphic_X .
The mild assumption of compact level sets for the function W 𝑊 W italic_W often holds in many applications. However, there is a significant drawback to the Wasserstein lower bound being the constant is non-explicit compared to the explicit lower bound in total variation.
What is surprising about the lower bounds in this section is the requirement only on the Markov family ( 𝒫 γ ) γ ∈ 𝒴 subscript subscript 𝒫 𝛾 𝛾 𝒴 (\mathcal{P}_{\gamma})_{\gamma\in\mathcal{Y}} ( caligraphic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_γ ∈ caligraphic_Y end_POSTSUBSCRIPT to satisfy (3 ) and does not directly depend on an adaptation strategy.
For example, it is common scenario in adaptive MCMC for the parameters space 𝒴 𝒴 \mathcal{Y} caligraphic_Y to be compact.
In this case, the simultaneous growth condition (3 ) often holds if a Markov kernel satisfies some mild regularity conditions and (3 ) holds with only fixed parameters.
Example 5 .
(Adaptive Unadjusted Langevin algorithm)
Consider the multivariate Student’s t-distribution π 𝜋 \pi italic_π on ℝ d superscript ℝ 𝑑 \mathbb{R}^{d} blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT with d ≥ 1 𝑑 1 d\geq 1 italic_d ≥ 1 and v > 0 𝑣 0 v>0 italic_v > 0 degrees of freedom. The Lebesgue density is defined by
D π ( x ) = ( v + d ) / 2 Γ ( v / 2 ) ( v π ) d / 2 exp ( − U ( x ) ) subscript 𝐷 𝜋 𝑥 𝑣 𝑑 2 Γ 𝑣 2 superscript 𝑣 𝜋 𝑑 2 𝑈 𝑥 D_{\pi}(x)=\frac{(v+d)/2}{\Gamma(v/2)(v\pi)^{d/2}}\exp(-U(x)) italic_D start_POSTSUBSCRIPT italic_π end_POSTSUBSCRIPT ( italic_x ) = divide start_ARG ( italic_v + italic_d ) / 2 end_ARG start_ARG roman_Γ ( italic_v / 2 ) ( italic_v italic_π ) start_POSTSUPERSCRIPT italic_d / 2 end_POSTSUPERSCRIPT end_ARG roman_exp ( - italic_U ( italic_x ) )
where U ( x ) = v + d 2 log ( 1 + ∥ x ∥ 2 ) 𝑈 𝑥 𝑣 𝑑 2 1 superscript delimited-∥∥ 𝑥 2 U(x)=\frac{v+d}{2}\log(1+\left\lVert x\right\rVert^{2}) italic_U ( italic_x ) = divide start_ARG italic_v + italic_d end_ARG start_ARG 2 end_ARG roman_log ( 1 + ∥ italic_x ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) .
The adapted unadjusted Langevin process ( Γ t , X t ) t ≥ 0 subscript subscript Γ 𝑡 subscript 𝑋 𝑡 𝑡 0 (\Gamma_{t},X_{t})_{t\geq 0} ( roman_Γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_t ≥ 0 end_POSTSUBSCRIPT on ( 0 , 1 ) × ℝ d 0 1 superscript ℝ 𝑑 (0,1)\times\mathbb{R}^{d} ( 0 , 1 ) × blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT defined by
X t + 1 = X t − Γ t + 1 ∇ U ( x ) + 2 Γ t + 1 Z t + 1 subscript 𝑋 𝑡 1 subscript 𝑋 𝑡 subscript Γ 𝑡 1 ∇ 𝑈 𝑥 2 subscript Γ 𝑡 1 subscript 𝑍 𝑡 1 X_{t+1}=X_{t}-\Gamma_{t+1}\nabla U(x)+\sqrt{2\Gamma_{t+1}}Z_{t+1} italic_X start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT = italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - roman_Γ start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ∇ italic_U ( italic_x ) + square-root start_ARG 2 roman_Γ start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT end_ARG italic_Z start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT
where Γ t + 1 ∈ ( 0 , 1 ) subscript Γ 𝑡 1 0 1 \Gamma_{t+1}\in(0,1) roman_Γ start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ∈ ( 0 , 1 ) and Z t + 1 subscript 𝑍 𝑡 1 Z_{t+1} italic_Z start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT is an independent standard normal random vector.
Subgeometric drift conditions have been shown for unadjusted Langevin in the non-adaptive case for heavy tailed target measures [Kamatani, 2009 ] .
Let α > 0 𝛼 0 \alpha>0 italic_α > 0 and W ( x ) = ( 1 + ∥ x ∥ 2 ) ( v + d ) / 2 𝑊 𝑥 superscript 1 superscript delimited-∥∥ 𝑥 2 𝑣 𝑑 2 W(x)=(1+\left\lVert x\right\rVert^{2})^{(v+d)/2} italic_W ( italic_x ) = ( 1 + ∥ italic_x ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ( italic_v + italic_d ) / 2 end_POSTSUPERSCRIPT .
By Ito’s formula, for large enough ∥ x ∥ delimited-∥∥ 𝑥 \left\lVert x\right\rVert ∥ italic_x ∥ , there is a constant ϵ > 0 italic-ϵ 0 \epsilon>0 italic_ϵ > 0 such that the second term is bounded using the moment generating function of non-central chi-square random variables by
𝔼 [ W α ( X t + 1 ) | Γ t + 1 = γ , X t = x ] − W α ( x ) 𝔼 delimited-[] formulae-sequence conditional superscript 𝑊 𝛼 subscript 𝑋 𝑡 1 subscript Γ 𝑡 1 𝛾 subscript 𝑋 𝑡 𝑥 superscript 𝑊 𝛼 𝑥 \displaystyle\mathbb{E}\left[W^{\alpha}(X_{t+1})|\Gamma_{t+1}=\gamma,X_{t}=x%
\right]-W^{\alpha}(x) blackboard_E [ italic_W start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ( italic_X start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ) | roman_Γ start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT = italic_γ , italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_x ] - italic_W start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ( italic_x )
= 𝔼 [ ∫ 0 γ ∇ W ( x ) α ⋅ 𝑑 X t ] + 𝔼 [ ∫ 0 γ tr ( ∇ 2 W ( x ) α x ) 𝑑 t ] absent 𝔼 delimited-[] superscript subscript 0 𝛾 ⋅ ∇ 𝑊 superscript 𝑥 𝛼 differential-d subscript 𝑋 𝑡 𝔼 delimited-[] superscript subscript 0 𝛾 tr superscript ∇ 2 𝑊 superscript 𝑥 𝛼 𝑥 differential-d 𝑡 \displaystyle=\mathbb{E}\left[\int_{0}^{\gamma}\nabla W(x)^{\alpha}\cdot dX_{t%
}\right]+\mathbb{E}\left[\int_{0}^{\gamma}\text{tr}\left(\nabla^{2}W(x)^{%
\alpha}x\right)dt\right] = blackboard_E [ ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_γ end_POSTSUPERSCRIPT ∇ italic_W ( italic_x ) start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ⋅ italic_d italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ] + blackboard_E [ ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_γ end_POSTSUPERSCRIPT tr ( ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_W ( italic_x ) start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT italic_x ) italic_d italic_t ]
≤ α ( v + d ) [ − γ ∗ ( v + 2 ) + α ( v + d ) + ϵ ] ( 1 + ∥ x ∥ 2 ) α ( v + d ) / 2 − 1 . absent 𝛼 𝑣 𝑑 delimited-[] subscript 𝛾 𝑣 2 𝛼 𝑣 𝑑 italic-ϵ superscript 1 superscript delimited-∥∥ 𝑥 2 𝛼 𝑣 𝑑 2 1 \displaystyle\leq\alpha(v+d)\left[-\gamma_{*}(v+2)+\alpha(v+d)+\epsilon\right]%
\left(1+\left\lVert x\right\rVert^{2}\right)^{\alpha(v+d)/2-1}. ≤ italic_α ( italic_v + italic_d ) [ - italic_γ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ( italic_v + 2 ) + italic_α ( italic_v + italic_d ) + italic_ϵ ] ( 1 + ∥ italic_x ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_α ( italic_v + italic_d ) / 2 - 1 end_POSTSUPERSCRIPT .
It follows that for some constant C α > 0 subscript 𝐶 𝛼 0 C_{\alpha}>0 italic_C start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT > 0 and for all x , γ 𝑥 𝛾
x,\gamma italic_x , italic_γ ,
𝔼 [ W α ( X t + 1 ) | Γ t + 1 = γ , X t = x ] − W α ( x ) ≤ C α W α ( x ) 1 − 2 α ( v + d ) . 𝔼 delimited-[] formulae-sequence conditional superscript 𝑊 𝛼 subscript 𝑋 𝑡 1 subscript Γ 𝑡 1 𝛾 subscript 𝑋 𝑡 𝑥 superscript 𝑊 𝛼 𝑥 subscript 𝐶 𝛼 superscript 𝑊 𝛼 superscript 𝑥 1 2 𝛼 𝑣 𝑑 \displaystyle\mathbb{E}\left[W^{\alpha}(X_{t+1})|\Gamma_{t+1}=\gamma,X_{t}=x%
\right]-W^{\alpha}(x)\leq C_{\alpha}W^{\alpha}(x)^{1-\frac{2}{\alpha(v+d)}}. blackboard_E [ italic_W start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ( italic_X start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ) | roman_Γ start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT = italic_γ , italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_x ] - italic_W start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ( italic_x ) ≤ italic_C start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT italic_W start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ( italic_x ) start_POSTSUPERSCRIPT 1 - divide start_ARG 2 end_ARG start_ARG italic_α ( italic_v + italic_d ) end_ARG end_POSTSUPERSCRIPT .
One has the lower bound for some constant C > 0 𝐶 0 C>0 italic_C > 0
π ( W ≥ r ) ≥ C r 1 − 2 / ( v + d ) . 𝜋 𝑊 𝑟 𝐶 superscript 𝑟 1 2 𝑣 𝑑 \pi(W\geq r)\geq\frac{C}{r^{1-2/(v+d)}}. italic_π ( italic_W ≥ italic_r ) ≥ divide start_ARG italic_C end_ARG start_ARG italic_r start_POSTSUPERSCRIPT 1 - 2 / ( italic_v + italic_d ) end_POSTSUPERSCRIPT end_ARG .
If v + d − 2 > 0 𝑣 𝑑 2 0 v+d-2>0 italic_v + italic_d - 2 > 0 , then by Theorem 4 , then there is a constants M > 0 𝑀 0 M>0 italic_M > 0 such that
inf 𝒬 ∈ 𝒮 ( 𝒳 , 𝒴 ) 𝒲 ∥ ⋅ ∥ ∧ 1 ( 𝒜 𝒬 ( t ) ( γ 0 , 0 , ⋅ ) , π ) ≥ M ( 1 + t ) v + d − 2 . subscript infimum 𝒬 𝒮 𝒳 𝒴 subscript 𝒲 delimited-∥∥ ⋅ 1 superscript subscript 𝒜 𝒬 𝑡 subscript 𝛾 0 0 ⋅ 𝜋 𝑀 superscript 1 𝑡 𝑣 𝑑 2 \inf_{\mathcal{Q}\in\mathcal{S}(\mathcal{X},\mathcal{Y})}\mathcal{W}_{\left%
\lVert\cdot\right\rVert\wedge 1}(\mathcal{A}_{\mathcal{Q}}^{(t)}(\gamma_{0},0,%
\cdot),\pi)\geq\frac{M}{(1+t)^{v+d-2}}. roman_inf start_POSTSUBSCRIPT caligraphic_Q ∈ caligraphic_S ( caligraphic_X , caligraphic_Y ) end_POSTSUBSCRIPT caligraphic_W start_POSTSUBSCRIPT ∥ ⋅ ∥ ∧ 1 end_POSTSUBSCRIPT ( caligraphic_A start_POSTSUBSCRIPT caligraphic_Q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ( italic_γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , 0 , ⋅ ) , italic_π ) ≥ divide start_ARG italic_M end_ARG start_ARG ( 1 + italic_t ) start_POSTSUPERSCRIPT italic_v + italic_d - 2 end_POSTSUPERSCRIPT end_ARG .
Of particular interest is that the rate cannot be geometric even when considering weak convergence.
In certain situations, the tail probability decay on π 𝜋 \pi italic_π in (2 ) may be difficult to establish.
In this case, we consider finding a function that is not integrable with respect to π 𝜋 \pi italic_π , but this results in a trade-off of only a having a lower bound for a subsequence.
An analogous result will also hold in total variation.
Theorem 6 .
Let 𝒳 = ℝ d 𝒳 superscript ℝ 𝑑 \mathcal{X}=\mathbb{R}^{d} caligraphic_X = blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT for d ∈ ℤ + 𝑑 subscript ℤ d\in\mathbb{Z}_{+} italic_d ∈ blackboard_Z start_POSTSUBSCRIPT + end_POSTSUBSCRIPT .
Assume for some Borel function W : 𝒳 → [ 1 , ∞ ) : 𝑊 → 𝒳 1 W:\mathcal{X}\to[1,\infty) italic_W : caligraphic_X → [ 1 , ∞ ) such that ∫ 𝒳 W 𝑑 π = ∞ subscript 𝒳 𝑊 differential-d 𝜋 \int_{\mathcal{X}}Wd\pi=\infty ∫ start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT italic_W italic_d italic_π = ∞ but also for some α > 1 𝛼 1 \alpha>1 italic_α > 1 and some concave function φ : ( 0 , ∞ ) → ( 0 , ∞ ) : 𝜑 → 0 0 \varphi:(0,\infty)\to(0,\infty) italic_φ : ( 0 , ∞ ) → ( 0 , ∞ ) ,
( 𝒫 γ W α ) ( x ) − W ( x ) α ≤ φ ( W ( x ) α ) subscript 𝒫 𝛾 superscript 𝑊 𝛼 𝑥 𝑊 superscript 𝑥 𝛼 𝜑 𝑊 superscript 𝑥 𝛼 \displaystyle(\mathcal{P}_{\gamma}W^{\alpha})(x)-W(x)^{\alpha}\leq\varphi(W(x)%
^{\alpha}) ( caligraphic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT italic_W start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ) ( italic_x ) - italic_W ( italic_x ) start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ≤ italic_φ ( italic_W ( italic_x ) start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT )
(6)
holds for all x , γ ∈ 𝒳 × 𝒴 𝑥 𝛾
𝒳 𝒴 x,\gamma\in\mathcal{X}\times\mathcal{Y} italic_x , italic_γ ∈ caligraphic_X × caligraphic_Y .
Assume additionally for each r > 0 𝑟 0 r>0 italic_r > 0 , the set { x ∈ ℝ d : W ( x ) ≤ r } conditional-set 𝑥 superscript ℝ 𝑑 𝑊 𝑥 𝑟 \{x\in\mathbb{R}^{d}:W(x)\leq r\} { italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT : italic_W ( italic_x ) ≤ italic_r } is compact.
Then there is a constant M ∗ > 0 subscript 𝑀 0 M_{*}>0 italic_M start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT > 0 and a subsequence t n ∈ ℤ + subscript 𝑡 𝑛 subscript ℤ t_{n}\in\mathbb{Z}_{+} italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∈ blackboard_Z start_POSTSUBSCRIPT + end_POSTSUBSCRIPT increasing to infinity such that for any ϵ ∈ ( 0 , 1 ) italic-ϵ 0 1 \epsilon\in(0,1) italic_ϵ ∈ ( 0 , 1 ) with α > 1 + ϵ 𝛼 1 italic-ϵ \alpha>1+\epsilon italic_α > 1 + italic_ϵ ,
inf 𝒬 ∈ 𝒮 ( 𝒳 , 𝒴 ) 𝒲 ∥ ⋅ ∥ ∧ 1 ( 𝒜 𝒬 ( t n , 𝒬 ) ( γ 0 , x 0 , ⋅ ) , π ) ≥ M ∗ ( H W α ( x 0 ) , φ − 1 ( t n ) ) 1 + ϵ α − 1 − ϵ . subscript infimum 𝒬 𝒮 𝒳 𝒴 subscript 𝒲 delimited-∥∥ ⋅ 1 superscript subscript 𝒜 𝒬 subscript 𝑡 𝑛 𝒬 subscript 𝛾 0 subscript 𝑥 0 ⋅ 𝜋 subscript 𝑀 superscript superscript subscript 𝐻 superscript 𝑊 𝛼 subscript 𝑥 0 𝜑
1 subscript 𝑡 𝑛 1 italic-ϵ 𝛼 1 italic-ϵ \inf_{\mathcal{Q}\in\mathcal{S}(\mathcal{X},\mathcal{Y})}\mathcal{W}_{\left%
\lVert\cdot\right\rVert\wedge 1}\left(\mathcal{A}_{\mathcal{Q}}^{(t_{n},%
\mathcal{Q})}(\gamma_{0},x_{0},\cdot),\pi\right)\geq\frac{M_{*}}{\left(H_{W^{%
\alpha}(x_{0}),\varphi}^{-1}(t_{n})\right)^{\frac{1+\epsilon}{\alpha-1-%
\epsilon}}}. roman_inf start_POSTSUBSCRIPT caligraphic_Q ∈ caligraphic_S ( caligraphic_X , caligraphic_Y ) end_POSTSUBSCRIPT caligraphic_W start_POSTSUBSCRIPT ∥ ⋅ ∥ ∧ 1 end_POSTSUBSCRIPT ( caligraphic_A start_POSTSUBSCRIPT caligraphic_Q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , caligraphic_Q ) end_POSTSUPERSCRIPT ( italic_γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , ⋅ ) , italic_π ) ≥ divide start_ARG italic_M start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT end_ARG start_ARG ( italic_H start_POSTSUBSCRIPT italic_W start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) , italic_φ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ) start_POSTSUPERSCRIPT divide start_ARG 1 + italic_ϵ end_ARG start_ARG italic_α - 1 - italic_ϵ end_ARG end_POSTSUPERSCRIPT end_ARG .
Proof.
Since ∫ 𝒳 W 𝑑 π = ∞ subscript 𝒳 𝑊 differential-d 𝜋 \int_{\mathcal{X}}Wd\pi=\infty ∫ start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT italic_W italic_d italic_π = ∞ , there is a sequence ( r n ) n subscript subscript 𝑟 𝑛 𝑛 (r_{n})_{n} ( italic_r start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT with lim n r n = ∞ subscript 𝑛 subscript 𝑟 𝑛 \lim_{n}r_{n}=\infty roman_lim start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = ∞ such that with T n = { x : W ( x ) ≥ r n } subscript 𝑇 𝑛 conditional-set 𝑥 𝑊 𝑥 subscript 𝑟 𝑛 T_{n}=\{x:W(x)\geq r_{n}\} italic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = { italic_x : italic_W ( italic_x ) ≥ italic_r start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } ,
π ( T n ) ≥ 2 α + 1 r n 1 + ϵ . 𝜋 subscript 𝑇 𝑛 superscript 2 𝛼 1 superscript subscript 𝑟 𝑛 1 italic-ϵ \pi(T_{n})\geq\frac{2^{\alpha+1}}{r_{n}^{1+\epsilon}}. italic_π ( italic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ≥ divide start_ARG 2 start_POSTSUPERSCRIPT italic_α + 1 end_POSTSUPERSCRIPT end_ARG start_ARG italic_r start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 + italic_ϵ end_POSTSUPERSCRIPT end_ARG .
The conclusion follows by Theorem 4 .
∎
3 Subgeometric upper bounds for adaptive MCMC
This section is dedicated to studying conditions such that an upper bound convergence rate can be obtained for adaptive MCMC comparable to the lower bounds in the previous section.
We first consider an alternative to the diminishing adaptation condition [Roberts and Rosenthal, 2007 ] that is stronger in the sense that it requires a specified rate of decay.
Definition 7 .
An adaptive process satisfies expected diminishing adaptation with function G : ℤ + → ( 0 , ∞ ) : 𝐺 → subscript ℤ 0 G:\mathbb{Z}_{+}\to(0,\infty) italic_G : blackboard_Z start_POSTSUBSCRIPT + end_POSTSUBSCRIPT → ( 0 , ∞ ) strictly decreasing to infinity if and for all t ∈ ℤ + 𝑡 subscript ℤ t\in\mathbb{Z}_{+} italic_t ∈ blackboard_Z start_POSTSUBSCRIPT + end_POSTSUBSCRIPT ,
sup x ∈ 𝒳 𝔼 [ ∥ 𝒫 Γ t + 1 ( x , ⋅ ) − 𝒫 Γ t ( x , ⋅ ) ∥ TV ∣ X t = x ] ≤ G ( t ) . subscript supremum 𝑥 𝒳 𝔼 delimited-[] conditional subscript delimited-∥∥ subscript 𝒫 subscript Γ 𝑡 1 𝑥 ⋅ subscript 𝒫 subscript Γ 𝑡 𝑥 ⋅ TV subscript 𝑋 𝑡 𝑥 𝐺 𝑡 \sup_{x\in\mathcal{X}}\mathbb{E}\left[\left\lVert\mathcal{P}_{\Gamma_{t+1}}(x,%
\cdot)-\mathcal{P}_{\Gamma_{t}}(x,\cdot)\right\rVert_{\text{TV}}\mid X_{t}=x%
\right]\leq G(t). roman_sup start_POSTSUBSCRIPT italic_x ∈ caligraphic_X end_POSTSUBSCRIPT blackboard_E [ ∥ caligraphic_P start_POSTSUBSCRIPT roman_Γ start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x , ⋅ ) - caligraphic_P start_POSTSUBSCRIPT roman_Γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x , ⋅ ) ∥ start_POSTSUBSCRIPT TV end_POSTSUBSCRIPT ∣ italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_x ] ≤ italic_G ( italic_t ) .
(7)
Proposition 17 ensures Borel measurability of the total variation in (7 ).
One way to satisfy this condition is if ρ 𝜌 \rho italic_ρ is a metric on 𝒴 𝒴 \mathcal{Y} caligraphic_Y and sup x 𝔼 [ ρ ( Γ t + 1 , Γ t ) ∣ X t = x ] ≤ G ( t ) subscript supremum 𝑥 𝔼 delimited-[] conditional 𝜌 subscript Γ 𝑡 1 subscript Γ 𝑡 subscript 𝑋 𝑡 𝑥 𝐺 𝑡 \sup_{x}\mathbb{E}\left[\rho(\Gamma_{t+1},\Gamma_{t})\mid X_{t}=x\right]\leq G%
(t) roman_sup start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT blackboard_E [ italic_ρ ( roman_Γ start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT , roman_Γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ∣ italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_x ] ≤ italic_G ( italic_t ) , then the expected diminishing adaptation condition can be shown through Lipschitz continuity of 𝒫 γ subscript 𝒫 𝛾 \mathcal{P}_{\gamma} caligraphic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT . For example, if for each x ∈ 𝒳 𝑥 𝒳 x\in\mathcal{X} italic_x ∈ caligraphic_X , γ ↦ 𝒫 γ ( x , ⋅ ) maps-to 𝛾 subscript 𝒫 𝛾 𝑥 ⋅ \gamma\mapsto\mathcal{P}_{\gamma}(x,\cdot) italic_γ ↦ caligraphic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_x , ⋅ ) is ρ 𝜌 \rho italic_ρ -Lipschitz with constant L x subscript 𝐿 𝑥 L_{x} italic_L start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT , then
sup x ∈ 𝒳 ∥ 𝒫 Γ t + 1 ( x , ⋅ ) − 𝒫 Γ t ( x , ⋅ ) ∥ TV ≤ ( sup x ∈ 𝒳 L x ) ρ ( Γ t + 1 , Γ t ) . subscript supremum 𝑥 𝒳 subscript delimited-∥∥ subscript 𝒫 subscript Γ 𝑡 1 𝑥 ⋅ subscript 𝒫 subscript Γ 𝑡 𝑥 ⋅ TV subscript supremum 𝑥 𝒳 subscript 𝐿 𝑥 𝜌 subscript Γ 𝑡 1 subscript Γ 𝑡 \sup_{x\in\mathcal{X}}\left\lVert\mathcal{P}_{\Gamma_{t+1}}(x,\cdot)-\mathcal{%
P}_{\Gamma_{t}}(x,\cdot)\right\rVert_{\text{TV}}\leq\left(\sup_{x\in\mathcal{X%
}}L_{x}\right)\rho(\Gamma_{t+1},\Gamma_{t}). roman_sup start_POSTSUBSCRIPT italic_x ∈ caligraphic_X end_POSTSUBSCRIPT ∥ caligraphic_P start_POSTSUBSCRIPT roman_Γ start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x , ⋅ ) - caligraphic_P start_POSTSUBSCRIPT roman_Γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x , ⋅ ) ∥ start_POSTSUBSCRIPT TV end_POSTSUBSCRIPT ≤ ( roman_sup start_POSTSUBSCRIPT italic_x ∈ caligraphic_X end_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ) italic_ρ ( roman_Γ start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT , roman_Γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) .
This has been shown to hold generally for adaptive Metropolis-Hastings with symmetric proposals [Andrieu and Moulines, 2006 ] .
Next, we consider a simultaneous version of a subgeometric drift condition on the Markov family.
Definition 8 .
A Markov family ( 𝒫 γ ) γ ∈ 𝒴 subscript subscript 𝒫 𝛾 𝛾 𝒴 (\mathcal{P}_{\gamma})_{\gamma\in\mathcal{Y}} ( caligraphic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_γ ∈ caligraphic_Y end_POSTSUBSCRIPT satisfies a simultaneous subgeometric drift condition if there is a Borel function V : 𝒳 → [ 1 , ∞ ) : 𝑉 → 𝒳 1 V:\mathcal{X}\to[1,\infty) italic_V : caligraphic_X → [ 1 , ∞ ) and a concave function φ : [ 0 , ∞ ) → [ 0 , ∞ ) : 𝜑 → 0 0 \varphi:[0,\infty)\to[0,\infty) italic_φ : [ 0 , ∞ ) → [ 0 , ∞ ) strictly increasing to infinity with lim v → ∞ φ ( v ) / v = 0 subscript → 𝑣 𝜑 𝑣 𝑣 0 \lim_{v\to\infty}\varphi(v)/v=0 roman_lim start_POSTSUBSCRIPT italic_v → ∞ end_POSTSUBSCRIPT italic_φ ( italic_v ) / italic_v = 0 and a constant K ≥ 0 𝐾 0 K\geq 0 italic_K ≥ 0 such that
( 𝒫 γ V ) ( x ) − V ( x ) ≤ − φ ( V ( x ) ) + K subscript 𝒫 𝛾 𝑉 𝑥 𝑉 𝑥 𝜑 𝑉 𝑥 𝐾 \displaystyle(\mathcal{P}_{\gamma}V)(x)-V(x)\leq-\varphi(V(x))+K ( caligraphic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT italic_V ) ( italic_x ) - italic_V ( italic_x ) ≤ - italic_φ ( italic_V ( italic_x ) ) + italic_K
(8)
holds for every x , γ ∈ 𝒳 × 𝒴 𝑥 𝛾
𝒳 𝒴 x,\gamma\in\mathcal{X}\times\mathcal{Y} italic_x , italic_γ ∈ caligraphic_X × caligraphic_Y .
Here we assume lim v → ∞ φ ( v ) / v = 0 subscript → 𝑣 𝜑 𝑣 𝑣 0 \lim_{v\to\infty}\varphi(v)/v=0 roman_lim start_POSTSUBSCRIPT italic_v → ∞ end_POSTSUBSCRIPT italic_φ ( italic_v ) / italic_v = 0 to exclude the geometric case. Subgeometric drift conditions for Markov chains has been studied previously [Jarner and Roberts, 2002 , Douc et al., 2004 ] but we adjust the previous conditions to hold over feasible tuning parameters 𝒴 𝒴 \mathcal{Y} caligraphic_Y . We now combine this drift condition with a simultaneous local contracting condition.
Definition 9 .
A Markov family ( 𝒫 γ ) γ ∈ 𝒴 subscript subscript 𝒫 𝛾 𝛾 𝒴 (\mathcal{P}_{\gamma})_{\gamma\in\mathcal{Y}} ( caligraphic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_γ ∈ caligraphic_Y end_POSTSUBSCRIPT satisfies a simultaneously locally contracting condition on a set C ⊆ 𝒳 × 𝒳 𝐶 𝒳 𝒳 C\subseteq\mathcal{X}\times\mathcal{X} italic_C ⊆ caligraphic_X × caligraphic_X if there is a constant α ∈ ( 0 , 1 ) 𝛼 0 1 \alpha\in(0,1) italic_α ∈ ( 0 , 1 ) where
∥ 𝒫 γ ( x , ⋅ ) − 𝒫 γ ( y , ⋅ ) ∥ TV ≤ 1 − α subscript delimited-∥∥ subscript 𝒫 𝛾 𝑥 ⋅ subscript 𝒫 𝛾 𝑦 ⋅ TV 1 𝛼 \displaystyle\left\lVert\mathcal{P}_{\gamma}(x,\cdot)-\mathcal{P}_{\gamma}(y,%
\cdot)\right\rVert_{\text{TV}}\leq 1-\alpha ∥ caligraphic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_x , ⋅ ) - caligraphic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_y , ⋅ ) ∥ start_POSTSUBSCRIPT TV end_POSTSUBSCRIPT ≤ 1 - italic_α
(9)
holds for all x , y ∈ C 𝑥 𝑦
𝐶 x,y\in C italic_x , italic_y ∈ italic_C and γ ∈ 𝒴 𝛾 𝒴 \gamma\in\mathcal{Y} italic_γ ∈ caligraphic_Y .
Local coupling conditions have been studied in the subgeometric case for Markov chains [Durmus et al., 2016 ] .
For example, a minorization condition can be used to verify the Markov family is simultaneously locally contracting (see [Roberts and Rosenthal, 2007 ] ).
Under these three conditions, we can establish an upper bound for the adaptation process.
Theorem 10 .
Assume the expected diminishing adaptation condition (7 ) holds with G ( ⋅ ) 𝐺 ⋅ G(\cdot) italic_G ( ⋅ ) decreasing to infinity.
Additionally assume the following assumptions hold for the Markov family ( 𝒫 γ ) γ ∈ 𝒴 subscript subscript 𝒫 𝛾 𝛾 𝒴 (\mathcal{P}_{\gamma})_{\gamma\in\mathcal{Y}} ( caligraphic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_γ ∈ caligraphic_Y end_POSTSUBSCRIPT :
1.
π 𝒫 γ = π 𝜋 subscript 𝒫 𝛾 𝜋 \pi\mathcal{P}_{\gamma}=\pi italic_π caligraphic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT = italic_π for all γ ∈ 𝒴 𝛾 𝒴 \gamma\in\mathcal{Y} italic_γ ∈ caligraphic_Y .
2.
A simultaneously subgeometric drift condition ( 8 ) holds with a Borel function V : 𝒳 → [ 0 , ∞ ) : 𝑉 → 𝒳 0 V:\mathcal{X}\to[0,\infty) italic_V : caligraphic_X → [ 0 , ∞ ) .
3.
A simultaneous locally contracting condition ( 9 ) holds on the set C = { x , y ∈ 𝒳 × 𝒳 : V ( x ) + V ( y ) ≤ 2 K / ( 1 − δ ) } 𝐶 conditional-set 𝑥 𝑦
𝒳 𝒳 𝑉 𝑥 𝑉 𝑦 2 𝐾 1 𝛿 C=\{x,y\in\mathcal{X}\times\mathcal{X}:V(x)+V(y)\leq 2K/(1-\delta)\} italic_C = { italic_x , italic_y ∈ caligraphic_X × caligraphic_X : italic_V ( italic_x ) + italic_V ( italic_y ) ≤ 2 italic_K / ( 1 - italic_δ ) } for some δ ∈ ( 0 , 1 ) 𝛿 0 1 \delta\in(0,1) italic_δ ∈ ( 0 , 1 ) .
Then for all ϵ ∈ ( 0 , 1 ) italic-ϵ 0 1 \epsilon\in(0,1) italic_ϵ ∈ ( 0 , 1 ) and all t ∈ ℤ + 𝑡 subscript ℤ t\in\mathbb{Z}_{+} italic_t ∈ blackboard_Z start_POSTSUBSCRIPT + end_POSTSUBSCRIPT ,
∥ 𝒜 𝒬 ( T ϵ , t + t ) ( ( γ 0 , x 0 ) , ⋅ ) − π ∥ TV ≤ δ + [ r ( 1 ) + 1 ] [ V ( x 0 ) + ∫ V d π + K T ϵ , t + C δ H 1 , φ − 1 ( t − log ( H 1 , φ − 1 ( t ) ) / log ( 1 − α ) + 1 ) + ϵ \left\lVert\mathcal{A}_{\mathcal{Q}}^{(T_{\epsilon,t}+t)}((\gamma_{0},x_{0}),%
\cdot)-\pi\right\rVert_{\text{TV}}\leq\frac{\delta+\left[r(1)+1\right][V(x_{0}%
)+\int Vd\pi+KT_{\epsilon,t}+C}{\delta H_{1,\varphi}^{-1}\left(\frac{t}{-\log(%
H_{1,\varphi}^{-1}(t))/\log(1-\alpha)+1}\right)}+\epsilon ∥ caligraphic_A start_POSTSUBSCRIPT caligraphic_Q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_T start_POSTSUBSCRIPT italic_ϵ , italic_t end_POSTSUBSCRIPT + italic_t ) end_POSTSUPERSCRIPT ( ( italic_γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) , ⋅ ) - italic_π ∥ start_POSTSUBSCRIPT TV end_POSTSUBSCRIPT ≤ divide start_ARG italic_δ + [ italic_r ( 1 ) + 1 ] [ italic_V ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) + ∫ italic_V italic_d italic_π + italic_K italic_T start_POSTSUBSCRIPT italic_ϵ , italic_t end_POSTSUBSCRIPT + italic_C end_ARG start_ARG italic_δ italic_H start_POSTSUBSCRIPT 1 , italic_φ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( divide start_ARG italic_t end_ARG start_ARG - roman_log ( italic_H start_POSTSUBSCRIPT 1 , italic_φ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_t ) ) / roman_log ( 1 - italic_α ) + 1 end_ARG ) end_ARG + italic_ϵ
where T ϵ , t = ( 1 / G ) − 1 ( t 2 / ϵ ) subscript 𝑇 italic-ϵ 𝑡
superscript 1 𝐺 1 superscript 𝑡 2 italic-ϵ T_{\epsilon,t}=(1/G)^{-1}(t^{2}/\epsilon) italic_T start_POSTSUBSCRIPT italic_ϵ , italic_t end_POSTSUBSCRIPT = ( 1 / italic_G ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / italic_ϵ ) and
r ( ⋅ ) = φ ( H 1 , φ − 1 ( ⋅ ) ) , R = φ − 1 ( 2 K / ( 1 − δ ) ) , formulae-sequence 𝑟 ⋅ 𝜑 superscript subscript 𝐻 1 𝜑
1 ⋅ 𝑅 superscript 𝜑 1 2 𝐾 1 𝛿 \displaystyle r(\cdot)=\varphi(H_{1,\varphi}^{-1}(\cdot)),\hskip 10.00002ptR=%
\varphi^{-1}(2K/(1-\delta)), italic_r ( ⋅ ) = italic_φ ( italic_H start_POSTSUBSCRIPT 1 , italic_φ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( ⋅ ) ) , italic_R = italic_φ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( 2 italic_K / ( 1 - italic_δ ) ) ,
C = [ r ( 1 ) + 1 ] { R + r ( 1 ) r ( 0 ) ( R + 4 K ) } . 𝐶 delimited-[] 𝑟 1 1 𝑅 𝑟 1 𝑟 0 𝑅 4 𝐾 \displaystyle C=\left[r(1)+1\right]\left\{R+\frac{r(1)}{r(0)}(R+4K)\right\}. italic_C = [ italic_r ( 1 ) + 1 ] { italic_R + divide start_ARG italic_r ( 1 ) end_ARG start_ARG italic_r ( 0 ) end_ARG ( italic_R + 4 italic_K ) } .
Theorem 10 requires satisfying expected diminishing adaptation (7 ) with a sufficiently fast rate.
Table 1 compares approximate upper bounds for different combinations of φ ( ⋅ ) 𝜑 ⋅ \varphi(\cdot) italic_φ ( ⋅ ) and G ( ⋅ ) 𝐺 ⋅ G(\cdot) italic_G ( ⋅ ) .
The upper and lower bounds may be also combined and in particular, Theoerem 10 can guarantee the adaptive process approximately achieves the lower bound rate if the adaptation diminishes sufficiently fast.
For example, if in addition to the assumptions of Theorem 10 , there are constants C , κ > 0 𝐶 𝜅
0 C,\kappa>0 italic_C , italic_κ > 0 such that
π ( V ≥ r ) ≥ C r − κ , 𝜋 𝑉 𝑟 𝐶 superscript 𝑟 𝜅 \displaystyle\pi(V\geq r)\geq Cr^{-\kappa}, italic_π ( italic_V ≥ italic_r ) ≥ italic_C italic_r start_POSTSUPERSCRIPT - italic_κ end_POSTSUPERSCRIPT ,
( 𝒫 γ V 2 κ ) ( x ) − V ( x ) 2 κ ≤ φ ( V ( x ) 2 κ ) subscript 𝒫 𝛾 superscript 𝑉 2 𝜅 𝑥 𝑉 superscript 𝑥 2 𝜅 𝜑 𝑉 superscript 𝑥 2 𝜅 \displaystyle(\mathcal{P}_{\gamma}V^{2\kappa})(x)-V(x)^{2\kappa}\leq\varphi(V(%
x)^{2\kappa}) ( caligraphic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT italic_V start_POSTSUPERSCRIPT 2 italic_κ end_POSTSUPERSCRIPT ) ( italic_x ) - italic_V ( italic_x ) start_POSTSUPERSCRIPT 2 italic_κ end_POSTSUPERSCRIPT ≤ italic_φ ( italic_V ( italic_x ) start_POSTSUPERSCRIPT 2 italic_κ end_POSTSUPERSCRIPT )
holds for every x , γ ∈ 𝒳 × 𝒴 𝑥 𝛾
𝒳 𝒴 x,\gamma\in\mathcal{X}\times\mathcal{Y} italic_x , italic_γ ∈ caligraphic_X × caligraphic_Y . Then Theorem 1 and Theorem 10 imply some constants M ∗ , α > 0 superscript 𝑀 𝛼
0 M^{*},\alpha>0 italic_M start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , italic_α > 0 such that
C 2 4 H V ( x 0 ) 2 κ , φ − 1 ( t ) ≤ ∥ 𝒜 𝒬 t ( ( γ 0 , x 0 ) , ⋅ ) − π ∥ TV ≤ M ∗ T ϵ , t H 1 , φ − 1 ( t − log ( H 1 , φ − 1 ( t ) ) / log ( 1 − α ) + 1 ) + ϵ superscript 𝐶 2 4 superscript subscript 𝐻 𝑉 superscript subscript 𝑥 0 2 𝜅 𝜑
1 𝑡 subscript delimited-∥∥ superscript subscript 𝒜 𝒬 𝑡 subscript 𝛾 0 subscript 𝑥 0 ⋅ 𝜋 TV superscript 𝑀 subscript 𝑇 italic-ϵ 𝑡
superscript subscript 𝐻 1 𝜑
1 𝑡 superscript subscript 𝐻 1 𝜑
1 𝑡 1 𝛼 1 italic-ϵ \frac{C^{2}}{4H_{V(x_{0})^{2\kappa},\varphi}^{-1}(t)}\leq\left\lVert\mathcal{A%
}_{\mathcal{Q}}^{t}((\gamma_{0},x_{0}),\cdot)-\pi\right\rVert_{\text{TV}}\leq%
\frac{M^{*}T_{\epsilon,t}}{H_{1,\varphi}^{-1}\left(\frac{t}{-\log(H_{1,\varphi%
}^{-1}(t))/\log(1-\alpha)+1}\right)}+\epsilon divide start_ARG italic_C start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 4 italic_H start_POSTSUBSCRIPT italic_V ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 italic_κ end_POSTSUPERSCRIPT , italic_φ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_t ) end_ARG ≤ ∥ caligraphic_A start_POSTSUBSCRIPT caligraphic_Q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( ( italic_γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) , ⋅ ) - italic_π ∥ start_POSTSUBSCRIPT TV end_POSTSUBSCRIPT ≤ divide start_ARG italic_M start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT italic_T start_POSTSUBSCRIPT italic_ϵ , italic_t end_POSTSUBSCRIPT end_ARG start_ARG italic_H start_POSTSUBSCRIPT 1 , italic_φ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( divide start_ARG italic_t end_ARG start_ARG - roman_log ( italic_H start_POSTSUBSCRIPT 1 , italic_φ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_t ) ) / roman_log ( 1 - italic_α ) + 1 end_ARG ) end_ARG + italic_ϵ
holds for all t 𝑡 t italic_t and ϵ italic-ϵ \epsilon italic_ϵ . Similarly, Theorem 4 can be used to give a weak lower bound.
As an example, consider a target measure on ℝ d superscript ℝ 𝑑 \mathbb{R}^{d} blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT with potential U : ℝ d → ℝ : 𝑈 → superscript ℝ 𝑑 ℝ U:\mathbb{R}^{d}\to\mathbb{R} italic_U : blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT → blackboard_R defined by π ( d x ) ∝ exp ( − U ( x ) ) d x proportional-to 𝜋 𝑑 𝑥 𝑈 𝑥 𝑑 𝑥 \pi(dx)\propto\exp(-U(x))dx italic_π ( italic_d italic_x ) ∝ roman_exp ( - italic_U ( italic_x ) ) italic_d italic_x and Lyapunov function defined by exp ( κ U ( x ) ) 𝜅 𝑈 𝑥 \exp(\kappa U(x)) roman_exp ( italic_κ italic_U ( italic_x ) ) for α > 0 𝛼 0 \alpha>0 italic_α > 0 .
Then with α < 1 𝛼 1 \alpha<1 italic_α < 1 , this can be used to obtain an upper bound and with α > 1 𝛼 1 \alpha>1 italic_α > 1 , this can be used to obtain a lower bound.
Table 1: Upper bound convergence rate comparisons from Theorem 10 for different combinations of φ ( ⋅ ) 𝜑 ⋅ \varphi(\cdot) italic_φ ( ⋅ ) and G ( ⋅ ) 𝐺 ⋅ G(\cdot) italic_G ( ⋅ ) . The table entries specify a convergence rate upper bound up to an explicit constant.
Proof of Theorem 10 .
We first specify a finite adaptation plan 𝒬 T superscript 𝒬 𝑇 \mathcal{Q}^{T} caligraphic_Q start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT with a time T ∈ ℤ + 𝑇 subscript ℤ T\in\mathbb{Z}_{+} italic_T ∈ blackboard_Z start_POSTSUBSCRIPT + end_POSTSUBSCRIPT defining a stopping point of adaptation.
This defines an adaptive process where for all t ≥ T 𝑡 𝑇 t\geq T italic_t ≥ italic_T , Γ t = Γ T subscript Γ 𝑡 subscript Γ 𝑇 \Gamma_{t}=\Gamma_{T} roman_Γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = roman_Γ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT and ( Γ t , X t ) | ( Γ s , X s ) s ≤ t − 1 ∼ δ Γ T ( ⋅ ) 𝒫 Γ T ( X t − 1 , ⋅ ) similar-to conditional subscript Γ 𝑡 subscript 𝑋 𝑡 subscript subscript Γ 𝑠 subscript 𝑋 𝑠 𝑠 𝑡 1 subscript 𝛿 subscript Γ 𝑇 ⋅ subscript 𝒫 subscript Γ 𝑇 subscript 𝑋 𝑡 1 ⋅ (\Gamma_{t},X_{t})|(\Gamma_{s},X_{s})_{s\leq t-1}\sim\delta_{\Gamma_{T}}(\cdot%
)\mathcal{P}_{\Gamma_{T}}(X_{t-1},\cdot) ( roman_Γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) | ( roman_Γ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_s ≤ italic_t - 1 end_POSTSUBSCRIPT ∼ italic_δ start_POSTSUBSCRIPT roman_Γ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( ⋅ ) caligraphic_P start_POSTSUBSCRIPT roman_Γ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT , ⋅ ) where δ Γ T subscript 𝛿 subscript Γ 𝑇 \delta_{\Gamma_{T}} italic_δ start_POSTSUBSCRIPT roman_Γ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_POSTSUBSCRIPT is the Dirac measure at Γ T subscript Γ 𝑇 \Gamma_{T} roman_Γ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT .
Using the finite adaptation process, we have the upper bound via the triangle inequality
∥ 𝒜 𝒬 ( T + t ) ( ( γ 0 , x 0 ) , ⋅ ) − π ∥ TV subscript delimited-∥∥ superscript subscript 𝒜 𝒬 𝑇 𝑡 subscript 𝛾 0 subscript 𝑥 0 ⋅ 𝜋 TV \displaystyle\left\lVert\mathcal{A}_{\mathcal{Q}}^{(T+t)}((\gamma_{0},x_{0}),%
\cdot)-\pi\right\rVert_{\text{TV}} ∥ caligraphic_A start_POSTSUBSCRIPT caligraphic_Q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_T + italic_t ) end_POSTSUPERSCRIPT ( ( italic_γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) , ⋅ ) - italic_π ∥ start_POSTSUBSCRIPT TV end_POSTSUBSCRIPT
≤ ∥ 𝒜 𝒬 ( T + t ) ( ( γ 0 , x 0 ) , ⋅ ) − 𝒜 𝒬 T ( T + t ) ( ( γ 0 , x 0 ) , ⋅ ) ∥ TV + ∥ 𝒜 𝒬 T ( T + t ) ( ( γ 0 , x 0 ) , ⋅ ) − π ∥ TV . absent subscript delimited-∥∥ superscript subscript 𝒜 𝒬 𝑇 𝑡 subscript 𝛾 0 subscript 𝑥 0 ⋅ superscript subscript 𝒜 superscript 𝒬 𝑇 𝑇 𝑡 subscript 𝛾 0 subscript 𝑥 0 ⋅ TV subscript delimited-∥∥ superscript subscript 𝒜 superscript 𝒬 𝑇 𝑇 𝑡 subscript 𝛾 0 subscript 𝑥 0 ⋅ 𝜋 TV \displaystyle\leq\left\lVert\mathcal{A}_{\mathcal{Q}}^{(T+t)}((\gamma_{0},x_{0%
}),\cdot)-\mathcal{A}_{\mathcal{Q}^{T}}^{(T+t)}((\gamma_{0},x_{0}),\cdot)%
\right\rVert_{\text{TV}}+\left\lVert\mathcal{A}_{\mathcal{Q}^{T}}^{(T+t)}((%
\gamma_{0},x_{0}),\cdot)-\pi\right\rVert_{\text{TV}}. ≤ ∥ caligraphic_A start_POSTSUBSCRIPT caligraphic_Q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_T + italic_t ) end_POSTSUPERSCRIPT ( ( italic_γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) , ⋅ ) - caligraphic_A start_POSTSUBSCRIPT caligraphic_Q start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_T + italic_t ) end_POSTSUPERSCRIPT ( ( italic_γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) , ⋅ ) ∥ start_POSTSUBSCRIPT TV end_POSTSUBSCRIPT + ∥ caligraphic_A start_POSTSUBSCRIPT caligraphic_Q start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_T + italic_t ) end_POSTSUPERSCRIPT ( ( italic_γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) , ⋅ ) - italic_π ∥ start_POSTSUBSCRIPT TV end_POSTSUBSCRIPT .
(10)
We will bound each term on the right hand side of (10 ) separately.
For the first term in (10 ), fix ϵ ∈ ( 0 , 1 ) italic-ϵ 0 1 \epsilon\in(0,1) italic_ϵ ∈ ( 0 , 1 ) and choose T ϵ , t = ( 1 / G ) − 1 ( t 2 / ϵ ) subscript 𝑇 italic-ϵ 𝑡
superscript 1 𝐺 1 superscript 𝑡 2 italic-ϵ T_{\epsilon,t}=(1/G)^{-1}(t^{2}/\epsilon) italic_T start_POSTSUBSCRIPT italic_ϵ , italic_t end_POSTSUBSCRIPT = ( 1 / italic_G ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / italic_ϵ ) .
Using the triangle inequality, we have that
sup x ∈ 𝒳 𝔼 [ ∥ 𝒫 Γ T + t ( x , ⋅ ) − 𝒫 Γ T ( x , ⋅ ) ∥ TV ∣ X T + t − 1 = x ] subscript supremum 𝑥 𝒳 𝔼 delimited-[] conditional subscript delimited-∥∥ subscript 𝒫 subscript Γ 𝑇 𝑡 𝑥 ⋅ subscript 𝒫 subscript Γ 𝑇 𝑥 ⋅ TV subscript 𝑋 𝑇 𝑡 1 𝑥 \displaystyle\sup_{x\in\mathcal{X}}\mathbb{E}\left[\left\lVert\mathcal{P}_{%
\Gamma_{T+t}}(x,\cdot)-\mathcal{P}_{\Gamma_{T}}(x,\cdot)\right\rVert_{\text{TV%
}}\mid X_{T+t-1}=x\right] roman_sup start_POSTSUBSCRIPT italic_x ∈ caligraphic_X end_POSTSUBSCRIPT blackboard_E [ ∥ caligraphic_P start_POSTSUBSCRIPT roman_Γ start_POSTSUBSCRIPT italic_T + italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x , ⋅ ) - caligraphic_P start_POSTSUBSCRIPT roman_Γ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x , ⋅ ) ∥ start_POSTSUBSCRIPT TV end_POSTSUBSCRIPT ∣ italic_X start_POSTSUBSCRIPT italic_T + italic_t - 1 end_POSTSUBSCRIPT = italic_x ]
≤ ∑ s = 1 t sup x ∈ 𝒳 𝔼 [ ∥ 𝒫 Γ T + s ( x , ⋅ ) − 𝒫 Γ T + s − 1 ( x , ⋅ ) ∥ TV ∣ X T + s − 1 = x ] absent superscript subscript 𝑠 1 𝑡 subscript supremum 𝑥 𝒳 𝔼 delimited-[] conditional subscript delimited-∥∥ subscript 𝒫 subscript Γ 𝑇 𝑠 𝑥 ⋅ subscript 𝒫 subscript Γ 𝑇 𝑠 1 𝑥 ⋅ TV subscript 𝑋 𝑇 𝑠 1 𝑥 \displaystyle\leq\sum_{s=1}^{t}\sup_{x\in\mathcal{X}}\mathbb{E}\left[\left%
\lVert\mathcal{P}_{\Gamma_{T+s}}(x,\cdot)-\mathcal{P}_{\Gamma_{T+s-1}}(x,\cdot%
)\right\rVert_{\text{TV}}\mid X_{T+s-1}=x\right] ≤ ∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT roman_sup start_POSTSUBSCRIPT italic_x ∈ caligraphic_X end_POSTSUBSCRIPT blackboard_E [ ∥ caligraphic_P start_POSTSUBSCRIPT roman_Γ start_POSTSUBSCRIPT italic_T + italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x , ⋅ ) - caligraphic_P start_POSTSUBSCRIPT roman_Γ start_POSTSUBSCRIPT italic_T + italic_s - 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x , ⋅ ) ∥ start_POSTSUBSCRIPT TV end_POSTSUBSCRIPT ∣ italic_X start_POSTSUBSCRIPT italic_T + italic_s - 1 end_POSTSUBSCRIPT = italic_x ]
≤ t G ( T ) absent 𝑡 𝐺 𝑇 \displaystyle\leq tG(T) ≤ italic_t italic_G ( italic_T )
≤ ϵ / t . absent italic-ϵ 𝑡 \displaystyle\leq\epsilon/t. ≤ italic_ϵ / italic_t .
Since 𝒳 𝒳 \mathcal{X} caligraphic_X is Polish, Proposition 17 ensures the total variation is Borel measurable.
Let ( Γ t , X t ) t ≥ 0 subscript subscript Γ 𝑡 subscript 𝑋 𝑡 𝑡 0 (\Gamma_{t},X_{t})_{t\geq 0} ( roman_Γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_t ≥ 0 end_POSTSUBSCRIPT be an adaptive process initialized at x 0 , γ 0 subscript 𝑥 0 subscript 𝛾 0
x_{0},\gamma_{0} italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and ( Γ t ′ , X t ′ ) t ≥ 0 subscript subscript superscript Γ ′ 𝑡 subscript superscript 𝑋 ′ 𝑡 𝑡 0 (\Gamma^{\prime}_{t},X^{\prime}_{t})_{t\geq 0} ( roman_Γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_t ≥ 0 end_POSTSUBSCRIPT be the finite adaptation process initialized similarly.
Since both of these processes are initialized at the same point, we can construct a coupling where X s = X s ′ subscript 𝑋 𝑠 superscript subscript 𝑋 𝑠 ′ X_{s}=X_{s}^{\prime} italic_X start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT = italic_X start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT for s ≤ T 𝑠 𝑇 s\leq T italic_s ≤ italic_T and
ℙ ( X t + T = Y t + T ) ℙ subscript 𝑋 𝑡 𝑇 subscript 𝑌 𝑡 𝑇 \displaystyle\mathbb{P}\left(X_{t+T}=Y_{t+T}\right) blackboard_P ( italic_X start_POSTSUBSCRIPT italic_t + italic_T end_POSTSUBSCRIPT = italic_Y start_POSTSUBSCRIPT italic_t + italic_T end_POSTSUBSCRIPT )
= ℙ ( X t + T = Y t + T | X t + T − 1 = Y t + T − 1 ) ℙ ( X t + T − 1 = Y t + T − 1 ) absent ℙ subscript 𝑋 𝑡 𝑇 conditional subscript 𝑌 𝑡 𝑇 subscript 𝑋 𝑡 𝑇 1 subscript 𝑌 𝑡 𝑇 1 ℙ subscript 𝑋 𝑡 𝑇 1 subscript 𝑌 𝑡 𝑇 1 \displaystyle=\mathbb{P}\left(X_{t+T}=Y_{t+T}|X_{t+T-1}=Y_{t+T-1}\right)%
\mathbb{P}\left(X_{t+T-1}=Y_{t+T-1}\right) = blackboard_P ( italic_X start_POSTSUBSCRIPT italic_t + italic_T end_POSTSUBSCRIPT = italic_Y start_POSTSUBSCRIPT italic_t + italic_T end_POSTSUBSCRIPT | italic_X start_POSTSUBSCRIPT italic_t + italic_T - 1 end_POSTSUBSCRIPT = italic_Y start_POSTSUBSCRIPT italic_t + italic_T - 1 end_POSTSUBSCRIPT ) blackboard_P ( italic_X start_POSTSUBSCRIPT italic_t + italic_T - 1 end_POSTSUBSCRIPT = italic_Y start_POSTSUBSCRIPT italic_t + italic_T - 1 end_POSTSUBSCRIPT )
≥ ( 1 − ϵ / t ) ℙ ( X t + T − 1 = Y t + T − 1 ) absent 1 italic-ϵ 𝑡 ℙ subscript 𝑋 𝑡 𝑇 1 subscript 𝑌 𝑡 𝑇 1 \displaystyle\geq(1-\epsilon/t)\mathbb{P}\left(X_{t+T-1}=Y_{t+T-1}\right) ≥ ( 1 - italic_ϵ / italic_t ) blackboard_P ( italic_X start_POSTSUBSCRIPT italic_t + italic_T - 1 end_POSTSUBSCRIPT = italic_Y start_POSTSUBSCRIPT italic_t + italic_T - 1 end_POSTSUBSCRIPT )
≥ ( 1 − ϵ / t ) t absent superscript 1 italic-ϵ 𝑡 𝑡 \displaystyle\geq(1-\epsilon/t)^{t} ≥ ( 1 - italic_ϵ / italic_t ) start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT
≥ 1 − ϵ . absent 1 italic-ϵ \displaystyle\geq 1-\epsilon. ≥ 1 - italic_ϵ .
Since 𝒳 𝒳 \mathcal{X} caligraphic_X is Polish, then it follows immediately that the optimal coupling is controlled by this coupling we have constructed so that
∥ 𝒜 𝒬 ( T ϵ , t + t ) ( ( γ 0 , x 0 ) , ⋅ ) − 𝒜 𝒬 T ϵ , t ( T ϵ , t + t ) ( ( γ 0 , x 0 ) , ⋅ ) ∥ TV ≤ ℙ ( X t + T ≠ Y t + T ) ≤ ϵ . subscript delimited-∥∥ superscript subscript 𝒜 𝒬 subscript 𝑇 italic-ϵ 𝑡
𝑡 subscript 𝛾 0 subscript 𝑥 0 ⋅ superscript subscript 𝒜 superscript 𝒬 subscript 𝑇 italic-ϵ 𝑡
subscript 𝑇 italic-ϵ 𝑡
𝑡 subscript 𝛾 0 subscript 𝑥 0 ⋅ TV ℙ subscript 𝑋 𝑡 𝑇 subscript 𝑌 𝑡 𝑇 italic-ϵ \left\lVert\mathcal{A}_{\mathcal{Q}}^{(T_{\epsilon,t}+t)}((\gamma_{0},x_{0}),%
\cdot)-\mathcal{A}_{\mathcal{Q}^{T_{\epsilon,t}}}^{(T_{\epsilon,t}+t)}((\gamma%
_{0},x_{0}),\cdot)\right\rVert_{\text{TV}}\leq\mathbb{P}\left(X_{t+T}\not=Y_{t%
+T}\right)\leq\epsilon. ∥ caligraphic_A start_POSTSUBSCRIPT caligraphic_Q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_T start_POSTSUBSCRIPT italic_ϵ , italic_t end_POSTSUBSCRIPT + italic_t ) end_POSTSUPERSCRIPT ( ( italic_γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) , ⋅ ) - caligraphic_A start_POSTSUBSCRIPT caligraphic_Q start_POSTSUPERSCRIPT italic_T start_POSTSUBSCRIPT italic_ϵ , italic_t end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_T start_POSTSUBSCRIPT italic_ϵ , italic_t end_POSTSUBSCRIPT + italic_t ) end_POSTSUPERSCRIPT ( ( italic_γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) , ⋅ ) ∥ start_POSTSUBSCRIPT TV end_POSTSUBSCRIPT ≤ blackboard_P ( italic_X start_POSTSUBSCRIPT italic_t + italic_T end_POSTSUBSCRIPT ≠ italic_Y start_POSTSUBSCRIPT italic_t + italic_T end_POSTSUBSCRIPT ) ≤ italic_ϵ .
To bound the second term in (10 ), the following is adapted from previous arguments for subgeometric upper bounds for non-adapted Markov chains [Durmus et al., 2016 ] , but modified for adaptive MCMC, and the constants are improved and explicit.
Since 𝒳 𝒳 \mathcal{X} caligraphic_X is Polish, there is a Borel measurable conditional total variation distance by [Villani, 2009 , Theorem 4.8] so that
∥ 𝒜 𝒬 T ( T + t ) ( ( γ 0 , x 0 ) , ⋅ ) − π ∥ TV subscript delimited-∥∥ superscript subscript 𝒜 superscript 𝒬 𝑇 𝑇 𝑡 subscript 𝛾 0 subscript 𝑥 0 ⋅ 𝜋 TV \displaystyle\left\lVert\mathcal{A}_{\mathcal{Q}^{T}}^{(T+t)}((\gamma_{0},x_{0%
}),\cdot)-\pi\right\rVert_{\text{TV}} ∥ caligraphic_A start_POSTSUBSCRIPT caligraphic_Q start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_T + italic_t ) end_POSTSUPERSCRIPT ( ( italic_γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) , ⋅ ) - italic_π ∥ start_POSTSUBSCRIPT TV end_POSTSUBSCRIPT
≤ 𝔼 [ ∥ 𝒫 Γ T t ( X T , ⋅ ) − π ∥ TV ] . absent 𝔼 delimited-[] subscript delimited-∥∥ superscript subscript 𝒫 subscript Γ 𝑇 𝑡 subscript 𝑋 𝑇 ⋅ 𝜋 TV \displaystyle\leq\mathbb{E}\left[\left\lVert\mathcal{P}_{\Gamma_{T}}^{t}(X_{T}%
,\cdot)-\pi\right\rVert_{\text{TV}}\right]. ≤ blackboard_E [ ∥ caligraphic_P start_POSTSUBSCRIPT roman_Γ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( italic_X start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT , ⋅ ) - italic_π ∥ start_POSTSUBSCRIPT TV end_POSTSUBSCRIPT ] .
Let τ C = inf { n ≥ 1 : X n , Y n ∈ C } subscript 𝜏 𝐶 infimum conditional-set 𝑛 1 subscript 𝑋 𝑛 subscript 𝑌 𝑛
𝐶 \tau_{C}=\inf\{n\geq 1:X_{n},Y_{n}\in C\} italic_τ start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT = roman_inf { italic_n ≥ 1 : italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∈ italic_C } be the first hit time to the set C 𝐶 C italic_C .
For n ∈ ℤ + 𝑛 subscript ℤ n\in\mathbb{Z}_{+} italic_n ∈ blackboard_Z start_POSTSUBSCRIPT + end_POSTSUBSCRIPT , let θ n subscript 𝜃 𝑛 \theta_{n} italic_θ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT denote the shift operator applied n 𝑛 n italic_n times so that θ n ( X i ) = X i + n subscript 𝜃 𝑛 subscript 𝑋 𝑖 subscript 𝑋 𝑖 𝑛 \theta_{n}(X_{i})=X_{i+n} italic_θ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = italic_X start_POSTSUBSCRIPT italic_i + italic_n end_POSTSUBSCRIPT for all i ∈ ℤ + 𝑖 subscript ℤ i\in\mathbb{Z}_{+} italic_i ∈ blackboard_Z start_POSTSUBSCRIPT + end_POSTSUBSCRIPT .
Define the successive hit times to C 𝐶 C italic_C recursively by
τ 1 = τ C , subscript 𝜏 1 subscript 𝜏 𝐶 \displaystyle\tau_{1}=\tau_{C}, italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_τ start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT ,
τ n + 1 = τ n + τ C ∘ θ τ n = ∑ k = 1 n + 1 τ k subscript 𝜏 𝑛 1 subscript 𝜏 𝑛 subscript 𝜏 𝐶 subscript 𝜃 subscript 𝜏 𝑛 superscript subscript 𝑘 1 𝑛 1 subscript 𝜏 𝑘 \displaystyle\tau_{n+1}=\tau_{n}+\tau_{C}\circ\theta_{\tau_{n}}=\sum_{k=1}^{n+%
1}\tau_{k} italic_τ start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT = italic_τ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT + italic_τ start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT ∘ italic_θ start_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n + 1 end_POSTSUPERSCRIPT italic_τ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT
for each n ∈ ℤ + 𝑛 subscript ℤ n\in\mathbb{Z}_{+} italic_n ∈ blackboard_Z start_POSTSUBSCRIPT + end_POSTSUBSCRIPT .
The inverse function theorem implies the derivative
r ( s ) = d d s H 1 , φ − 1 ( s ) = φ ( H 1 , φ − 1 ( s ) ) 𝑟 𝑠 𝑑 𝑑 𝑠 subscript superscript 𝐻 1 1 𝜑
𝑠 𝜑 subscript superscript 𝐻 1 1 𝜑
𝑠 r(s)=\frac{d}{ds}H^{-1}_{1,\varphi}(s)=\varphi(H^{-1}_{1,\varphi}(s)) italic_r ( italic_s ) = divide start_ARG italic_d end_ARG start_ARG italic_d italic_s end_ARG italic_H start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 , italic_φ end_POSTSUBSCRIPT ( italic_s ) = italic_φ ( italic_H start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 , italic_φ end_POSTSUBSCRIPT ( italic_s ) )
for s ≥ 0 𝑠 0 s\geq 0 italic_s ≥ 0 .
Thus, H 1 , φ − 1 superscript subscript 𝐻 1 𝜑
1 H_{1,\varphi}^{-1} italic_H start_POSTSUBSCRIPT 1 , italic_φ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT is convex since its derivative is monotone increasing by Lemma 18 .
By Markov’s inequality and Jensen’s inequality,
ℙ ( τ m ≥ t ) ℙ subscript 𝜏 𝑚 𝑡 \displaystyle\mathbb{P}(\tau_{m}\geq t) blackboard_P ( italic_τ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ≥ italic_t )
≤ 𝔼 [ H 1 , φ − 1 ( 1 m ∑ k = 1 m τ k ) ] H 1 , φ − 1 ( t / m ) absent 𝔼 delimited-[] superscript subscript 𝐻 1 𝜑
1 1 𝑚 superscript subscript 𝑘 1 𝑚 subscript 𝜏 𝑘 superscript subscript 𝐻 1 𝜑
1 𝑡 𝑚 \displaystyle\leq\frac{\mathbb{E}\left[H_{1,\varphi}^{-1}\left(\frac{1}{m}\sum%
_{k=1}^{m}\tau_{k}\right)\right]}{H_{1,\varphi}^{-1}(t/m)} ≤ divide start_ARG blackboard_E [ italic_H start_POSTSUBSCRIPT 1 , italic_φ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( divide start_ARG 1 end_ARG start_ARG italic_m end_ARG ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT italic_τ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ] end_ARG start_ARG italic_H start_POSTSUBSCRIPT 1 , italic_φ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_t / italic_m ) end_ARG
≤ 1 m ∑ k = 1 m 𝔼 [ H 1 , φ − 1 ( τ k ) ] H 1 , φ − 1 ( t / m ) . absent 1 𝑚 superscript subscript 𝑘 1 𝑚 𝔼 delimited-[] superscript subscript 𝐻 1 𝜑
1 subscript 𝜏 𝑘 superscript subscript 𝐻 1 𝜑
1 𝑡 𝑚 \displaystyle\leq\frac{\frac{1}{m}\sum_{k=1}^{m}\mathbb{E}\left[H_{1,\varphi}^%
{-1}(\tau_{k})\right]}{H_{1,\varphi}^{-1}(t/m)}. ≤ divide start_ARG divide start_ARG 1 end_ARG start_ARG italic_m end_ARG ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT blackboard_E [ italic_H start_POSTSUBSCRIPT 1 , italic_φ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_τ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ] end_ARG start_ARG italic_H start_POSTSUBSCRIPT 1 , italic_φ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_t / italic_m ) end_ARG .
For any t , m ∈ ℤ + 𝑡 𝑚
subscript ℤ t,m\in\mathbb{Z}_{+} italic_t , italic_m ∈ blackboard_Z start_POSTSUBSCRIPT + end_POSTSUBSCRIPT with t ≥ m 𝑡 𝑚 t\geq m italic_t ≥ italic_m , the local coupling condition (9 ) implies an upper bound via a coupling argument with [Jarner and Tweedie, 2001 , Lemma 3.1] so that for all γ ∈ 𝒴 𝛾 𝒴 \gamma\in\mathcal{Y} italic_γ ∈ caligraphic_Y and x , y ∈ 𝒳 𝑥 𝑦
𝒳 x,y\in\mathcal{X} italic_x , italic_y ∈ caligraphic_X ,
∥ 𝒫 γ t ( x , ⋅ ) − 𝒫 γ t ( y , ⋅ ) ∥ TV subscript delimited-∥∥ superscript subscript 𝒫 𝛾 𝑡 𝑥 ⋅ superscript subscript 𝒫 𝛾 𝑡 𝑦 ⋅ TV \displaystyle\left\lVert\mathcal{P}_{\gamma}^{t}(x,\cdot)-\mathcal{P}_{\gamma}%
^{t}(y,\cdot)\right\rVert_{\text{TV}} ∥ caligraphic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( italic_x , ⋅ ) - caligraphic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( italic_y , ⋅ ) ∥ start_POSTSUBSCRIPT TV end_POSTSUBSCRIPT
≤ inf ξ ∈ 𝒞 ( 𝒫 γ t ( x , ⋅ ) , 𝒫 γ t ( y , ⋅ ) ) ξ ( { u , v : u ≠ v , τ m < t } ) + ℙ ( τ m ≥ t ) absent subscript infimum 𝜉 𝒞 superscript subscript 𝒫 𝛾 𝑡 𝑥 ⋅ superscript subscript 𝒫 𝛾 𝑡 𝑦 ⋅ 𝜉 conditional-set 𝑢 𝑣
formulae-sequence 𝑢 𝑣 subscript 𝜏 𝑚 𝑡 ℙ subscript 𝜏 𝑚 𝑡 \displaystyle\leq\inf_{\xi\in\mathcal{C}\left(\mathcal{P}_{\gamma}^{t}(x,\cdot%
),\mathcal{P}_{\gamma}^{t}(y,\cdot)\right)}\xi(\{u,v:u\not=v,\tau_{m}<t\})+%
\mathbb{P}(\tau_{m}\geq t) ≤ roman_inf start_POSTSUBSCRIPT italic_ξ ∈ caligraphic_C ( caligraphic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( italic_x , ⋅ ) , caligraphic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( italic_y , ⋅ ) ) end_POSTSUBSCRIPT italic_ξ ( { italic_u , italic_v : italic_u ≠ italic_v , italic_τ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT < italic_t } ) + blackboard_P ( italic_τ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ≥ italic_t )
≤ ( 1 − α ) m + ℙ ( ∑ k = 1 m τ k > t ) absent superscript 1 𝛼 𝑚 ℙ superscript subscript 𝑘 1 𝑚 subscript 𝜏 𝑘 𝑡 \displaystyle\leq(1-\alpha)^{m}+\mathbb{P}\left(\sum_{k=1}^{m}\tau_{k}>t\right) ≤ ( 1 - italic_α ) start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT + blackboard_P ( ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT italic_τ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT > italic_t )
≤ ( 1 − α ) m + 1 m ∑ k = 1 m 𝔼 [ H 1 , φ − 1 ( τ k ) ] H 1 , φ − 1 ( t / m ) . absent superscript 1 𝛼 𝑚 1 𝑚 superscript subscript 𝑘 1 𝑚 𝔼 delimited-[] superscript subscript 𝐻 1 𝜑
1 subscript 𝜏 𝑘 superscript subscript 𝐻 1 𝜑
1 𝑡 𝑚 \displaystyle\leq(1-\alpha)^{m}+\frac{\frac{1}{m}\sum_{k=1}^{m}\mathbb{E}\left%
[H_{1,\varphi}^{-1}(\tau_{k})\right]}{H_{1,\varphi}^{-1}(t/m)}. ≤ ( 1 - italic_α ) start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT + divide start_ARG divide start_ARG 1 end_ARG start_ARG italic_m end_ARG ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT blackboard_E [ italic_H start_POSTSUBSCRIPT 1 , italic_φ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_τ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ] end_ARG start_ARG italic_H start_POSTSUBSCRIPT 1 , italic_φ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_t / italic_m ) end_ARG .
Since φ 𝜑 \varphi italic_φ is concave, it is subadditive so φ ( V ( x ) + V ( y ) ) ≤ φ ( V ( x ) ) + φ ( V ( y ) ) 𝜑 𝑉 𝑥 𝑉 𝑦 𝜑 𝑉 𝑥 𝜑 𝑉 𝑦 \varphi(V(x)+V(y))\leq\varphi(V(x))+\varphi(V(y)) italic_φ ( italic_V ( italic_x ) + italic_V ( italic_y ) ) ≤ italic_φ ( italic_V ( italic_x ) ) + italic_φ ( italic_V ( italic_y ) ) .
Since φ 𝜑 \varphi italic_φ is strictly increasing, by the drift condition,
( 𝒫 γ V ) ( x ) + ( 𝒫 γ V ) ( y ) − [ V ( x ) + V ( y ) ] subscript 𝒫 𝛾 𝑉 𝑥 subscript 𝒫 𝛾 𝑉 𝑦 delimited-[] 𝑉 𝑥 𝑉 𝑦 \displaystyle(\mathcal{P}_{\gamma}V)(x)+(\mathcal{P}_{\gamma}V)(y)-[V(x)+V(y)] ( caligraphic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT italic_V ) ( italic_x ) + ( caligraphic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT italic_V ) ( italic_y ) - [ italic_V ( italic_x ) + italic_V ( italic_y ) ]
≤ − [ φ ( V ( x ) ) + φ ( V ( y ) ) ] + 2 K absent delimited-[] 𝜑 𝑉 𝑥 𝜑 𝑉 𝑦 2 𝐾 \displaystyle\leq-[\varphi(V(x))+\varphi(V(y))]+2K ≤ - [ italic_φ ( italic_V ( italic_x ) ) + italic_φ ( italic_V ( italic_y ) ) ] + 2 italic_K
≤ − [ φ ( V ( x ) + V ( y ) ) ] + 2 K absent delimited-[] 𝜑 𝑉 𝑥 𝑉 𝑦 2 𝐾 \displaystyle\leq-[\varphi(V(x)+V(y))]+2K ≤ - [ italic_φ ( italic_V ( italic_x ) + italic_V ( italic_y ) ) ] + 2 italic_K
holds for all x , y ∈ 𝒳 𝑥 𝑦
𝒳 x,y\in\mathcal{X} italic_x , italic_y ∈ caligraphic_X .
Using Lemma 19 ,
( 𝒫 γ V ) ( x ) + ( 𝒫 γ V ) ( y ) − [ V ( x ) + V ( y ) ] ≤ − δ [ φ ( V ( x ) + V ( y ) ) ] + ( R + 2 K ) I C ( x , y ) . subscript 𝒫 𝛾 𝑉 𝑥 subscript 𝒫 𝛾 𝑉 𝑦 delimited-[] 𝑉 𝑥 𝑉 𝑦 𝛿 delimited-[] 𝜑 𝑉 𝑥 𝑉 𝑦 𝑅 2 𝐾 subscript 𝐼 𝐶 𝑥 𝑦 \displaystyle(\mathcal{P}_{\gamma}V)(x)+(\mathcal{P}_{\gamma}V)(y)-[V(x)+V(y)]%
\leq-\delta[\varphi(V(x)+V(y))]+(R+2K)I_{C}(x,y). ( caligraphic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT italic_V ) ( italic_x ) + ( caligraphic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT italic_V ) ( italic_y ) - [ italic_V ( italic_x ) + italic_V ( italic_y ) ] ≤ - italic_δ [ italic_φ ( italic_V ( italic_x ) + italic_V ( italic_y ) ) ] + ( italic_R + 2 italic_K ) italic_I start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT ( italic_x , italic_y ) .
By [Douc et al., 2004 , Proposition 2.2] ,
sup x , y ∈ C 𝔼 x , y ( ∑ i = 0 τ C − 1 r ( i ) ) ≤ φ − 1 ( 2 K / ( 1 − δ ) ) δ + ( R + 2 K ) r ( 1 ) δ r ( 0 ) . subscript supremum 𝑥 𝑦
𝐶 subscript 𝔼 𝑥 𝑦
superscript subscript 𝑖 0 subscript 𝜏 𝐶 1 𝑟 𝑖 superscript 𝜑 1 2 𝐾 1 𝛿 𝛿 𝑅 2 𝐾 𝑟 1 𝛿 𝑟 0 \sup_{x,y\in C}\mathbb{E}_{x,y}\left(\sum_{i=0}^{\tau_{C}-1}r(i)\right)\leq%
\frac{\varphi^{-1}(2K/(1-\delta))}{\delta}+\frac{(R+2K)r(1)}{\delta r(0)}. roman_sup start_POSTSUBSCRIPT italic_x , italic_y ∈ italic_C end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT italic_x , italic_y end_POSTSUBSCRIPT ( ∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_τ start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT italic_r ( italic_i ) ) ≤ divide start_ARG italic_φ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( 2 italic_K / ( 1 - italic_δ ) ) end_ARG start_ARG italic_δ end_ARG + divide start_ARG ( italic_R + 2 italic_K ) italic_r ( 1 ) end_ARG start_ARG italic_δ italic_r ( 0 ) end_ARG .
We have that r ( ⋅ ) = φ ( H 1 , φ − 1 ( ⋅ ) ) 𝑟 ⋅ 𝜑 superscript subscript 𝐻 1 𝜑
1 ⋅ r(\cdot)=\varphi(H_{1,\varphi}^{-1}(\cdot)) italic_r ( ⋅ ) = italic_φ ( italic_H start_POSTSUBSCRIPT 1 , italic_φ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( ⋅ ) ) is log-concave and so r ( s + t ) ≤ r ( s ) r ( t ) 𝑟 𝑠 𝑡 𝑟 𝑠 𝑟 𝑡 r(s+t)\leq r(s)r(t) italic_r ( italic_s + italic_t ) ≤ italic_r ( italic_s ) italic_r ( italic_t ) for all t , s ≥ 0 𝑡 𝑠
0 t,s\geq 0 italic_t , italic_s ≥ 0 [Douc et al., 2004 , see the proof of Proposition 2.1] .
We then have the upper bound for k ≥ 2 𝑘 2 k\geq 2 italic_k ≥ 2 ,
𝔼 [ H 1 , φ − 1 ( τ k ) ] 𝔼 delimited-[] superscript subscript 𝐻 1 𝜑
1 subscript 𝜏 𝑘 \displaystyle\mathbb{E}\left[H_{1,\varphi}^{-1}(\tau_{k})\right] blackboard_E [ italic_H start_POSTSUBSCRIPT 1 , italic_φ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_τ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ]
= 𝔼 [ 𝔼 X τ k − 1 ( ∫ 0 τ k r ( s ) 𝑑 s ) ] absent 𝔼 delimited-[] subscript 𝔼 subscript 𝑋 subscript 𝜏 𝑘 1 subscript superscript subscript 𝜏 𝑘 0 𝑟 𝑠 differential-d 𝑠 \displaystyle=\mathbb{E}\left[\mathbb{E}_{X_{\tau_{k-1}}}\left(\int^{\tau_{k}}%
_{0}r(s)ds\right)\right] = blackboard_E [ blackboard_E start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( ∫ start_POSTSUPERSCRIPT italic_τ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_r ( italic_s ) italic_d italic_s ) ]
≤ 𝔼 [ 𝔼 X τ k − 1 ( ∑ i = 1 τ C r ( i ) ) ] absent 𝔼 delimited-[] subscript 𝔼 subscript 𝑋 subscript 𝜏 𝑘 1 superscript subscript 𝑖 1 subscript 𝜏 𝐶 𝑟 𝑖 \displaystyle\leq\mathbb{E}\left[\mathbb{E}_{X_{\tau_{k-1}}}\left(\sum_{i=1}^{%
\tau_{C}}r(i)\right)\right] ≤ blackboard_E [ blackboard_E start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_τ start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_r ( italic_i ) ) ]
≤ 𝔼 [ 𝔼 X τ k − 1 ( r ( τ C ) ) ] − r ( 0 ) + 𝔼 [ 𝔼 X τ k − 1 ( ∑ i = 0 τ C − 1 r ( i ) ) ] absent 𝔼 delimited-[] subscript 𝔼 subscript 𝑋 subscript 𝜏 𝑘 1 𝑟 subscript 𝜏 𝐶 𝑟 0 𝔼 delimited-[] subscript 𝔼 subscript 𝑋 subscript 𝜏 𝑘 1 superscript subscript 𝑖 0 subscript 𝜏 𝐶 1 𝑟 𝑖 \displaystyle\leq\mathbb{E}\left[\mathbb{E}_{X_{\tau_{k-1}}}\left(r(\tau_{C})%
\right)\right]-r(0)+\mathbb{E}\left[\mathbb{E}_{X_{\tau_{k-1}}}\left(\sum_{i=0%
}^{\tau_{C}-1}r(i)\right)\right] ≤ blackboard_E [ blackboard_E start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_r ( italic_τ start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT ) ) ] - italic_r ( 0 ) + blackboard_E [ blackboard_E start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( ∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_τ start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT italic_r ( italic_i ) ) ]
≤ r ( 1 ) 𝔼 [ 𝔼 X τ k − 1 ( r ( τ C − 1 ) ) ] − r ( 0 ) + 𝔼 [ 𝔼 X τ k − 1 ( ∑ i = 0 τ C − 1 r ( i ) ) ] absent 𝑟 1 𝔼 delimited-[] subscript 𝔼 subscript 𝑋 subscript 𝜏 𝑘 1 𝑟 subscript 𝜏 𝐶 1 𝑟 0 𝔼 delimited-[] subscript 𝔼 subscript 𝑋 subscript 𝜏 𝑘 1 superscript subscript 𝑖 0 subscript 𝜏 𝐶 1 𝑟 𝑖 \displaystyle\leq r(1)\mathbb{E}\left[\mathbb{E}_{X_{\tau_{k-1}}}\left(r(\tau_%
{C}-1)\right)\right]-r(0)+\mathbb{E}\left[\mathbb{E}_{X_{\tau_{k-1}}}\left(%
\sum_{i=0}^{\tau_{C}-1}r(i)\right)\right] ≤ italic_r ( 1 ) blackboard_E [ blackboard_E start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_r ( italic_τ start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT - 1 ) ) ] - italic_r ( 0 ) + blackboard_E [ blackboard_E start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( ∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_τ start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT italic_r ( italic_i ) ) ]
≤ [ r ( 1 ) + 1 ] 𝔼 [ 𝔼 X τ k − 1 ( ∑ i = 0 τ C − 1 r ( i ) ) ] − r ( 0 ) absent delimited-[] 𝑟 1 1 𝔼 delimited-[] subscript 𝔼 subscript 𝑋 subscript 𝜏 𝑘 1 superscript subscript 𝑖 0 subscript 𝜏 𝐶 1 𝑟 𝑖 𝑟 0 \displaystyle\leq[r(1)+1]\mathbb{E}\left[\mathbb{E}_{X_{\tau_{k-1}}}\left(\sum%
_{i=0}^{\tau_{C}-1}r(i)\right)\right]-r(0) ≤ [ italic_r ( 1 ) + 1 ] blackboard_E [ blackboard_E start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( ∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_τ start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT italic_r ( italic_i ) ) ] - italic_r ( 0 )
≤ r ( 1 ) + 1 δ { R ( 1 + r ( 1 ) r ( 0 ) ) + 2 K r ( 1 ) r ( 0 ) } . absent 𝑟 1 1 𝛿 𝑅 1 𝑟 1 𝑟 0 2 𝐾 𝑟 1 𝑟 0 \displaystyle\leq\frac{r(1)+1}{\delta}\left\{R\left(1+\frac{r(1)}{r(0)}\right)%
+\frac{2Kr(1)}{r(0)}\right\}. ≤ divide start_ARG italic_r ( 1 ) + 1 end_ARG start_ARG italic_δ end_ARG { italic_R ( 1 + divide start_ARG italic_r ( 1 ) end_ARG start_ARG italic_r ( 0 ) end_ARG ) + divide start_ARG 2 italic_K italic_r ( 1 ) end_ARG start_ARG italic_r ( 0 ) end_ARG } .
For k = 1 𝑘 1 k=1 italic_k = 1 , similarly, we have
𝔼 [ H 1 , φ − 1 ( τ C ) ] ≤ r ( 1 ) + 1 δ { V ( x ) + V ( y ) + R r ( 1 ) r ( 0 ) + 2 K r ( 1 ) r ( 0 ) } . 𝔼 delimited-[] superscript subscript 𝐻 1 𝜑
1 subscript 𝜏 𝐶 𝑟 1 1 𝛿 𝑉 𝑥 𝑉 𝑦 𝑅 𝑟 1 𝑟 0 2 𝐾 𝑟 1 𝑟 0 \displaystyle\mathbb{E}\left[H_{1,\varphi}^{-1}(\tau_{C})\right]\leq\frac{r(1)%
+1}{\delta}\left\{V(x)+V(y)+\frac{Rr(1)}{r(0)}+\frac{2Kr(1)}{r(0)}\right\}. blackboard_E [ italic_H start_POSTSUBSCRIPT 1 , italic_φ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_τ start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT ) ] ≤ divide start_ARG italic_r ( 1 ) + 1 end_ARG start_ARG italic_δ end_ARG { italic_V ( italic_x ) + italic_V ( italic_y ) + divide start_ARG italic_R italic_r ( 1 ) end_ARG start_ARG italic_r ( 0 ) end_ARG + divide start_ARG 2 italic_K italic_r ( 1 ) end_ARG start_ARG italic_r ( 0 ) end_ARG } .
Combining these upper bounds,
𝔼 [ H 1 , φ − 1 ( 1 m ∑ k = 1 m τ k ) ] ≤ r ( 1 ) + 1 δ { V ( x ) + V ( y ) + R + r ( 1 ) r ( 0 ) ( R + 4 K ) } . 𝔼 delimited-[] superscript subscript 𝐻 1 𝜑
1 1 𝑚 superscript subscript 𝑘 1 𝑚 subscript 𝜏 𝑘 𝑟 1 1 𝛿 𝑉 𝑥 𝑉 𝑦 𝑅 𝑟 1 𝑟 0 𝑅 4 𝐾 \displaystyle\mathbb{E}\left[H_{1,\varphi}^{-1}\left(\frac{1}{m}\sum_{k=1}^{m}%
\tau_{k}\right)\right]\leq\frac{r(1)+1}{\delta}\left\{V(x)+V(y)+R+\frac{r(1)}{%
r(0)}(R+4K)\right\}. blackboard_E [ italic_H start_POSTSUBSCRIPT 1 , italic_φ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( divide start_ARG 1 end_ARG start_ARG italic_m end_ARG ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT italic_τ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ] ≤ divide start_ARG italic_r ( 1 ) + 1 end_ARG start_ARG italic_δ end_ARG { italic_V ( italic_x ) + italic_V ( italic_y ) + italic_R + divide start_ARG italic_r ( 1 ) end_ARG start_ARG italic_r ( 0 ) end_ARG ( italic_R + 4 italic_K ) } .
The simultaneous subgeometric drift condition (8 ) implies
𝔼 [ V ( X T ) ] ≤ V ( x 0 ) + K T . 𝔼 delimited-[] 𝑉 subscript 𝑋 𝑇 𝑉 subscript 𝑥 0 𝐾 𝑇 \mathbb{E}\left[V(X_{T})\right]\leq V(x_{0})+KT. blackboard_E [ italic_V ( italic_X start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) ] ≤ italic_V ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) + italic_K italic_T .
Choosing m ≡ m t = ⌈ log ( H 1 , φ − 1 ( t ) ) / log ( 1 / ( 1 − α ) ) ⌉ 𝑚 subscript 𝑚 𝑡 superscript subscript 𝐻 1 𝜑
1 𝑡 1 1 𝛼 m\equiv m_{t}=\lceil\log(H_{1,\varphi}^{-1}(t))/\log(1/(1-\alpha))\rceil italic_m ≡ italic_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = ⌈ roman_log ( italic_H start_POSTSUBSCRIPT 1 , italic_φ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_t ) ) / roman_log ( 1 / ( 1 - italic_α ) ) ⌉ , we have the upper bound
∥ 𝒜 𝒬 T ( T + t ) ( ( γ 0 , x 0 ) , ⋅ ) − π ∥ TV subscript delimited-∥∥ superscript subscript 𝒜 superscript 𝒬 𝑇 𝑇 𝑡 subscript 𝛾 0 subscript 𝑥 0 ⋅ 𝜋 TV \displaystyle\left\lVert\mathcal{A}_{\mathcal{Q}^{T}}^{(T+t)}((\gamma_{0},x_{0%
}),\cdot)-\pi\right\rVert_{\text{TV}} ∥ caligraphic_A start_POSTSUBSCRIPT caligraphic_Q start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_T + italic_t ) end_POSTSUPERSCRIPT ( ( italic_γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) , ⋅ ) - italic_π ∥ start_POSTSUBSCRIPT TV end_POSTSUBSCRIPT
≤ ( 1 − α ) m t + [ r ( 1 ) + 1 ] { V ( x 0 ) + K T + ∫ V 𝑑 π + R + r ( 1 ) r ( 0 ) ( R + 4 K ) } δ H 1 , φ − 1 ( t / m t ) absent superscript 1 𝛼 subscript 𝑚 𝑡 delimited-[] 𝑟 1 1 𝑉 subscript 𝑥 0 𝐾 𝑇 𝑉 differential-d 𝜋 𝑅 𝑟 1 𝑟 0 𝑅 4 𝐾 𝛿 superscript subscript 𝐻 1 𝜑
1 𝑡 subscript 𝑚 𝑡 \displaystyle\leq(1-\alpha)^{m_{t}}+\frac{\left[r(1)+1\right]\left\{V(x_{0})+%
KT+\int Vd\pi+R+\frac{r(1)}{r(0)}(R+4K)\right\}}{\delta H_{1,\varphi}^{-1}(t/m%
_{t})} ≤ ( 1 - italic_α ) start_POSTSUPERSCRIPT italic_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUPERSCRIPT + divide start_ARG [ italic_r ( 1 ) + 1 ] { italic_V ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) + italic_K italic_T + ∫ italic_V italic_d italic_π + italic_R + divide start_ARG italic_r ( 1 ) end_ARG start_ARG italic_r ( 0 ) end_ARG ( italic_R + 4 italic_K ) } end_ARG start_ARG italic_δ italic_H start_POSTSUBSCRIPT 1 , italic_φ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_t / italic_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) end_ARG
≤ δ + [ r ( 1 ) + 1 ] { V ( x 0 ) + K T + ∫ V 𝑑 π } + C δ H 1 , φ − 1 ( t / m t ) . absent 𝛿 delimited-[] 𝑟 1 1 𝑉 subscript 𝑥 0 𝐾 𝑇 𝑉 differential-d 𝜋 𝐶 𝛿 superscript subscript 𝐻 1 𝜑
1 𝑡 subscript 𝑚 𝑡 \displaystyle\leq\frac{\delta+\left[r(1)+1\right]\left\{V(x_{0})+KT+\int Vd\pi%
\right\}+C}{\delta H_{1,\varphi}^{-1}(t/m_{t})}. ≤ divide start_ARG italic_δ + [ italic_r ( 1 ) + 1 ] { italic_V ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) + italic_K italic_T + ∫ italic_V italic_d italic_π } + italic_C end_ARG start_ARG italic_δ italic_H start_POSTSUBSCRIPT 1 , italic_φ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_t / italic_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) end_ARG .
∎
4 Example: adaptive Metropolis-Hastings independence sampler
In many cases, it is difficult to choose a proposal for Metropolis-Hastings that approximately matches the tail behavior of a complex target measure and adaptive MCMC is often employed.
The point of this toy example is to concretely demonstrate this scenario.
We will use the upper and lower bounds on the convergence to investigate and the sensitivity to different adaptation strategies.
Consider the target measure π ( d x ) = exp ( − x ) I [ 0 , ∞ ) ( x ) d x 𝜋 𝑑 𝑥 𝑥 subscript 𝐼 0 𝑥 𝑑 𝑥 \pi(dx)=\exp(-x)I_{[0,\infty)}(x)dx italic_π ( italic_d italic_x ) = roman_exp ( - italic_x ) italic_I start_POSTSUBSCRIPT [ 0 , ∞ ) end_POSTSUBSCRIPT ( italic_x ) italic_d italic_x .
Let ( γ ∗ , γ ∗ ) subscript 𝛾 superscript 𝛾 (\gamma_{*},\gamma^{*}) ( italic_γ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT , italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) be the interval for some potential tuning parameters 1 < γ ∗ < γ ∗ 1 subscript 𝛾 superscript 𝛾 1<\gamma_{*}<\gamma^{*} 1 < italic_γ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT < italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT and consider a Metropolis-Hastings Markov chain with independent proposal
γ exp ( − γ x ) I [ 0 , ∞ ) ( x ) 𝛾 𝛾 𝑥 subscript 𝐼 0 𝑥 \gamma\exp(-\gamma x)I_{[0,\infty)}(x) italic_γ roman_exp ( - italic_γ italic_x ) italic_I start_POSTSUBSCRIPT [ 0 , ∞ ) end_POSTSUBSCRIPT ( italic_x )
and Markov kernel defined for x , γ ∈ [ 0 , ∞ ) × ( γ ∗ , γ ∗ ) 𝑥 𝛾
0 subscript 𝛾 superscript 𝛾 x,\gamma\in[0,\infty)\times(\gamma_{*},\gamma^{*}) italic_x , italic_γ ∈ [ 0 , ∞ ) × ( italic_γ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT , italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) and all Borel sets A ⊆ [ 0 , ∞ ) 𝐴 0 A\subseteq[0,\infty) italic_A ⊆ [ 0 , ∞ ) by
𝒫 γ ( x , A ) = ∫ A a γ ( x , y ) γ exp ( − γ y ) 𝑑 y + δ x ( A ) R γ ( x ) subscript 𝒫 𝛾 𝑥 𝐴 subscript 𝐴 subscript 𝑎 𝛾 𝑥 𝑦 𝛾 𝛾 𝑦 differential-d 𝑦 subscript 𝛿 𝑥 𝐴 subscript 𝑅 𝛾 𝑥 \displaystyle\mathcal{P}_{\gamma}(x,A)=\int_{A}a_{\gamma}(x,y)\gamma\exp(-%
\gamma y)dy+\delta_{x}(A)R_{\gamma}(x) caligraphic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_x , italic_A ) = ∫ start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_x , italic_y ) italic_γ roman_exp ( - italic_γ italic_y ) italic_d italic_y + italic_δ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ( italic_A ) italic_R start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_x )
(11)
where the acceptance function is a γ ( x , y ) = exp [ ( γ − 1 ) ( y − x ) ] ∧ 1 subscript 𝑎 𝛾 𝑥 𝑦 𝛾 1 𝑦 𝑥 1 a_{\gamma}(x,y)=\exp\left[(\gamma-1)(y-x)\right]\wedge 1 italic_a start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_x , italic_y ) = roman_exp [ ( italic_γ - 1 ) ( italic_y - italic_x ) ] ∧ 1 and the rejection probability is R γ ( x ) = 1 − ∫ 0 ∞ a γ ( x , y ) γ exp ( − γ y ) 𝑑 y subscript 𝑅 𝛾 𝑥 1 superscript subscript 0 subscript 𝑎 𝛾 𝑥 𝑦 𝛾 𝛾 𝑦 differential-d 𝑦 R_{\gamma}(x)=1-\int_{0}^{\infty}a_{\gamma}(x,y)\gamma\exp(-\gamma y)dy italic_R start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_x ) = 1 - ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_x , italic_y ) italic_γ roman_exp ( - italic_γ italic_y ) italic_d italic_y .
Since we restrict γ > 1 𝛾 1 \gamma>1 italic_γ > 1 , the tail probabilities of the proposal and the Metropolis-Hastings kernel are lighter than the target.
Due to this restriction, we will have a polynomial lower bound over any possible adaptation plan.
Proposition 11 .
Let 𝒜 𝒬 ( t ) ( γ 0 , x 0 , ⋅ ) superscript subscript 𝒜 𝒬 𝑡 subscript 𝛾 0 subscript 𝑥 0 ⋅ \mathcal{A}_{\mathcal{Q}}^{(t)}(\gamma_{0},x_{0},\cdot) caligraphic_A start_POSTSUBSCRIPT caligraphic_Q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ( italic_γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , ⋅ ) be the marginal of the adaptive independent Metropolis-Hastings process at time t ∈ ℤ + 𝑡 subscript ℤ t\in\mathbb{Z}_{+} italic_t ∈ blackboard_Z start_POSTSUBSCRIPT + end_POSTSUBSCRIPT from (11 ) with adaptation parameter set ( γ ∗ , γ ∗ ) subscript 𝛾 superscript 𝛾 (\gamma_{*},\gamma^{*}) ( italic_γ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT , italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) and initialized at x 0 , γ 0 ∈ ( 0 , ∞ ) × ( γ ∗ , γ ∗ ) subscript 𝑥 0 subscript 𝛾 0
0 subscript 𝛾 superscript 𝛾 x_{0},\gamma_{0}\in(0,\infty)\times(\gamma_{*},\gamma^{*}) italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∈ ( 0 , ∞ ) × ( italic_γ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT , italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) .
Then
inf 𝒬 ∈ 𝒮 ( [ 0 , ∞ ) , ( γ ∗ , γ ∗ ) ) ∥ 𝒜 𝒬 ( t ) ( γ 0 , 0 , ⋅ ) − π ∥ TV ≥ M ∗ ( c ∗ t + 1 ) 1 γ ∗ − 1 subscript infimum 𝒬 𝒮 0 subscript 𝛾 superscript 𝛾 subscript delimited-∥∥ superscript subscript 𝒜 𝒬 𝑡 subscript 𝛾 0 0 ⋅ 𝜋 TV subscript 𝑀 superscript subscript 𝑐 𝑡 1 1 subscript 𝛾 1 \inf_{\mathcal{Q}\in\mathcal{S}([0,\infty),(\gamma_{*},\gamma^{*}))}\left%
\lVert\mathcal{A}_{\mathcal{Q}}^{(t)}(\gamma_{0},0,\cdot)-\pi\right\rVert_{%
\text{TV}}\geq\frac{M_{*}}{\left(c_{*}t+1\right)^{\frac{1}{\gamma_{*}-1}}} roman_inf start_POSTSUBSCRIPT caligraphic_Q ∈ caligraphic_S ( [ 0 , ∞ ) , ( italic_γ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT , italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ) end_POSTSUBSCRIPT ∥ caligraphic_A start_POSTSUBSCRIPT caligraphic_Q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ( italic_γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , 0 , ⋅ ) - italic_π ∥ start_POSTSUBSCRIPT TV end_POSTSUBSCRIPT ≥ divide start_ARG italic_M start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT end_ARG start_ARG ( italic_c start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT italic_t + 1 ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_γ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT - 1 end_ARG end_POSTSUPERSCRIPT end_ARG
where M ∗ = γ ∗ − 1 / ( γ ∗ − 1 ) − γ ∗ − γ ∗ / ( γ ∗ − 1 ) subscript 𝑀 superscript subscript 𝛾 1 subscript 𝛾 1 superscript subscript 𝛾 subscript 𝛾 subscript 𝛾 1 M_{*}=\gamma_{*}^{-1/(\gamma_{*}-1)}-\gamma_{*}^{-\gamma_{*}/(\gamma_{*}-1)} italic_M start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT = italic_γ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 / ( italic_γ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT - 1 ) end_POSTSUPERSCRIPT - italic_γ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_γ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT / ( italic_γ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT - 1 ) end_POSTSUPERSCRIPT and c ∗ = γ ∗ γ ∗ − 1 + γ ∗ − 1 subscript 𝑐 superscript 𝛾 subscript 𝛾 1 superscript 𝛾 1 c_{*}=\frac{\gamma^{*}}{\gamma_{*}-1}+\gamma^{*}-1 italic_c start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT = divide start_ARG italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG start_ARG italic_γ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT - 1 end_ARG + italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - 1 .
Proof.
Define W ( x ) = exp ( x ) 𝑊 𝑥 𝑥 W(x)=\exp(x) italic_W ( italic_x ) = roman_exp ( italic_x ) , and we have by a standard computation π ( W ( x ) ≥ r ) = r − 1 𝜋 𝑊 𝑥 𝑟 superscript 𝑟 1 \pi(W(x)\geq r)=r^{-1} italic_π ( italic_W ( italic_x ) ≥ italic_r ) = italic_r start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT .
Assume 1 < γ ∗ < γ 1 subscript 𝛾 𝛾 1<\gamma_{*}<\gamma 1 < italic_γ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT < italic_γ .
For α < γ 𝛼 𝛾 \alpha<\gamma italic_α < italic_γ , the identity holds
( 𝒫 γ W α ) ( x ) − W α ( x ) subscript 𝒫 𝛾 superscript 𝑊 𝛼 𝑥 superscript 𝑊 𝛼 𝑥 \displaystyle(\mathcal{P}_{\gamma}W^{\alpha})(x)-W^{\alpha}(x) ( caligraphic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT italic_W start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ) ( italic_x ) - italic_W start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ( italic_x )
= ∫ [ 0 , ∞ ) exp ( α y ) 1 ∧ { exp [ ( γ − 1 ) ( y − x ) ] } γ exp ( − γ y ) d y − exp ( α x ) 𝔼 ( a γ ( x , Y ) ) . absent subscript 0 𝛼 𝑦 1 𝛾 1 𝑦 𝑥 𝛾 𝛾 𝑦 𝑑 𝑦 𝛼 𝑥 𝔼 subscript 𝑎 𝛾 𝑥 𝑌 \displaystyle=\int_{[0,\infty)}\exp(\alpha y)1\wedge\left\{\exp\left[(\gamma-1%
)(y-x)\right]\right\}\gamma\exp(-\gamma y)dy-\exp(\alpha x)\mathbb{E}\left(a_{%
\gamma}(x,Y)\right). = ∫ start_POSTSUBSCRIPT [ 0 , ∞ ) end_POSTSUBSCRIPT roman_exp ( italic_α italic_y ) 1 ∧ { roman_exp [ ( italic_γ - 1 ) ( italic_y - italic_x ) ] } italic_γ roman_exp ( - italic_γ italic_y ) italic_d italic_y - roman_exp ( italic_α italic_x ) blackboard_E ( italic_a start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_x , italic_Y ) ) .
We also have for any α < γ 𝛼 𝛾 \alpha<\gamma italic_α < italic_γ ,
∫ [ 0 , ∞ ) exp ( α y ) 1 ∧ { exp [ ( γ − 1 ) ( y − x ) ] } γ exp ( − γ y ) d y subscript 0 𝛼 𝑦 1 𝛾 1 𝑦 𝑥 𝛾 𝛾 𝑦 𝑑 𝑦 \displaystyle\int_{[0,\infty)}\exp(\alpha y)1\wedge\left\{\exp\left[(\gamma-1)%
(y-x)\right]\right\}\gamma\exp(-\gamma y)dy ∫ start_POSTSUBSCRIPT [ 0 , ∞ ) end_POSTSUBSCRIPT roman_exp ( italic_α italic_y ) 1 ∧ { roman_exp [ ( italic_γ - 1 ) ( italic_y - italic_x ) ] } italic_γ roman_exp ( - italic_γ italic_y ) italic_d italic_y
= γ exp [ ( 1 − γ ) x ] α − 1 ∫ 0 x ( α − 1 ) exp [ ( α − 1 ) y ] 𝑑 y absent 𝛾 1 𝛾 𝑥 𝛼 1 superscript subscript 0 𝑥 𝛼 1 𝛼 1 𝑦 differential-d 𝑦 \displaystyle=\frac{\gamma\exp\left[(1-\gamma)x\right]}{\alpha-1}\int_{0}^{x}(%
\alpha-1)\exp\left[(\alpha-1)y\right]dy = divide start_ARG italic_γ roman_exp [ ( 1 - italic_γ ) italic_x ] end_ARG start_ARG italic_α - 1 end_ARG ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT ( italic_α - 1 ) roman_exp [ ( italic_α - 1 ) italic_y ] italic_d italic_y
+ γ γ − α ∫ x ∞ ( γ − α ) exp [ − ( γ − α ) y ] 𝑑 y 𝛾 𝛾 𝛼 superscript subscript 𝑥 𝛾 𝛼 𝛾 𝛼 𝑦 differential-d 𝑦 \displaystyle\hskip 15.00002pt+\frac{\gamma}{\gamma-\alpha}\int_{x}^{\infty}(%
\gamma-\alpha)\exp\left[-(\gamma-\alpha)y\right]dy + divide start_ARG italic_γ end_ARG start_ARG italic_γ - italic_α end_ARG ∫ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ( italic_γ - italic_α ) roman_exp [ - ( italic_γ - italic_α ) italic_y ] italic_d italic_y
= { γ α − 1 + γ γ − α } exp [ − ( γ − α ) x ] − γ α − 1 exp [ − ( γ − 1 ) x ] . absent 𝛾 𝛼 1 𝛾 𝛾 𝛼 𝛾 𝛼 𝑥 𝛾 𝛼 1 𝛾 1 𝑥 \displaystyle=\left\{\frac{\gamma}{\alpha-1}+\frac{\gamma}{\gamma-\alpha}%
\right\}\exp\left[-(\gamma-\alpha)x\right]-\frac{\gamma}{\alpha-1}\exp\left[-(%
\gamma-1)x\right]. = { divide start_ARG italic_γ end_ARG start_ARG italic_α - 1 end_ARG + divide start_ARG italic_γ end_ARG start_ARG italic_γ - italic_α end_ARG } roman_exp [ - ( italic_γ - italic_α ) italic_x ] - divide start_ARG italic_γ end_ARG start_ARG italic_α - 1 end_ARG roman_exp [ - ( italic_γ - 1 ) italic_x ] .
(12)
Using (12 ),
𝔼 ( a γ ( x , Y ) ) = γ exp ( ( 1 − γ ) x ) + ( 1 − γ ) exp ( − γ x ) . 𝔼 subscript 𝑎 𝛾 𝑥 𝑌 𝛾 1 𝛾 𝑥 1 𝛾 𝛾 𝑥 \mathbb{E}\left(a_{\gamma}(x,Y)\right)=\gamma\exp((1-\gamma)x)+(1-\gamma)\exp(%
-\gamma x). blackboard_E ( italic_a start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_x , italic_Y ) ) = italic_γ roman_exp ( ( 1 - italic_γ ) italic_x ) + ( 1 - italic_γ ) roman_exp ( - italic_γ italic_x ) .
So then
( 𝒫 γ W α ) ( x ) − W α ( x ) subscript 𝒫 𝛾 superscript 𝑊 𝛼 𝑥 superscript 𝑊 𝛼 𝑥 \displaystyle(\mathcal{P}_{\gamma}W^{\alpha})(x)-W^{\alpha}(x) ( caligraphic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT italic_W start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ) ( italic_x ) - italic_W start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ( italic_x )
= { γ α − 1 + γ γ − α + γ − 1 } exp [ − ( γ − α ) x ] − γ α − 1 exp [ − ( γ − 1 ) x ] − γ W ( x ) 1 + α − γ α . absent 𝛾 𝛼 1 𝛾 𝛾 𝛼 𝛾 1 𝛾 𝛼 𝑥 𝛾 𝛼 1 𝛾 1 𝑥 𝛾 𝑊 superscript 𝑥 1 𝛼 𝛾 𝛼 \displaystyle=\left\{\frac{\gamma}{\alpha-1}+\frac{\gamma}{\gamma-\alpha}+%
\gamma-1\right\}\exp\left[-(\gamma-\alpha)x\right]-\frac{\gamma}{\alpha-1}\exp%
\left[-(\gamma-1)x\right]-\gamma W(x)^{\frac{1+\alpha-\gamma}{\alpha}}. = { divide start_ARG italic_γ end_ARG start_ARG italic_α - 1 end_ARG + divide start_ARG italic_γ end_ARG start_ARG italic_γ - italic_α end_ARG + italic_γ - 1 } roman_exp [ - ( italic_γ - italic_α ) italic_x ] - divide start_ARG italic_γ end_ARG start_ARG italic_α - 1 end_ARG roman_exp [ - ( italic_γ - 1 ) italic_x ] - italic_γ italic_W ( italic_x ) start_POSTSUPERSCRIPT divide start_ARG 1 + italic_α - italic_γ end_ARG start_ARG italic_α end_ARG end_POSTSUPERSCRIPT .
It follows that
( 𝒫 γ W γ ∗ ) ( x ) − W γ ∗ ( x ) subscript 𝒫 𝛾 superscript 𝑊 subscript 𝛾 𝑥 superscript 𝑊 subscript 𝛾 𝑥 \displaystyle(\mathcal{P}_{\gamma}W^{\gamma_{*}})(x)-W^{\gamma_{*}}(x) ( caligraphic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT italic_W start_POSTSUPERSCRIPT italic_γ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) ( italic_x ) - italic_W start_POSTSUPERSCRIPT italic_γ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( italic_x )
= { γ γ ∗ − 1 + γ γ − γ ∗ + γ − 1 } exp ( − ( γ − γ ∗ ) x ) absent 𝛾 subscript 𝛾 1 𝛾 𝛾 subscript 𝛾 𝛾 1 𝛾 subscript 𝛾 𝑥 \displaystyle=\left\{\frac{\gamma}{\gamma_{*}-1}+\frac{\gamma}{\gamma-\gamma_{%
*}}+\gamma-1\right\}\exp(-(\gamma-\gamma_{*})x) = { divide start_ARG italic_γ end_ARG start_ARG italic_γ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT - 1 end_ARG + divide start_ARG italic_γ end_ARG start_ARG italic_γ - italic_γ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT end_ARG + italic_γ - 1 } roman_exp ( - ( italic_γ - italic_γ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) italic_x )
− γ γ ∗ − 1 exp ( − ( γ − 1 ) x ) − γ W γ ∗ ( x ) 1 + γ ∗ − γ γ ∗ 𝛾 subscript 𝛾 1 𝛾 1 𝑥 𝛾 superscript 𝑊 subscript 𝛾 superscript 𝑥 1 subscript 𝛾 𝛾 subscript 𝛾 \displaystyle\hskip 15.00002pt-\frac{\gamma}{\gamma_{*}-1}\exp(-(\gamma-1)x)-%
\gamma W^{\gamma_{*}}(x)^{\frac{1+\gamma_{*}-\gamma}{\gamma_{*}}} - divide start_ARG italic_γ end_ARG start_ARG italic_γ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT - 1 end_ARG roman_exp ( - ( italic_γ - 1 ) italic_x ) - italic_γ italic_W start_POSTSUPERSCRIPT italic_γ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( italic_x ) start_POSTSUPERSCRIPT divide start_ARG 1 + italic_γ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT - italic_γ end_ARG start_ARG italic_γ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT end_ARG end_POSTSUPERSCRIPT
≤ c ∗ . absent subscript 𝑐 \displaystyle\leq c_{*}. ≤ italic_c start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT .
(13)
With the upper bound in (13 ), we can now use Theorem 1 with φ ≡ c ∗ 𝜑 superscript 𝑐 \varphi\equiv c^{*} italic_φ ≡ italic_c start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , κ = 1 𝜅 1 \kappa=1 italic_κ = 1 , and α = γ ∗ 𝛼 subscript 𝛾 \alpha=\gamma_{*} italic_α = italic_γ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT .
We then have the lower bound
∥ 𝒜 𝒬 ( t ) ( γ 0 , x 0 , ⋅ ) − π ∥ TV ≥ M ∗ [ c ∗ t + exp ( γ ∗ x 0 ) ] 1 γ ∗ − 1 subscript delimited-∥∥ superscript subscript 𝒜 𝒬 𝑡 subscript 𝛾 0 subscript 𝑥 0 ⋅ 𝜋 TV subscript 𝑀 superscript delimited-[] subscript 𝑐 𝑡 subscript 𝛾 subscript 𝑥 0 1 subscript 𝛾 1 \displaystyle\left\lVert\mathcal{A}_{\mathcal{Q}}^{(t)}(\gamma_{0},x_{0},\cdot%
)-\pi\right\rVert_{\text{TV}}\geq\frac{M_{*}}{\left[c_{*}t+\exp(\gamma_{*}x_{0%
})\right]^{\frac{1}{\gamma_{*}-1}}} ∥ caligraphic_A start_POSTSUBSCRIPT caligraphic_Q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ( italic_γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , ⋅ ) - italic_π ∥ start_POSTSUBSCRIPT TV end_POSTSUBSCRIPT ≥ divide start_ARG italic_M start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT end_ARG start_ARG [ italic_c start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT italic_t + roman_exp ( italic_γ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ] start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_γ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT - 1 end_ARG end_POSTSUPERSCRIPT end_ARG
holds for every t ∈ ℤ + 𝑡 subscript ℤ t\in\mathbb{Z}_{+} italic_t ∈ blackboard_Z start_POSTSUBSCRIPT + end_POSTSUBSCRIPT uniformly in the adaptation strategy 𝒬 𝒬 \mathcal{Q} caligraphic_Q .
∎
In Table 2 , we compute the lower bound in Proposition 11 for different choices of ( γ ∗ , γ ∗ ) subscript 𝛾 superscript 𝛾 (\gamma_{*},\gamma^{*}) ( italic_γ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT , italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) .
The large values from Table 2 illustrate that even in this toy example, it is possible to observe poor convergence behavior of adaptive MCMC with certain tuning parameter sets independently of the adaptation strategy.
However, this limitation on the convergence rate can be avoided if the adaptation plan is capable of crossing the critical boundary γ = 1 𝛾 1 \gamma=1 italic_γ = 1 .
By Theorem 4 , the lower bound rate in Proposition 11 will be the same even when converging weakly.
Table 2: Lower bound computations from Propositon 11 for the adaptive Metropolis-Hastings independence sampler.
We now look at upper bounds from Section 3 where we require adaptation is restricted to a compact set. This is a commonly used strategy in adaptive MCMC [Pompe et al., 2020 ] .
Proposition 12 .
For t ∈ ℤ + 𝑡 subscript ℤ t\in\mathbb{Z}_{+} italic_t ∈ blackboard_Z start_POSTSUBSCRIPT + end_POSTSUBSCRIPT , let 𝒜 𝒬 ( t ) ( γ 0 , x 0 , ⋅ ) superscript subscript 𝒜 𝒬 𝑡 subscript 𝛾 0 subscript 𝑥 0 ⋅ \mathcal{A}_{\mathcal{Q}}^{(t)}(\gamma_{0},x_{0},\cdot) caligraphic_A start_POSTSUBSCRIPT caligraphic_Q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ( italic_γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , ⋅ ) be the marginal of an adaptive independent Metropolis-Hastings process and M ∗ , c ∗ subscript 𝑀 subscript 𝑐
M_{*},c_{*} italic_M start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT defined in Proposition 11 .
Assume for each t ∈ ℤ + 𝑡 subscript ℤ t\in\mathbb{Z}_{+} italic_t ∈ blackboard_Z start_POSTSUBSCRIPT + end_POSTSUBSCRIPT , 𝒫 Γ t + 1 ( x , ⋅ ) = 𝒫 Γ t ( x , ⋅ ) subscript 𝒫 subscript Γ 𝑡 1 𝑥 ⋅ subscript 𝒫 subscript Γ 𝑡 𝑥 ⋅ \mathcal{P}_{\Gamma_{t+1}}(x,\cdot)=\mathcal{P}_{\Gamma_{t}}(x,\cdot) caligraphic_P start_POSTSUBSCRIPT roman_Γ start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x , ⋅ ) = caligraphic_P start_POSTSUBSCRIPT roman_Γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x , ⋅ ) for all x ≥ r 𝑥 𝑟 x\geq r italic_x ≥ italic_r for some r > 0 𝑟 0 r>0 italic_r > 0 and
sup x ∈ 𝒳 𝔼 [ ∥ Γ t + 1 − Γ t ∥ F ∣ X t = x ] ≤ G ( t ) subscript supremum 𝑥 𝒳 𝔼 delimited-[] conditional subscript delimited-∥∥ subscript Γ 𝑡 1 subscript Γ 𝑡 𝐹 subscript 𝑋 𝑡 𝑥 𝐺 𝑡 \sup_{x\in\mathcal{X}}\mathbb{E}\left[\left\lVert\Gamma_{t+1}-\Gamma_{t}\right%
\rVert_{F}\mid X_{t}=x\right]\leq G(t) roman_sup start_POSTSUBSCRIPT italic_x ∈ caligraphic_X end_POSTSUBSCRIPT blackboard_E [ ∥ roman_Γ start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT - roman_Γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT ∣ italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_x ] ≤ italic_G ( italic_t )
for some function G ( ⋅ ) 𝐺 ⋅ G(\cdot) italic_G ( ⋅ ) strictly decreasing to infinity.
Additionally, assume γ ∗ < 2 − ϵ superscript 𝛾 2 italic-ϵ \gamma^{*}<2-\epsilon italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT < 2 - italic_ϵ for some ϵ ∈ ( 0 , 1 ) italic-ϵ 0 1 \epsilon\in(0,1) italic_ϵ ∈ ( 0 , 1 ) .
Then for all δ ∈ ( 0 , 1 ) 𝛿 0 1 \delta\in(0,1) italic_δ ∈ ( 0 , 1 ) and t ∈ ℤ + 𝑡 subscript ℤ t\in\mathbb{Z}_{+} italic_t ∈ blackboard_Z start_POSTSUBSCRIPT + end_POSTSUBSCRIPT with T δ , t = ( 1 / G ) − 1 ( J t 2 / δ ) subscript 𝑇 𝛿 𝑡
superscript 1 𝐺 1 𝐽 superscript 𝑡 2 𝛿 T_{\delta,t}=(1/G)^{-1}\left(Jt^{2}/\delta\right) italic_T start_POSTSUBSCRIPT italic_δ , italic_t end_POSTSUBSCRIPT = ( 1 / italic_G ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_J italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / italic_δ ) ,
M ∗ [ c ∗ ( T δ , t + t ) + 1 ] 1 γ ∗ − 1 ≤ subscript 𝑀 superscript delimited-[] subscript 𝑐 subscript 𝑇 𝛿 𝑡
𝑡 1 1 subscript 𝛾 1 absent \displaystyle\frac{M_{*}}{\left[c_{*}(T_{\delta,t}+t)+1\right]^{\frac{1}{%
\gamma_{*}-1}}}\leq divide start_ARG italic_M start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT end_ARG start_ARG [ italic_c start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ( italic_T start_POSTSUBSCRIPT italic_δ , italic_t end_POSTSUBSCRIPT + italic_t ) + 1 ] start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_γ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT - 1 end_ARG end_POSTSUPERSCRIPT end_ARG ≤
∥ 𝒜 𝒬 ( T δ , t + t ) ( ( γ 0 , 0 ) , ⋅ ) − π ∥ TV subscript delimited-∥∥ superscript subscript 𝒜 𝒬 subscript 𝑇 𝛿 𝑡
𝑡 subscript 𝛾 0 0 ⋅ 𝜋 TV \displaystyle\left\lVert\mathcal{A}_{\mathcal{Q}}^{(T_{\delta,t}+t)}((\gamma_{%
0},0),\cdot)-\pi\right\rVert_{\text{TV}} ∥ caligraphic_A start_POSTSUBSCRIPT caligraphic_Q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_T start_POSTSUBSCRIPT italic_δ , italic_t end_POSTSUBSCRIPT + italic_t ) end_POSTSUPERSCRIPT ( ( italic_γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , 0 ) , ⋅ ) - italic_π ∥ start_POSTSUBSCRIPT TV end_POSTSUBSCRIPT
≤ 2 [ r ( 1 ) + 1 ] K T δ , t + C [ 1 + c t − log ( 1 + c t ) / log ( 1 − α ) + 1 ] 1 − ϵ γ ∗ − 1 + δ absent 2 delimited-[] 𝑟 1 1 𝐾 subscript 𝑇 𝛿 𝑡
𝐶 superscript delimited-[] 1 𝑐 𝑡 1 𝑐 𝑡 1 𝛼 1 1 italic-ϵ superscript 𝛾 1 𝛿 \displaystyle\leq\frac{2\left[r(1)+1\right]KT_{\delta,t}+C}{\left[1+c\frac{t}{%
-\log(1+ct)/\log(1-\alpha)+1}\right]^{\frac{1-\epsilon}{\gamma^{*}-1}}}+\delta ≤ divide start_ARG 2 [ italic_r ( 1 ) + 1 ] italic_K italic_T start_POSTSUBSCRIPT italic_δ , italic_t end_POSTSUBSCRIPT + italic_C end_ARG start_ARG [ 1 + italic_c divide start_ARG italic_t end_ARG start_ARG - roman_log ( 1 + italic_c italic_t ) / roman_log ( 1 - italic_α ) + 1 end_ARG ] start_POSTSUPERSCRIPT divide start_ARG 1 - italic_ϵ end_ARG start_ARG italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - 1 end_ARG end_POSTSUPERSCRIPT end_ARG + italic_δ
where
J = 2 / γ ∗ 2 + r + 1 γ ∗ , c = γ ∗ − 1 1 − ϵ , K = γ ∗ γ ∗ − 1 + ϵ , α = γ ∗ ( 2 K ) γ ∗ − 1 1 − ϵ , r ( 1 ) = ( c + 1 ) 2 − ϵ − γ ∗ γ ∗ − 1 , formulae-sequence 𝐽 2 superscript subscript 𝛾 2 𝑟 1 subscript 𝛾 formulae-sequence 𝑐 superscript 𝛾 1 1 italic-ϵ formulae-sequence 𝐾 superscript 𝛾 subscript 𝛾 1 italic-ϵ formulae-sequence 𝛼 subscript 𝛾 superscript 2 𝐾 superscript 𝛾 1 1 italic-ϵ 𝑟 1 superscript 𝑐 1 2 italic-ϵ superscript 𝛾 superscript 𝛾 1 \displaystyle J=2/\gamma_{*}^{2}+r+\frac{1}{\gamma_{*}},c=\frac{\gamma^{*}-1}{%
1-\epsilon},K=\frac{\gamma^{*}}{\gamma_{*}-1+\epsilon},\alpha=\frac{\gamma_{*}%
}{(2K)^{\frac{\gamma*^{-}1}{1-\epsilon}}},r(1)=\left(c+1\right)^{\frac{2-%
\epsilon-\gamma^{*}}{\gamma^{*}-1}}, italic_J = 2 / italic_γ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_r + divide start_ARG 1 end_ARG start_ARG italic_γ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT end_ARG , italic_c = divide start_ARG italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - 1 end_ARG start_ARG 1 - italic_ϵ end_ARG , italic_K = divide start_ARG italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG start_ARG italic_γ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT - 1 + italic_ϵ end_ARG , italic_α = divide start_ARG italic_γ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT end_ARG start_ARG ( 2 italic_K ) start_POSTSUPERSCRIPT divide start_ARG italic_γ ∗ start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT 1 end_ARG start_ARG 1 - italic_ϵ end_ARG end_POSTSUPERSCRIPT end_ARG , italic_r ( 1 ) = ( italic_c + 1 ) start_POSTSUPERSCRIPT divide start_ARG 2 - italic_ϵ - italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG start_ARG italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - 1 end_ARG end_POSTSUPERSCRIPT ,
R = ( 4 K ) 1 − ϵ 2 − ϵ − γ ∗ , C = 1 + 2 [ r ( 1 ) + 1 ] [ 1 + ϵ − 1 ] + [ r ( 1 ) + 1 ] { R + r ( 1 ) ( R + 4 K ) } . formulae-sequence 𝑅 superscript 4 𝐾 1 italic-ϵ 2 italic-ϵ superscript 𝛾 𝐶 1 2 delimited-[] 𝑟 1 1 delimited-[] 1 superscript italic-ϵ 1 delimited-[] 𝑟 1 1 𝑅 𝑟 1 𝑅 4 𝐾 \displaystyle R=(4K)^{\frac{1-\epsilon}{2-\epsilon-\gamma^{*}}},C=1+2\left[r(1%
)+1\right][1+\epsilon^{-1}]+\left[r(1)+1\right]\left\{R+r(1)(R+4K)\right\}. italic_R = ( 4 italic_K ) start_POSTSUPERSCRIPT divide start_ARG 1 - italic_ϵ end_ARG start_ARG 2 - italic_ϵ - italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG end_POSTSUPERSCRIPT , italic_C = 1 + 2 [ italic_r ( 1 ) + 1 ] [ 1 + italic_ϵ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ] + [ italic_r ( 1 ) + 1 ] { italic_R + italic_r ( 1 ) ( italic_R + 4 italic_K ) } .
Proof.
We will use Theorem 10 to establish the upper bound.
We first verify the expected diminishing adaptation condition (7 ).
Let φ : ℝ → [ 0 , 1 ] : 𝜑 → ℝ 0 1 \varphi:\mathbb{R}\to[0,1] italic_φ : blackboard_R → [ 0 , 1 ] and for x > 0 𝑥 0 x>0 italic_x > 0 and let ψ x ( y ) = φ ( y ) − φ ( x ) subscript 𝜓 𝑥 𝑦 𝜑 𝑦 𝜑 𝑥 \psi_{x}(y)=\varphi(y)-\varphi(x) italic_ψ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ( italic_y ) = italic_φ ( italic_y ) - italic_φ ( italic_x ) .
Then
𝒫 γ ′ φ ( x ) − 𝒫 γ φ ( x ) = 𝒫 γ ′ ψ x ( x ) − 𝒫 γ ψ x ( x ) subscript 𝒫 superscript 𝛾 ′ 𝜑 𝑥 subscript 𝒫 𝛾 𝜑 𝑥 subscript 𝒫 superscript 𝛾 ′ subscript 𝜓 𝑥 𝑥 subscript 𝒫 𝛾 subscript 𝜓 𝑥 𝑥 \displaystyle\mathcal{P}_{\gamma^{\prime}}\varphi(x)-\mathcal{P}_{\gamma}%
\varphi(x)=\mathcal{P}_{\gamma^{\prime}}\psi_{x}(x)-\mathcal{P}_{\gamma}\psi_{%
x}(x) caligraphic_P start_POSTSUBSCRIPT italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_φ ( italic_x ) - caligraphic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT italic_φ ( italic_x ) = caligraphic_P start_POSTSUBSCRIPT italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_ψ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ( italic_x ) - caligraphic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT italic_ψ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ( italic_x )
= ∫ ψ x ( y ) [ a γ ′ ( x , y ) γ ′ exp ( − γ ′ y ) − a γ ( x , y ) γ exp ( − γ y ) ] 𝑑 y absent subscript 𝜓 𝑥 𝑦 delimited-[] subscript 𝑎 superscript 𝛾 ′ 𝑥 𝑦 superscript 𝛾 ′ superscript 𝛾 ′ 𝑦 subscript 𝑎 𝛾 𝑥 𝑦 𝛾 𝛾 𝑦 differential-d 𝑦 \displaystyle=\int\psi_{x}(y)\left[a_{\gamma^{\prime}}(x,y)\gamma^{\prime}\exp%
(-\gamma^{\prime}y)-a_{\gamma}(x,y)\gamma\exp(-\gamma y)\right]dy = ∫ italic_ψ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ( italic_y ) [ italic_a start_POSTSUBSCRIPT italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_x , italic_y ) italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT roman_exp ( - italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_y ) - italic_a start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_x , italic_y ) italic_γ roman_exp ( - italic_γ italic_y ) ] italic_d italic_y
≤ | γ ′ − γ | ∫ | y − x | γ ′ exp ( − γ ′ y ) 𝑑 y + ∫ | γ ′ exp ( − γ ′ y ) − γ exp ( − γ y ) | 𝑑 y absent superscript 𝛾 ′ 𝛾 𝑦 𝑥 superscript 𝛾 ′ superscript 𝛾 ′ 𝑦 differential-d 𝑦 superscript 𝛾 ′ superscript 𝛾 ′ 𝑦 𝛾 𝛾 𝑦 differential-d 𝑦 \displaystyle\leq|\gamma^{\prime}-\gamma|\int|y-x|\gamma^{\prime}\exp(-\gamma^%
{\prime}y)dy+\int|\gamma^{\prime}\exp(-\gamma^{\prime}y)-\gamma\exp(-\gamma y)%
|dy ≤ | italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - italic_γ | ∫ | italic_y - italic_x | italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT roman_exp ( - italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_y ) italic_d italic_y + ∫ | italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT roman_exp ( - italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_y ) - italic_γ roman_exp ( - italic_γ italic_y ) | italic_d italic_y
≤ ( 2 / γ ∗ 2 + | x | ) | γ ′ − γ | + 1 γ ∗ | γ ′ − γ | . absent 2 superscript subscript 𝛾 2 𝑥 superscript 𝛾 ′ 𝛾 1 subscript 𝛾 superscript 𝛾 ′ 𝛾 \displaystyle\leq(2/\gamma_{*}^{2}+|x|)|\gamma^{\prime}-\gamma|+\frac{1}{%
\gamma_{*}}|\gamma^{\prime}-\gamma|. ≤ ( 2 / italic_γ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + | italic_x | ) | italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - italic_γ | + divide start_ARG 1 end_ARG start_ARG italic_γ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT end_ARG | italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - italic_γ | .
Let J = 2 / γ ∗ 2 + r + 1 γ ∗ 𝐽 2 superscript subscript 𝛾 2 𝑟 1 subscript 𝛾 J=2/\gamma_{*}^{2}+r+\frac{1}{\gamma_{*}} italic_J = 2 / italic_γ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_r + divide start_ARG 1 end_ARG start_ARG italic_γ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT end_ARG and so expected diminishing adaptation (7 ) since
sup x ∈ [ 0 , ∞ ) 𝔼 ( ∥ 𝒫 Γ t + 1 ( x , ⋅ ) − 𝒫 Γ t ( x , ⋅ ) ∥ TV ∣ X t = x ) subscript supremum 𝑥 0 𝔼 conditional subscript delimited-∥∥ subscript 𝒫 subscript Γ 𝑡 1 𝑥 ⋅ subscript 𝒫 subscript Γ 𝑡 𝑥 ⋅ TV subscript 𝑋 𝑡 𝑥 \displaystyle\sup_{x\in[0,\infty)}\mathbb{E}\left(\left\lVert\mathcal{P}_{%
\Gamma_{t+1}}(x,\cdot)-\mathcal{P}_{\Gamma_{t}}(x,\cdot)\right\rVert_{\text{TV%
}}\mid X_{t}=x\right) roman_sup start_POSTSUBSCRIPT italic_x ∈ [ 0 , ∞ ) end_POSTSUBSCRIPT blackboard_E ( ∥ caligraphic_P start_POSTSUBSCRIPT roman_Γ start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x , ⋅ ) - caligraphic_P start_POSTSUBSCRIPT roman_Γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x , ⋅ ) ∥ start_POSTSUBSCRIPT TV end_POSTSUBSCRIPT ∣ italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_x )
≤ J sup | x | ≤ r 𝔼 ( | Γ t + 1 − Γ t | ∣ X t = x ) absent 𝐽 subscript supremum 𝑥 𝑟 𝔼 conditional subscript Γ 𝑡 1 subscript Γ 𝑡 subscript 𝑋 𝑡 𝑥 \displaystyle\leq J\sup_{|x|\leq r}\mathbb{E}\left(|\Gamma_{t+1}-\Gamma_{t}|%
\mid X_{t}=x\right) ≤ italic_J roman_sup start_POSTSUBSCRIPT | italic_x | ≤ italic_r end_POSTSUBSCRIPT blackboard_E ( | roman_Γ start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT - roman_Γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | ∣ italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_x )
≤ J G ( t ) . absent 𝐽 𝐺 𝑡 \displaystyle\leq JG(t). ≤ italic_J italic_G ( italic_t ) .
Next, we verify the simultaneous subgeometric drift condition.
Let V ( x ) = exp ( x ) 𝑉 𝑥 𝑥 V(x)=\exp(x) italic_V ( italic_x ) = roman_exp ( italic_x ) , and using the identity (12 ), for ϵ ∈ ( 0 , 1 ) italic-ϵ 0 1 \epsilon\in(0,1) italic_ϵ ∈ ( 0 , 1 ) ,
( 𝒫 γ V 1 − ϵ ) ( x ) − V 1 − ϵ ( x ) ≤ { γ ∗ γ ∗ − 1 + ϵ + γ ∗ − 1 } − γ ∗ V ( x ) 2 − ϵ − γ ∗ 1 − ϵ . subscript 𝒫 𝛾 superscript 𝑉 1 italic-ϵ 𝑥 superscript 𝑉 1 italic-ϵ 𝑥 superscript 𝛾 subscript 𝛾 1 italic-ϵ superscript 𝛾 1 subscript 𝛾 𝑉 superscript 𝑥 2 italic-ϵ superscript 𝛾 1 italic-ϵ \displaystyle(\mathcal{P}_{\gamma}V^{1-\epsilon})(x)-V^{1-\epsilon}(x)\leq%
\left\{\frac{\gamma^{*}}{\gamma_{*}-1+\epsilon}+\gamma^{*}-1\right\}-\gamma_{*%
}V(x)^{\frac{2-\epsilon-\gamma^{*}}{1-\epsilon}}. ( caligraphic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT italic_V start_POSTSUPERSCRIPT 1 - italic_ϵ end_POSTSUPERSCRIPT ) ( italic_x ) - italic_V start_POSTSUPERSCRIPT 1 - italic_ϵ end_POSTSUPERSCRIPT ( italic_x ) ≤ { divide start_ARG italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG start_ARG italic_γ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT - 1 + italic_ϵ end_ARG + italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - 1 } - italic_γ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT italic_V ( italic_x ) start_POSTSUPERSCRIPT divide start_ARG 2 - italic_ϵ - italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG start_ARG 1 - italic_ϵ end_ARG end_POSTSUPERSCRIPT .
Now we satisfy the simultaneous local coupling with a minorization condition.
If V ( x ) ≤ R 𝑉 𝑥 𝑅 V(x)\leq R italic_V ( italic_x ) ≤ italic_R , then exp ( ( γ − 1 ) x ) ≤ R γ − 1 1 − ϵ 𝛾 1 𝑥 superscript 𝑅 𝛾 1 1 italic-ϵ \exp((\gamma-1)x)\leq R^{\frac{\gamma-1}{1-\epsilon}} roman_exp ( ( italic_γ - 1 ) italic_x ) ≤ italic_R start_POSTSUPERSCRIPT divide start_ARG italic_γ - 1 end_ARG start_ARG 1 - italic_ϵ end_ARG end_POSTSUPERSCRIPT .
Define ν γ ( ⋅ ) = Z − 1 ∫ ⋅ 1 ∧ exp ( ( γ − 1 ) y ) γ exp ( − γ y ) d y subscript 𝜈 𝛾 ⋅ superscript 𝑍 1 subscript ⋅ 1 𝛾 1 𝑦 𝛾 𝛾 𝑦 𝑑 𝑦 \nu_{\gamma}(\cdot)=Z^{-1}\int_{\cdot}1\wedge\exp((\gamma-1)y)\gamma\exp(-%
\gamma y)dy italic_ν start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( ⋅ ) = italic_Z start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∫ start_POSTSUBSCRIPT ⋅ end_POSTSUBSCRIPT 1 ∧ roman_exp ( ( italic_γ - 1 ) italic_y ) italic_γ roman_exp ( - italic_γ italic_y ) italic_d italic_y where Z 𝑍 Z italic_Z is the normalizing constant.
So then
inf V ( x ) ≤ R 𝒫 γ ( x , ⋅ ) ≥ γ ∗ R γ ∗ − 1 1 − ϵ ν γ ( ⋅ ) . subscript infimum 𝑉 𝑥 𝑅 subscript 𝒫 𝛾 𝑥 ⋅ subscript 𝛾 superscript 𝑅 superscript 𝛾 1 1 italic-ϵ subscript 𝜈 𝛾 ⋅ \inf_{V(x)\leq R}\mathcal{P}_{\gamma}(x,\cdot)\geq\frac{\gamma_{*}}{R^{\frac{%
\gamma*^{-}1}{1-\epsilon}}}\nu_{\gamma}(\cdot). roman_inf start_POSTSUBSCRIPT italic_V ( italic_x ) ≤ italic_R end_POSTSUBSCRIPT caligraphic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_x , ⋅ ) ≥ divide start_ARG italic_γ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT end_ARG start_ARG italic_R start_POSTSUPERSCRIPT divide start_ARG italic_γ ∗ start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT 1 end_ARG start_ARG 1 - italic_ϵ end_ARG end_POSTSUPERSCRIPT end_ARG italic_ν start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( ⋅ ) .
∎
If adaptation diminishes fast enough, Proposition 12 shows the upper bound rate is essentially ( 1 + t ) 1 − ϵ 1 − γ ∗ superscript 1 𝑡 1 italic-ϵ 1 superscript 𝛾 (1+t)^{\frac{1-\epsilon}{1-\gamma^{*}}} ( 1 + italic_t ) start_POSTSUPERSCRIPT divide start_ARG 1 - italic_ϵ end_ARG start_ARG 1 - italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG end_POSTSUPERSCRIPT and depends on the largest tuning parameter γ ∗ superscript 𝛾 \gamma^{*} italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT . This is due to the adaptation plan possibly concentrating on γ ∗ superscript 𝛾 \gamma^{*} italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , which is farthest from the optimal choice. On the other hand, the lower bound rate ( 1 + t ) 1 1 − γ ∗ superscript 1 𝑡 1 1 subscript 𝛾 (1+t)^{\frac{1}{1-\gamma_{*}}} ( 1 + italic_t ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 1 - italic_γ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT end_ARG end_POSTSUPERSCRIPT depends on the smallest tuning parameter γ ∗ subscript 𝛾 \gamma_{*} italic_γ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT being closest to the optimal choice.
In particular, there can be a gap in the upper and lower bounds on the convergence characterized by potential tuning parameters.
In Table 3 , we compare the upper bound convergence rates with ϵ = .01 italic-ϵ .01 \epsilon=.01 italic_ϵ = .01 , G ( t ) = exp ( − t ) 𝐺 𝑡 𝑡 G(t)=\exp(-t) italic_G ( italic_t ) = roman_exp ( - italic_t ) , and δ = ( 1 + c t ) 1 − ϵ 1 − γ ∗ 𝛿 superscript 1 𝑐 𝑡 1 italic-ϵ 1 superscript 𝛾 \delta=(1+ct)^{\frac{1-\epsilon}{1-\gamma^{*}}} italic_δ = ( 1 + italic_c italic_t ) start_POSTSUPERSCRIPT divide start_ARG 1 - italic_ϵ end_ARG start_ARG 1 - italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG end_POSTSUPERSCRIPT .
We observe that the upper bound is sensitive to the tuning of γ ∗ superscript 𝛾 \gamma^{*} italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT .
Table 3: A comparison of the upper bounds from Proposition 12 for the adaptive Metropolis-Hastings independence sampler with ϵ = .01 italic-ϵ .01 \epsilon=.01 italic_ϵ = .01 , G ( t ) = exp ( − t ) 𝐺 𝑡 𝑡 G(t)=\exp(-t) italic_G ( italic_t ) = roman_exp ( - italic_t ) , and δ = ( 1 + c t ) 1 − ϵ 1 − γ ∗ 𝛿 superscript 1 𝑐 𝑡 1 italic-ϵ 1 superscript 𝛾 \delta=(1+ct)^{\frac{1-\epsilon}{1-\gamma^{*}}} italic_δ = ( 1 + italic_c italic_t ) start_POSTSUPERSCRIPT divide start_ARG 1 - italic_ϵ end_ARG start_ARG 1 - italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG end_POSTSUPERSCRIPT .
5 Example: Adaptive random walk Metropolis
Adaptive random-walk Metropolis is a popular simulation algorithm for Bayesian statistics [Haario et al., 2001 ] .
Let U : ℝ d → ℝ : 𝑈 → superscript ℝ 𝑑 ℝ U:\mathbb{R}^{d}\to\mathbb{R} italic_U : blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT → blackboard_R and consider the target measure with with normalizing constant Z = ∫ ℝ d exp ( − U ( x ) ) 𝑑 x 𝑍 subscript superscript ℝ 𝑑 𝑈 𝑥 differential-d 𝑥 Z=\int_{\mathbb{R}^{d}}\exp(-U(x))dx italic_Z = ∫ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_POSTSUBSCRIPT roman_exp ( - italic_U ( italic_x ) ) italic_d italic_x defined by π ( d x ) = Z − 1 exp ( − U ( x ) ) d x 𝜋 𝑑 𝑥 superscript 𝑍 1 𝑈 𝑥 𝑑 𝑥 \pi(dx)=Z^{-1}\exp(-U(x))dx italic_π ( italic_d italic_x ) = italic_Z start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT roman_exp ( - italic_U ( italic_x ) ) italic_d italic_x .
We make the following regularity assumptions on the target π 𝜋 \pi italic_π which have been used previously to show convergence results in MCMC [Douc et al., 2004 ] .
Let ∥ ⋅ ∥ F subscript delimited-∥∥ ⋅ 𝐹 \left\lVert\cdot\right\rVert_{F} ∥ ⋅ ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT denote the Frobenius norm.
Assumption 13 .
Suppose U 𝑈 U italic_U is continuous and twice continuously differentiable and there exists a minimum such that x ∗ subscript 𝑥 x_{*} italic_x start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT such that U ( x ∗ ) = 0 𝑈 subscript 𝑥 0 U(x_{*})=0 italic_U ( italic_x start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) = 0 .
Assume there are constants d k , D k , r > 0 subscript 𝑑 𝑘 subscript 𝐷 𝑘 𝑟
0 d_{k},D_{k},r>0 italic_d start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_D start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_r > 0 for k = 1 , 2 , 3 𝑘 1 2 3
k=1,2,3 italic_k = 1 , 2 , 3 such that for all x ∈ ℝ d 𝑥 superscript ℝ 𝑑 x\in\mathbb{R}^{d} italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT with ∥ x ∥ ≥ R delimited-∥∥ 𝑥 𝑅 \left\lVert x\right\rVert\geq R ∥ italic_x ∥ ≥ italic_R for some R > 0 𝑅 0 R>0 italic_R > 0 ,
d 1 ∥ x ∥ m ≤ U ( x ) ≤ D 1 ∥ x ∥ m , subscript 𝑑 1 superscript delimited-∥∥ 𝑥 𝑚 𝑈 𝑥 subscript 𝐷 1 superscript delimited-∥∥ 𝑥 𝑚 \displaystyle d_{1}\left\lVert x\right\rVert^{m}\leq U(x)\leq D_{1}\left\lVert
x%
\right\rVert^{m}, italic_d start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∥ italic_x ∥ start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ≤ italic_U ( italic_x ) ≤ italic_D start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∥ italic_x ∥ start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ,
(14)
d 2 ∥ x ∥ m − 1 ≤ ∥ ∇ U ( x ) ∥ ≤ D 2 ∥ x ∥ m − 1 , subscript 𝑑 2 superscript delimited-∥∥ 𝑥 𝑚 1 delimited-∥∥ ∇ 𝑈 𝑥 subscript 𝐷 2 superscript delimited-∥∥ 𝑥 𝑚 1 \displaystyle d_{2}\left\lVert x\right\rVert^{m-1}\leq\left\lVert\nabla U(x)%
\right\rVert\leq D_{2}\left\lVert x\right\rVert^{m-1}, italic_d start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∥ italic_x ∥ start_POSTSUPERSCRIPT italic_m - 1 end_POSTSUPERSCRIPT ≤ ∥ ∇ italic_U ( italic_x ) ∥ ≤ italic_D start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∥ italic_x ∥ start_POSTSUPERSCRIPT italic_m - 1 end_POSTSUPERSCRIPT ,
∇ U ( x ) ∥ U ( x ) ∥ ⋅ x ∥ x ∥ ≥ r , ⋅ ∇ 𝑈 𝑥 delimited-∥∥ 𝑈 𝑥 𝑥 delimited-∥∥ 𝑥 𝑟 \displaystyle\frac{\nabla U(x)}{\left\lVert U(x)\right\rVert}\cdot\frac{x}{%
\left\lVert x\right\rVert}\geq r, divide start_ARG ∇ italic_U ( italic_x ) end_ARG start_ARG ∥ italic_U ( italic_x ) ∥ end_ARG ⋅ divide start_ARG italic_x end_ARG start_ARG ∥ italic_x ∥ end_ARG ≥ italic_r ,
(15)
d 3 ∥ x ∥ m − 2 ≤ ∥ ∇ 2 U ( x ) ∥ F ≤ D 3 ∥ x ∥ m − 2 . subscript 𝑑 3 superscript delimited-∥∥ 𝑥 𝑚 2 subscript delimited-∥∥ superscript ∇ 2 𝑈 𝑥 𝐹 subscript 𝐷 3 superscript delimited-∥∥ 𝑥 𝑚 2 \displaystyle d_{3}\left\lVert x\right\rVert^{m-2}\leq\left\lVert\nabla^{2}U(x%
)\right\rVert_{F}\leq D_{3}\left\lVert x\right\rVert^{m-2}. italic_d start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ∥ italic_x ∥ start_POSTSUPERSCRIPT italic_m - 2 end_POSTSUPERSCRIPT ≤ ∥ ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_U ( italic_x ) ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT ≤ italic_D start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ∥ italic_x ∥ start_POSTSUPERSCRIPT italic_m - 2 end_POSTSUPERSCRIPT .
(16)
While Assumption (13 ) is strong, the Weibull distribution is one example (see [Fort and Moulines, 2000 ] ).
Let g K subscript 𝑔 𝐾 g_{K} italic_g start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT be a Lebesgue density on ℝ d superscript ℝ 𝑑 \mathbb{R}^{d} blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT used for the proposal supported on a compact set K ⊂ ℝ d 𝐾 superscript ℝ 𝑑 K\subset\mathbb{R}^{d} italic_K ⊂ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT satisfying
g K ( ξ ) = g K ( ∥ ξ ∥ ) for all ξ ∈ ℝ d , subscript 𝑔 𝐾 𝜉 subscript 𝑔 𝐾 delimited-∥∥ 𝜉 for all 𝜉 superscript ℝ 𝑑 \displaystyle g_{K}(\xi)=g_{K}(\left\lVert\xi\right\rVert)\text{ for all }\xi%
\in\mathbb{R}^{d}, italic_g start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( italic_ξ ) = italic_g start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( ∥ italic_ξ ∥ ) for all italic_ξ ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ,
inf ξ ∈ K g K ( ξ ) > 0 . subscript infimum 𝜉 𝐾 subscript 𝑔 𝐾 𝜉 0 \displaystyle\inf_{\xi\in K}g_{K}(\xi)>0. roman_inf start_POSTSUBSCRIPT italic_ξ ∈ italic_K end_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( italic_ξ ) > 0 .
(17)
For γ ∈ 𝒴 𝛾 𝒴 \gamma\in\mathcal{Y} italic_γ ∈ caligraphic_Y , define the random-walk Metropolis Markov family for x ∈ ℝ d 𝑥 superscript ℝ 𝑑 x\in\mathbb{R}^{d} italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT and Borel A ⊆ ℝ d 𝐴 superscript ℝ 𝑑 A\subseteq\mathbb{R}^{d} italic_A ⊆ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT by
𝒫 γ ( x , A ) = ∫ A a ( x , x + γ 1 / 2 ξ ) g K ( ξ ) 𝑑 ξ + δ x ( A ) R γ ( x ) subscript 𝒫 𝛾 𝑥 𝐴 subscript 𝐴 𝑎 𝑥 𝑥 superscript 𝛾 1 2 𝜉 subscript 𝑔 𝐾 𝜉 differential-d 𝜉 subscript 𝛿 𝑥 𝐴 subscript 𝑅 𝛾 𝑥 \displaystyle\mathcal{P}_{\gamma}(x,A)=\int_{A}a(x,x+\gamma^{1/2}\xi)g_{K}(\xi%
)d\xi+\delta_{x}(A)R_{\gamma}(x) caligraphic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_x , italic_A ) = ∫ start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT italic_a ( italic_x , italic_x + italic_γ start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT italic_ξ ) italic_g start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( italic_ξ ) italic_d italic_ξ + italic_δ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ( italic_A ) italic_R start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_x )
(18)
with acceptance function a ( x , y ) = exp [ U ( x ) − U ( y ) ] ∧ 1 𝑎 𝑥 𝑦 𝑈 𝑥 𝑈 𝑦 1 a(x,y)=\exp[U(x)-U(y)]\wedge 1 italic_a ( italic_x , italic_y ) = roman_exp [ italic_U ( italic_x ) - italic_U ( italic_y ) ] ∧ 1 , and rejection probability R γ ( x ) = 1 − ∫ ℝ d a ( x , x + γ 1 / 2 ξ ) g K ( ξ ) 𝑑 ξ subscript 𝑅 𝛾 𝑥 1 subscript superscript ℝ 𝑑 𝑎 𝑥 𝑥 superscript 𝛾 1 2 𝜉 subscript 𝑔 𝐾 𝜉 differential-d 𝜉 R_{\gamma}(x)=1-\int_{\mathbb{R}^{d}}a(x,x+\gamma^{1/2}\xi)g_{K}(\xi)d\xi italic_R start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_x ) = 1 - ∫ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_a ( italic_x , italic_x + italic_γ start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT italic_ξ ) italic_g start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( italic_ξ ) italic_d italic_ξ .
We define an adaptive random-walk Metropolis process by adapting the covariance of the proposal [Haario et al., 2001 ] with dynamics
Γ t | ( Γ s , X s ) s ≤ t − 1 conditional subscript Γ 𝑡 subscript subscript Γ 𝑠 subscript 𝑋 𝑠 𝑠 𝑡 1 \Gamma_{t}|(\Gamma_{s},X_{s})_{s\leq t-1} roman_Γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | ( roman_Γ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_s ≤ italic_t - 1 end_POSTSUBSCRIPT first updating the covariance matrix, and then X t | Γ t , X t − 1 conditional subscript 𝑋 𝑡 subscript Γ 𝑡 subscript 𝑋 𝑡 1
X_{t}|\Gamma_{t},X_{t-1} italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | roman_Γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT updating the current state with random-walk Metropolis.
For the tuning parameter set 𝒴 𝒴 \mathcal{Y} caligraphic_Y , we consider the set of symmetric positive definite matrices on ℝ d superscript ℝ 𝑑 \mathbb{R}^{d} blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT such that the eigenvalues are bounded by constants λ ∗ , λ ∗ > 0 subscript 𝜆 superscript 𝜆
0 \lambda_{*},\lambda^{*}>0 italic_λ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT , italic_λ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT > 0 , that is,
𝒴 = { γ ∈ ℝ d × d : λ ∗ I ≤ γ ≤ λ ∗ I , γ T = γ } . 𝒴 conditional-set 𝛾 superscript ℝ 𝑑 𝑑 formulae-sequence subscript 𝜆 𝐼 𝛾 superscript 𝜆 𝐼 superscript 𝛾 𝑇 𝛾 \displaystyle\mathcal{Y}=\left\{\gamma\in\mathbb{R}^{d\times d}:\lambda_{*}I%
\leq\gamma\leq\lambda^{*}I,\gamma^{T}=\gamma\right\}. caligraphic_Y = { italic_γ ∈ blackboard_R start_POSTSUPERSCRIPT italic_d × italic_d end_POSTSUPERSCRIPT : italic_λ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT italic_I ≤ italic_γ ≤ italic_λ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT italic_I , italic_γ start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT = italic_γ } .
(19)
One example is to adapt a sample covariance matrix scaled by h > 0 ℎ 0 h>0 italic_h > 0 using the following identity
Γ t = h t ∑ s = 0 t ( X s − X ¯ t ) ( X s − X ¯ t ) T = t − 1 t Γ t − 1 + h t + 1 ( X t − X ¯ t − 1 ) ( X t − X ¯ t − 1 ) T subscript Γ 𝑡 ℎ 𝑡 superscript subscript 𝑠 0 𝑡 subscript 𝑋 𝑠 subscript ¯ 𝑋 𝑡 superscript subscript 𝑋 𝑠 subscript ¯ 𝑋 𝑡 𝑇 𝑡 1 𝑡 subscript Γ 𝑡 1 ℎ 𝑡 1 subscript 𝑋 𝑡 subscript ¯ 𝑋 𝑡 1 superscript subscript 𝑋 𝑡 subscript ¯ 𝑋 𝑡 1 𝑇 \displaystyle\Gamma_{t}=\frac{h}{t}\sum_{s=0}^{t}(X_{s}-\bar{X}_{t})(X_{s}-%
\bar{X}_{t})^{T}=\frac{t-1}{t}\Gamma_{t-1}+\frac{h}{t+1}(X_{t}-\bar{X}_{t-1})(%
X_{t}-\bar{X}_{t-1})^{T} roman_Γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = divide start_ARG italic_h end_ARG start_ARG italic_t end_ARG ∑ start_POSTSUBSCRIPT italic_s = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( italic_X start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT - over¯ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ( italic_X start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT - over¯ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT = divide start_ARG italic_t - 1 end_ARG start_ARG italic_t end_ARG roman_Γ start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT + divide start_ARG italic_h end_ARG start_ARG italic_t + 1 end_ARG ( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - over¯ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) ( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - over¯ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT
(20)
where X ¯ t = ( t + 1 ) − 1 ∑ s = 0 t X s subscript ¯ 𝑋 𝑡 superscript 𝑡 1 1 superscript subscript 𝑠 0 𝑡 subscript 𝑋 𝑠 \bar{X}_{t}=(t+1)^{-1}\sum_{s=0}^{t}X_{s} over¯ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = ( italic_t + 1 ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_s = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_X start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT [Haario et al., 2001 , Andrieu and Moulines, 2006 ] .
The set 𝒴 𝒴 \mathcal{Y} caligraphic_Y is convex and one way to ensure the updates remain in 𝒴 𝒴 \mathcal{Y} caligraphic_Y is to truncate the eigenvalues of (20 ).
Under Assumption (13 ), we first obtain a lower bound on the convergence rate for the adaptive random-walk Metropolis process.
Proposition 14 .
For t ∈ ℤ + 𝑡 subscript ℤ t\in\mathbb{Z}_{+} italic_t ∈ blackboard_Z start_POSTSUBSCRIPT + end_POSTSUBSCRIPT , let 𝒜 𝒬 ( t ) ( γ 0 , x 0 , ⋅ ) superscript subscript 𝒜 𝒬 𝑡 subscript 𝛾 0 subscript 𝑥 0 ⋅ \mathcal{A}_{\mathcal{Q}}^{(t)}(\gamma_{0},x_{0},\cdot) caligraphic_A start_POSTSUBSCRIPT caligraphic_Q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ( italic_γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , ⋅ ) be the marginal of the adaptive random-walk Metropolis process initialized at x 0 , γ 0 ∈ ℝ d × 𝒴 subscript 𝑥 0 subscript 𝛾 0
superscript ℝ 𝑑 𝒴 x_{0},\gamma_{0}\in\mathbb{R}^{d}\times\mathcal{Y} italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT × caligraphic_Y from the Metropolis-Hastings family (18 ) and adaptation parameter set (19 ).
If Assumption (13 ) holds for π 𝜋 \pi italic_π , then there are constants c ∗ > 0 subscript 𝑐 0 c_{*}>0 italic_c start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT > 0 depending on m 𝑚 m italic_m and M ∗ subscript 𝑀 M_{*} italic_M start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT depending on m 𝑚 m italic_m and x 0 subscript 𝑥 0 x_{0} italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT such that
inf 𝒬 ∈ 𝒮 ( 𝒳 , 𝒴 ) 𝒲 ∥ ⋅ ∥ ∧ 1 ( 𝒜 𝒬 ( t ) ( γ 0 , x 0 , ⋅ ) , π ) ≥ M ∗ exp ( − c ∗ t m 2 − m ) . subscript infimum 𝒬 𝒮 𝒳 𝒴 subscript 𝒲 delimited-∥∥ ⋅ 1 superscript subscript 𝒜 𝒬 𝑡 subscript 𝛾 0 subscript 𝑥 0 ⋅ 𝜋 subscript 𝑀 subscript 𝑐 superscript 𝑡 𝑚 2 𝑚 \displaystyle\inf_{\mathcal{Q}\in\mathcal{S}(\mathcal{X},\mathcal{Y})}\mathcal%
{W}_{\left\lVert\cdot\right\rVert\wedge 1}\left(\mathcal{A}_{\mathcal{Q}}^{(t)%
}(\gamma_{0},x_{0},\cdot),\pi\right)\geq M_{*}\exp\left(-c_{*}{t}^{\frac{m}{2-%
m}}\right). roman_inf start_POSTSUBSCRIPT caligraphic_Q ∈ caligraphic_S ( caligraphic_X , caligraphic_Y ) end_POSTSUBSCRIPT caligraphic_W start_POSTSUBSCRIPT ∥ ⋅ ∥ ∧ 1 end_POSTSUBSCRIPT ( caligraphic_A start_POSTSUBSCRIPT caligraphic_Q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ( italic_γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , ⋅ ) , italic_π ) ≥ italic_M start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT roman_exp ( - italic_c start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT italic_t start_POSTSUPERSCRIPT divide start_ARG italic_m end_ARG start_ARG 2 - italic_m end_ARG end_POSTSUPERSCRIPT ) .
In order to proceed, we will first establish a simultaneous growth condition on the Markov family.
Lemma 15 .
With W ( x ) = exp ( U ( x ) ) 𝑊 𝑥 𝑈 𝑥 W(x)=\exp(U(x)) italic_W ( italic_x ) = roman_exp ( italic_U ( italic_x ) ) , there are constants M 0 , N 0 , L > 0 subscript 𝑀 0 subscript 𝑁 0 𝐿
0 M_{0},N_{0},L>0 italic_M start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_N start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_L > 0 depending on m 𝑚 m italic_m such that for any α > 0 𝛼 0 \alpha>0 italic_α > 0 and x , γ ∈ 𝒳 × 𝒴 𝑥 𝛾
𝒳 𝒴 x,\gamma\in\mathcal{X}\times\mathcal{Y} italic_x , italic_γ ∈ caligraphic_X × caligraphic_Y ,
( 𝒫 γ W α ) ( x ) − W α ( x ) subscript 𝒫 𝛾 superscript 𝑊 𝛼 𝑥 superscript 𝑊 𝛼 𝑥 \displaystyle(\mathcal{P}_{\gamma}W^{\alpha})(x)-W^{\alpha}(x) ( caligraphic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT italic_W start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ) ( italic_x ) - italic_W start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ( italic_x )
≤ M 0 α 3 − 2 / m [ α − N 0 ] φ ( W α ( x ) ) + L absent subscript 𝑀 0 superscript 𝛼 3 2 𝑚 delimited-[] 𝛼 subscript 𝑁 0 𝜑 superscript 𝑊 𝛼 𝑥 𝐿 \displaystyle\leq M_{0}\alpha^{3-2/m}\left[\alpha-N_{0}\right]\varphi(W^{%
\alpha}(x))+L ≤ italic_M start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_α start_POSTSUPERSCRIPT 3 - 2 / italic_m end_POSTSUPERSCRIPT [ italic_α - italic_N start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ] italic_φ ( italic_W start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ( italic_x ) ) + italic_L
(21)
where for K m = exp ( 2 / ( m − 2 ) + 1 ) subscript 𝐾 𝑚 2 𝑚 2 1 K_{m}=\exp(2/(m-2)+1) italic_K start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT = roman_exp ( 2 / ( italic_m - 2 ) + 1 ) and w ≥ 1 𝑤 1 w\geq 1 italic_w ≥ 1 ,
φ ( w ) = w + K m log [ w + K m ] 2 / m − 2 . \varphi(w)=\frac{w+K_{m}}{\log\left[w+K_{m}\right]^{2/m-2}}. italic_φ ( italic_w ) = divide start_ARG italic_w + italic_K start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_ARG start_ARG roman_log [ italic_w + italic_K start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT 2 / italic_m - 2 end_POSTSUPERSCRIPT end_ARG .
Proof.
Similar to [Fort and Moulines, 2000 , Lemma B.4] , using the fundamental theorem of calculus and (15 )
sup ξ ∈ K W α ( x + γ 1 / 2 ξ ) W α ( x ) subscript supremum 𝜉 𝐾 superscript 𝑊 𝛼 𝑥 superscript 𝛾 1 2 𝜉 superscript 𝑊 𝛼 𝑥 \displaystyle\sup_{\xi\in K}\frac{W^{\alpha}(x+\gamma^{1/2}\xi)}{W^{\alpha}(x)} roman_sup start_POSTSUBSCRIPT italic_ξ ∈ italic_K end_POSTSUBSCRIPT divide start_ARG italic_W start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ( italic_x + italic_γ start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT italic_ξ ) end_ARG start_ARG italic_W start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ( italic_x ) end_ARG
= 1 + ∫ 0 1 W α ( x + t γ 1 / 2 ξ ) W α ( x ) γ ∇ U ( x + t γ ξ ) 𝑑 t absent 1 superscript subscript 0 1 superscript 𝑊 𝛼 𝑥 𝑡 superscript 𝛾 1 2 𝜉 superscript 𝑊 𝛼 𝑥 𝛾 ∇ 𝑈 𝑥 𝑡 𝛾 𝜉 differential-d 𝑡 \displaystyle=1+\int_{0}^{1}\frac{W^{\alpha}(x+t\gamma^{1/2}\xi)}{W^{\alpha}(x%
)}\gamma\nabla U(x+t\gamma\xi)dt = 1 + ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT divide start_ARG italic_W start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ( italic_x + italic_t italic_γ start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT italic_ξ ) end_ARG start_ARG italic_W start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ( italic_x ) end_ARG italic_γ ∇ italic_U ( italic_x + italic_t italic_γ italic_ξ ) italic_d italic_t
≤ 1 + ∥ γ ∥ sup ξ ∈ K ∥ ξ ∥ sup ξ ∈ K W α ( x + t γ 1 / 2 ξ ) W α ( x ) ∫ 0 1 ∥ x + t γ 1 / 2 ξ ∥ m − 1 𝑑 t . absent 1 delimited-∥∥ 𝛾 subscript supremum 𝜉 𝐾 delimited-∥∥ 𝜉 subscript supremum 𝜉 𝐾 superscript 𝑊 𝛼 𝑥 𝑡 superscript 𝛾 1 2 𝜉 superscript 𝑊 𝛼 𝑥 superscript subscript 0 1 superscript delimited-∥∥ 𝑥 𝑡 superscript 𝛾 1 2 𝜉 𝑚 1 differential-d 𝑡 \displaystyle\leq 1+\left\lVert\gamma\right\rVert\sup_{\xi\in K}\left\lVert\xi%
\right\rVert\sup_{\xi\in K}\frac{W^{\alpha}(x+t\gamma^{1/2}\xi)}{W^{\alpha}(x)%
}\int_{0}^{1}\left\lVert x+t\gamma^{1/2}\xi\right\rVert^{m-1}dt. ≤ 1 + ∥ italic_γ ∥ roman_sup start_POSTSUBSCRIPT italic_ξ ∈ italic_K end_POSTSUBSCRIPT ∥ italic_ξ ∥ roman_sup start_POSTSUBSCRIPT italic_ξ ∈ italic_K end_POSTSUBSCRIPT divide start_ARG italic_W start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ( italic_x + italic_t italic_γ start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT italic_ξ ) end_ARG start_ARG italic_W start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ( italic_x ) end_ARG ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ∥ italic_x + italic_t italic_γ start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT italic_ξ ∥ start_POSTSUPERSCRIPT italic_m - 1 end_POSTSUPERSCRIPT italic_d italic_t .
It follows that
lim r → ∞ sup ∥ x ∥ ≥ r sup ξ ∈ K W α ( x + γ 1 / 2 ξ ) W α ( x ) subscript → 𝑟 subscript supremum delimited-∥∥ 𝑥 𝑟 subscript supremum 𝜉 𝐾 superscript 𝑊 𝛼 𝑥 superscript 𝛾 1 2 𝜉 superscript 𝑊 𝛼 𝑥 \displaystyle\lim_{r\to\infty}\sup_{\left\lVert x\right\rVert\geq r}\sup_{\xi%
\in K}\frac{W^{\alpha}(x+\gamma^{1/2}\xi)}{W^{\alpha}(x)} roman_lim start_POSTSUBSCRIPT italic_r → ∞ end_POSTSUBSCRIPT roman_sup start_POSTSUBSCRIPT ∥ italic_x ∥ ≥ italic_r end_POSTSUBSCRIPT roman_sup start_POSTSUBSCRIPT italic_ξ ∈ italic_K end_POSTSUBSCRIPT divide start_ARG italic_W start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ( italic_x + italic_γ start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT italic_ξ ) end_ARG start_ARG italic_W start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ( italic_x ) end_ARG
≤ 1 . absent 1 \displaystyle\leq 1. ≤ 1 .
Using the fundamental theorem of calculus twice with (15 ) and (16 ), there is a constant M > 0 𝑀 0 M>0 italic_M > 0 such that for large enough ∥ x ∥ delimited-∥∥ 𝑥 \left\lVert x\right\rVert ∥ italic_x ∥
| W α ( x + γ 1 / 2 ξ ) W α ( x ) − 1 − α γ 1 / 2 ∇ U ( x ) ⋅ ξ | superscript 𝑊 𝛼 𝑥 superscript 𝛾 1 2 𝜉 superscript 𝑊 𝛼 𝑥 1 ⋅ 𝛼 superscript 𝛾 1 2 ∇ 𝑈 𝑥 𝜉 \displaystyle\left|\frac{W^{\alpha}(x+\gamma^{1/2}\xi)}{W^{\alpha}(x)}-1-%
\alpha\gamma^{1/2}\nabla U(x)\cdot\xi\right| | divide start_ARG italic_W start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ( italic_x + italic_γ start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT italic_ξ ) end_ARG start_ARG italic_W start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ( italic_x ) end_ARG - 1 - italic_α italic_γ start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT ∇ italic_U ( italic_x ) ⋅ italic_ξ |
≤ ∫ 0 1 W α ( x + t C ξ ) W α ( x ) { α 2 [ γ 1 / 2 ∇ U ( x + t γ 1 / 2 ξ ) ⋅ ξ ] 2 + α ξ ⋅ γ 1 / 2 ∇ 2 U ( x + t γ 1 / 2 ξ ) γ 1 / 2 ξ } ( 1 − t ) 𝑑 t absent superscript subscript 0 1 superscript 𝑊 𝛼 𝑥 𝑡 𝐶 𝜉 superscript 𝑊 𝛼 𝑥 superscript 𝛼 2 superscript delimited-[] ⋅ superscript 𝛾 1 2 ∇ 𝑈 𝑥 𝑡 superscript 𝛾 1 2 𝜉 𝜉 2 ⋅ 𝛼 𝜉 superscript 𝛾 1 2 superscript ∇ 2 𝑈 𝑥 𝑡 superscript 𝛾 1 2 𝜉 superscript 𝛾 1 2 𝜉 1 𝑡 differential-d 𝑡 \displaystyle\leq\int_{0}^{1}\frac{W^{\alpha}(x+tC\xi)}{W^{\alpha}(x)}\left\{%
\alpha^{2}[\gamma^{1/2}\nabla U(x+t\gamma^{1/2}\xi)\cdot\xi]^{2}+\alpha\xi%
\cdot\gamma^{1/2}\nabla^{2}U(x+t\gamma^{1/2}\xi)\gamma^{1/2}\xi\right\}(1-t)dt ≤ ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT divide start_ARG italic_W start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ( italic_x + italic_t italic_C italic_ξ ) end_ARG start_ARG italic_W start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ( italic_x ) end_ARG { italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT [ italic_γ start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT ∇ italic_U ( italic_x + italic_t italic_γ start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT italic_ξ ) ⋅ italic_ξ ] start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_α italic_ξ ⋅ italic_γ start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_U ( italic_x + italic_t italic_γ start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT italic_ξ ) italic_γ start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT italic_ξ } ( 1 - italic_t ) italic_d italic_t
≤ α 2 M ∥ x ∥ 2 ( m − 1 ) . absent superscript 𝛼 2 𝑀 superscript delimited-∥∥ 𝑥 2 𝑚 1 \displaystyle\leq\alpha^{2}M\left\lVert x\right\rVert^{2(m-1)}. ≤ italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_M ∥ italic_x ∥ start_POSTSUPERSCRIPT 2 ( italic_m - 1 ) end_POSTSUPERSCRIPT .
Similarly, we obtain
| exp ( U ( x ) − U ( x + γ 1 / 2 ξ ) ) − 1 + γ 1 / 2 ∇ U ( x ) ⋅ ξ | ≤ M ∥ x ∥ 2 ( m − 1 ) . 𝑈 𝑥 𝑈 𝑥 superscript 𝛾 1 2 𝜉 1 ⋅ superscript 𝛾 1 2 ∇ 𝑈 𝑥 𝜉 𝑀 superscript delimited-∥∥ 𝑥 2 𝑚 1 \left|\exp(U(x)-U(x+\gamma^{1/2}\xi))-1+\gamma^{1/2}\nabla U(x)\cdot\xi\right|%
\leq M\left\lVert x\right\rVert^{2(m-1)}. | roman_exp ( italic_U ( italic_x ) - italic_U ( italic_x + italic_γ start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT italic_ξ ) ) - 1 + italic_γ start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT ∇ italic_U ( italic_x ) ⋅ italic_ξ | ≤ italic_M ∥ italic_x ∥ start_POSTSUPERSCRIPT 2 ( italic_m - 1 ) end_POSTSUPERSCRIPT .
Let r γ ( x ) = { y ∈ ℝ d : U ( x ) < U ( x + γ 1 / 2 y ) } subscript 𝑟 𝛾 𝑥 conditional-set 𝑦 superscript ℝ 𝑑 𝑈 𝑥 𝑈 𝑥 superscript 𝛾 1 2 𝑦 r_{\gamma}(x)=\{y\in\mathbb{R}^{d}:U(x)<U(x+\gamma^{1/2}y)\} italic_r start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_x ) = { italic_y ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT : italic_U ( italic_x ) < italic_U ( italic_x + italic_γ start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT italic_y ) } denote the rejection region.
By [Fort and Moulines, 2000 , Lemma B.3] combined with (15 ), there is a constant a > 0 𝑎 0 a>0 italic_a > 0 such that for large enough ∥ x ∥ delimited-∥∥ 𝑥 \left\lVert x\right\rVert ∥ italic_x ∥
∫ r γ ( x ) [ γ 1 / 2 ∇ U ( x ) ⋅ ξ ] 2 g K ( ξ ) 𝑑 ξ ≥ a ∥ ∇ U ( x ) ∥ 2 . subscript subscript 𝑟 𝛾 𝑥 superscript delimited-[] ⋅ superscript 𝛾 1 2 ∇ 𝑈 𝑥 𝜉 2 subscript 𝑔 𝐾 𝜉 differential-d 𝜉 𝑎 superscript delimited-∥∥ ∇ 𝑈 𝑥 2 \int_{r_{\gamma}(x)}\left[\gamma^{1/2}\nabla U(x)\cdot\xi\right]^{2}g_{K}(\xi)%
d\xi\geq a\left\lVert\nabla U(x)\right\rVert^{2}. ∫ start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_x ) end_POSTSUBSCRIPT [ italic_γ start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT ∇ italic_U ( italic_x ) ⋅ italic_ξ ] start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_g start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( italic_ξ ) italic_d italic_ξ ≥ italic_a ∥ ∇ italic_U ( italic_x ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .
Applying these bounds, for large enough ∥ x ∥ delimited-∥∥ 𝑥 \left\lVert x\right\rVert ∥ italic_x ∥ ,
( 𝒫 γ W α ) ( x ) W α ( x ) − 1 subscript 𝒫 𝛾 superscript 𝑊 𝛼 𝑥 superscript 𝑊 𝛼 𝑥 1 \displaystyle\frac{(\mathcal{P}_{\gamma}W^{\alpha})(x)}{W^{\alpha}(x)}-1 divide start_ARG ( caligraphic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT italic_W start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ) ( italic_x ) end_ARG start_ARG italic_W start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ( italic_x ) end_ARG - 1
≤ α ∫ ℝ d γ 1 / 2 ∇ U ( x ) ⋅ ξ a ( x , x + γ 1 / 2 ξ ) g K ( ξ ) 𝑑 ξ + M α 2 ∥ x ∥ 2 ( m − 1 ) absent 𝛼 subscript superscript ℝ 𝑑 ⋅ superscript 𝛾 1 2 ∇ 𝑈 𝑥 𝜉 𝑎 𝑥 𝑥 superscript 𝛾 1 2 𝜉 subscript 𝑔 𝐾 𝜉 differential-d 𝜉 𝑀 superscript 𝛼 2 superscript delimited-∥∥ 𝑥 2 𝑚 1 \displaystyle\leq\alpha\int_{\mathbb{R}^{d}}\gamma^{1/2}\nabla U(x)\cdot\xi a(%
x,x+\gamma^{1/2}\xi)g_{K}(\xi)d\xi+M\alpha^{2}\left\lVert x\right\rVert^{2(m-1)} ≤ italic_α ∫ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_γ start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT ∇ italic_U ( italic_x ) ⋅ italic_ξ italic_a ( italic_x , italic_x + italic_γ start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT italic_ξ ) italic_g start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( italic_ξ ) italic_d italic_ξ + italic_M italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ italic_x ∥ start_POSTSUPERSCRIPT 2 ( italic_m - 1 ) end_POSTSUPERSCRIPT
= α ∫ r γ ( x ) γ 1 / 2 ∇ U ( x ) ⋅ ξ ( exp [ U ( x ) − U ( x + γ 1 / 2 ξ ) ] − 1 ) g K ( ξ ) 𝑑 ξ + M α 2 ∥ x ∥ 2 ( m − 1 ) absent 𝛼 subscript subscript 𝑟 𝛾 𝑥 ⋅ superscript 𝛾 1 2 ∇ 𝑈 𝑥 𝜉 𝑈 𝑥 𝑈 𝑥 superscript 𝛾 1 2 𝜉 1 subscript 𝑔 𝐾 𝜉 differential-d 𝜉 𝑀 superscript 𝛼 2 superscript delimited-∥∥ 𝑥 2 𝑚 1 \displaystyle=\alpha\int_{r_{\gamma}(x)}\gamma^{1/2}\nabla U(x)\cdot\xi\left(%
\exp[U(x)-U(x+\gamma^{1/2}\xi)]-1\right)g_{K}(\xi)d\xi+M\alpha^{2}\left\lVert x%
\right\rVert^{2(m-1)} = italic_α ∫ start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_x ) end_POSTSUBSCRIPT italic_γ start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT ∇ italic_U ( italic_x ) ⋅ italic_ξ ( roman_exp [ italic_U ( italic_x ) - italic_U ( italic_x + italic_γ start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT italic_ξ ) ] - 1 ) italic_g start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( italic_ξ ) italic_d italic_ξ + italic_M italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ italic_x ∥ start_POSTSUPERSCRIPT 2 ( italic_m - 1 ) end_POSTSUPERSCRIPT
≤ − α ∫ r γ ( x ) [ γ 1 / 2 ∇ U ( x ) ⋅ ξ ] 2 𝑑 G ( ξ ) + M α 2 ∥ x ∥ 2 ( m − 1 ) absent 𝛼 subscript subscript 𝑟 𝛾 𝑥 superscript delimited-[] ⋅ superscript 𝛾 1 2 ∇ 𝑈 𝑥 𝜉 2 differential-d 𝐺 𝜉 𝑀 superscript 𝛼 2 superscript delimited-∥∥ 𝑥 2 𝑚 1 \displaystyle\leq-\alpha\int_{r_{\gamma}(x)}\left[\gamma^{1/2}\nabla U(x)\cdot%
\xi\right]^{2}dG(\xi)+M\alpha^{2}\left\lVert x\right\rVert^{2(m-1)} ≤ - italic_α ∫ start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_x ) end_POSTSUBSCRIPT [ italic_γ start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT ∇ italic_U ( italic_x ) ⋅ italic_ξ ] start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_d italic_G ( italic_ξ ) + italic_M italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ italic_x ∥ start_POSTSUPERSCRIPT 2 ( italic_m - 1 ) end_POSTSUPERSCRIPT
≤ − α a ∥ ∇ U ( x ) ∥ 2 + M α 2 ∥ x ∥ 2 ( m − 1 ) . absent 𝛼 𝑎 superscript delimited-∥∥ ∇ 𝑈 𝑥 2 𝑀 superscript 𝛼 2 superscript delimited-∥∥ 𝑥 2 𝑚 1 \displaystyle\leq-\alpha a\left\lVert\nabla U(x)\right\rVert^{2}+M\alpha^{2}%
\left\lVert x\right\rVert^{2(m-1)}. ≤ - italic_α italic_a ∥ ∇ italic_U ( italic_x ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_M italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ italic_x ∥ start_POSTSUPERSCRIPT 2 ( italic_m - 1 ) end_POSTSUPERSCRIPT .
Applying (14 ) and (15 ), there are constants M 0 , N 0 , K m > 0 subscript 𝑀 0 subscript 𝑁 0 subscript 𝐾 𝑚
0 M_{0},N_{0},K_{m}>0 italic_M start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_N start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_K start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT > 0 such that ∥ x ∥ delimited-∥∥ 𝑥 \left\lVert x\right\rVert ∥ italic_x ∥ sufficiently large
( 𝒫 γ W α ) ( x ) − W α ( x ) subscript 𝒫 𝛾 superscript 𝑊 𝛼 𝑥 superscript 𝑊 𝛼 𝑥 \displaystyle(\mathcal{P}_{\gamma}W^{\alpha})(x)-W^{\alpha}(x) ( caligraphic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT italic_W start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ) ( italic_x ) - italic_W start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ( italic_x )
≤ M 0 α [ α − N 0 ] W α ( x ) U ( x ) 2 / m − 2 absent subscript 𝑀 0 𝛼 delimited-[] 𝛼 subscript 𝑁 0 superscript 𝑊 𝛼 𝑥 𝑈 superscript 𝑥 2 𝑚 2 \displaystyle\leq M_{0}\alpha\left[\alpha-N_{0}\right]\frac{W^{\alpha}(x)}{U(x%
)^{2/m-2}} ≤ italic_M start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_α [ italic_α - italic_N start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ] divide start_ARG italic_W start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ( italic_x ) end_ARG start_ARG italic_U ( italic_x ) start_POSTSUPERSCRIPT 2 / italic_m - 2 end_POSTSUPERSCRIPT end_ARG
≤ M 0 α 3 − 2 / m [ α − N 0 ] W α ( x ) + K m log ( W α ( x ) + K m ) 2 / m − 2 . \displaystyle\leq M_{0}\alpha^{3-2/m}\left[\alpha-N_{0}\right]\frac{W^{\alpha}%
(x)+K_{m}}{\log(W^{\alpha}(x)+K_{m})^{2/m-2}}. ≤ italic_M start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_α start_POSTSUPERSCRIPT 3 - 2 / italic_m end_POSTSUPERSCRIPT [ italic_α - italic_N start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ] divide start_ARG italic_W start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ( italic_x ) + italic_K start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_ARG start_ARG roman_log ( italic_W start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ( italic_x ) + italic_K start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 / italic_m - 2 end_POSTSUPERSCRIPT end_ARG .
For small ∥ x ∥ delimited-∥∥ 𝑥 \left\lVert x\right\rVert ∥ italic_x ∥ , we have by continuity, the sub-level sets of W 𝑊 W italic_W are compact and ( 𝒫 γ W α ) ( x ) − W α ( x ) subscript 𝒫 𝛾 superscript 𝑊 𝛼 𝑥 superscript 𝑊 𝛼 𝑥 (\mathcal{P}_{\gamma}W^{\alpha})(x)-W^{\alpha}(x) ( caligraphic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT italic_W start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ) ( italic_x ) - italic_W start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ( italic_x ) is bounded on compact sets, so the conclusion follows at once.
∎
We may now apply Lemma 15 to obtain the lower bound.
Proof of Proposition 14 .
Changing to polar coordinates, we have for r 𝑟 r italic_r large enough
π ( exp ( U ( x ) ) ≥ r ) 𝜋 𝑈 𝑥 𝑟 \displaystyle\pi\left(\exp(U(x))\geq r\right) italic_π ( roman_exp ( italic_U ( italic_x ) ) ≥ italic_r )
≥ π ( ∥ x ∥ m ≥ log ( r ) ) absent 𝜋 superscript delimited-∥∥ 𝑥 𝑚 𝑟 \displaystyle\geq\pi\left(\left\lVert x\right\rVert^{m}\geq\log(r)\right) ≥ italic_π ( ∥ italic_x ∥ start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ≥ roman_log ( italic_r ) )
≥ 2 π d / 2 Z Γ ( d / 2 ) ∫ s m ≥ r s d − 1 exp ( − s m ) 𝑑 s absent 2 superscript 𝜋 𝑑 2 𝑍 Γ 𝑑 2 subscript superscript 𝑠 𝑚 𝑟 superscript 𝑠 𝑑 1 superscript 𝑠 𝑚 differential-d 𝑠 \displaystyle\geq\frac{2\pi^{d/2}}{Z\Gamma(d/2)}\int_{s^{m}\geq r}s^{d-1}\exp(%
-s^{m})ds ≥ divide start_ARG 2 italic_π start_POSTSUPERSCRIPT italic_d / 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_Z roman_Γ ( italic_d / 2 ) end_ARG ∫ start_POSTSUBSCRIPT italic_s start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ≥ italic_r end_POSTSUBSCRIPT italic_s start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT roman_exp ( - italic_s start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ) italic_d italic_s
≥ 2 π d / 2 Z m Γ ( d / 2 ) 1 r . absent 2 superscript 𝜋 𝑑 2 𝑍 𝑚 Γ 𝑑 2 1 𝑟 \displaystyle\geq\frac{2\pi^{d/2}}{Zm\Gamma(d/2)}\frac{1}{r}. ≥ divide start_ARG 2 italic_π start_POSTSUPERSCRIPT italic_d / 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_Z italic_m roman_Γ ( italic_d / 2 ) end_ARG divide start_ARG 1 end_ARG start_ARG italic_r end_ARG .
where Z 𝑍 Z italic_Z is the normalizing constant and Γ ( t ) = ∫ 0 ∞ u t − 1 exp ( − u ) 𝑑 u Γ 𝑡 superscript subscript 0 superscript 𝑢 𝑡 1 𝑢 differential-d 𝑢 \Gamma(t)=\int_{0}^{\infty}u^{t-1}\exp(-u)du roman_Γ ( italic_t ) = ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_u start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT roman_exp ( - italic_u ) italic_d italic_u for t > 0 𝑡 0 t>0 italic_t > 0 is the Gamma function.
By Lemma 15 , for α 𝛼 \alpha italic_α sufficiently large, we have constants M , K m > 0 𝑀 subscript 𝐾 𝑚
0 M,K_{m}>0 italic_M , italic_K start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT > 0 depending on α , m 𝛼 𝑚
\alpha,m italic_α , italic_m such that
( 𝒫 γ W α ) ( x ) − W α ( x ) subscript 𝒫 𝛾 superscript 𝑊 𝛼 𝑥 superscript 𝑊 𝛼 𝑥 \displaystyle(\mathcal{P}_{\gamma}W^{\alpha})(x)-W^{\alpha}(x) ( caligraphic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT italic_W start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ) ( italic_x ) - italic_W start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ( italic_x )
≤ M W α ( x ) + K m log ( W α ( x ) + K m ) 2 / m − 2 \displaystyle\leq M\frac{W^{\alpha}(x)+K_{m}}{\log(W^{\alpha}(x)+K_{m})^{2/m-2}} ≤ italic_M divide start_ARG italic_W start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ( italic_x ) + italic_K start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_ARG start_ARG roman_log ( italic_W start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ( italic_x ) + italic_K start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 / italic_m - 2 end_POSTSUPERSCRIPT end_ARG
holds for all x , γ ∈ ℝ d × 𝒴 𝑥 𝛾
superscript ℝ 𝑑 𝒴 x,\gamma\in\mathbb{R}^{d}\times\mathcal{Y} italic_x , italic_γ ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT × caligraphic_Y .
Therefore, there is a constant c m > 0 subscript 𝑐 𝑚 0 c_{m}>0 italic_c start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT > 0 depending on m 𝑚 m italic_m such that
H φ , W α ( x 0 ) ( u ) subscript 𝐻 𝜑 superscript 𝑊 𝛼 subscript 𝑥 0
𝑢 \displaystyle H_{\varphi,W^{\alpha}(x_{0})}(u) italic_H start_POSTSUBSCRIPT italic_φ , italic_W start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT ( italic_u )
= M ∫ W α ( x ) u log ( x + K m ) 2 / m − 2 x + K m 𝑑 x \displaystyle=M\int_{W^{\alpha}(x)}^{u}\frac{\log(x+K_{m})^{2/m-2}}{x+K_{m}}dx = italic_M ∫ start_POSTSUBSCRIPT italic_W start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ( italic_x ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT divide start_ARG roman_log ( italic_x + italic_K start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 / italic_m - 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_x + italic_K start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_ARG italic_d italic_x
H φ , W α ( x 0 ) − 1 ( t ) superscript subscript 𝐻 𝜑 superscript 𝑊 𝛼 subscript 𝑥 0
1 𝑡 \displaystyle H_{\varphi,W^{\alpha}(x_{0})}^{-1}(t) italic_H start_POSTSUBSCRIPT italic_φ , italic_W start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_t )
≤ M exp ( α U ( x 0 ) ) exp ( c m t m 2 − m ) . absent 𝑀 𝛼 𝑈 subscript 𝑥 0 subscript 𝑐 𝑚 superscript 𝑡 𝑚 2 𝑚 \displaystyle\leq M\exp(\alpha U(x_{0}))\exp\left(c_{m}t^{\frac{m}{2-m}}\right). ≤ italic_M roman_exp ( italic_α italic_U ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ) roman_exp ( italic_c start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT italic_t start_POSTSUPERSCRIPT divide start_ARG italic_m end_ARG start_ARG 2 - italic_m end_ARG end_POSTSUPERSCRIPT ) .
Since W ( ⋅ ) 𝑊 ⋅ W(\cdot) italic_W ( ⋅ ) has compact sublevel sets, the lower bound then follows by Theorem 4 .
∎
We investigate now an upper bound with the expected diminishing adaptation condition (7 ) that can approximately achieve the lower bound rate.
The following upper and lower bounds show that the convergence of adaptive random-walk Metropolis in this situation is not geometric.
One drawback is that we do not obtain explicit constants in the upper and lower bounds.
Proposition 16 .
For t ∈ ℤ + 𝑡 subscript ℤ t\in\mathbb{Z}_{+} italic_t ∈ blackboard_Z start_POSTSUBSCRIPT + end_POSTSUBSCRIPT , let 𝒜 𝒬 ( t ) ( γ 0 , x 0 , ⋅ ) superscript subscript 𝒜 𝒬 𝑡 subscript 𝛾 0 subscript 𝑥 0 ⋅ \mathcal{A}_{\mathcal{Q}}^{(t)}(\gamma_{0},x_{0},\cdot) caligraphic_A start_POSTSUBSCRIPT caligraphic_Q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ( italic_γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , ⋅ ) be the marginal of an adaptive random-walk Metropolis process as in Proposition 14 .
Assume the proposal g 𝑔 g italic_g is a truncated Gaussian on a centered closed ball and
sup x ∈ 𝒳 𝔼 ( ∥ Γ t + 1 − Γ t ∥ F ∣ X t = x ) ≤ G ( t ) subscript supremum 𝑥 𝒳 𝔼 conditional subscript delimited-∥∥ subscript Γ 𝑡 1 subscript Γ 𝑡 𝐹 subscript 𝑋 𝑡 𝑥 𝐺 𝑡 \sup_{x\in\mathcal{X}}\mathbb{E}\left(\left\lVert\Gamma_{t+1}-\Gamma_{t}\right%
\rVert_{F}\mid X_{t}=x\right)\leq G(t) roman_sup start_POSTSUBSCRIPT italic_x ∈ caligraphic_X end_POSTSUBSCRIPT blackboard_E ( ∥ roman_Γ start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT - roman_Γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT ∣ italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_x ) ≤ italic_G ( italic_t )
with G ( ⋅ ) 𝐺 ⋅ G(\cdot) italic_G ( ⋅ ) strictly decreasing to infinity.
Then there are constants M ∗ , M ∗ subscript 𝑀 superscript 𝑀
M_{*},M^{*} italic_M start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT , italic_M start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT depending on x 0 subscript 𝑥 0 x_{0} italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and c ∗ , c ∗ , J ∗ > 0 subscript 𝑐 superscript 𝑐 superscript 𝐽
0 c_{*},c^{*},J^{*}>0 italic_c start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT , italic_c start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , italic_J start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT > 0 such that for all ϵ ∈ ( 0 , 1 ) italic-ϵ 0 1 \epsilon\in(0,1) italic_ϵ ∈ ( 0 , 1 ) and all t ∈ ℤ + 𝑡 subscript ℤ t\in\mathbb{Z}_{+} italic_t ∈ blackboard_Z start_POSTSUBSCRIPT + end_POSTSUBSCRIPT with T ϵ , t ≥ F − 1 ( J ∗ t 2 / ϵ ) subscript 𝑇 italic-ϵ 𝑡
superscript 𝐹 1 superscript 𝐽 superscript 𝑡 2 italic-ϵ T_{\epsilon,t}\geq F^{-1}\left(J^{*}t^{2}/\epsilon\right) italic_T start_POSTSUBSCRIPT italic_ϵ , italic_t end_POSTSUBSCRIPT ≥ italic_F start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_J start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / italic_ϵ ) ,
M ∗ exp [ − c ∗ ( T ϵ , t + t ) m 2 − m ] ≤ 𝒲 ∥ ⋅ ∥ ∧ 1 ( 𝒜 𝒬 ( T ϵ , t + t ) ( ( γ 0 , x 0 ) , ⋅ ) , π ) ≤ M ∗ T ϵ , t exp [ − c ∗ t m 2 − m ] + ϵ . subscript 𝑀 subscript 𝑐 superscript subscript 𝑇 italic-ϵ 𝑡
𝑡 𝑚 2 𝑚 subscript 𝒲 delimited-∥∥ ⋅ 1 superscript subscript 𝒜 𝒬 subscript 𝑇 italic-ϵ 𝑡
𝑡 subscript 𝛾 0 subscript 𝑥 0 ⋅ 𝜋 superscript 𝑀 subscript 𝑇 italic-ϵ 𝑡
superscript 𝑐 superscript 𝑡 𝑚 2 𝑚 italic-ϵ M_{*}\exp\left[-c_{*}(T_{\epsilon,t}+t)^{\frac{m}{2-m}}\right]\leq\mathcal{W}_%
{\left\lVert\cdot\right\rVert\wedge 1}\left(\mathcal{A}_{\mathcal{Q}}^{(T_{%
\epsilon,t}+t)}((\gamma_{0},x_{0}),\cdot),\pi\right)\leq M^{*}T_{\epsilon,t}%
\exp\left[-c^{*}{t}^{\frac{m}{2-m}}\right]+\epsilon. italic_M start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT roman_exp [ - italic_c start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ( italic_T start_POSTSUBSCRIPT italic_ϵ , italic_t end_POSTSUBSCRIPT + italic_t ) start_POSTSUPERSCRIPT divide start_ARG italic_m end_ARG start_ARG 2 - italic_m end_ARG end_POSTSUPERSCRIPT ] ≤ caligraphic_W start_POSTSUBSCRIPT ∥ ⋅ ∥ ∧ 1 end_POSTSUBSCRIPT ( caligraphic_A start_POSTSUBSCRIPT caligraphic_Q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_T start_POSTSUBSCRIPT italic_ϵ , italic_t end_POSTSUBSCRIPT + italic_t ) end_POSTSUPERSCRIPT ( ( italic_γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) , ⋅ ) , italic_π ) ≤ italic_M start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT italic_T start_POSTSUBSCRIPT italic_ϵ , italic_t end_POSTSUBSCRIPT roman_exp [ - italic_c start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT italic_t start_POSTSUPERSCRIPT divide start_ARG italic_m end_ARG start_ARG 2 - italic_m end_ARG end_POSTSUPERSCRIPT ] + italic_ϵ .
Proof.
We will apply Theorem 10 to obtain the conclusion.
Choosing α 𝛼 \alpha italic_α sufficiently small in the simultaneous drift condition from Lemma 15 , and a compactness and continuity argument shows a simultaneous minorization condition holds.
It remains to verify expected diminishing adaptation (7 ).
For γ ∈ 𝒴 𝛾 𝒴 \gamma\in\mathcal{Y} italic_γ ∈ caligraphic_Y , define
f γ ( y ) = exp ( − 1 2 log ( det ( γ ) ) − 1 2 y T γ − 1 y ) . subscript 𝑓 𝛾 𝑦 1 2 𝛾 1 2 superscript 𝑦 𝑇 superscript 𝛾 1 𝑦 f_{\gamma}(y)=\exp\left(-\frac{1}{2}\log(\det(\gamma))-\frac{1}{2}y^{T}{\gamma%
}^{-1}y\right). italic_f start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_y ) = roman_exp ( - divide start_ARG 1 end_ARG start_ARG 2 end_ARG roman_log ( roman_det ( italic_γ ) ) - divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_y start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_γ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_y ) .
Following [Andrieu and Moulines, 2006 , Lemma 13] , the mean value theorem gives the upper bound
∫ ℝ d | f γ ′ ( y ) − f γ ( y ) | 𝑑 y subscript superscript ℝ 𝑑 subscript 𝑓 superscript 𝛾 ′ 𝑦 subscript 𝑓 𝛾 𝑦 differential-d 𝑦 \displaystyle\int_{\mathbb{R}^{d}}\left|f_{\gamma^{\prime}}(y)-f_{\gamma}(y)%
\right|dy ∫ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_POSTSUBSCRIPT | italic_f start_POSTSUBSCRIPT italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_y ) - italic_f start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_y ) | italic_d italic_y
≤ 1 2 ∫ ℝ d ∫ 0 1 f γ t ( y ) | tr ( γ t − 1 ( γ ′ − γ ) + γ t − 1 y y T γ t − 1 ( γ ′ − γ ) ) | 𝑑 t 𝑑 y absent 1 2 subscript superscript ℝ 𝑑 superscript subscript 0 1 subscript 𝑓 subscript 𝛾 𝑡 𝑦 tr superscript subscript 𝛾 𝑡 1 superscript 𝛾 ′ 𝛾 superscript subscript 𝛾 𝑡 1 𝑦 superscript 𝑦 𝑇 superscript subscript 𝛾 𝑡 1 superscript 𝛾 ′ 𝛾 differential-d 𝑡 differential-d 𝑦 \displaystyle\leq\frac{1}{2}\int_{\mathbb{R}^{d}}\int_{0}^{1}f_{\gamma_{t}}(y)%
\left|\text{tr}\left(\gamma_{t}^{-1}(\gamma^{\prime}-\gamma)+\gamma_{t}^{-1}yy%
^{T}\gamma_{t}^{-1}(\gamma^{\prime}-\gamma)\right)\right|dtdy ≤ divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∫ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT italic_γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_y ) | tr ( italic_γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - italic_γ ) + italic_γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_y italic_y start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - italic_γ ) ) | italic_d italic_t italic_d italic_y
≤ d ( 2 π ) d / 2 λ ∗ ∥ γ ′ − γ ∥ F . absent 𝑑 superscript 2 𝜋 𝑑 2 subscript 𝜆 subscript delimited-∥∥ superscript 𝛾 ′ 𝛾 𝐹 \displaystyle\leq\frac{d(2\pi)^{d/2}}{\lambda_{*}}\left\lVert\gamma^{\prime}-%
\gamma\right\rVert_{F}. ≤ divide start_ARG italic_d ( 2 italic_π ) start_POSTSUPERSCRIPT italic_d / 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_λ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT end_ARG ∥ italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - italic_γ ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT .
Since the proposal g K subscript 𝑔 𝐾 g_{K} italic_g start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT is symmetric, then for Borel φ : 𝒳 → [ 0 , 1 ] : 𝜑 → 𝒳 0 1 \varphi:\mathcal{X}\to[0,1] italic_φ : caligraphic_X → [ 0 , 1 ] , let ψ ( x , y ) = ( φ ( y ) − φ ( x ) ) a ( x , y ) 𝜓 𝑥 𝑦 𝜑 𝑦 𝜑 𝑥 𝑎 𝑥 𝑦 \psi(x,y)=(\varphi(y)-\varphi(x))a(x,y) italic_ψ ( italic_x , italic_y ) = ( italic_φ ( italic_y ) - italic_φ ( italic_x ) ) italic_a ( italic_x , italic_y ) and
𝒫 γ ′ φ ( x ) − 𝒫 γ φ ( x ) subscript 𝒫 superscript 𝛾 ′ 𝜑 𝑥 subscript 𝒫 𝛾 𝜑 𝑥 \displaystyle\mathcal{P}_{\gamma^{\prime}}\varphi(x)-\mathcal{P}_{\gamma}%
\varphi(x) caligraphic_P start_POSTSUBSCRIPT italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_φ ( italic_x ) - caligraphic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT italic_φ ( italic_x )
= ∫ ψ ( x , y ) g K ( γ ′ − 1 / 2 ( y − x ) ) 𝑑 y − ∫ ψ ( x , y ) g K ( γ − 1 / 2 ( y − x ) ) 𝑑 y absent 𝜓 𝑥 𝑦 subscript 𝑔 𝐾 superscript superscript 𝛾 ′ 1 2 𝑦 𝑥 differential-d 𝑦 𝜓 𝑥 𝑦 subscript 𝑔 𝐾 superscript 𝛾 1 2 𝑦 𝑥 differential-d 𝑦 \displaystyle=\int\psi(x,y)g_{K}({\gamma^{\prime}}^{-1/2}(y-x))dy-\int\psi(x,y%
)g_{K}(\gamma^{-1/2}(y-x))dy = ∫ italic_ψ ( italic_x , italic_y ) italic_g start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT ( italic_y - italic_x ) ) italic_d italic_y - ∫ italic_ψ ( italic_x , italic_y ) italic_g start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( italic_γ start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT ( italic_y - italic_x ) ) italic_d italic_y
≤ 2 ∫ K exp ( − ∥ z ∥ 2 / 2 ) 𝑑 z ∫ x + K | f γ ′ ( y ) − f γ ( y ) | 𝑑 y absent 2 subscript 𝐾 superscript delimited-∥∥ 𝑧 2 2 differential-d 𝑧 subscript 𝑥 𝐾 subscript 𝑓 superscript 𝛾 ′ 𝑦 subscript 𝑓 𝛾 𝑦 differential-d 𝑦 \displaystyle\leq\frac{2}{\int_{K}\exp(-\left\lVert z\right\rVert^{2}/2)dz}%
\int_{x+K}\left|f_{\gamma^{\prime}}(y)-f_{\gamma}(y)\right|dy ≤ divide start_ARG 2 end_ARG start_ARG ∫ start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT roman_exp ( - ∥ italic_z ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / 2 ) italic_d italic_z end_ARG ∫ start_POSTSUBSCRIPT italic_x + italic_K end_POSTSUBSCRIPT | italic_f start_POSTSUBSCRIPT italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_y ) - italic_f start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_y ) | italic_d italic_y
≤ J ∗ ∥ γ ′ − γ ∥ F absent superscript 𝐽 subscript delimited-∥∥ superscript 𝛾 ′ 𝛾 𝐹 \displaystyle\leq J^{*}\left\lVert\gamma^{\prime}-\gamma\right\rVert_{F} ≤ italic_J start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - italic_γ ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT
where J ∗ = 2 d ( 2 π ) d / 2 / [ λ ∗ ∫ K exp ( − ∥ z ∥ 2 / 2 ) 𝑑 z ] superscript 𝐽 2 𝑑 superscript 2 𝜋 𝑑 2 delimited-[] subscript 𝜆 subscript 𝐾 superscript delimited-∥∥ 𝑧 2 2 differential-d 𝑧 J^{*}=2d(2\pi)^{d/2}/[\lambda_{*}\int_{K}\exp(-\left\lVert z\right\rVert^{2}/2%
)dz] italic_J start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = 2 italic_d ( 2 italic_π ) start_POSTSUPERSCRIPT italic_d / 2 end_POSTSUPERSCRIPT / [ italic_λ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ∫ start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT roman_exp ( - ∥ italic_z ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / 2 ) italic_d italic_z ] .
Taking the supremum over φ 𝜑 \varphi italic_φ , we then have for each t ∈ ℤ + 𝑡 subscript ℤ t\in\mathbb{Z}_{+} italic_t ∈ blackboard_Z start_POSTSUBSCRIPT + end_POSTSUBSCRIPT ,
sup x ∈ 𝒳 𝔼 [ ∥ 𝒫 Γ t + 1 ( x , ⋅ ) − 𝒫 Γ t ( x , ⋅ ) ∥ TV ∣ X t = x ] subscript supremum 𝑥 𝒳 𝔼 delimited-[] conditional subscript delimited-∥∥ subscript 𝒫 subscript Γ 𝑡 1 𝑥 ⋅ subscript 𝒫 subscript Γ 𝑡 𝑥 ⋅ TV subscript 𝑋 𝑡 𝑥 \displaystyle\sup_{x\in\mathcal{X}}\mathbb{E}\left[\left\lVert\mathcal{P}_{%
\Gamma_{t+1}}(x,\cdot)-\mathcal{P}_{\Gamma_{t}}(x,\cdot)\right\rVert_{\text{TV%
}}\mid X_{t}=x\right] roman_sup start_POSTSUBSCRIPT italic_x ∈ caligraphic_X end_POSTSUBSCRIPT blackboard_E [ ∥ caligraphic_P start_POSTSUBSCRIPT roman_Γ start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x , ⋅ ) - caligraphic_P start_POSTSUBSCRIPT roman_Γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x , ⋅ ) ∥ start_POSTSUBSCRIPT TV end_POSTSUBSCRIPT ∣ italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_x ]
≤ J ∗ sup x ∈ 𝒳 𝔼 [ ∥ Γ t + 1 − Γ t ∥ F ∣ X t = x ] absent superscript 𝐽 subscript supremum 𝑥 𝒳 𝔼 delimited-[] conditional subscript delimited-∥∥ subscript Γ 𝑡 1 subscript Γ 𝑡 𝐹 subscript 𝑋 𝑡 𝑥 \displaystyle\leq J^{*}\sup_{x\in\mathcal{X}}\mathbb{E}\left[\left\lVert\Gamma%
_{t+1}-\Gamma_{t}\right\rVert_{F}\mid X_{t}=x\right] ≤ italic_J start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT roman_sup start_POSTSUBSCRIPT italic_x ∈ caligraphic_X end_POSTSUBSCRIPT blackboard_E [ ∥ roman_Γ start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT - roman_Γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT ∣ italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_x ]
≤ J ∗ G ( t ) . absent superscript 𝐽 𝐺 𝑡 \displaystyle\leq J^{*}G(t). ≤ italic_J start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT italic_G ( italic_t ) .
∎