Extremal bounds for Gaussian trace estimation

Eric Hallman
(November 23, 2024)
  • Abstract: This work derives extremal tail bounds for the Gaussian trace estimator applied to a real symmetric matrix. We define a partial ordering on the eigenvalues, so that when a matrix has greater spectrum under this ordering, its estimator will have worse tail bounds. This is done for two families of matrices: positive semidefinite matrices with bounded effective rank, and indefinite matrices with bounded 2-norm and fixed Frobenius norm. In each case, the tail region is defined rigorously and is constant for a given family.

1 Introduction

Let 𝐀n×n𝐀superscript𝑛𝑛{\bf A}\in\mathbb{R}^{n\times n}bold_A ∈ blackboard_R start_POSTSUPERSCRIPT italic_n × italic_n end_POSTSUPERSCRIPT be a symmetric matrix with eigenvalues λ1λnsubscript𝜆1subscript𝜆𝑛\lambda_{1}\geq\ldots\geq\lambda_{n}italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≥ … ≥ italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, and let 𝐳1,,𝐳mnsubscript𝐳1subscript𝐳𝑚superscript𝑛{\bf z}_{1},\ldots,{\bf z}_{m}\in\mathbb{R}^{n}bold_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , bold_z start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT be i.i.d standard Gaussian vectors. The Gaussian trace estimator is given by

tr(𝐀)trmG(𝐀):=1mj=1m𝐳jT𝐀𝐳j.tr𝐀superscriptsubscripttr𝑚𝐺𝐀assign1𝑚superscriptsubscript𝑗1𝑚superscriptsubscript𝐳𝑗𝑇subscript𝐀𝐳𝑗\operatorname{tr}({\bf A})\approx\operatorname{tr}_{m}^{G}({\bf A}):=\frac{1}{% m}\sum_{j=1}^{m}{\bf z}_{j}^{T}{\bf A}{\bf z}_{j}.roman_tr ( bold_A ) ≈ roman_tr start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_G end_POSTSUPERSCRIPT ( bold_A ) := divide start_ARG 1 end_ARG start_ARG italic_m end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT bold_z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_Az start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT . (1)

It is well known that this estimator is unbiased and has variance 2m𝐀F22𝑚superscriptsubscriptnorm𝐀𝐹2\tfrac{2}{m}\|{\bf A}\|_{F}^{2}divide start_ARG 2 end_ARG start_ARG italic_m end_ARG ∥ bold_A ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT.

Often, it is useful to know how many samples m𝑚mitalic_m are needed to produce an estimate that satisfies a given error tolerance. Cortinovis and Kressner [1] have used concentration inequalities to derive good results on this problem, with slight improvements by Persson, Cortinovis, and Kressner in [6].

Theorem 1 ([1], Thm. 1).

Let 𝐀n×n𝐀superscript𝑛𝑛{\bf A}\in\mathbb{R}^{n\times n}bold_A ∈ blackboard_R start_POSTSUPERSCRIPT italic_n × italic_n end_POSTSUPERSCRIPT be symmetric. Then for all ε>0𝜀0\varepsilon>0italic_ε > 0,

Pr(|trmG(𝐀)tr(𝐀)|ε)2exp(mε24𝐀F2+4ε𝐀2).Prsuperscriptsubscripttr𝑚𝐺𝐀tr𝐀𝜀2𝑚superscript𝜀24superscriptsubscriptnorm𝐀𝐹24𝜀subscriptnorm𝐀2\mathrm{Pr}\left(|\operatorname{tr}_{m}^{G}({\bf A})-\operatorname{tr}({\bf A}% )|\geq\varepsilon\right)\leq 2\exp\left(-\frac{m\varepsilon^{2}}{4\|{\bf A}\|_% {F}^{2}+4\varepsilon\|{\bf A}\|_{2}}\right).roman_Pr ( | roman_tr start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_G end_POSTSUPERSCRIPT ( bold_A ) - roman_tr ( bold_A ) | ≥ italic_ε ) ≤ 2 roman_exp ( - divide start_ARG italic_m italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 4 ∥ bold_A ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 4 italic_ε ∥ bold_A ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG ) .

For nonzero symmetric positive semidefinite (SPSD) matrices, this result can be turned into a relative error estimate. The trace estimator will generally be more accurate on matrices with tightly clustered eigenvalues.

Definition 1.

The effective rank of a nonzero SPSD matrix 𝐀𝐀{\bf A}bold_A is

reff(𝐀):=tr(𝐀)𝐀2.assignsubscript𝑟eff𝐀tr𝐀subscriptnorm𝐀2r_{\mathrm{eff}}({\bf A}):=\frac{\operatorname{tr}({\bf A})}{\|{\bf A}\|_{2}}.italic_r start_POSTSUBSCRIPT roman_eff end_POSTSUBSCRIPT ( bold_A ) := divide start_ARG roman_tr ( bold_A ) end_ARG start_ARG ∥ bold_A ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG .
Corollary 1 ([1], Remark 2).

For nonzero SPSD 𝐀n×n𝐀superscript𝑛𝑛{\bf A}\in\mathbb{R}^{n\times n}bold_A ∈ blackboard_R start_POSTSUPERSCRIPT italic_n × italic_n end_POSTSUPERSCRIPT, replace ε𝜀\varepsilonitalic_ε by εtr(𝐀)𝜀tr𝐀\varepsilon\cdot\operatorname{tr}({\bf A})italic_ε ⋅ roman_tr ( bold_A ) in Theorem 1, and note that 𝐀F2/tr(𝐀)2reff(𝐀)1\|{\bf A}\|_{F}^{2}/\operatorname{tr}({\bf A})^{2}\leq r_{\mathrm{eff}}({\bf A% })^{-1}∥ bold_A ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / roman_tr ( bold_A ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ italic_r start_POSTSUBSCRIPT roman_eff end_POSTSUBSCRIPT ( bold_A ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT. For ε>0𝜀0\varepsilon>0italic_ε > 0,

Pr(|trmG(𝐀)tr(𝐀)|εtr(𝐀))2exp(mε2reff(𝐀)4(1+ε)).Prsuperscriptsubscripttr𝑚𝐺𝐀tr𝐀𝜀tr𝐀2𝑚superscript𝜀2subscript𝑟eff𝐀41𝜀\mathrm{Pr}\left(|\operatorname{tr}_{m}^{G}({\bf A})-\operatorname{tr}({\bf A}% )|\geq\varepsilon\cdot\operatorname{tr}({\bf A})\right)\leq 2\exp\left(-\frac{% m\varepsilon^{2}\cdot r_{\mathrm{eff}}({\bf A})}{4(1+\varepsilon)}\right).roman_Pr ( | roman_tr start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_G end_POSTSUPERSCRIPT ( bold_A ) - roman_tr ( bold_A ) | ≥ italic_ε ⋅ roman_tr ( bold_A ) ) ≤ 2 roman_exp ( - divide start_ARG italic_m italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⋅ italic_r start_POSTSUBSCRIPT roman_eff end_POSTSUBSCRIPT ( bold_A ) end_ARG start_ARG 4 ( 1 + italic_ε ) end_ARG ) .

The goal of this paper is to tighten these bounds as much as possible using techniques from [10, 9, 8]. The practical benefit may be marginal, since the above bounds are already quite good. Still, I found the techniques interesting.

1.1 Extremal bounds

Words like tight and optimal can be slippery. To be precise, I mean to find extremal tail probabilities: ones that can be expressed as the supremum or infimum over some set. The error bounds of Theorem 1 and Corollary 1 use only certain information about 𝐀𝐀{\bf A}bold_A; respectively, they use (𝐀2,𝐀F)subscriptnorm𝐀2subscriptnorm𝐀𝐹(\|{\bf A}\|_{2},\|{\bf A}\|_{F})( ∥ bold_A ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , ∥ bold_A ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT ) and reff(𝐀)subscript𝑟eff𝐀r_{\mathrm{eff}}({\bf A})italic_r start_POSTSUBSCRIPT roman_eff end_POSTSUBSCRIPT ( bold_A ). It is therefore worth considering the sets of all matrices with the same summary statistics:

𝒜abs(λ,ϕ)subscript𝒜abs𝜆italic-ϕ\displaystyle\mathcal{A}_{\mathrm{abs}}(\lambda,\phi)caligraphic_A start_POSTSUBSCRIPT roman_abs end_POSTSUBSCRIPT ( italic_λ , italic_ϕ ) :={𝐀𝒮:𝐀2λ,𝐀F=ϕ},0<λϕ,formulae-sequenceassignabsentconditional-set𝐀subscript𝒮formulae-sequencesubscriptnorm𝐀2𝜆subscriptnorm𝐀𝐹italic-ϕ0𝜆italic-ϕ\displaystyle:=\{{\bf A}\in\mathcal{S}_{\infty}\,:\,\|{\bf A}\|_{2}\leq\lambda% ,\,\|{\bf A}\|_{F}=\phi\},\quad 0<\lambda\leq\phi,:= { bold_A ∈ caligraphic_S start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT : ∥ bold_A ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ italic_λ , ∥ bold_A ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT = italic_ϕ } , 0 < italic_λ ≤ italic_ϕ ,
𝒜rel(μ)subscript𝒜rel𝜇\displaystyle\mathcal{A}_{\mathrm{rel}}(\mu)caligraphic_A start_POSTSUBSCRIPT roman_rel end_POSTSUBSCRIPT ( italic_μ ) :={𝐀𝒮+:𝐀21μ,tr(𝐀)=1},1μ,formulae-sequenceassignabsentconditional-set𝐀superscriptsubscript𝒮formulae-sequencesubscriptnorm𝐀21𝜇tr𝐀11𝜇\displaystyle:=\{{\bf A}\in\mathcal{S}_{\infty}^{+}\,:\,\|{\bf A}\|_{2}\leq% \tfrac{1}{\mu},\,\operatorname{tr}({\bf A})=1\},\quad 1\leq\mu,:= { bold_A ∈ caligraphic_S start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT : ∥ bold_A ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ divide start_ARG 1 end_ARG start_ARG italic_μ end_ARG , roman_tr ( bold_A ) = 1 } , 1 ≤ italic_μ ,

where 𝒮subscript𝒮\mathcal{S}_{\infty}caligraphic_S start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT is the set of all symmetric matrices of any dimension, and 𝒮+superscriptsubscript𝒮\mathcal{S}_{\infty}^{+}caligraphic_S start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT is the set of all SPSD matrices of any dimension. For 𝒜rel(μ)subscript𝒜rel𝜇\mathcal{A}_{\mathrm{rel}}(\mu)caligraphic_A start_POSTSUBSCRIPT roman_rel end_POSTSUBSCRIPT ( italic_μ ), the parameter μ𝜇\muitalic_μ is a lower bound on the effective rank of 𝐀𝐀{\bf A}bold_A; we can fix tr(𝐀)tr𝐀\operatorname{tr}({\bf A})roman_tr ( bold_A ) without loss of generality since the relative error of trmG(𝐀)superscriptsubscripttr𝑚𝐺𝐀\operatorname{tr}_{m}^{G}({\bf A})roman_tr start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_G end_POSTSUPERSCRIPT ( bold_A ) is scale invariant.

For each of the above sets 𝒜𝒜\mathcal{A}caligraphic_A, we define a partial ordering:

  • For 𝒜rel(μ)subscript𝒜rel𝜇\mathcal{A}_{\mathrm{rel}}(\mu)caligraphic_A start_POSTSUBSCRIPT roman_rel end_POSTSUBSCRIPT ( italic_μ ), the partial ordering is the vector majorization order on the eigenvalues. This paper will show that 𝝀𝐀𝝀𝐁precedes-or-equalssubscript𝝀𝐀subscript𝝀𝐁\boldsymbol{\lambda}_{{\bf A}}\preceq\boldsymbol{\lambda}_{{\bf B}}bold_italic_λ start_POSTSUBSCRIPT bold_A end_POSTSUBSCRIPT ⪯ bold_italic_λ start_POSTSUBSCRIPT bold_B end_POSTSUBSCRIPT implies that trmG(𝐁)superscriptsubscripttr𝑚𝐺𝐁\operatorname{tr}_{m}^{G}({\bf B})roman_tr start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_G end_POSTSUPERSCRIPT ( bold_B ) has worse upper and lower tail bounds than trmG(𝐀)superscriptsubscripttr𝑚𝐺𝐀\operatorname{tr}_{m}^{G}({\bf A})roman_tr start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_G end_POSTSUPERSCRIPT ( bold_A ).

  • For 𝒜abs(λ,ϕ)subscript𝒜abs𝜆italic-ϕ\mathcal{A}_{\mathrm{abs}}(\lambda,\phi)caligraphic_A start_POSTSUBSCRIPT roman_abs end_POSTSUBSCRIPT ( italic_λ , italic_ϕ ), the partial ordering is related to the majorization order but is a little more complicated. This paper will show that 𝝀𝐀𝝀𝐁precedes-or-equalssubscript𝝀𝐀subscript𝝀𝐁\boldsymbol{\lambda}_{{\bf A}}\preceq\boldsymbol{\lambda}_{{\bf B}}bold_italic_λ start_POSTSUBSCRIPT bold_A end_POSTSUBSCRIPT ⪯ bold_italic_λ start_POSTSUBSCRIPT bold_B end_POSTSUBSCRIPT implies that trmG(𝐁)superscriptsubscripttr𝑚𝐺𝐁\operatorname{tr}_{m}^{G}({\bf B})roman_tr start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_G end_POSTSUPERSCRIPT ( bold_B ) has worse upper tail bounds and that trmG(𝐀)superscriptsubscripttr𝑚𝐺𝐀\operatorname{tr}_{m}^{G}({\bf A})roman_tr start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_G end_POSTSUPERSCRIPT ( bold_A ) has worse lower tail bounds.

With these partial orderings, 𝒜rel(μ)subscript𝒜rel𝜇\mathcal{A}_{\mathrm{rel}}(\mu)caligraphic_A start_POSTSUBSCRIPT roman_rel end_POSTSUBSCRIPT ( italic_μ ) has a maximum element and 𝒜abs(λ,ϕ)subscript𝒜abs𝜆italic-ϕ\mathcal{A}_{\mathrm{abs}}(\lambda,\phi)caligraphic_A start_POSTSUBSCRIPT roman_abs end_POSTSUBSCRIPT ( italic_λ , italic_ϕ ) has both maximum and minimum elements.111Strictly speaking, the maximum is unique only if we treat matrices as equivalent when they have the same nonzero eigenvalues.

1.2 The tail

It is not particularly useful to show that if 𝝀𝐀𝝀𝐁precedes-or-equalssubscript𝝀𝐀subscript𝝀𝐁\boldsymbol{\lambda}_{{\bf A}}\preceq\boldsymbol{\lambda}_{{\bf B}}bold_italic_λ start_POSTSUBSCRIPT bold_A end_POSTSUBSCRIPT ⪯ bold_italic_λ start_POSTSUBSCRIPT bold_B end_POSTSUBSCRIPT then one trace estimate eventually has worse tail bounds than the other—this much can already be deduced just by comparing the asymptotic behavior of the probability distributions.

The critical result, then, is that we can define the “tail” regions so that their value depends only on 𝒜𝒜\mathcal{A}caligraphic_A. For error tolerances within these regions, the partial ordering works as advertised on all elements of 𝒜𝒜\mathcal{A}caligraphic_A.

Unfortunately, it has so far been beyond my ability to prove exactly where these tail regions begin. This paper contains some pessimistic bounds, but otherwise is restricted to conjecture.

1.3 The extremal matrices

For the relative error, the matrix with the worst tail bounds is

𝐀rel(μ)=1μ[𝐈μ𝟎𝟎𝟎μμ𝟎𝟎𝟎𝟎].subscript𝐀rel𝜇1𝜇matrixsubscript𝐈𝜇000𝜇𝜇0000{\bf A}_{\mathrm{rel}}(\mu)=\frac{1}{\mu}\begin{bmatrix}{\bf I}_{\lfloor\mu% \rfloor}&{\bf 0}&{\bf 0}\\ {\bf 0}&\mu-\lfloor\mu\rfloor&{\bf 0}\\ {\bf 0}&{\bf 0}&{\bf 0}\end{bmatrix}.bold_A start_POSTSUBSCRIPT roman_rel end_POSTSUBSCRIPT ( italic_μ ) = divide start_ARG 1 end_ARG start_ARG italic_μ end_ARG [ start_ARG start_ROW start_CELL bold_I start_POSTSUBSCRIPT ⌊ italic_μ ⌋ end_POSTSUBSCRIPT end_CELL start_CELL bold_0 end_CELL start_CELL bold_0 end_CELL end_ROW start_ROW start_CELL bold_0 end_CELL start_CELL italic_μ - ⌊ italic_μ ⌋ end_CELL start_CELL bold_0 end_CELL end_ROW start_ROW start_CELL bold_0 end_CELL start_CELL bold_0 end_CELL start_CELL bold_0 end_CELL end_ROW end_ARG ] . (2)

For the absolute error, the matrix with the worst upper tail bounds is

𝐀abs(λ,ϕ)=λ[𝐈ρ𝟎𝟎𝟎ρρ𝟎𝟎𝟎𝟎],ρ:=ϕ2λ2,formulae-sequencesubscript𝐀abs𝜆italic-ϕ𝜆matrixsubscript𝐈𝜌000𝜌𝜌0000assign𝜌superscriptitalic-ϕ2superscript𝜆2{\bf A}_{\mathrm{abs}}(\lambda,\phi)=\lambda\begin{bmatrix}{\bf I}_{\lfloor% \rho\rfloor}&{\bf 0}&{\bf 0}\\ {\bf 0}&\sqrt{\rho-\lfloor\rho\rfloor}&{\bf 0}\\ {\bf 0}&{\bf 0}&{\bf 0}\end{bmatrix},\quad\rho:=\frac{\phi^{2}}{\lambda^{2}},bold_A start_POSTSUBSCRIPT roman_abs end_POSTSUBSCRIPT ( italic_λ , italic_ϕ ) = italic_λ [ start_ARG start_ROW start_CELL bold_I start_POSTSUBSCRIPT ⌊ italic_ρ ⌋ end_POSTSUBSCRIPT end_CELL start_CELL bold_0 end_CELL start_CELL bold_0 end_CELL end_ROW start_ROW start_CELL bold_0 end_CELL start_CELL square-root start_ARG italic_ρ - ⌊ italic_ρ ⌋ end_ARG end_CELL start_CELL bold_0 end_CELL end_ROW start_ROW start_CELL bold_0 end_CELL start_CELL bold_0 end_CELL start_CELL bold_0 end_CELL end_ROW end_ARG ] , italic_ρ := divide start_ARG italic_ϕ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_λ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG , (3)

where ρ𝜌\rhoitalic_ρ is an upper bound on the stable rank of 𝐀𝐀{\bf A}bold_A. By symmetry, the matrix with the worst lower tail bounds is 𝐀abs(λ,ϕ)subscript𝐀abs𝜆italic-ϕ-{\bf A}_{\mathrm{abs}}(\lambda,\phi)- bold_A start_POSTSUBSCRIPT roman_abs end_POSTSUBSCRIPT ( italic_λ , italic_ϕ ).

Definition 2.

The stable rank of a nonzero matrix 𝐀𝐀{\bf A}bold_A is

rstab(𝐀):=𝐀F2𝐀22.assignsubscript𝑟stab𝐀superscriptsubscriptnorm𝐀𝐹2superscriptsubscriptnorm𝐀22r_{\mathrm{stab}}({\bf A}):=\frac{\|{\bf A}\|_{F}^{2}}{\|{\bf A}\|_{2}^{2}}.italic_r start_POSTSUBSCRIPT roman_stab end_POSTSUBSCRIPT ( bold_A ) := divide start_ARG ∥ bold_A ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG ∥ bold_A ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG .

1.3.1 Extension to Gamma random variables

When μ𝜇\muitalic_μ and ρ𝜌\rhoitalic_ρ are not integers, the distributions of the trace estimators for (2) and (3) are not quite as elegant as I would have liked them to be. We can, however, relax the problem by considering more general linear combinations of Gamma random variables, as opposed to only those whose distribution can be expressed as the Gaussian trace estimate of some matrix as in (1).

For distribution families 𝒬relsubscript𝒬rel\mathcal{Q}_{\mathrm{rel}}caligraphic_Q start_POSTSUBSCRIPT roman_rel end_POSTSUBSCRIPT and 𝒬abssubscript𝒬abs\mathcal{Q}_{\mathrm{abs}}caligraphic_Q start_POSTSUBSCRIPT roman_abs end_POSTSUBSCRIPT to be defined later, we will show that

max𝒬relGamma(mμ2,mμ2)similar-tosubscript𝒬rel𝐺𝑎𝑚𝑚𝑎𝑚𝜇2𝑚𝜇2\max\,\mathcal{Q}_{\mathrm{rel}}\sim Gamma\left(\frac{m\mu}{2},\frac{m\mu}{2}\right)roman_max caligraphic_Q start_POSTSUBSCRIPT roman_rel end_POSTSUBSCRIPT ∼ italic_G italic_a italic_m italic_m italic_a ( divide start_ARG italic_m italic_μ end_ARG start_ARG 2 end_ARG , divide start_ARG italic_m italic_μ end_ARG start_ARG 2 end_ARG ) (4)

and

extrema𝒬abs=±λQ,QGamma(mρ2,m2),ρ:=ϕ2λ2.formulae-sequenceextremasubscript𝒬absplus-or-minus𝜆𝑄formulae-sequencesimilar-to𝑄𝐺𝑎𝑚𝑚𝑎𝑚𝜌2𝑚2assign𝜌superscriptitalic-ϕ2superscript𝜆2\operatorname*{extrema}\,\mathcal{Q}_{\mathrm{abs}}=\pm\lambda Q,\quad Q\sim Gamma% \left(\frac{m\rho}{2},\frac{m}{2}\right),\,\rho:=\frac{\phi^{2}}{\lambda^{2}}.roman_extrema caligraphic_Q start_POSTSUBSCRIPT roman_abs end_POSTSUBSCRIPT = ± italic_λ italic_Q , italic_Q ∼ italic_G italic_a italic_m italic_m italic_a ( divide start_ARG italic_m italic_ρ end_ARG start_ARG 2 end_ARG , divide start_ARG italic_m end_ARG start_ARG 2 end_ARG ) , italic_ρ := divide start_ARG italic_ϕ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_λ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG . (5)

When μ𝜇\muitalic_μ and ρ𝜌\rhoitalic_ρ are integers, these are exactly the distributions of the trace estimators trmG(𝐀rel)superscriptsubscripttr𝑚𝐺subscript𝐀rel\operatorname{tr}_{m}^{G}({\bf A}_{\mathrm{rel}})roman_tr start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_G end_POSTSUPERSCRIPT ( bold_A start_POSTSUBSCRIPT roman_rel end_POSTSUBSCRIPT ) and trmG(𝐀abs)superscriptsubscripttr𝑚𝐺subscript𝐀abs\operatorname{tr}_{m}^{G}({\bf A}_{\mathrm{abs}})roman_tr start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_G end_POSTSUPERSCRIPT ( bold_A start_POSTSUBSCRIPT roman_abs end_POSTSUBSCRIPT ) from (2) and (3).

1.4 Related work

This paper is primarily a spiritual successor to the works [10, 9, 8]. Székely [10] gives thorough tail bounds for the relative error when Q𝑄Qitalic_Q is a nonnegative sum of chi-squared r.v’s with no further restrictions (the worst case is typically Qχ12similar-to𝑄superscriptsubscript𝜒12Q\sim\chi_{1}^{2}italic_Q ∼ italic_χ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT) and provides a majorization result as a corollary. Roosta-Khorasani, Székely, and Ascher [9] extend the main result to where Q𝑄Qitalic_Q is a nonnegative sum of Gamma r.v’s with arbitrary shape and scale parameters. This allows them to consider the effects of using a larger sampling number m𝑚mitalic_m for trace estimation. Finally, Roosta-Khorasani and Székely [8] generalize the worst-case bounds of [9] by obtaining a majorization result for nonnegative sums of Gamma r.v’s. The present work’s Theorem 2, in particular, is more or less a restatement of a result from [8].

This current paper is novel in two main ways. First, it considers how the above bounds might be strengthened when the matrix in question has a large effective rank. Second, it obtains absolute error bounds for indefinite matrices —that is, when the sum of Gamma random variables is not necessarily nonnegative.

2 Majorization

In [8], Roosta-Khorasani and Székely use the majorization order on the eigenvalues of a matrix as a measure of the “skewness” of its spectrum. Their general observation is that the Gaussian trace estimator performs worse when the spectrum is highly skewed. In other words: the majorization order is the partial ordering that they (and we) use to determine for which matrices the Gaussian trace estimator yields the worst relative error bounds.

Given a nonnegative vector 𝝀n𝝀superscript𝑛\boldsymbol{\lambda}\in\mathbb{R}^{n}bold_italic_λ ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, let λ[i]subscript𝜆delimited-[]𝑖\lambda_{[i]}italic_λ start_POSTSUBSCRIPT [ italic_i ] end_POSTSUBSCRIPT denote its indices in sorted order, so that λ[1]λ[n]0subscript𝜆delimited-[]1subscript𝜆delimited-[]𝑛0\lambda_{[1]}\geq\ldots\geq\lambda_{[n]}\geq 0italic_λ start_POSTSUBSCRIPT [ 1 ] end_POSTSUBSCRIPT ≥ … ≥ italic_λ start_POSTSUBSCRIPT [ italic_n ] end_POSTSUBSCRIPT ≥ 0. We say that 𝝀𝝀\boldsymbol{\lambda}bold_italic_λ majorizes 𝝁𝝁\boldsymbol{\mu}bold_italic_μ, denoted 𝝁𝝀precedes-or-equals𝝁𝝀\boldsymbol{\mu}\preceq\boldsymbol{\lambda}bold_italic_μ ⪯ bold_italic_λ, if

i=1kμ[i]superscriptsubscript𝑖1𝑘subscript𝜇delimited-[]𝑖\displaystyle\sum_{i=1}^{k}\mu_{[i]}∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_μ start_POSTSUBSCRIPT [ italic_i ] end_POSTSUBSCRIPT i=1kλ[i],kn,formulae-sequenceabsentsuperscriptsubscript𝑖1𝑘subscript𝜆delimited-[]𝑖for-all𝑘𝑛\displaystyle\leq\sum_{i=1}^{k}\lambda_{[i]},\quad\forall k\leq n,≤ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT [ italic_i ] end_POSTSUBSCRIPT , ∀ italic_k ≤ italic_n , (6a)
i=1nμisuperscriptsubscript𝑖1𝑛subscript𝜇𝑖\displaystyle\sum_{i=1}^{n}\mu_{i}∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT =i=1nλi.absentsuperscriptsubscript𝑖1𝑛subscript𝜆𝑖\displaystyle=\sum_{i=1}^{n}\lambda_{i}.= ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT . (6b)

Similarly, we say that 𝝀𝝀\boldsymbol{\lambda}bold_italic_λ weakly majorizes 𝝁𝝁\boldsymbol{\mu}bold_italic_μ, denoted 𝝁w𝝀subscriptprecedes-or-equals𝑤𝝁𝝀\boldsymbol{\mu}\preceq_{w}\boldsymbol{\lambda}bold_italic_μ ⪯ start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT bold_italic_λ, just as long as (6a) holds. If 𝝀𝝀\boldsymbol{\lambda}bold_italic_λ and 𝝁𝝁\boldsymbol{\mu}bold_italic_μ do not have the same length, pad the shorter one with zeroes as necessary.

If 𝝀𝝀\boldsymbol{\lambda}bold_italic_λ weakly majorizes 𝝁𝝁\boldsymbol{\mu}bold_italic_μ but does not majorize it, then i=1nμi<i=1nλisuperscriptsubscript𝑖1𝑛subscript𝜇𝑖superscriptsubscript𝑖1𝑛subscript𝜆𝑖\sum_{i=1}^{n}\mu_{i}<\sum_{i=1}^{n}\lambda_{i}∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT < ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. For our proofs, it will be useful to identify the indices for which the inequalities in (6a) are strict.

Definition 3.

Given 𝛍w𝛌subscriptprecedes-or-equals𝑤𝛍𝛌\boldsymbol{\mu}\preceq_{w}\boldsymbol{\lambda}bold_italic_μ ⪯ start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT bold_italic_λ, the leading slack index is the smallest number j𝑗jitalic_j such that

i=1μ[i]<i=1λ[i],{j,,n}.formulae-sequencesuperscriptsubscript𝑖1subscript𝜇delimited-[]𝑖superscriptsubscript𝑖1subscript𝜆delimited-[]𝑖for-all𝑗𝑛\sum_{i=1}^{\ell}\mu_{[i]}<\sum_{i=1}^{\ell}\lambda_{[i]},\quad\forall\ell\in% \{j,\ldots,n\}.∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT italic_μ start_POSTSUBSCRIPT [ italic_i ] end_POSTSUBSCRIPT < ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT [ italic_i ] end_POSTSUBSCRIPT , ∀ roman_ℓ ∈ { italic_j , … , italic_n } .

This index has the property that μ[j]subscript𝜇delimited-[]𝑗\mu_{[j]}italic_μ start_POSTSUBSCRIPT [ italic_j ] end_POSTSUBSCRIPT (and no larger entry of 𝝁𝝁\boldsymbol{\mu}bold_italic_μ) can be increased by a nonzero amount without violating the majorization condition.

In deriving their main results, Roosta-Khorasani and Székely make use of the following lemma [7, 12.5a].

Lemma 1.

If 𝛍𝛌precedes-or-equals𝛍𝛌\boldsymbol{\mu}\preceq\boldsymbol{\lambda}bold_italic_μ ⪯ bold_italic_λ, then there exists a finite sequence of vectors

𝝁𝜼1𝜼r=𝝀precedes-or-equals𝝁subscript𝜼1precedes-or-equalsprecedes-or-equalssubscript𝜼𝑟𝝀\boldsymbol{\mu}\preceq\boldsymbol{\eta}_{1}\preceq\ldots\preceq\boldsymbol{% \eta}_{r}=\boldsymbol{\lambda}bold_italic_μ ⪯ bold_italic_η start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⪯ … ⪯ bold_italic_η start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT = bold_italic_λ

so that 𝛈isubscript𝛈𝑖\boldsymbol{\eta}_{i}bold_italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and 𝛈i+1subscript𝛈𝑖1\boldsymbol{\eta}_{i+1}bold_italic_η start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT differ in two coordinates only for i=1,,r1𝑖1𝑟1i=1,\ldots,r-1italic_i = 1 , … , italic_r - 1.

This lemma lets us assume without loss of generality that 𝝁𝝁\boldsymbol{\mu}bold_italic_μ and 𝝀𝝀\boldsymbol{\lambda}bold_italic_λ differ in two coordinates only, satisfying

0λk<μkμj<λj.0subscript𝜆𝑘subscript𝜇𝑘subscript𝜇𝑗subscript𝜆𝑗0\leq\lambda_{k}<\mu_{k}\leq\mu_{j}<\lambda_{j}.0 ≤ italic_λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT < italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ≤ italic_μ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT < italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT .

2.1 Indefinite majorization

For indefinite matrices with fixed Frobenius norm as in 𝒜abs(λ,ϕ)subscript𝒜abs𝜆italic-ϕ\mathcal{A}_{\mathrm{abs}}(\lambda,\phi)caligraphic_A start_POSTSUBSCRIPT roman_abs end_POSTSUBSCRIPT ( italic_λ , italic_ϕ ), the majorization condition is somewhat more complicated. Here is why: a Gamma random variable is nonnegative, so its lower tail can’t get too far from the mean. The average of many Gamma random variables, however, looks more like a normal distribution which has tails in both directions. This suggests that trmG(𝐀)superscriptsubscripttr𝑚𝐺𝐀\operatorname{tr}_{m}^{G}({\bf A})roman_tr start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_G end_POSTSUPERSCRIPT ( bold_A ) will have worse absolute upper tail bounds when the nonnegative eigenvalues are highly skewed and when the negative eigenvalues are tightly clustered.

We will also see that trmG(|𝐀|)superscriptsubscripttr𝑚𝐺𝐀\operatorname{tr}_{m}^{G}(|{\bf A}|)roman_tr start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_G end_POSTSUPERSCRIPT ( | bold_A | ) has worse upper tail bounds than trmG(𝐀)superscriptsubscripttr𝑚𝐺𝐀\operatorname{tr}_{m}^{G}({\bf A})roman_tr start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_G end_POSTSUPERSCRIPT ( bold_A ), for indefinite 𝐀𝐀{\bf A}bold_A. Analogous results for lower tail bounds follow by symmetry.

Now we describe the majorization condition more precisely.

Definition 4.

For a vector 𝛌𝛌\boldsymbol{\lambda}bold_italic_λ, define

𝝀:=min(𝝀,𝟎)and𝝀+:=max(𝝀,𝟎),formulae-sequenceassignsuperscript𝝀𝝀0andassignsuperscript𝝀𝝀0\boldsymbol{\lambda}^{-}:=\min(\boldsymbol{\lambda},\boldsymbol{0})\quad\text{% and}\quad\boldsymbol{\lambda}^{+}:=\max(\boldsymbol{\lambda},\boldsymbol{0}),bold_italic_λ start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT := roman_min ( bold_italic_λ , bold_0 ) and bold_italic_λ start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT := roman_max ( bold_italic_λ , bold_0 ) ,

where the min and max operations are done elementwise.

Definition 5.

For vectors 𝛌𝛌\boldsymbol{\lambda}bold_italic_λ and 𝛍𝛍\boldsymbol{\mu}bold_italic_μ, we say that 𝛍F𝛌subscriptprecedes-or-equals𝐹𝛍𝛌\boldsymbol{\mu}\preceq_{F}\boldsymbol{\lambda}bold_italic_μ ⪯ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT bold_italic_λ if

(𝝁+)2superscriptsuperscript𝝁2\displaystyle(\boldsymbol{\mu}^{+})^{2}( bold_italic_μ start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT w(𝝀+)2,subscriptprecedes-or-equals𝑤absentsuperscriptsuperscript𝝀2\displaystyle\preceq_{w}(\boldsymbol{\lambda}^{+})^{2},⪯ start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT ( bold_italic_λ start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (7a)
(𝝀)2superscriptsuperscript𝝀2\displaystyle(\boldsymbol{\lambda}^{-})^{2}( bold_italic_λ start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT w(𝝁)2,subscriptprecedes-or-equals𝑤absentsuperscriptsuperscript𝝁2\displaystyle\preceq_{w}(\boldsymbol{\mu}^{-})^{2},⪯ start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT ( bold_italic_μ start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (7b)
i=1nμi2superscriptsubscript𝑖1𝑛superscriptsubscript𝜇𝑖2\displaystyle\sum_{i=1}^{n}\mu_{i}^{2}∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT =i=1nλi2.absentsuperscriptsubscript𝑖1𝑛superscriptsubscript𝜆𝑖2\displaystyle=\sum_{i=1}^{n}\lambda_{i}^{2}.= ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . (7c)

Condition (7a) specifies that the positive entries of 𝝀𝝀\boldsymbol{\lambda}bold_italic_λ are more skewed, and perhaps have more total weight, than those of 𝝁𝝁\boldsymbol{\mu}bold_italic_μ. Condition (7b) specifies the opposite for the negative entries. Condition (7c) means that the matrices associated with 𝝀𝝀\boldsymbol{\lambda}bold_italic_λ and 𝝁𝝁\boldsymbol{\mu}bold_italic_μ have the same Frobenius norm. This last condition also means that the associated trace estimators have the same variance.

For the main result, we propose a lemma analogous to Lemma 1 (see Appendix A) for the proof).

Lemma 2.

If 𝛍F𝛌subscriptprecedes-or-equals𝐹𝛍𝛌\boldsymbol{\mu}\preceq_{F}\boldsymbol{\lambda}bold_italic_μ ⪯ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT bold_italic_λ, then there exists a finite sequence of vectors

𝝁F𝜼1FF𝜼r=𝝀subscriptprecedes-or-equals𝐹𝝁subscript𝜼1subscriptprecedes-or-equals𝐹subscriptprecedes-or-equals𝐹subscript𝜼𝑟𝝀\boldsymbol{\mu}\preceq_{F}\boldsymbol{\eta}_{1}\preceq_{F}\ldots\preceq_{F}% \boldsymbol{\eta}_{r}=\boldsymbol{\lambda}bold_italic_μ ⪯ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT bold_italic_η start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⪯ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT … ⪯ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT bold_italic_η start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT = bold_italic_λ

so that 𝛈isubscript𝛈𝑖\boldsymbol{\eta}_{i}bold_italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and 𝛈i+1subscript𝛈𝑖1\boldsymbol{\eta}_{i+1}bold_italic_η start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT differ in two coordinates only for i=1,,r1𝑖1𝑟1i=1,\ldots,r-1italic_i = 1 , … , italic_r - 1. Furthermore, the difference between consecutive vectors may be assumed to take one of the following forms:

  1. 1.

    μk<λk0μj<λjsubscript𝜇𝑘subscript𝜆𝑘0subscript𝜇𝑗subscript𝜆𝑗\mu_{k}<\lambda_{k}\leq 0\leq\mu_{j}<\lambda_{j}italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT < italic_λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ≤ 0 ≤ italic_μ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT < italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT;

  2. 2.

    0λk<μkμj<λj0subscript𝜆𝑘subscript𝜇𝑘subscript𝜇𝑗subscript𝜆𝑗0\leq\lambda_{k}<\mu_{k}\leq\mu_{j}<\lambda_{j}0 ≤ italic_λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT < italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ≤ italic_μ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT < italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT;

  3. 3.

    μk<λkλj<μj0subscript𝜇𝑘subscript𝜆𝑘subscript𝜆𝑗subscript𝜇𝑗0\mu_{k}<\lambda_{k}\leq\lambda_{j}<\mu_{j}\leq 0italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT < italic_λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ≤ italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT < italic_μ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ≤ 0.

Once again, we may assume without loss of generality that 𝝁𝝁\boldsymbol{\mu}bold_italic_μ and 𝝀𝝀\boldsymbol{\lambda}bold_italic_λ differ in two coordinates only, this time satisfying λj2μj2=μk2λk2>0superscriptsubscript𝜆𝑗2superscriptsubscript𝜇𝑗2superscriptsubscript𝜇𝑘2superscriptsubscript𝜆𝑘20\lambda_{j}^{2}-\mu_{j}^{2}=\mu_{k}^{2}-\lambda_{k}^{2}>0italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - italic_μ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - italic_λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT > 0.

3 Gamma random variables

In this section we cover some relevant properties of Gamma random variables, and explain how to reformulate the original trace estimation problem in terms of such variables.

Since real symmetric matrices are orthogonally diagonalizable and Gaussian vectors are rotationally invariant, the Gaussian trace estimator (1) can be written in the form

trmG(𝐀)=i=1nλiXi,Xii.i.d.1mχm2,formulae-sequencesuperscriptsubscripttr𝑚𝐺𝐀superscriptsubscript𝑖1𝑛subscript𝜆𝑖subscript𝑋𝑖superscriptsimilar-toi.i.d.subscript𝑋𝑖1𝑚superscriptsubscript𝜒𝑚2\operatorname{tr}_{m}^{G}({\bf A})=\sum_{i=1}^{n}\lambda_{i}X_{i},\quad X_{i}% \stackrel{{\scriptstyle\text{i.i.d.}}}{{\sim}}\tfrac{1}{m}\chi_{m}^{2},roman_tr start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_G end_POSTSUPERSCRIPT ( bold_A ) = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_RELOP SUPERSCRIPTOP start_ARG ∼ end_ARG start_ARG i.i.d. end_ARG end_RELOP divide start_ARG 1 end_ARG start_ARG italic_m end_ARG italic_χ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (8)

where χm2superscriptsubscript𝜒𝑚2\chi_{m}^{2}italic_χ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT is a chi-squared variable with m𝑚mitalic_m degrees of freedom.

The chi-squared distribution is a special case of the Gamma distribution, and so we can generalize (8) as follows:

Definition 6.

Given a vector 𝛌=(λ1,,λn)n𝛌subscript𝜆1subscript𝜆𝑛superscript𝑛\boldsymbol{\lambda}=(\lambda_{1},\ldots,\lambda_{n})\in\mathbb{R}^{n}bold_italic_λ = ( italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, denote

Q(𝝀;α,β):=i=1nλiXi,Xii.i.d.Gamma(α,β).formulae-sequenceassign𝑄𝝀𝛼𝛽superscriptsubscript𝑖1𝑛subscript𝜆𝑖subscript𝑋𝑖superscriptsimilar-toi.i.d.subscript𝑋𝑖𝐺𝑎𝑚𝑚𝑎𝛼𝛽Q(\boldsymbol{\lambda};\alpha,\beta):=\sum_{i=1}^{n}\lambda_{i}X_{i},\quad X_{% i}\stackrel{{\scriptstyle\text{i.i.d.}}}{{\sim}}Gamma\left(\alpha,\beta\right).italic_Q ( bold_italic_λ ; italic_α , italic_β ) := ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_RELOP SUPERSCRIPTOP start_ARG ∼ end_ARG start_ARG i.i.d. end_ARG end_RELOP italic_G italic_a italic_m italic_m italic_a ( italic_α , italic_β ) .

When α𝛼\alphaitalic_α and β𝛽\betaitalic_β are clear from context, we will use Q𝛌subscript𝑄𝛌Q_{\boldsymbol{\lambda}}italic_Q start_POSTSUBSCRIPT bold_italic_λ end_POSTSUBSCRIPT for short.

With the notation above, trmG(𝐀)superscriptsubscripttr𝑚𝐺𝐀\operatorname{tr}_{m}^{G}({\bf A})roman_tr start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_G end_POSTSUPERSCRIPT ( bold_A ) has the same distribution as Q(𝝀𝐀;m2,m2)𝑄subscript𝝀𝐀𝑚2𝑚2Q\left(\boldsymbol{\lambda}_{{\bf A}};\tfrac{m}{2},\tfrac{m}{2}\right)italic_Q ( bold_italic_λ start_POSTSUBSCRIPT bold_A end_POSTSUBSCRIPT ; divide start_ARG italic_m end_ARG start_ARG 2 end_ARG , divide start_ARG italic_m end_ARG start_ARG 2 end_ARG ).

3.1 Basic properties

Here are a few other facts about the Gamma distribution:

  • A Gamma random variable with shape α>0𝛼0\alpha>0italic_α > 0 and rate β>0𝛽0\beta>0italic_β > 0 has the PDF

    f(x)={βαΓ(α)xα1eβxx0,0x0.𝑓𝑥casessuperscript𝛽𝛼Γ𝛼superscript𝑥𝛼1superscript𝑒𝛽𝑥𝑥00𝑥0f(x)=\begin{cases}\frac{\beta^{\alpha}}{\Gamma(\alpha)}x^{\alpha-1}e^{-\beta x% }&x\geq 0,\\ 0&x\leq 0.\end{cases}italic_f ( italic_x ) = { start_ROW start_CELL divide start_ARG italic_β start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT end_ARG start_ARG roman_Γ ( italic_α ) end_ARG italic_x start_POSTSUPERSCRIPT italic_α - 1 end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT - italic_β italic_x end_POSTSUPERSCRIPT end_CELL start_CELL italic_x ≥ 0 , end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL italic_x ≤ 0 . end_CELL end_ROW (9)

    The tail decays more slowly than the tail of a normal distribution.

  • If XGamma(α,β)similar-to𝑋𝐺𝑎𝑚𝑚𝑎𝛼𝛽X\sim Gamma(\alpha,\beta)italic_X ∼ italic_G italic_a italic_m italic_m italic_a ( italic_α , italic_β ) then 𝔼[X]=α/β𝔼delimited-[]𝑋𝛼𝛽\mathbb{E}[X]=\alpha/\betablackboard_E [ italic_X ] = italic_α / italic_β and Var[X]=α/β2Var𝑋𝛼superscript𝛽2\operatorname{Var}[X]=\alpha/\beta^{2}roman_Var [ italic_X ] = italic_α / italic_β start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. Furthermore, X𝑋Xitalic_X is unimodal with mode (α1)/β𝛼1𝛽(\alpha-1)/\beta( italic_α - 1 ) / italic_β if α1𝛼1\alpha\geq 1italic_α ≥ 1 and 0 otherwise.

  • Gamma random variables follow a scaling law: if XGamma(α,β)similar-to𝑋𝐺𝑎𝑚𝑚𝑎𝛼𝛽{X\sim Gamma(\alpha,\beta)}italic_X ∼ italic_G italic_a italic_m italic_m italic_a ( italic_α , italic_β ) and λ>0𝜆0\lambda>0italic_λ > 0, then λXGamma(α,β/λ)similar-to𝜆𝑋𝐺𝑎𝑚𝑚𝑎𝛼𝛽𝜆\lambda X\sim Gamma(\alpha,\beta/\lambda)italic_λ italic_X ∼ italic_G italic_a italic_m italic_m italic_a ( italic_α , italic_β / italic_λ ).

  • Gamma random variables with the same rate parameter have an additive property: if X1Gamma(α1,β)similar-tosubscript𝑋1𝐺𝑎𝑚𝑚𝑎subscript𝛼1𝛽X_{1}\sim Gamma(\alpha_{1},\beta)italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∼ italic_G italic_a italic_m italic_m italic_a ( italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_β ) and X2Gamma(α2,β)similar-tosubscript𝑋2𝐺𝑎𝑚𝑚𝑎subscript𝛼2𝛽X_{2}\sim Gamma(\alpha_{2},\beta)italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∼ italic_G italic_a italic_m italic_m italic_a ( italic_α start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_β ), then X1+X2Gamma(α1+α2,β)similar-tosubscript𝑋1subscript𝑋2𝐺𝑎𝑚𝑚𝑎subscript𝛼1subscript𝛼2𝛽{X_{1}+X_{2}\sim Gamma(\alpha_{1}+\alpha_{2},\beta)}italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∼ italic_G italic_a italic_m italic_m italic_a ( italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_α start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_β ).

  • Conversely, Gamma random variables are infinitely divisible: if XGamma(α,β)similar-to𝑋𝐺𝑎𝑚𝑚𝑎𝛼𝛽X\sim Gamma(\alpha,\beta)italic_X ∼ italic_G italic_a italic_m italic_m italic_a ( italic_α , italic_β ) then for any positive integer T𝑇Titalic_T, X𝑋Xitalic_X has the same distribution as i=1TXisuperscriptsubscript𝑖1𝑇subscript𝑋𝑖\sum_{i=1}^{T}X_{i}∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, where X1,,XTi.i.d.Gamma(α/T,β)superscriptsimilar-toi.i.d.subscript𝑋1subscript𝑋𝑇𝐺𝑎𝑚𝑚𝑎𝛼𝑇𝛽X_{1},\ldots,X_{T}\stackrel{{\scriptstyle\text{i.i.d.}}}{{\sim}}Gamma(\alpha/T% ,\beta)italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_X start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_RELOP SUPERSCRIPTOP start_ARG ∼ end_ARG start_ARG i.i.d. end_ARG end_RELOP italic_G italic_a italic_m italic_m italic_a ( italic_α / italic_T , italic_β ).

3.2 Generalizing the distribution families

In order to define 𝒬relsubscript𝒬rel\mathcal{Q}_{\mathrm{rel}}caligraphic_Q start_POSTSUBSCRIPT roman_rel end_POSTSUBSCRIPT and 𝒬abssubscript𝒬abs\mathcal{Q}_{\mathrm{abs}}caligraphic_Q start_POSTSUBSCRIPT roman_abs end_POSTSUBSCRIPT as extensions of 𝒜relsubscript𝒜rel\mathcal{A}_{\mathrm{rel}}caligraphic_A start_POSTSUBSCRIPT roman_rel end_POSTSUBSCRIPT and 𝒜abssubscript𝒜abs\mathcal{A}_{\mathrm{abs}}caligraphic_A start_POSTSUBSCRIPT roman_abs end_POSTSUBSCRIPT, we first generalize the notions of 2-norm and effective rank.

Definition 7.

For a random variable Q𝛌=Q(𝛌;α,β)subscript𝑄𝛌𝑄𝛌𝛼𝛽Q_{\boldsymbol{\lambda}}=Q(\boldsymbol{\lambda};\alpha,\beta)italic_Q start_POSTSUBSCRIPT bold_italic_λ end_POSTSUBSCRIPT = italic_Q ( bold_italic_λ ; italic_α , italic_β ), we define the scale of Q𝛌subscript𝑄𝛌Q_{\boldsymbol{\lambda}}italic_Q start_POSTSUBSCRIPT bold_italic_λ end_POSTSUBSCRIPT as

scale(Q𝝀)=maxi|λi|β.scalesubscript𝑄𝝀subscript𝑖subscript𝜆𝑖𝛽\operatorname{scale}(Q_{\boldsymbol{\lambda}})=\frac{\max_{i}|\lambda_{i}|}{% \beta}.roman_scale ( italic_Q start_POSTSUBSCRIPT bold_italic_λ end_POSTSUBSCRIPT ) = divide start_ARG roman_max start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | end_ARG start_ARG italic_β end_ARG .

It is worth mentioning that scale(Q𝝀)scalesubscript𝑄𝝀\operatorname{scale}(Q_{\boldsymbol{\lambda}})roman_scale ( italic_Q start_POSTSUBSCRIPT bold_italic_λ end_POSTSUBSCRIPT ) is a property of the distribution itself, not just its representation in terms of (𝝀,α,β)𝝀𝛼𝛽(\boldsymbol{\lambda},\alpha,\beta)( bold_italic_λ , italic_α , italic_β ).

Definition 8.

For a random variable Q𝛌=Q(𝛌;α,β)subscript𝑄𝛌𝑄𝛌𝛼𝛽Q_{\boldsymbol{\lambda}}=Q(\boldsymbol{\lambda};\alpha,\beta)italic_Q start_POSTSUBSCRIPT bold_italic_λ end_POSTSUBSCRIPT = italic_Q ( bold_italic_λ ; italic_α , italic_β ) with nonnegative weights 𝛌𝛌\boldsymbol{\lambda}bold_italic_λ, we define the effective shape of Q𝛌subscript𝑄𝛌Q_{\boldsymbol{\lambda}}italic_Q start_POSTSUBSCRIPT bold_italic_λ end_POSTSUBSCRIPT as

αeff(Q𝝀):=𝔼[Q𝝀]scale(Q𝝀)=αi=1nλimaxiλi.assignsubscript𝛼effsubscript𝑄𝝀𝔼delimited-[]subscript𝑄𝝀scalesubscript𝑄𝝀𝛼superscriptsubscript𝑖1𝑛subscript𝜆𝑖subscript𝑖subscript𝜆𝑖\alpha_{\mathrm{eff}}(Q_{\boldsymbol{\lambda}}):=\frac{\mathbb{E}[Q_{% \boldsymbol{\lambda}}]}{\operatorname{scale}(Q_{\boldsymbol{\lambda}})}=\frac{% \alpha\sum_{i=1}^{n}\lambda_{i}}{\max_{i}\lambda_{i}}.italic_α start_POSTSUBSCRIPT roman_eff end_POSTSUBSCRIPT ( italic_Q start_POSTSUBSCRIPT bold_italic_λ end_POSTSUBSCRIPT ) := divide start_ARG blackboard_E [ italic_Q start_POSTSUBSCRIPT bold_italic_λ end_POSTSUBSCRIPT ] end_ARG start_ARG roman_scale ( italic_Q start_POSTSUBSCRIPT bold_italic_λ end_POSTSUBSCRIPT ) end_ARG = divide start_ARG italic_α ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG roman_max start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG .

Now we can extend the definition of 𝒜rel(μ)subscript𝒜rel𝜇\mathcal{A}_{\mathrm{rel}}(\mu)caligraphic_A start_POSTSUBSCRIPT roman_rel end_POSTSUBSCRIPT ( italic_μ ):

Definition 9.

For fixed α>0𝛼0\alpha>0italic_α > 0 and β>0𝛽0\beta>0italic_β > 0, define

𝒬rel(μ;α,β):={Q𝝀:scale(Q𝝀)1μ,𝔼[Q𝝀]=1},αμ,formulae-sequenceassignsubscript𝒬rel𝜇𝛼𝛽conditional-setsubscript𝑄𝝀formulae-sequencescalesubscript𝑄𝝀1𝜇𝔼delimited-[]subscript𝑄𝝀1𝛼𝜇\mathcal{Q}_{\mathrm{rel}}(\mu;\alpha,\beta):=\{Q_{\boldsymbol{\lambda}}\,:\,% \operatorname{scale}(Q_{\boldsymbol{\lambda}})\leq\tfrac{1}{\mu},\,\mathbb{E}[% Q_{\boldsymbol{\lambda}}]=1\},\quad\alpha\leq\mu,caligraphic_Q start_POSTSUBSCRIPT roman_rel end_POSTSUBSCRIPT ( italic_μ ; italic_α , italic_β ) := { italic_Q start_POSTSUBSCRIPT bold_italic_λ end_POSTSUBSCRIPT : roman_scale ( italic_Q start_POSTSUBSCRIPT bold_italic_λ end_POSTSUBSCRIPT ) ≤ divide start_ARG 1 end_ARG start_ARG italic_μ end_ARG , blackboard_E [ italic_Q start_POSTSUBSCRIPT bold_italic_λ end_POSTSUBSCRIPT ] = 1 } , italic_α ≤ italic_μ ,

where the weights 𝛌n𝛌superscript𝑛\boldsymbol{\lambda}\in\mathbb{R}^{n}bold_italic_λ ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT are nonnegative.

The parameter μ𝜇\muitalic_μ in 𝒬rel(μ;α,β)subscript𝒬rel𝜇𝛼𝛽\mathcal{Q}_{\mathrm{rel}}(\mu;\alpha,\beta)caligraphic_Q start_POSTSUBSCRIPT roman_rel end_POSTSUBSCRIPT ( italic_μ ; italic_α , italic_β ) is a lower bound on the effective shape of Q𝝀subscript𝑄𝝀Q_{\boldsymbol{\lambda}}italic_Q start_POSTSUBSCRIPT bold_italic_λ end_POSTSUBSCRIPT. This definition generalizes the case of matrix trace estimation since

trmG(𝒜rel(μ))=𝒬rel(mμ2;m2,m2).superscriptsubscripttr𝑚𝐺subscript𝒜rel𝜇subscript𝒬rel𝑚𝜇2𝑚2𝑚2\operatorname{tr}_{m}^{G}\left(\mathcal{A}_{\mathrm{rel}}(\mu)\right)=\mathcal% {Q}_{\mathrm{rel}}\left(\tfrac{m\mu}{2};\tfrac{m}{2},\tfrac{m}{2}\right).roman_tr start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_G end_POSTSUPERSCRIPT ( caligraphic_A start_POSTSUBSCRIPT roman_rel end_POSTSUBSCRIPT ( italic_μ ) ) = caligraphic_Q start_POSTSUBSCRIPT roman_rel end_POSTSUBSCRIPT ( divide start_ARG italic_m italic_μ end_ARG start_ARG 2 end_ARG ; divide start_ARG italic_m end_ARG start_ARG 2 end_ARG , divide start_ARG italic_m end_ARG start_ARG 2 end_ARG ) .

The definition of 𝒜abs(λ,ϕ)subscript𝒜abs𝜆italic-ϕ\mathcal{A}_{\mathrm{abs}}(\lambda,\phi)caligraphic_A start_POSTSUBSCRIPT roman_abs end_POSTSUBSCRIPT ( italic_λ , italic_ϕ ) can be extended similarly:

Definition 10.

For fixed α>0𝛼0\alpha>0italic_α > 0 and β>0𝛽0\beta>0italic_β > 0, define

𝒬abs(λ,ϕ;α,β):={Q𝝀:scale(Q𝝀)λ,Var[Q𝝀]=ϕ2},0<λ1αϕ.formulae-sequenceassignsubscript𝒬abs𝜆italic-ϕ𝛼𝛽conditional-setsubscript𝑄𝝀formulae-sequencescalesubscript𝑄𝝀𝜆Varsubscript𝑄𝝀superscriptitalic-ϕ20𝜆1𝛼italic-ϕ\mathcal{Q}_{\mathrm{abs}}(\lambda,\phi;\alpha,\beta):=\{Q_{\boldsymbol{% \lambda}}\,:\,\operatorname{scale}(Q_{\boldsymbol{\lambda}})\leq\lambda,% \operatorname{Var}[Q_{\boldsymbol{\lambda}}]=\phi^{2}\},\quad 0<\lambda\leq% \tfrac{1}{\sqrt{\alpha}}\phi.caligraphic_Q start_POSTSUBSCRIPT roman_abs end_POSTSUBSCRIPT ( italic_λ , italic_ϕ ; italic_α , italic_β ) := { italic_Q start_POSTSUBSCRIPT bold_italic_λ end_POSTSUBSCRIPT : roman_scale ( italic_Q start_POSTSUBSCRIPT bold_italic_λ end_POSTSUBSCRIPT ) ≤ italic_λ , roman_Var [ italic_Q start_POSTSUBSCRIPT bold_italic_λ end_POSTSUBSCRIPT ] = italic_ϕ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT } , 0 < italic_λ ≤ divide start_ARG 1 end_ARG start_ARG square-root start_ARG italic_α end_ARG end_ARG italic_ϕ .

In this case, we have

trmG(𝒜abs(λ,ϕ))=𝒬abs(2λm,ϕ2m;m2,m2).superscriptsubscripttr𝑚𝐺subscript𝒜abs𝜆italic-ϕsubscript𝒬abs2𝜆𝑚italic-ϕ2𝑚𝑚2𝑚2\operatorname{tr}_{m}^{G}(\mathcal{A}_{\mathrm{abs}}(\lambda,\phi))=\mathcal{Q% }_{\mathrm{abs}}\left(\tfrac{2\lambda}{m},\phi\sqrt{\tfrac{2}{m}};\tfrac{m}{2}% ,\tfrac{m}{2}\right).roman_tr start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_G end_POSTSUPERSCRIPT ( caligraphic_A start_POSTSUBSCRIPT roman_abs end_POSTSUBSCRIPT ( italic_λ , italic_ϕ ) ) = caligraphic_Q start_POSTSUBSCRIPT roman_abs end_POSTSUBSCRIPT ( divide start_ARG 2 italic_λ end_ARG start_ARG italic_m end_ARG , italic_ϕ square-root start_ARG divide start_ARG 2 end_ARG start_ARG italic_m end_ARG end_ARG ; divide start_ARG italic_m end_ARG start_ARG 2 end_ARG , divide start_ARG italic_m end_ARG start_ARG 2 end_ARG ) .

3.3 Infinite division

Our strategy for relaxing the original trace estimation problem is to use the fact that Gamma random variables are infinitely divisible (see Section 3.1). Any Gamma random variable may be rewritten as a sum of arbitrarily many Gamma random variables with smaller shape parameters α𝛼\alphaitalic_α, and this fact enables us to nest some distribution families within others.

Lemma 3.

For any integer T1𝑇1T\geq 1italic_T ≥ 1, it holds that Q(𝛌;α,β)𝑄𝛌𝛼𝛽Q(\boldsymbol{\lambda};\alpha,\beta)italic_Q ( bold_italic_λ ; italic_α , italic_β ) has the same distribution as Q((𝛌,,𝛌T times);αT,β)𝑄subscript𝛌𝛌𝑇 times𝛼𝑇𝛽Q\big{(}(\underbrace{\boldsymbol{\lambda},\ldots,\boldsymbol{\lambda}}_{T\text% { times}});\tfrac{\alpha}{T},\beta\big{)}italic_Q ( ( under⏟ start_ARG bold_italic_λ , … , bold_italic_λ end_ARG start_POSTSUBSCRIPT italic_T times end_POSTSUBSCRIPT ) ; divide start_ARG italic_α end_ARG start_ARG italic_T end_ARG , italic_β ), and consequently that

𝒬rel(μ;α,β)𝒬rel(μ;αT,β)and𝒬abs(λ,ϕ;α,β)𝒬abs(λ,ϕ;αT,β).formulae-sequencesubscript𝒬rel𝜇𝛼𝛽subscript𝒬rel𝜇𝛼𝑇𝛽andsubscript𝒬abs𝜆italic-ϕ𝛼𝛽subscript𝒬abs𝜆italic-ϕ𝛼𝑇𝛽\mathcal{Q}_{\mathrm{rel}}(\mu;\alpha,\beta)\subseteq\mathcal{Q}_{\mathrm{rel}% }\left(\mu;\tfrac{\alpha}{T},\beta\right)\quad\text{and}\quad\mathcal{Q}_{% \mathrm{abs}}(\lambda,\phi;\alpha,\beta)\subseteq\mathcal{Q}_{\mathrm{abs}}% \left(\lambda,\phi;\tfrac{\alpha}{T},\beta\right).caligraphic_Q start_POSTSUBSCRIPT roman_rel end_POSTSUBSCRIPT ( italic_μ ; italic_α , italic_β ) ⊆ caligraphic_Q start_POSTSUBSCRIPT roman_rel end_POSTSUBSCRIPT ( italic_μ ; divide start_ARG italic_α end_ARG start_ARG italic_T end_ARG , italic_β ) and caligraphic_Q start_POSTSUBSCRIPT roman_abs end_POSTSUBSCRIPT ( italic_λ , italic_ϕ ; italic_α , italic_β ) ⊆ caligraphic_Q start_POSTSUBSCRIPT roman_abs end_POSTSUBSCRIPT ( italic_λ , italic_ϕ ; divide start_ARG italic_α end_ARG start_ARG italic_T end_ARG , italic_β ) .

By considering the limit as T𝑇T\rightarrow\inftyitalic_T → ∞, we can effectively do away with the constraint of sharing a single scale parameter, and in doing so obtain the more general sets 𝒬rel(μ)subscript𝒬rel𝜇\mathcal{Q}_{\mathrm{rel}}(\mu)caligraphic_Q start_POSTSUBSCRIPT roman_rel end_POSTSUBSCRIPT ( italic_μ ) and 𝒬abs(λ,ϕ)subscript𝒬abs𝜆italic-ϕ\mathcal{Q}_{\mathrm{abs}}(\lambda,\phi)caligraphic_Q start_POSTSUBSCRIPT roman_abs end_POSTSUBSCRIPT ( italic_λ , italic_ϕ ) promised in (4) and (5).

Definition 11.

For μ>0𝜇0\mu>0italic_μ > 0, let 𝒬rel(μ)subscript𝒬rel𝜇\mathcal{Q}_{\mathrm{rel}}(\mu)caligraphic_Q start_POSTSUBSCRIPT roman_rel end_POSTSUBSCRIPT ( italic_μ ) be the set of all finite linear combinations of Gamma random variables

Q=i=1nλiXi,Xii.i.d.Gamma(αi,βi),λi0,αi>0,βi>0,formulae-sequence𝑄superscriptsubscript𝑖1𝑛subscript𝜆𝑖subscript𝑋𝑖formulae-sequencesuperscriptsimilar-toi.i.d.subscript𝑋𝑖𝐺𝑎𝑚𝑚𝑎subscript𝛼𝑖subscript𝛽𝑖formulae-sequencesubscript𝜆𝑖0formulae-sequencesubscript𝛼𝑖0subscript𝛽𝑖0Q=\sum_{i=1}^{n}\lambda_{i}X_{i},\quad X_{i}\stackrel{{\scriptstyle\text{i.i.d% .}}}{{\sim}}Gamma(\alpha_{i},\beta_{i}),\,\lambda_{i}\geq 0,\,\alpha_{i}>0,\,% \beta_{i}>0,italic_Q = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_RELOP SUPERSCRIPTOP start_ARG ∼ end_ARG start_ARG i.i.d. end_ARG end_RELOP italic_G italic_a italic_m italic_m italic_a ( italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≥ 0 , italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT > 0 , italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT > 0 ,

subject to the constraints scale[Q]1μscale𝑄1𝜇\operatorname{scale}[Q]\leq\tfrac{1}{\mu}roman_scale [ italic_Q ] ≤ divide start_ARG 1 end_ARG start_ARG italic_μ end_ARG and 𝔼[Q]=1𝔼delimited-[]𝑄1\mathbb{E}[Q]=1blackboard_E [ italic_Q ] = 1.

Definition 12.

For λ>0𝜆0\lambda>0italic_λ > 0 and ϕ>0italic-ϕ0\phi>0italic_ϕ > 0, let 𝒬abs(λ,ϕ)subscript𝒬abs𝜆italic-ϕ\mathcal{Q}_{\mathrm{abs}}(\lambda,\phi)caligraphic_Q start_POSTSUBSCRIPT roman_abs end_POSTSUBSCRIPT ( italic_λ , italic_ϕ ) be the set of all finite linear combinations of Gamma random variables

Q=i=1nλiXi,Xii.i.d.Gamma(αi,βi),αi>0,βi>0,formulae-sequence𝑄superscriptsubscript𝑖1𝑛subscript𝜆𝑖subscript𝑋𝑖formulae-sequencesuperscriptsimilar-toi.i.d.subscript𝑋𝑖𝐺𝑎𝑚𝑚𝑎subscript𝛼𝑖subscript𝛽𝑖formulae-sequencesubscript𝛼𝑖0subscript𝛽𝑖0Q=\sum_{i=1}^{n}\lambda_{i}X_{i},\quad X_{i}\stackrel{{\scriptstyle\text{i.i.d% .}}}{{\sim}}Gamma(\alpha_{i},\beta_{i}),\,\alpha_{i}>0,\,\beta_{i}>0,italic_Q = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_RELOP SUPERSCRIPTOP start_ARG ∼ end_ARG start_ARG i.i.d. end_ARG end_RELOP italic_G italic_a italic_m italic_m italic_a ( italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT > 0 , italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT > 0 ,

subject to the constraints scale[Q]λscale𝑄𝜆\operatorname{scale}[Q]\leq\lambdaroman_scale [ italic_Q ] ≤ italic_λ and Var[Q]=ϕ2Var𝑄superscriptitalic-ϕ2\operatorname{Var}[Q]=\phi^{2}roman_Var [ italic_Q ] = italic_ϕ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT.

For any fixed α>0𝛼0\alpha>0italic_α > 0 and β>0𝛽0\beta>0italic_β > 0, for sufficiently large T𝑇Titalic_T the set 𝒬rel(μ;αT,β)subscript𝒬rel𝜇𝛼𝑇𝛽\mathcal{Q}_{\mathrm{rel}}(\mu;\tfrac{\alpha}{T},\beta)caligraphic_Q start_POSTSUBSCRIPT roman_rel end_POSTSUBSCRIPT ( italic_μ ; divide start_ARG italic_α end_ARG start_ARG italic_T end_ARG , italic_β ) contains distributions arbitrarily close to any given distribution in 𝒬rel(μ)subscript𝒬rel𝜇\mathcal{Q}_{\mathrm{rel}}(\mu)caligraphic_Q start_POSTSUBSCRIPT roman_rel end_POSTSUBSCRIPT ( italic_μ ); the same goes for 𝒬abs(λ,ϕ;αT,β)subscript𝒬abs𝜆italic-ϕ𝛼𝑇𝛽\mathcal{Q}_{\mathrm{abs}}(\lambda,\phi;\tfrac{\alpha}{T},\beta)caligraphic_Q start_POSTSUBSCRIPT roman_abs end_POSTSUBSCRIPT ( italic_λ , italic_ϕ ; divide start_ARG italic_α end_ARG start_ARG italic_T end_ARG , italic_β ) and 𝒬abs(λ,ϕ)subscript𝒬abs𝜆italic-ϕ\mathcal{Q}_{\mathrm{abs}}(\lambda,\phi)caligraphic_Q start_POSTSUBSCRIPT roman_abs end_POSTSUBSCRIPT ( italic_λ , italic_ϕ ). Thus any bounds we derive for the former set will also apply to the latter.

3.4 Fixed shape and scale parameters

The main results assume that parameters α𝛼\alphaitalic_α and β𝛽\betaitalic_β are fixed in order to keep the majorization definitions as simple as possible. If αisubscript𝛼𝑖\alpha_{i}italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and βisubscript𝛽𝑖\beta_{i}italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT were allowed to differ for each random variable Xisubscript𝑋𝑖X_{i}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT in the mixture Q𝑄Qitalic_Q, we could still map Q𝑄Qitalic_Q to a random variable that takes on values λi/βisubscript𝜆𝑖subscript𝛽𝑖\lambda_{i}/\beta_{i}italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT / italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT with probability αisubscript𝛼𝑖\alpha_{i}italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, then make comparisons using the stochastic ordering. I don’t feel that the added complexity is worth it, so won’t pursue this approach further.

4 Majorization theorems

This section presents the main majorization results. The relative error result is essentially a restatement of results from [8]; to the best of my knowledge the absolute error result is novel.

4.1 Relative error

As mentioned above, this first result was essentially proved as part of [8, Thm. 1]. I’ve still included the proof (Appendix A.1) for the sake of completeness.

Theorem 2.

Fix μα>0𝜇𝛼0\mu\geq\alpha>0italic_μ ≥ italic_α > 0, and β>0𝛽0\beta>0italic_β > 0. Define x¯uppersubscript¯𝑥upper\overline{x}_{\mathrm{upper}}over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT roman_upper end_POSTSUBSCRIPT and x¯lowersubscript¯𝑥lower\overline{x}_{\mathrm{lower}}over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT roman_lower end_POSTSUBSCRIPT to be the extreme values of the mode of the PDF of the distribution

Q𝝀+λjψ+λkψ,Q𝝀𝒬rel(μ;α,β),jk,formulae-sequencesubscript𝑄𝝀subscript𝜆𝑗𝜓subscript𝜆𝑘superscript𝜓subscript𝑄𝝀subscript𝒬rel𝜇𝛼𝛽𝑗𝑘Q_{\boldsymbol{\lambda}}+\lambda_{j}\psi+\lambda_{k}\psi^{\prime},\quad Q_{% \boldsymbol{\lambda}}\in\mathcal{Q}_{\mathrm{rel}}(\mu;\alpha,\beta),\ j\neq k,italic_Q start_POSTSUBSCRIPT bold_italic_λ end_POSTSUBSCRIPT + italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_ψ + italic_λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_ψ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_Q start_POSTSUBSCRIPT bold_italic_λ end_POSTSUBSCRIPT ∈ caligraphic_Q start_POSTSUBSCRIPT roman_rel end_POSTSUBSCRIPT ( italic_μ ; italic_α , italic_β ) , italic_j ≠ italic_k ,

where ψ,ψi.i.d.Gamma(1,β)superscriptsimilar-toi.i.d.𝜓superscript𝜓𝐺𝑎𝑚𝑚𝑎1𝛽\psi,\psi^{\prime}\stackrel{{\scriptstyle\text{i.i.d.}}}{{\sim}}Gamma\left(1,% \beta\right)italic_ψ , italic_ψ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_RELOP SUPERSCRIPTOP start_ARG ∼ end_ARG start_ARG i.i.d. end_ARG end_RELOP italic_G italic_a italic_m italic_m italic_a ( 1 , italic_β ) are exponential r.v’s independent of Q𝛌subscript𝑄𝛌Q_{\boldsymbol{\lambda}}italic_Q start_POSTSUBSCRIPT bold_italic_λ end_POSTSUBSCRIPT. If Q𝛌,Q𝛍𝒬rel(μ;α,β)subscript𝑄𝛌subscript𝑄𝛍subscript𝒬rel𝜇𝛼𝛽Q_{\boldsymbol{\lambda}},Q_{\boldsymbol{\mu}}\in\mathcal{Q}_{\mathrm{rel}}(\mu% ;\alpha,\beta)italic_Q start_POSTSUBSCRIPT bold_italic_λ end_POSTSUBSCRIPT , italic_Q start_POSTSUBSCRIPT bold_italic_μ end_POSTSUBSCRIPT ∈ caligraphic_Q start_POSTSUBSCRIPT roman_rel end_POSTSUBSCRIPT ( italic_μ ; italic_α , italic_β ) satisfy 𝛍𝛌precedes-or-equals𝛍𝛌\boldsymbol{\mu}\preceq\boldsymbol{\lambda}bold_italic_μ ⪯ bold_italic_λ, then

Pr(Q𝝁x)Prsubscript𝑄𝝁𝑥\displaystyle\mathrm{Pr}\left(Q_{\boldsymbol{\mu}}\leq x\right)roman_Pr ( italic_Q start_POSTSUBSCRIPT bold_italic_μ end_POSTSUBSCRIPT ≤ italic_x ) Pr(Q𝝀x),xx¯upper,formulae-sequenceabsentPrsubscript𝑄𝝀𝑥for-all𝑥subscript¯𝑥upper\displaystyle\geq\mathrm{Pr}\left(Q_{\boldsymbol{\lambda}}\leq x\right),\quad% \forall x\geq\overline{x}_{\mathrm{upper}},≥ roman_Pr ( italic_Q start_POSTSUBSCRIPT bold_italic_λ end_POSTSUBSCRIPT ≤ italic_x ) , ∀ italic_x ≥ over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT roman_upper end_POSTSUBSCRIPT ,
Pr(Q𝝁x)Prsubscript𝑄𝝁𝑥\displaystyle\mathrm{Pr}\left(Q_{\boldsymbol{\mu}}\leq x\right)roman_Pr ( italic_Q start_POSTSUBSCRIPT bold_italic_μ end_POSTSUBSCRIPT ≤ italic_x ) Pr(Q𝝀x),xx¯lower.formulae-sequenceabsentPrsubscript𝑄𝝀𝑥for-all𝑥subscript¯𝑥lower\displaystyle\leq\mathrm{Pr}\left(Q_{\boldsymbol{\lambda}}\leq x\right),\quad% \forall x\leq\overline{x}_{\mathrm{lower}}.≤ roman_Pr ( italic_Q start_POSTSUBSCRIPT bold_italic_λ end_POSTSUBSCRIPT ≤ italic_x ) , ∀ italic_x ≤ over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT roman_lower end_POSTSUBSCRIPT .

That is, Q𝝀subscript𝑄𝝀Q_{\boldsymbol{\lambda}}italic_Q start_POSTSUBSCRIPT bold_italic_λ end_POSTSUBSCRIPT has worse upper and lower tail bounds.

4.1.1 Locating the tail

But what are the values of x¯uppersubscript¯𝑥upper\overline{x}_{\mathrm{upper}}over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT roman_upper end_POSTSUBSCRIPT and x¯lowersubscript¯𝑥lower\overline{x}_{\mathrm{lower}}over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT roman_lower end_POSTSUBSCRIPT? In the most general case μ=α𝜇𝛼\mu=\alphaitalic_μ = italic_α, [8, Thm. 1] shows that x¯upper=1+(2α)1subscript¯𝑥upper1superscript2𝛼1\overline{x}_{\mathrm{upper}}=1+(2\alpha)^{-1}over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT roman_upper end_POSTSUBSCRIPT = 1 + ( 2 italic_α ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT and x¯lower=1α1subscript¯𝑥lower1superscript𝛼1\overline{x}_{\mathrm{lower}}=1-\alpha^{-1}over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT roman_lower end_POSTSUBSCRIPT = 1 - italic_α start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT. These values necessarily serve as bounds whenever μα𝜇𝛼\mu\geq\alphaitalic_μ ≥ italic_α, but I would like to obtain stronger results for distributions with greater effective shape.

Theorem 3 gives a pessimistic result: it tells us where the tails are not, rather than where they are.

Theorem 3.

If μ>α𝜇𝛼\mu>\alphaitalic_μ > italic_α, the points x¯uppersubscript¯𝑥upper\overline{x}_{\mathrm{upper}}over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT roman_upper end_POSTSUBSCRIPT and x¯lowersubscript¯𝑥lower\overline{x}_{\mathrm{lower}}over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT roman_lower end_POSTSUBSCRIPT as defined in Theorem 2 satisfy

x¯upper1+(αμ/α)1andx¯lower1(αμ/α)1.formulae-sequencesubscript¯𝑥upper1superscript𝛼𝜇𝛼1andsubscript¯𝑥lower1superscript𝛼𝜇𝛼1\overline{x}_{\mathrm{upper}}\geq 1+(\alpha\lceil\mu/\alpha\rceil)^{-1}\quad% \text{and}\quad\overline{x}_{\mathrm{lower}}\leq 1-(\alpha\lceil\mu/\alpha% \rceil)^{-1}.over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT roman_upper end_POSTSUBSCRIPT ≥ 1 + ( italic_α ⌈ italic_μ / italic_α ⌉ ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT and over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT roman_lower end_POSTSUBSCRIPT ≤ 1 - ( italic_α ⌈ italic_μ / italic_α ⌉ ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT .
Proof.

For the bound on x¯uppersubscript¯𝑥upper\overline{x}_{\mathrm{upper}}over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT roman_upper end_POSTSUBSCRIPT, set Q𝝀=i=1μ/αλXisubscript𝑄𝝀superscriptsubscript𝑖1𝜇𝛼𝜆subscript𝑋𝑖Q_{\boldsymbol{\lambda}}=\sum_{i=1}^{\lceil\mu/\alpha\rceil}\lambda X_{i}italic_Q start_POSTSUBSCRIPT bold_italic_λ end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⌈ italic_μ / italic_α ⌉ end_POSTSUPERSCRIPT italic_λ italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT with λ=βαμ/α𝜆𝛽𝛼𝜇𝛼\lambda=\tfrac{\beta}{\alpha\lceil\mu/\alpha\rceil}italic_λ = divide start_ARG italic_β end_ARG start_ARG italic_α ⌈ italic_μ / italic_α ⌉ end_ARG. The distribution Q𝝀+λ1ψ+λ2ψsubscript𝑄𝝀subscript𝜆1𝜓subscript𝜆2superscript𝜓Q_{\boldsymbol{\lambda}}+\lambda_{1}\psi+\lambda_{2}\psi^{\prime}italic_Q start_POSTSUBSCRIPT bold_italic_λ end_POSTSUBSCRIPT + italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_ψ + italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_ψ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is Gamma(αμ/α+2,λ1β)𝐺𝑎𝑚𝑚𝑎𝛼𝜇𝛼2superscript𝜆1𝛽Gamma(\alpha\lceil\mu/\alpha\rceil+2,\lambda^{-1}\beta)italic_G italic_a italic_m italic_m italic_a ( italic_α ⌈ italic_μ / italic_α ⌉ + 2 , italic_λ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_β ), so its mode is 1+(αμ/α)11superscript𝛼𝜇𝛼1{1+(\alpha\lceil\mu/\alpha\rceil)^{-1}}1 + ( italic_α ⌈ italic_μ / italic_α ⌉ ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT as desired while 𝔼[Q𝝀]=1𝔼delimited-[]subscript𝑄𝝀1\mathbb{E}[Q_{\boldsymbol{\lambda}}]=1blackboard_E [ italic_Q start_POSTSUBSCRIPT bold_italic_λ end_POSTSUBSCRIPT ] = 1.

For the bound on x¯lowersubscript¯𝑥lower\overline{x}_{\mathrm{lower}}over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT roman_lower end_POSTSUBSCRIPT, take the same Q𝝀subscript𝑄𝝀Q_{\boldsymbol{\lambda}}italic_Q start_POSTSUBSCRIPT bold_italic_λ end_POSTSUBSCRIPT, but pad it with eigenvalues λj=λk=0subscript𝜆𝑗subscript𝜆𝑘0\lambda_{j}=\lambda_{k}=0italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = italic_λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 0 instead. ∎

Unfortunately, I have not so far been able to establish bounds in the opposite direction. I will therefore leave them as a conjecture:

Conjecture 1.

The points x¯uppersubscript¯𝑥upper\overline{x}_{\mathrm{upper}}over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT roman_upper end_POSTSUBSCRIPT and x¯lowersubscript¯𝑥lower\overline{x}_{\mathrm{lower}}over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT roman_lower end_POSTSUBSCRIPT as defined in Theorem 2 satisfy

x¯upper1+μ1andx¯lower1μ1.formulae-sequencesubscript¯𝑥upper1superscript𝜇1andsubscript¯𝑥lower1superscript𝜇1\overline{x}_{\mathrm{upper}}\leq 1+\mu^{-1}\quad\text{and}\quad\overline{x}_{% \mathrm{lower}}\geq 1-\mu^{-1}.over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT roman_upper end_POSTSUBSCRIPT ≤ 1 + italic_μ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT and over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT roman_lower end_POSTSUBSCRIPT ≥ 1 - italic_μ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT .

4.2 Absolute error

The relative error analysis in the previous section has two significant limitations: first, it only applies when 𝐀𝐀{\bf A}bold_A is SPSD. Second, reff(𝐀)subscript𝑟eff𝐀r_{\mathrm{eff}}({\bf A})italic_r start_POSTSUBSCRIPT roman_eff end_POSTSUBSCRIPT ( bold_A ) may sometimes be significantly larger than rstab(𝐀)subscript𝑟stab𝐀r_{\mathrm{stab}}({\bf A})italic_r start_POSTSUBSCRIPT roman_stab end_POSTSUBSCRIPT ( bold_A ), which is a better indicator of the quality of the Gaussian trace estimator since the variance of trmG(𝐀)superscriptsubscripttr𝑚𝐺𝐀\operatorname{tr}_{m}^{G}({\bf A})roman_tr start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_G end_POSTSUPERSCRIPT ( bold_A ) is 2m𝐀F22𝑚superscriptsubscriptnorm𝐀𝐹2\tfrac{2}{m}\|{\bf A}\|_{F}^{2}divide start_ARG 2 end_ARG start_ARG italic_m end_ARG ∥ bold_A ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT.

We therefore turn to the case where 𝐀𝐀{\bf A}bold_A may be indefinite but bounds on 𝐀Fsubscriptnorm𝐀𝐹\|{\bf A}\|_{F}∥ bold_A ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT and 𝐀2subscriptnorm𝐀2\|{\bf A}\|_{2}∥ bold_A ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT are known. As before, it is useful to present the results in terms of more general Gamma random variables.

Here is the main result on the absolute error of the estimator (proof in Appendix A.2).

Theorem 4.

Fix α>0𝛼0\alpha>0italic_α > 0, β>0𝛽0\beta>0italic_β > 0, and ϕαλ>0italic-ϕ𝛼𝜆0\frac{\phi}{\sqrt{\alpha}}\geq\lambda>0divide start_ARG italic_ϕ end_ARG start_ARG square-root start_ARG italic_α end_ARG end_ARG ≥ italic_λ > 0. Define x^uppersubscript^𝑥upper\hat{x}_{\mathrm{upper}}over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT roman_upper end_POSTSUBSCRIPT to be the supremum over all inflection points of the PDF of the distribution

Q𝝀+λjψ+λkψ𝔼[Q𝝀],Q𝝀𝒬abs(λ,ϕ;α,β),jk,formulae-sequencesubscript𝑄𝝀subscript𝜆𝑗𝜓subscript𝜆𝑘superscript𝜓𝔼delimited-[]subscript𝑄𝝀subscript𝑄𝝀subscript𝒬abs𝜆italic-ϕ𝛼𝛽𝑗𝑘Q_{\boldsymbol{\lambda}}+\lambda_{j}\psi+\lambda_{k}\psi^{\prime}-\mathbb{E}[Q% _{\boldsymbol{\lambda}}],\quad Q_{\boldsymbol{\lambda}}\in\mathcal{Q}_{\mathrm% {abs}}(\lambda,\phi;\alpha,\beta),\,j\neq k,italic_Q start_POSTSUBSCRIPT bold_italic_λ end_POSTSUBSCRIPT + italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_ψ + italic_λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_ψ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - blackboard_E [ italic_Q start_POSTSUBSCRIPT bold_italic_λ end_POSTSUBSCRIPT ] , italic_Q start_POSTSUBSCRIPT bold_italic_λ end_POSTSUBSCRIPT ∈ caligraphic_Q start_POSTSUBSCRIPT roman_abs end_POSTSUBSCRIPT ( italic_λ , italic_ϕ ; italic_α , italic_β ) , italic_j ≠ italic_k , (10)

where ψ,ψi.i.d.Gamma(1,β)superscriptsimilar-toi.i.d.𝜓superscript𝜓𝐺𝑎𝑚𝑚𝑎1𝛽\psi,\psi^{\prime}\stackrel{{\scriptstyle\text{i.i.d.}}}{{\sim}}Gamma(1,\beta)italic_ψ , italic_ψ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_RELOP SUPERSCRIPTOP start_ARG ∼ end_ARG start_ARG i.i.d. end_ARG end_RELOP italic_G italic_a italic_m italic_m italic_a ( 1 , italic_β ) are exponential r.v’s independent of Q𝛌subscript𝑄𝛌Q_{\boldsymbol{\lambda}}italic_Q start_POSTSUBSCRIPT bold_italic_λ end_POSTSUBSCRIPT. If Q𝛌,Q𝛍𝒬abs(λ,ϕ;α,β)subscript𝑄𝛌subscript𝑄𝛍subscript𝒬abs𝜆italic-ϕ𝛼𝛽Q_{\boldsymbol{\lambda}},Q_{\boldsymbol{\mu}}\in\mathcal{Q}_{\mathrm{abs}}(% \lambda,\phi;\alpha,\beta)italic_Q start_POSTSUBSCRIPT bold_italic_λ end_POSTSUBSCRIPT , italic_Q start_POSTSUBSCRIPT bold_italic_μ end_POSTSUBSCRIPT ∈ caligraphic_Q start_POSTSUBSCRIPT roman_abs end_POSTSUBSCRIPT ( italic_λ , italic_ϕ ; italic_α , italic_β ) satisfy 𝛍F𝛌subscriptprecedes-or-equals𝐹𝛍𝛌\boldsymbol{\mu}\preceq_{F}\boldsymbol{\lambda}bold_italic_μ ⪯ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT bold_italic_λ, then

Pr(Q𝝁𝔼[Q𝝁]x)Pr(Q𝝀𝔼[Q𝝀]x),|x|>x^upper.formulae-sequencePrsubscript𝑄𝝁𝔼delimited-[]subscript𝑄𝝁𝑥Prsubscript𝑄𝝀𝔼delimited-[]subscript𝑄𝝀𝑥for-all𝑥subscript^𝑥upper\mathrm{Pr}(Q_{\boldsymbol{\mu}}-\mathbb{E}[Q_{\boldsymbol{\mu}}]\leq x)\geq% \mathrm{Pr}(Q_{\boldsymbol{\lambda}}-\mathbb{E}[Q_{\boldsymbol{\lambda}}]\leq x% ),\quad\forall|x|>\hat{x}_{\mathrm{upper}}.roman_Pr ( italic_Q start_POSTSUBSCRIPT bold_italic_μ end_POSTSUBSCRIPT - blackboard_E [ italic_Q start_POSTSUBSCRIPT bold_italic_μ end_POSTSUBSCRIPT ] ≤ italic_x ) ≥ roman_Pr ( italic_Q start_POSTSUBSCRIPT bold_italic_λ end_POSTSUBSCRIPT - blackboard_E [ italic_Q start_POSTSUBSCRIPT bold_italic_λ end_POSTSUBSCRIPT ] ≤ italic_x ) , ∀ | italic_x | > over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT roman_upper end_POSTSUBSCRIPT .

This single equation (note the use of the absolute value |x|𝑥|x|| italic_x | in the condition) means that Q𝝀subscript𝑄𝝀Q_{\boldsymbol{\lambda}}italic_Q start_POSTSUBSCRIPT bold_italic_λ end_POSTSUBSCRIPT has worse upper tail bounds and that Q𝝁subscript𝑄𝝁Q_{\boldsymbol{\mu}}italic_Q start_POSTSUBSCRIPT bold_italic_μ end_POSTSUBSCRIPT has worse lower tail bounds.

4.2.1 Locating the tail

Theorem 4 implies that the “tails” are the regions where all density functions of the form (10) are convex. For reference, a normal distribution has inflection points equal to the mean plus or minus one standard deviation.

How much further away could the tails begin? Theorem 5 gives a pessimistic result.

Theorem 5.

The point x^uppersubscript^𝑥upper\hat{x}_{\mathrm{upper}}over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT roman_upper end_POSTSUBSCRIPT as defined in Theorem 4 satisfies

x^upperϕ1+rα+1rα,wherer=ϕ2λ2α.formulae-sequencesubscript^𝑥upperitalic-ϕ1𝑟𝛼1𝑟𝛼where𝑟superscriptitalic-ϕ2superscript𝜆2𝛼\hat{x}_{\mathrm{upper}}\geq\phi\frac{1+\sqrt{r\alpha+1}}{\sqrt{r\alpha}},% \quad\text{where}\ r=\left\lceil\frac{\phi^{2}}{\lambda^{2}\alpha}\right\rceil.over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT roman_upper end_POSTSUBSCRIPT ≥ italic_ϕ divide start_ARG 1 + square-root start_ARG italic_r italic_α + 1 end_ARG end_ARG start_ARG square-root start_ARG italic_r italic_α end_ARG end_ARG , where italic_r = ⌈ divide start_ARG italic_ϕ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_λ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_α end_ARG ⌉ .

See Appendix A.3 for the proof. I’ll leave an upper bound as a conjecture.

Conjecture 2.

The point x^uppersubscript^𝑥upper\hat{x}_{\mathrm{upper}}over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT roman_upper end_POSTSUBSCRIPT as defined in Theorem 4 satisfies

x^upperλ+ϕ2+λ2.subscript^𝑥upper𝜆superscriptitalic-ϕ2superscript𝜆2\hat{x}_{\mathrm{upper}}\leq\lambda+\sqrt{\phi^{2}+\lambda^{2}}.over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT roman_upper end_POSTSUBSCRIPT ≤ italic_λ + square-root start_ARG italic_ϕ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_λ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG .

5 Application to trace estimation

In this section I’ll rephrase the main theorems (Theorems 2 and 4) more directly in terms of Gaussian trace estimation.

Here is the basic idea: the matrix 𝐀rel(μ)subscript𝐀rel𝜇{\bf A}_{\mathrm{rel}}(\mu)bold_A start_POSTSUBSCRIPT roman_rel end_POSTSUBSCRIPT ( italic_μ ) from (2) (resp. 𝐀abs(λ,ϕ)subscript𝐀abs𝜆italic-ϕ{\bf A}_{\mathrm{abs}}(\lambda,\phi)bold_A start_POSTSUBSCRIPT roman_abs end_POSTSUBSCRIPT ( italic_λ , italic_ϕ ) from (3)) majorizes every other element of 𝒜rel(μ)subscript𝒜rel𝜇\mathcal{A}_{\mathrm{rel}}(\mu)caligraphic_A start_POSTSUBSCRIPT roman_rel end_POSTSUBSCRIPT ( italic_μ ) (resp. 𝒜abs(λ,ϕ)subscript𝒜abs𝜆italic-ϕ\mathcal{A}_{\mathrm{abs}}(\lambda,\phi)caligraphic_A start_POSTSUBSCRIPT roman_abs end_POSTSUBSCRIPT ( italic_λ , italic_ϕ )). Thus by the main theorems, these two matrices have the worst tail bounds. Employing the infinite division strategy of Section (3.3) shows that the Gaussian trace estimators for these matrices are in turn tail-bounded by the Gamma distributions in (4) and (5), respectively. Finally, increasing the sampling number m𝑚mitalic_m will make the tails take up a larger portion of the distribution, increasing the domain over which the bounds in this paper are effective.

First up is the relative error bound. Note that a little unit conversion is used here: if reff(𝐀)=μsubscript𝑟eff𝐀𝜇r_{\mathrm{eff}}({\bf A})=\muitalic_r start_POSTSUBSCRIPT roman_eff end_POSTSUBSCRIPT ( bold_A ) = italic_μ, then αeff(trmG(𝐀))=mμ2subscript𝛼effsuperscriptsubscripttr𝑚𝐺𝐀𝑚𝜇2\alpha_{\mathrm{eff}}(\operatorname{tr}_{m}^{G}({\bf A}))=\frac{m\mu}{2}italic_α start_POSTSUBSCRIPT roman_eff end_POSTSUBSCRIPT ( roman_tr start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_G end_POSTSUPERSCRIPT ( bold_A ) ) = divide start_ARG italic_m italic_μ end_ARG start_ARG 2 end_ARG.

Theorem 6.

For nonzero SPSD 𝐀n×n𝐀superscript𝑛𝑛{\bf A}\in\mathbb{R}^{n\times n}bold_A ∈ blackboard_R start_POSTSUPERSCRIPT italic_n × italic_n end_POSTSUPERSCRIPT and sampling number m𝑚mitalic_m, let μ=reff(𝐀)𝜇subscript𝑟eff𝐀\mu=r_{\mathrm{eff}}({\bf A})italic_μ = italic_r start_POSTSUBSCRIPT roman_eff end_POSTSUBSCRIPT ( bold_A ). Then there exists εrelsubscript𝜀rel\varepsilon_{\mathrm{rel}}italic_ε start_POSTSUBSCRIPT roman_rel end_POSTSUBSCRIPT such that for εεrel𝜀subscript𝜀rel\varepsilon\geq\varepsilon_{\mathrm{rel}}italic_ε ≥ italic_ε start_POSTSUBSCRIPT roman_rel end_POSTSUBSCRIPT,

Pr(|trmG(𝐀)tr(𝐀)|εtr(𝐀))Prsuperscriptsubscripttr𝑚𝐺𝐀tr𝐀𝜀tr𝐀\displaystyle\mathrm{Pr}\left(|\operatorname{tr}_{m}^{G}({\bf A})-% \operatorname{tr}({\bf A})|\geq\varepsilon\cdot\operatorname{tr}({\bf A})\right)roman_Pr ( | roman_tr start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_G end_POSTSUPERSCRIPT ( bold_A ) - roman_tr ( bold_A ) | ≥ italic_ε ⋅ roman_tr ( bold_A ) ) Pr(|trmG(𝐀rel)1|ε)absentPrsuperscriptsubscripttr𝑚𝐺subscript𝐀rel1𝜀\displaystyle\leq\mathrm{Pr}\left(|\operatorname{tr}_{m}^{G}({\bf A}_{\mathrm{% rel}})-1|\geq\varepsilon\right)≤ roman_Pr ( | roman_tr start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_G end_POSTSUPERSCRIPT ( bold_A start_POSTSUBSCRIPT roman_rel end_POSTSUBSCRIPT ) - 1 | ≥ italic_ε )
Pr(|X1|ε),absentPr𝑋1𝜀\displaystyle\leq\mathrm{Pr}(|X-1|\geq\varepsilon),≤ roman_Pr ( | italic_X - 1 | ≥ italic_ε ) ,

where 𝐀rel=𝐀rel(μ)subscript𝐀relsubscript𝐀rel𝜇{\bf A}_{\mathrm{rel}}={\bf A}_{\mathrm{rel}}(\mu)bold_A start_POSTSUBSCRIPT roman_rel end_POSTSUBSCRIPT = bold_A start_POSTSUBSCRIPT roman_rel end_POSTSUBSCRIPT ( italic_μ ) is defined in (2) and XGamma(mμ2,mμ2)similar-to𝑋𝐺𝑎𝑚𝑚𝑎𝑚𝜇2𝑚𝜇2X\sim Gamma\left(\frac{m\mu}{2},\frac{m\mu}{2}\right)italic_X ∼ italic_G italic_a italic_m italic_m italic_a ( divide start_ARG italic_m italic_μ end_ARG start_ARG 2 end_ARG , divide start_ARG italic_m italic_μ end_ARG start_ARG 2 end_ARG ).

Conjecture 3.

εrel2/(mμ)subscript𝜀rel2𝑚𝜇\varepsilon_{\mathrm{rel}}\leq 2/(m\mu)italic_ε start_POSTSUBSCRIPT roman_rel end_POSTSUBSCRIPT ≤ 2 / ( italic_m italic_μ ).

For comparison, Corollary 1 holds for all ε>0𝜀0\varepsilon>0italic_ε > 0, but with marginally weaker bounds.

Next is the absolute error bound. Again there is a bit of unit conversion: if 𝐀2=λsubscriptnorm𝐀2𝜆\|{\bf A}\|_{2}=\lambda∥ bold_A ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = italic_λ and 𝐀F=ϕsubscriptnorm𝐀𝐹italic-ϕ\|{\bf A}\|_{F}=\phi∥ bold_A ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT = italic_ϕ, then scale(trmG(𝐀))=2λmscalesuperscriptsubscripttr𝑚𝐺𝐀2𝜆𝑚\operatorname{scale}(\operatorname{tr}_{m}^{G}({\bf A}))=\frac{2\lambda}{m}roman_scale ( roman_tr start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_G end_POSTSUPERSCRIPT ( bold_A ) ) = divide start_ARG 2 italic_λ end_ARG start_ARG italic_m end_ARG and Var[trmG(𝐀)]=2mϕ2Varsuperscriptsubscripttr𝑚𝐺𝐀2𝑚superscriptitalic-ϕ2\operatorname{Var}[\operatorname{tr}_{m}^{G}({\bf A})]=\frac{2}{m}\phi^{2}roman_Var [ roman_tr start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_G end_POSTSUPERSCRIPT ( bold_A ) ] = divide start_ARG 2 end_ARG start_ARG italic_m end_ARG italic_ϕ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT.

Theorem 7.

Let 𝐀n×n𝐀superscript𝑛𝑛{\bf A}\in\mathbb{R}^{n\times n}bold_A ∈ blackboard_R start_POSTSUPERSCRIPT italic_n × italic_n end_POSTSUPERSCRIPT be symmetric with 𝐀2=λsubscriptnorm𝐀2𝜆\|{\bf A}\|_{2}=\lambda∥ bold_A ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = italic_λ and 𝐀F=ϕsubscriptnorm𝐀𝐹italic-ϕ\|{\bf A}\|_{F}=\phi∥ bold_A ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT = italic_ϕ. For fixed sampling number m𝑚mitalic_m, there exists εabssubscript𝜀abs\varepsilon_{\mathrm{abs}}italic_ε start_POSTSUBSCRIPT roman_abs end_POSTSUBSCRIPT such that for εεabs𝜀subscript𝜀abs\varepsilon\geq\varepsilon_{\mathrm{abs}}italic_ε ≥ italic_ε start_POSTSUBSCRIPT roman_abs end_POSTSUBSCRIPT,

Pr(|trmG(𝐀)tr(𝐀)|ε)Prsuperscriptsubscripttr𝑚𝐺𝐀tr𝐀𝜀\displaystyle\mathrm{Pr}\left(|\operatorname{tr}_{m}^{G}({\bf A})-% \operatorname{tr}({\bf A})|\geq\varepsilon\right)roman_Pr ( | roman_tr start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_G end_POSTSUPERSCRIPT ( bold_A ) - roman_tr ( bold_A ) | ≥ italic_ε ) 2Pr(trmG(𝐀abs)tr(𝐀abs)ε)absent2Prsuperscriptsubscripttr𝑚𝐺subscript𝐀abstrsubscript𝐀abs𝜀\displaystyle\leq 2\,\mathrm{Pr}\left(\operatorname{tr}_{m}^{G}({\bf A}_{% \mathrm{abs}})-\operatorname{tr}({\bf A}_{\mathrm{abs}})\geq\varepsilon\right)≤ 2 roman_Pr ( roman_tr start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_G end_POSTSUPERSCRIPT ( bold_A start_POSTSUBSCRIPT roman_abs end_POSTSUBSCRIPT ) - roman_tr ( bold_A start_POSTSUBSCRIPT roman_abs end_POSTSUBSCRIPT ) ≥ italic_ε )
2Pr(X𝔼[X]ε),absent2Pr𝑋𝔼delimited-[]𝑋𝜀\displaystyle\leq 2\,\mathrm{Pr}(X-\mathbb{E}[X]\geq\varepsilon),≤ 2 roman_Pr ( italic_X - blackboard_E [ italic_X ] ≥ italic_ε ) ,

where 𝐀abs=𝐀abs(λ,ϕ)subscript𝐀abssubscript𝐀abs𝜆italic-ϕ{\bf A}_{\mathrm{abs}}={\bf A}_{\mathrm{abs}}(\lambda,\phi)bold_A start_POSTSUBSCRIPT roman_abs end_POSTSUBSCRIPT = bold_A start_POSTSUBSCRIPT roman_abs end_POSTSUBSCRIPT ( italic_λ , italic_ϕ ) is defined in (3) and XGamma(mρ2,m2λ)similar-to𝑋𝐺𝑎𝑚𝑚𝑎𝑚𝜌2𝑚2𝜆X\sim Gamma\left(\frac{m\rho}{2},\frac{m}{2\lambda}\right)italic_X ∼ italic_G italic_a italic_m italic_m italic_a ( divide start_ARG italic_m italic_ρ end_ARG start_ARG 2 end_ARG , divide start_ARG italic_m end_ARG start_ARG 2 italic_λ end_ARG ) with ρ=rstab(𝐀)=ϕ2/λ2𝜌subscript𝑟stab𝐀superscriptitalic-ϕ2superscript𝜆2\rho=r_{\mathrm{stab}}({\bf A})=\phi^{2}/\lambda^{2}italic_ρ = italic_r start_POSTSUBSCRIPT roman_stab end_POSTSUBSCRIPT ( bold_A ) = italic_ϕ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / italic_λ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT.

Conjecture 4.

εabs2λm+2mϕ2+(2λm)2subscript𝜀abs2𝜆𝑚2𝑚superscriptitalic-ϕ2superscript2𝜆𝑚2\varepsilon_{\mathrm{abs}}\leq\frac{2\lambda}{m}+\sqrt{\frac{2}{m}\phi^{2}+% \left(\frac{2\lambda}{m}\right)^{2}}italic_ε start_POSTSUBSCRIPT roman_abs end_POSTSUBSCRIPT ≤ divide start_ARG 2 italic_λ end_ARG start_ARG italic_m end_ARG + square-root start_ARG divide start_ARG 2 end_ARG start_ARG italic_m end_ARG italic_ϕ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( divide start_ARG 2 italic_λ end_ARG start_ARG italic_m end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG.

For comparison, Theorem 1 holds for all ε>0𝜀0\varepsilon>0italic_ε > 0, but with marginally weaker bounds.

The factor of 2 appearing in the bounds in Theorem 7 is due to the fact that the result uses 𝐀abssubscript𝐀abs{\bf A}_{\mathrm{abs}}bold_A start_POSTSUBSCRIPT roman_abs end_POSTSUBSCRIPT for the upper tail bound and 𝐀abssubscript𝐀abs-{\bf A}_{\mathrm{abs}}- bold_A start_POSTSUBSCRIPT roman_abs end_POSTSUBSCRIPT for the lower tail bound (resp. X𝑋Xitalic_X and X𝑋-X- italic_X). Note also that in Conjecture 4 the terms ϕitalic-ϕ\phiitalic_ϕ and λ𝜆\lambdaitalic_λ scale differently as the sampling number m𝑚mitalic_m increases. If true, the conjecture implies that εabssubscript𝜀abs\varepsilon_{\mathrm{abs}}italic_ε start_POSTSUBSCRIPT roman_abs end_POSTSUBSCRIPT converges to ϕitalic-ϕ\phiitalic_ϕ as m𝑚m\rightarrow\inftyitalic_m → ∞.

6 Conclusion

So what exactly are the practical implications of all of this? One takeaway, following from Theorem 6, is that the relative error bound for SPSD matrices depends on mreff(𝐀)𝑚subscript𝑟eff𝐀m\cdot r_{\mathrm{eff}}({\bf A})italic_m ⋅ italic_r start_POSTSUBSCRIPT roman_eff end_POSTSUBSCRIPT ( bold_A ), so having a large effective rank is just as beneficial as using a large sampling number. One particular application is in estimating the Frobenius norm of a matrix. The worst-case bounds of [9] hold when the matrix has rank one, but with a lower bound on the stable rank of 𝐀𝐀{\bf A}bold_A these bounds can be tightened. This could in turn reduce the number of samples required to estimate the Frobenius norm to a given tolerance, as in [5, Lemma 2.2].

Another takeaway is that the bounds by Cortinovis and Kressner in Theorem 1 and Corollary 1 are already quite good. The improvements made in this paper, which establishes bounds that are tight given certain information about 𝐀𝐀{\bf A}bold_A, are fairly minor. If you want any further improvements to these tail bounds, you will have to use more information about the spectrum of 𝐀𝐀{\bf A}bold_A.

In practice, you probably shouldn’t use the Gaussian trace estimator on its own. If your matrix has a small stable rank or effective rank, you’d do better to combine the trace estimator with a low-rank approximation such as in [4, 5, 3]. Furthermore, you can reduce the variance by using the Hutchinson estimator (sampling vectors with random ±1plus-or-minus1\pm 1± 1 entries), or by normalizing the Gaussian vectors to have unit length [2]. The main proof techniques in this paper do not work on the Hutchinson estimator because it has a discrete distribution. The techniques could potentially be applied to the normalized Gaussian estimator, but the proofs will be more complicated.

Finally, this paper successfully applies the proof techniques of [10, 9, 8] to obtain a majorization result and tight tail bounds for matrices with fixed 2222-norm and Frobenius norm. The application to trace estimation may not yield results significantly better than those that can be obtained through concentration inequalities, but the approach—comparing the CDFs of distributions to solve an optimization problem directly—is markedly different.

Appendix A Proofs

Proof of Lemma 2.

Recall from Definition 5 that 𝝁F𝝀subscriptprecedes-or-equals𝐹𝝁𝝀\boldsymbol{\mu}\preceq_{F}\boldsymbol{\lambda}bold_italic_μ ⪯ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT bold_italic_λ if

(𝝁+)2w(𝝀+)2,subscriptprecedes-or-equals𝑤superscriptsuperscript𝝁2superscriptsuperscript𝝀2\displaystyle(\boldsymbol{\mu}^{+})^{2}\preceq_{w}(\boldsymbol{\lambda}^{+})^{% 2},( bold_italic_μ start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⪯ start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT ( bold_italic_λ start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (7a revisited)
(𝝀)2w(𝝁)2,subscriptprecedes-or-equals𝑤superscriptsuperscript𝝀2superscriptsuperscript𝝁2\displaystyle(\boldsymbol{\lambda}^{-})^{2}\preceq_{w}(\boldsymbol{\mu}^{-})^{% 2},( bold_italic_λ start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⪯ start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT ( bold_italic_μ start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (7b revisited)
i=1nμi2=i=1nλi2.superscriptsubscript𝑖1𝑛superscriptsubscript𝜇𝑖2superscriptsubscript𝑖1𝑛superscriptsubscript𝜆𝑖2\displaystyle\sum_{i=1}^{n}\mu_{i}^{2}=\sum_{i=1}^{n}\lambda_{i}^{2}.∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . (7c revisited)

First suppose that the inequalities in (7a) and (7b) are strict (either both are, or neither). Find the leading slack indices (see Definition 3) j+superscript𝑗j^{+}italic_j start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT and jsuperscript𝑗j^{-}italic_j start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT for each inequality. Increase the corresponding nonnegative coordinate222If no such coordinate exists, pad 𝝁𝝁\boldsymbol{\mu}bold_italic_μ with zeros as needed. For example, if 𝝀=(2,2)𝝀22\boldsymbol{\lambda}=(2,2)bold_italic_λ = ( 2 , 2 ) and 𝝁=(2,2)𝝁22\boldsymbol{\mu}=(-2,-2)bold_italic_μ = ( - 2 , - 2 ), we would get the sequence (2,2,0)(2,0,2)(0,2,2)maps-to220202maps-to022(-2,-2,0)\mapsto(-2,0,2)\mapsto(0,2,2)( - 2 , - 2 , 0 ) ↦ ( - 2 , 0 , 2 ) ↦ ( 0 , 2 , 2 ). Such padding allows us to keep the sum of squares constant while changing the vectors continuously. of 𝝁𝝁\boldsymbol{\mu}bold_italic_μ while shrinking the negative coordinate toward zero, keeping their sum-of-squares constant, until one of the slack inequalities becomes an equality. Repeat a finite number of times to eliminate all slack.

Then, apply Lemma (1) to the cases (𝝁+)2(𝝀+)2precedes-or-equalssuperscriptsuperscript𝝁2superscriptsuperscript𝝀2(\boldsymbol{\mu}^{+})^{2}\preceq(\boldsymbol{\lambda}^{+})^{2}( bold_italic_μ start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⪯ ( bold_italic_λ start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT and (𝝀)2(𝝁)2precedes-or-equalssuperscriptsuperscript𝝀2superscriptsuperscript𝝁2(\boldsymbol{\lambda}^{-})^{2}\preceq(\boldsymbol{\mu}^{-})^{2}( bold_italic_λ start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⪯ ( bold_italic_μ start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT separately. ∎

A.1 Relative error majorization theorem

As a reminder, most of this proof is substantially the same as the one given in [8]. I provide it here in part to show how the proof techniques relate to those used for the absolute error case.

Proof of Theorem 2.

Lemma 1 implies that we can without loss of generality assume that 𝝁𝝁\boldsymbol{\mu}bold_italic_μ and 𝝀𝝀\boldsymbol{\lambda}bold_italic_λ differ in two coordinates only, satisfying 0λk<μkμj<λj0subscript𝜆𝑘subscript𝜇𝑘subscript𝜇𝑗subscript𝜆𝑗0\leq\lambda_{k}<\mu_{k}\leq\mu_{j}<\lambda_{j}0 ≤ italic_λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT < italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ≤ italic_μ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT < italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT. For t[0,1]𝑡01t\in[0,1]italic_t ∈ [ 0 , 1 ], define

νi(t)subscript𝜈𝑖𝑡\displaystyle\nu_{i}(t)italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) :=tλi+(1t)μi,i{j,k},formulae-sequenceassignabsent𝑡subscript𝜆𝑖1𝑡subscript𝜇𝑖𝑖𝑗𝑘\displaystyle:=t\lambda_{i}+(1-t)\mu_{i},\quad i\in\{j,k\},:= italic_t italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + ( 1 - italic_t ) italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_i ∈ { italic_j , italic_k } ,
νi(t)subscript𝜈𝑖𝑡\displaystyle\nu_{i}(t)italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) :=λi,ij,formulae-sequenceassignabsentsubscript𝜆𝑖𝑖𝑗\displaystyle:=\lambda_{i},\quad i\neq j,:= italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_i ≠ italic_j ,
Y(t)𝑌𝑡\displaystyle Y(t)italic_Y ( italic_t ) :=i=1nνi(t)Xi.assignabsentsuperscriptsubscript𝑖1𝑛subscript𝜈𝑖𝑡subscript𝑋𝑖\displaystyle:=\sum_{i=1}^{n}\nu_{i}(t)X_{i}.:= ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT .

This interpolation satisfies Y(0)=Q𝝁𝑌0subscript𝑄𝝁Y(0)=Q_{\boldsymbol{\mu}}italic_Y ( 0 ) = italic_Q start_POSTSUBSCRIPT bold_italic_μ end_POSTSUBSCRIPT and Y(1)=Q𝝀𝑌1subscript𝑄𝝀Y(1)=Q_{\boldsymbol{\lambda}}italic_Y ( 1 ) = italic_Q start_POSTSUBSCRIPT bold_italic_λ end_POSTSUBSCRIPT. Our goal is to show that for certain fixed values of x𝑥xitalic_x, the CDF FY(t)(x)subscript𝐹𝑌𝑡𝑥F_{Y(t)}(x)italic_F start_POSTSUBSCRIPT italic_Y ( italic_t ) end_POSTSUBSCRIPT ( italic_x ) is monotonic for t[0,1]𝑡01t\in[0,1]italic_t ∈ [ 0 , 1 ].

Take the Laplace transform of FY(t)subscript𝐹𝑌𝑡F_{Y(t)}italic_F start_POSTSUBSCRIPT italic_Y ( italic_t ) end_POSTSUBSCRIPT as

J(t,z):=[FY(t)](z)=0ezxFY(t)(x)𝑑x=1z0FY(t)(x)d(ezx)=1z0ezx𝑑FY(t)(x)=1z[Y(t)](z),assign𝐽𝑡𝑧delimited-[]subscript𝐹𝑌𝑡𝑧superscriptsubscript0superscript𝑒𝑧𝑥subscript𝐹𝑌𝑡𝑥differential-d𝑥1𝑧superscriptsubscript0subscript𝐹𝑌𝑡𝑥𝑑superscript𝑒𝑧𝑥1𝑧superscriptsubscript0superscript𝑒𝑧𝑥differential-dsubscript𝐹𝑌𝑡𝑥1𝑧delimited-[]𝑌𝑡𝑧\displaystyle\begin{split}J(t,z):=\mathcal{L}[F_{Y(t)}](z)&=\int_{0}^{\infty}e% ^{-zx}F_{Y(t)}(x)\,dx\\ &=\frac{-1}{z}\int_{0}^{\infty}F_{Y(t)}(x)\,d\left(e^{-zx}\right)\\ &=\frac{1}{z}\int_{0}^{\infty}e^{-zx}\,dF_{Y(t)}(x)\\ &=\frac{1}{z}\mathcal{L}[Y(t)](z),\end{split}start_ROW start_CELL italic_J ( italic_t , italic_z ) := caligraphic_L [ italic_F start_POSTSUBSCRIPT italic_Y ( italic_t ) end_POSTSUBSCRIPT ] ( italic_z ) end_CELL start_CELL = ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT - italic_z italic_x end_POSTSUPERSCRIPT italic_F start_POSTSUBSCRIPT italic_Y ( italic_t ) end_POSTSUBSCRIPT ( italic_x ) italic_d italic_x end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = divide start_ARG - 1 end_ARG start_ARG italic_z end_ARG ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_F start_POSTSUBSCRIPT italic_Y ( italic_t ) end_POSTSUBSCRIPT ( italic_x ) italic_d ( italic_e start_POSTSUPERSCRIPT - italic_z italic_x end_POSTSUPERSCRIPT ) end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = divide start_ARG 1 end_ARG start_ARG italic_z end_ARG ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT - italic_z italic_x end_POSTSUPERSCRIPT italic_d italic_F start_POSTSUBSCRIPT italic_Y ( italic_t ) end_POSTSUBSCRIPT ( italic_x ) end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = divide start_ARG 1 end_ARG start_ARG italic_z end_ARG caligraphic_L [ italic_Y ( italic_t ) ] ( italic_z ) , end_CELL end_ROW

where [Y(t)](z):=𝔼[ezY(t)]assigndelimited-[]𝑌𝑡𝑧𝔼delimited-[]superscript𝑒𝑧𝑌𝑡\mathcal{L}[Y(t)](z):=\mathbb{E}[e^{-zY(t)}]caligraphic_L [ italic_Y ( italic_t ) ] ( italic_z ) := blackboard_E [ italic_e start_POSTSUPERSCRIPT - italic_z italic_Y ( italic_t ) end_POSTSUPERSCRIPT ], the Laplace transform of Y(t)𝑌𝑡Y(t)italic_Y ( italic_t ), satisfies

[Y(t)](z)=i=1n(1+νi(t)zβ)α,z,(z)>min1inβνi(t).formulae-sequencedelimited-[]𝑌𝑡𝑧superscriptsubscriptproduct𝑖1𝑛superscript1subscript𝜈𝑖𝑡𝑧𝛽𝛼formulae-sequence𝑧𝑧subscript1𝑖𝑛𝛽subscript𝜈𝑖𝑡\mathcal{L}[Y(t)](z)=\prod_{i=1}^{n}\left(1+\frac{\nu_{i}(t)z}{\beta}\right)^{% -\alpha},\quad z\in\mathbb{C},\ \Re(z)>-\min_{1\leq i\leq n}\frac{\beta}{\nu_{% i}(t)}.caligraphic_L [ italic_Y ( italic_t ) ] ( italic_z ) = ∏ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( 1 + divide start_ARG italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) italic_z end_ARG start_ARG italic_β end_ARG ) start_POSTSUPERSCRIPT - italic_α end_POSTSUPERSCRIPT , italic_z ∈ blackboard_C , roman_ℜ ( italic_z ) > - roman_min start_POSTSUBSCRIPT 1 ≤ italic_i ≤ italic_n end_POSTSUBSCRIPT divide start_ARG italic_β end_ARG start_ARG italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) end_ARG . (11)

Differentiating with respect to t𝑡titalic_t yields

tJ(t,z)𝑡𝐽𝑡𝑧\displaystyle\frac{\partial}{\partial t}J(t,z)divide start_ARG ∂ end_ARG start_ARG ∂ italic_t end_ARG italic_J ( italic_t , italic_z ) =J(t,z)tlnJ(t,z)absent𝐽𝑡𝑧𝑡𝐽𝑡𝑧\displaystyle=J(t,z)\frac{\partial}{\partial t}\ln J(t,z)= italic_J ( italic_t , italic_z ) divide start_ARG ∂ end_ARG start_ARG ∂ italic_t end_ARG roman_ln italic_J ( italic_t , italic_z )
=J(t,z)ti{j,k}(αln(1+νi(t)zβ))absent𝐽𝑡𝑧𝑡subscript𝑖𝑗𝑘𝛼1subscript𝜈𝑖𝑡𝑧𝛽\displaystyle=J(t,z)\frac{\partial}{\partial t}\sum_{i\in\{j,k\}}\left(-\alpha% \ln\left(1+\frac{\nu_{i}(t)z}{\beta}\right)\right)= italic_J ( italic_t , italic_z ) divide start_ARG ∂ end_ARG start_ARG ∂ italic_t end_ARG ∑ start_POSTSUBSCRIPT italic_i ∈ { italic_j , italic_k } end_POSTSUBSCRIPT ( - italic_α roman_ln ( 1 + divide start_ARG italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) italic_z end_ARG start_ARG italic_β end_ARG ) )
=J(t,z)zαβi{j,k}λiμi1+νi(t)zβ.absent𝐽𝑡𝑧𝑧𝛼𝛽subscript𝑖𝑗𝑘subscript𝜆𝑖subscript𝜇𝑖1subscript𝜈𝑖𝑡𝑧𝛽\displaystyle=-J(t,z)z\frac{\alpha}{\beta}\sum_{i\in\{j,k\}}\frac{\lambda_{i}-% \mu_{i}}{1+\frac{\nu_{i}(t)z}{\beta}}.= - italic_J ( italic_t , italic_z ) italic_z divide start_ARG italic_α end_ARG start_ARG italic_β end_ARG ∑ start_POSTSUBSCRIPT italic_i ∈ { italic_j , italic_k } end_POSTSUBSCRIPT divide start_ARG italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG 1 + divide start_ARG italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) italic_z end_ARG start_ARG italic_β end_ARG end_ARG .

Recall that λjμj=μkλk>0subscript𝜆𝑗subscript𝜇𝑗subscript𝜇𝑘subscript𝜆𝑘0\lambda_{j}-\mu_{j}=\mu_{k}-\lambda_{k}>0italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT > 0 (since 𝝁𝝀precedes-or-equals𝝁𝝀\boldsymbol{\mu}\preceq\boldsymbol{\lambda}bold_italic_μ ⪯ bold_italic_λ), and see that

11+νj(t)zβ11+νk(t)zβ=(νk(t)νj(t))zβi{j,k}(1+νi(t)zβ)1.11subscript𝜈𝑗𝑡𝑧𝛽11subscript𝜈𝑘𝑡𝑧𝛽subscript𝜈𝑘𝑡subscript𝜈𝑗𝑡𝑧𝛽subscriptproduct𝑖𝑗𝑘superscript1subscript𝜈𝑖𝑡𝑧𝛽1\frac{1}{1+\frac{\nu_{j}(t)z}{\beta}}-\frac{1}{1+\frac{\nu_{k}(t)z}{\beta}}=(% \nu_{k}(t)-\nu_{j}(t))\frac{z}{\beta}\prod_{i\in\{j,k\}}\left(1+\frac{\nu_{i}(% t)z}{\beta}\right)^{-1}.divide start_ARG 1 end_ARG start_ARG 1 + divide start_ARG italic_ν start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_t ) italic_z end_ARG start_ARG italic_β end_ARG end_ARG - divide start_ARG 1 end_ARG start_ARG 1 + divide start_ARG italic_ν start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t ) italic_z end_ARG start_ARG italic_β end_ARG end_ARG = ( italic_ν start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t ) - italic_ν start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_t ) ) divide start_ARG italic_z end_ARG start_ARG italic_β end_ARG ∏ start_POSTSUBSCRIPT italic_i ∈ { italic_j , italic_k } end_POSTSUBSCRIPT ( 1 + divide start_ARG italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) italic_z end_ARG start_ARG italic_β end_ARG ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT . (12)

It follows that

Jt(t,z)=J(t,z)z2αβ2(λjμj)(νj(t)νk(t))i{j,k}(1+νi(t)zβ).𝐽𝑡𝑡𝑧𝐽𝑡𝑧superscript𝑧2𝛼superscript𝛽2subscript𝜆𝑗subscript𝜇𝑗subscript𝜈𝑗𝑡subscript𝜈𝑘𝑡subscriptproduct𝑖𝑗𝑘1subscript𝜈𝑖𝑡𝑧𝛽\frac{\partial J}{\partial t}(t,z)=J(t,z)z^{2}\frac{\alpha}{\beta^{2}}\frac{(% \lambda_{j}-\mu_{j})(\nu_{j}(t)-\nu_{k}(t))}{\prod_{i\in\{j,k\}}\left(1+\frac{% \nu_{i}(t)z}{\beta}\right)}.divide start_ARG ∂ italic_J end_ARG start_ARG ∂ italic_t end_ARG ( italic_t , italic_z ) = italic_J ( italic_t , italic_z ) italic_z start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT divide start_ARG italic_α end_ARG start_ARG italic_β start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG divide start_ARG ( italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ( italic_ν start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_t ) - italic_ν start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t ) ) end_ARG start_ARG ∏ start_POSTSUBSCRIPT italic_i ∈ { italic_j , italic_k } end_POSTSUBSCRIPT ( 1 + divide start_ARG italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) italic_z end_ARG start_ARG italic_β end_ARG ) end_ARG .

Substitute J(t,z):=[FY(t)](z)assign𝐽𝑡𝑧delimited-[]subscript𝐹𝑌𝑡𝑧J(t,z):=\mathcal{L}[F_{Y(t)}](z)italic_J ( italic_t , italic_z ) := caligraphic_L [ italic_F start_POSTSUBSCRIPT italic_Y ( italic_t ) end_POSTSUBSCRIPT ] ( italic_z ) and apply the inverse Laplace transform to get

tFY(t)(x)𝑡subscript𝐹𝑌𝑡𝑥\displaystyle\frac{\partial}{\partial t}F_{Y(t)}(x)divide start_ARG ∂ end_ARG start_ARG ∂ italic_t end_ARG italic_F start_POSTSUBSCRIPT italic_Y ( italic_t ) end_POSTSUBSCRIPT ( italic_x ) =αβ2(λjμj)(νj(t)νk(t))2x2Pr(Y(t)+νj(t)ψ+νkψx)absent𝛼superscript𝛽2subscript𝜆𝑗subscript𝜇𝑗subscript𝜈𝑗𝑡subscript𝜈𝑘𝑡superscript2superscript𝑥2Pr𝑌𝑡subscript𝜈𝑗𝑡𝜓subscript𝜈𝑘superscript𝜓𝑥\displaystyle=\frac{\alpha}{\beta^{2}}(\lambda_{j}-\mu_{j})(\nu_{j}(t)-\nu_{k}% (t))\frac{\partial^{2}}{\partial x^{2}}\mathrm{Pr}\left(Y(t)+\nu_{j}(t)\psi+% \nu_{k}\psi^{\prime}\leq x\right)= divide start_ARG italic_α end_ARG start_ARG italic_β start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ( italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ( italic_ν start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_t ) - italic_ν start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t ) ) divide start_ARG ∂ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG ∂ italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG roman_Pr ( italic_Y ( italic_t ) + italic_ν start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_t ) italic_ψ + italic_ν start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_ψ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ≤ italic_x )
=αβ2(λjμj)(νj(t)νk(t))xfY(t)+νj(t)ψ+νkψ(x),absent𝛼superscript𝛽2subscript𝜆𝑗subscript𝜇𝑗subscript𝜈𝑗𝑡subscript𝜈𝑘𝑡𝑥subscript𝑓𝑌𝑡subscript𝜈𝑗𝑡𝜓subscript𝜈𝑘superscript𝜓𝑥\displaystyle=\frac{\alpha}{\beta^{2}}(\lambda_{j}-\mu_{j})(\nu_{j}(t)-\nu_{k}% (t))\frac{\partial}{\partial x}f_{Y(t)+\nu_{j}(t)\psi+\nu_{k}\psi^{\prime}}(x),= divide start_ARG italic_α end_ARG start_ARG italic_β start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ( italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ( italic_ν start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_t ) - italic_ν start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t ) ) divide start_ARG ∂ end_ARG start_ARG ∂ italic_x end_ARG italic_f start_POSTSUBSCRIPT italic_Y ( italic_t ) + italic_ν start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_t ) italic_ψ + italic_ν start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_ψ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_x ) ,

where ψ,ψi.i.d.Gamma(1,β)superscriptsimilar-toi.i.d.𝜓superscript𝜓𝐺𝑎𝑚𝑚𝑎1𝛽\psi,\psi^{\prime}\stackrel{{\scriptstyle\text{i.i.d.}}}{{\sim}}Gamma(1,\beta)italic_ψ , italic_ψ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_RELOP SUPERSCRIPTOP start_ARG ∼ end_ARG start_ARG i.i.d. end_ARG end_RELOP italic_G italic_a italic_m italic_m italic_a ( 1 , italic_β ) are i.i.d exponential r.v’s which are also independent of Y(t)𝑌𝑡Y(t)italic_Y ( italic_t ), and f(x)𝑓𝑥f(x)italic_f ( italic_x ) is the probability density function (PDF). Now λj>μjsubscript𝜆𝑗subscript𝜇𝑗\lambda_{j}>\mu_{j}italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT > italic_μ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT by assumption, and for any t[0,1]𝑡01t\in[0,1]italic_t ∈ [ 0 , 1 ] it holds that νj(t)νk(t)subscript𝜈𝑗𝑡subscript𝜈𝑘𝑡\nu_{j}(t)\geq\nu_{k}(t)italic_ν start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_t ) ≥ italic_ν start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t ) as well. Thus for each t[0,1]𝑡01t\in[0,1]italic_t ∈ [ 0 , 1 ] the left-hand side tFY(t)(x)𝑡subscript𝐹𝑌𝑡𝑥\frac{\partial}{\partial t}F_{Y(t)}(x)divide start_ARG ∂ end_ARG start_ARG ∂ italic_t end_ARG italic_F start_POSTSUBSCRIPT italic_Y ( italic_t ) end_POSTSUBSCRIPT ( italic_x ) has the same sign as xfY(t)+νj(t)ψ+νkψ(x)𝑥subscript𝑓𝑌𝑡subscript𝜈𝑗𝑡𝜓subscript𝜈𝑘superscript𝜓𝑥\frac{\partial}{\partial x}f_{Y(t)+\nu_{j}(t)\psi+\nu_{k}\psi^{\prime}}(x)divide start_ARG ∂ end_ARG start_ARG ∂ italic_x end_ARG italic_f start_POSTSUBSCRIPT italic_Y ( italic_t ) + italic_ν start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_t ) italic_ψ + italic_ν start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_ψ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_x ), the derivative of the density function.

It is known that the distribution of any linear combination of Gamma random variables is unimodal (see [10, Thm. 4]), so by the definition of x¯uppersubscript¯𝑥upper\overline{x}_{\mathrm{upper}}over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT roman_upper end_POSTSUBSCRIPT and x¯lowersubscript¯𝑥lower\overline{x}_{\mathrm{lower}}over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT roman_lower end_POSTSUBSCRIPT, the density function is increasing on (0,x¯lower)0subscript¯𝑥lower(0,\overline{x}_{\mathrm{lower}})( 0 , over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT roman_lower end_POSTSUBSCRIPT ) and decreasing on (x¯upper,)subscript¯𝑥upper(\overline{x}_{\mathrm{upper}},\infty)( over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT roman_upper end_POSTSUBSCRIPT , ∞ ). Therefore, for any x𝑥xitalic_x in either of these regions, FY(t)(x)subscript𝐹𝑌𝑡𝑥F_{Y(t)}(x)italic_F start_POSTSUBSCRIPT italic_Y ( italic_t ) end_POSTSUBSCRIPT ( italic_x ) is monotonic with respect to t[0,1]𝑡01t\in[0,1]italic_t ∈ [ 0 , 1 ]. Since Y(0)=Q𝝁𝑌0subscript𝑄𝝁Y(0)=Q_{\boldsymbol{\mu}}italic_Y ( 0 ) = italic_Q start_POSTSUBSCRIPT bold_italic_μ end_POSTSUBSCRIPT and Y(1)=Q𝝀𝑌1subscript𝑄𝝀Y(1)=Q_{\boldsymbol{\lambda}}italic_Y ( 1 ) = italic_Q start_POSTSUBSCRIPT bold_italic_λ end_POSTSUBSCRIPT, the desired inequalities follow. ∎

A.2 Absolute error majorization theorem

Proof of Theorem 4.

Lemma 2 implies that we can without loss of generality assume that 𝝀𝝀\boldsymbol{\lambda}bold_italic_λ and 𝝁𝝁\boldsymbol{\mu}bold_italic_μ differ in two coordinates, satisfying λj2μj2=μk2λk2>0superscriptsubscript𝜆𝑗2superscriptsubscript𝜇𝑗2superscriptsubscript𝜇𝑘2superscriptsubscript𝜆𝑘20\lambda_{j}^{2}-\mu_{j}^{2}=\mu_{k}^{2}-\lambda_{k}^{2}>0italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - italic_μ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - italic_λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT > 0. For t[0,1]𝑡01t\in[0,1]italic_t ∈ [ 0 , 1 ], define

νi(t):=sgn(λi+μi)tλi2+(1t)μi2,i{j,k},νi(t):=λi,ij,k,Y(t):=i=1nνi(t)(Xiα/β).\displaystyle\begin{split}\nu_{i}(t)&:=\mathrm{sgn}(\lambda_{i}+\mu_{i})\sqrt{% t\lambda_{i}^{2}+(1-t)\mu_{i}^{2}},\quad i\in\{j,k\},\\ \nu_{i}(t)&:=\lambda_{i},\quad i\neq j,\,k,\\ Y(t)&:=\sum_{i=1}^{n}\nu_{i}(t)(X_{i}-\alpha/\beta).\end{split}start_ROW start_CELL italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) end_CELL start_CELL := roman_sgn ( italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) square-root start_ARG italic_t italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( 1 - italic_t ) italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG , italic_i ∈ { italic_j , italic_k } , end_CELL end_ROW start_ROW start_CELL italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) end_CELL start_CELL := italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_i ≠ italic_j , italic_k , end_CELL end_ROW start_ROW start_CELL italic_Y ( italic_t ) end_CELL start_CELL := ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_α / italic_β ) . end_CELL end_ROW

Note that 𝔼[Y(t)]=0𝔼delimited-[]𝑌𝑡0\mathbb{E}[Y(t)]=0blackboard_E [ italic_Y ( italic_t ) ] = 0 by design. As for the term sgn(λi+μi)sgnsubscript𝜆𝑖subscript𝜇𝑖\mathrm{sgn}(\lambda_{i}+\mu_{i})roman_sgn ( italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ), Lemma 2 implies that we need only consider cases where λisubscript𝜆𝑖\lambda_{i}italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and μisubscript𝜇𝑖\mu_{i}italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are both nonpositive or both nonnegative. This slightly awkward term therefore ensures that Y(0)=Q𝝁𝑌0subscript𝑄𝝁Y(0)=Q_{\boldsymbol{\mu}}italic_Y ( 0 ) = italic_Q start_POSTSUBSCRIPT bold_italic_μ end_POSTSUBSCRIPT and Y(1)=Q𝝀𝑌1subscript𝑄𝝀Y(1)=Q_{\boldsymbol{\lambda}}italic_Y ( 1 ) = italic_Q start_POSTSUBSCRIPT bold_italic_λ end_POSTSUBSCRIPT. We also note that for i{j,k}𝑖𝑗𝑘i\in\{j,k\}italic_i ∈ { italic_j , italic_k } and νi(t)0subscript𝜈𝑖𝑡0\nu_{i}(t)\neq 0italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) ≠ 0, we have

ddtνi(t)=λi2μi22νi(t).𝑑𝑑𝑡subscript𝜈𝑖𝑡superscriptsubscript𝜆𝑖2superscriptsubscript𝜇𝑖22subscript𝜈𝑖𝑡\frac{d}{dt}\nu_{i}(t)=\frac{\lambda_{i}^{2}-\mu_{i}^{2}}{2\nu_{i}(t)}.divide start_ARG italic_d end_ARG start_ARG italic_d italic_t end_ARG italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) = divide start_ARG italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) end_ARG . (13)

Our goal is again to show that for certain fixed values of x𝑥xitalic_x, the CDF FY(t)subscript𝐹𝑌𝑡F_{Y(t)}italic_F start_POSTSUBSCRIPT italic_Y ( italic_t ) end_POSTSUBSCRIPT is monotonic for t[0,1]𝑡01t\in[0,1]italic_t ∈ [ 0 , 1 ].

Take the Laplace transform of FY(t)subscript𝐹𝑌𝑡F_{Y(t)}italic_F start_POSTSUBSCRIPT italic_Y ( italic_t ) end_POSTSUBSCRIPT as

J(t,q):=[FY(t)](z)=1z[Y(t)](z),assign𝐽𝑡𝑞delimited-[]subscript𝐹𝑌𝑡𝑧1𝑧delimited-[]𝑌𝑡𝑧J(t,q):=\mathcal{L}[F_{Y(t)}](z)=\frac{1}{z}\mathcal{L}[Y(t)](z),italic_J ( italic_t , italic_q ) := caligraphic_L [ italic_F start_POSTSUBSCRIPT italic_Y ( italic_t ) end_POSTSUBSCRIPT ] ( italic_z ) = divide start_ARG 1 end_ARG start_ARG italic_z end_ARG caligraphic_L [ italic_Y ( italic_t ) ] ( italic_z ) ,

where [Y(t)](z):=𝔼[ezY(t)]assigndelimited-[]𝑌𝑡𝑧𝔼delimited-[]superscript𝑒𝑧𝑌𝑡\mathcal{L}[Y(t)](z):=\mathbb{E}[e^{-zY(t)}]caligraphic_L [ italic_Y ( italic_t ) ] ( italic_z ) := blackboard_E [ italic_e start_POSTSUPERSCRIPT - italic_z italic_Y ( italic_t ) end_POSTSUPERSCRIPT ], the Laplace transform of Y(t)𝑌𝑡Y(t)italic_Y ( italic_t ), satisfies

[Y(t)](z)=i=1nezνi(t)α/β(1+νi(t)zβ)α,delimited-[]𝑌𝑡𝑧superscriptsubscriptproduct𝑖1𝑛superscript𝑒𝑧subscript𝜈𝑖𝑡𝛼𝛽superscript1subscript𝜈𝑖𝑡𝑧𝛽𝛼\mathcal{L}[Y(t)](z)=\prod_{i=1}^{n}e^{z\nu_{i}(t)\alpha/\beta}\left(1+\frac{% \nu_{i}(t)z}{\beta}\right)^{-\alpha},caligraphic_L [ italic_Y ( italic_t ) ] ( italic_z ) = ∏ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT italic_z italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) italic_α / italic_β end_POSTSUPERSCRIPT ( 1 + divide start_ARG italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) italic_z end_ARG start_ARG italic_β end_ARG ) start_POSTSUPERSCRIPT - italic_α end_POSTSUPERSCRIPT ,

defined over the strip

z,βmaxνi(t)>0νi(t)<(z)<βmaxνi(t)<0|νi(t)|.formulae-sequence𝑧𝛽subscriptsubscript𝜈𝑖𝑡0subscript𝜈𝑖𝑡𝑧𝛽subscriptsubscript𝜈𝑖𝑡0subscript𝜈𝑖𝑡z\in\mathbb{C},\quad\frac{-\beta}{\max_{\nu_{i}(t)>0}\nu_{i}(t)}<\Re(z)<\frac{% \beta}{\max_{\nu_{i}(t)<0}|\nu_{i}(t)|}.italic_z ∈ blackboard_C , divide start_ARG - italic_β end_ARG start_ARG roman_max start_POSTSUBSCRIPT italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) > 0 end_POSTSUBSCRIPT italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) end_ARG < roman_ℜ ( italic_z ) < divide start_ARG italic_β end_ARG start_ARG roman_max start_POSTSUBSCRIPT italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) < 0 end_POSTSUBSCRIPT | italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) | end_ARG . (14)

As compared with (11), the extra term ezνi(t)α/βsuperscript𝑒𝑧subscript𝜈𝑖𝑡𝛼𝛽e^{z\nu_{i}(t)\alpha/\beta}italic_e start_POSTSUPERSCRIPT italic_z italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) italic_α / italic_β end_POSTSUPERSCRIPT comes from our using the centered variables Xiα/βsubscript𝑋𝑖𝛼𝛽X_{i}-\alpha/\betaitalic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_α / italic_β. The region of convergence in (14) is a strip rather than a half-plane because Y(t)𝑌𝑡Y(t)italic_Y ( italic_t ) may include both positive and negative weights νi(t)subscript𝜈𝑖𝑡\nu_{i}(t)italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ).

Differentiating J(t,z)𝐽𝑡𝑧J(t,z)italic_J ( italic_t , italic_z ) with respect to t𝑡titalic_t and using (13) yields

tJ(t,z)𝑡𝐽𝑡𝑧\displaystyle\frac{\partial}{\partial t}J(t,z)divide start_ARG ∂ end_ARG start_ARG ∂ italic_t end_ARG italic_J ( italic_t , italic_z ) =J(t,z)tlnJ(t,z)absent𝐽𝑡𝑧𝑡𝐽𝑡𝑧\displaystyle=J(t,z)\frac{\partial}{\partial t}\ln J(t,z)= italic_J ( italic_t , italic_z ) divide start_ARG ∂ end_ARG start_ARG ∂ italic_t end_ARG roman_ln italic_J ( italic_t , italic_z )
=J(t,z)i{j,k}t(zνi(t)αβαln(1+νi(t)zβ))absent𝐽𝑡𝑧subscript𝑖𝑗𝑘𝑡𝑧subscript𝜈𝑖𝑡𝛼𝛽𝛼1subscript𝜈𝑖𝑡𝑧𝛽\displaystyle=J(t,z)\sum_{i\in\{j,k\}}\frac{\partial}{\partial t}\left(z\nu_{i% }(t)\frac{\alpha}{\beta}-\alpha\ln\left(1+\frac{\nu_{i}(t)z}{\beta}\right)\right)= italic_J ( italic_t , italic_z ) ∑ start_POSTSUBSCRIPT italic_i ∈ { italic_j , italic_k } end_POSTSUBSCRIPT divide start_ARG ∂ end_ARG start_ARG ∂ italic_t end_ARG ( italic_z italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) divide start_ARG italic_α end_ARG start_ARG italic_β end_ARG - italic_α roman_ln ( 1 + divide start_ARG italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) italic_z end_ARG start_ARG italic_β end_ARG ) )
=J(t,z)i{j,k}zαβ(111+νi(t)zβ)ddtνi(t)absent𝐽𝑡𝑧subscript𝑖𝑗𝑘𝑧𝛼𝛽111subscript𝜈𝑖𝑡𝑧𝛽𝑑𝑑𝑡subscript𝜈𝑖𝑡\displaystyle=J(t,z)\sum_{i\in\{j,k\}}z\frac{\alpha}{\beta}\left(1-\frac{1}{1+% \tfrac{\nu_{i}(t)z}{\beta}}\right)\frac{d}{dt}\nu_{i}(t)= italic_J ( italic_t , italic_z ) ∑ start_POSTSUBSCRIPT italic_i ∈ { italic_j , italic_k } end_POSTSUBSCRIPT italic_z divide start_ARG italic_α end_ARG start_ARG italic_β end_ARG ( 1 - divide start_ARG 1 end_ARG start_ARG 1 + divide start_ARG italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) italic_z end_ARG start_ARG italic_β end_ARG end_ARG ) divide start_ARG italic_d end_ARG start_ARG italic_d italic_t end_ARG italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t )
=J(t,z)z2α2β2i{j,k}λi2μi21+νi(t)z/β.absent𝐽𝑡𝑧superscript𝑧2𝛼2superscript𝛽2subscript𝑖𝑗𝑘superscriptsubscript𝜆𝑖2superscriptsubscript𝜇𝑖21subscript𝜈𝑖𝑡𝑧𝛽\displaystyle=J(t,z)z^{2}\frac{\alpha}{2\beta^{2}}\sum_{i\in\{j,k\}}\frac{% \lambda_{i}^{2}-\mu_{i}^{2}}{1+\nu_{i}(t)z/\beta}.= italic_J ( italic_t , italic_z ) italic_z start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT divide start_ARG italic_α end_ARG start_ARG 2 italic_β start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_i ∈ { italic_j , italic_k } end_POSTSUBSCRIPT divide start_ARG italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 1 + italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) italic_z / italic_β end_ARG .

Recall that λj2μj2=μk2λk2>0superscriptsubscript𝜆𝑗2superscriptsubscript𝜇𝑗2superscriptsubscript𝜇𝑘2superscriptsubscript𝜆𝑘20\lambda_{j}^{2}-\mu_{j}^{2}=\mu_{k}^{2}-\lambda_{k}^{2}>0italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - italic_μ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - italic_λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT > 0, and again use (12) to get

tJ(t,z)=J(t,z)z3α2β3(λj2μj2)(νk(t)νj(t))(1+νj(t)zβ)(1+νk(t)zβ).𝑡𝐽𝑡𝑧𝐽𝑡𝑧superscript𝑧3𝛼2superscript𝛽3superscriptsubscript𝜆𝑗2superscriptsubscript𝜇𝑗2subscript𝜈𝑘𝑡subscript𝜈𝑗𝑡1subscript𝜈𝑗𝑡𝑧𝛽1subscript𝜈𝑘𝑡𝑧𝛽\frac{\partial}{\partial t}J(t,z)=J(t,z)z^{3}\frac{\alpha}{2\beta^{3}}\frac{(% \lambda_{j}^{2}-\mu_{j}^{2})(\nu_{k}(t)-\nu_{j}(t))}{\left(1+\tfrac{\nu_{j}(t)% z}{\beta}\right)\left(1+\tfrac{\nu_{k}(t)z}{\beta}\right)}.divide start_ARG ∂ end_ARG start_ARG ∂ italic_t end_ARG italic_J ( italic_t , italic_z ) = italic_J ( italic_t , italic_z ) italic_z start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT divide start_ARG italic_α end_ARG start_ARG 2 italic_β start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG divide start_ARG ( italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - italic_μ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ( italic_ν start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t ) - italic_ν start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_t ) ) end_ARG start_ARG ( 1 + divide start_ARG italic_ν start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_t ) italic_z end_ARG start_ARG italic_β end_ARG ) ( 1 + divide start_ARG italic_ν start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t ) italic_z end_ARG start_ARG italic_β end_ARG ) end_ARG .

Taking the inverse transform then gives

tFY(t)(x)𝑡subscript𝐹𝑌𝑡𝑥\displaystyle\frac{\partial}{\partial t}F_{Y(t)}(x)divide start_ARG ∂ end_ARG start_ARG ∂ italic_t end_ARG italic_F start_POSTSUBSCRIPT italic_Y ( italic_t ) end_POSTSUBSCRIPT ( italic_x ) =α2β3(λj2μj2)(νk(t)νj(t))3x3Pr(Y(t)+νj(t)ψ+νk(t)ψx)absent𝛼2superscript𝛽3superscriptsubscript𝜆𝑗2superscriptsubscript𝜇𝑗2subscript𝜈𝑘𝑡subscript𝜈𝑗𝑡superscript3superscript𝑥3Pr𝑌𝑡subscript𝜈𝑗𝑡𝜓subscript𝜈𝑘𝑡superscript𝜓𝑥\displaystyle=\frac{\alpha}{2\beta^{3}}(\lambda_{j}^{2}-\mu_{j}^{2})(\nu_{k}(t% )-\nu_{j}(t))\frac{\partial^{3}}{\partial x^{3}}\mathrm{Pr}\left(Y(t)+\nu_{j}(% t)\psi+\nu_{k}(t)\psi^{\prime}\leq x\right)= divide start_ARG italic_α end_ARG start_ARG 2 italic_β start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG ( italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - italic_μ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ( italic_ν start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t ) - italic_ν start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_t ) ) divide start_ARG ∂ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG start_ARG ∂ italic_x start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG roman_Pr ( italic_Y ( italic_t ) + italic_ν start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_t ) italic_ψ + italic_ν start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t ) italic_ψ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ≤ italic_x )
=α2β3(λj2μj2)(νk(t)νj(t))2x2fY(t)+νj(t)ψ+νk(t)ψ(x),absent𝛼2superscript𝛽3superscriptsubscript𝜆𝑗2superscriptsubscript𝜇𝑗2subscript𝜈𝑘𝑡subscript𝜈𝑗𝑡superscript2superscript𝑥2subscript𝑓𝑌𝑡subscript𝜈𝑗𝑡𝜓subscript𝜈𝑘𝑡superscript𝜓𝑥\displaystyle=\frac{\alpha}{2\beta^{3}}(\lambda_{j}^{2}-\mu_{j}^{2})(\nu_{k}(t% )-\nu_{j}(t))\frac{\partial^{2}}{\partial x^{2}}f_{Y(t)+\nu_{j}(t)\psi+\nu_{k}% (t)\psi^{\prime}}(x),= divide start_ARG italic_α end_ARG start_ARG 2 italic_β start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG ( italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - italic_μ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ( italic_ν start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t ) - italic_ν start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_t ) ) divide start_ARG ∂ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG ∂ italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG italic_f start_POSTSUBSCRIPT italic_Y ( italic_t ) + italic_ν start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_t ) italic_ψ + italic_ν start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t ) italic_ψ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_x ) ,

where ψ,ψi.i.d.Gamma(1,β)superscriptsimilar-toi.i.d.𝜓superscript𝜓𝐺𝑎𝑚𝑚𝑎1𝛽\psi,\psi^{\prime}\stackrel{{\scriptstyle\text{i.i.d.}}}{{\sim}}Gamma(1,\beta)italic_ψ , italic_ψ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_RELOP SUPERSCRIPTOP start_ARG ∼ end_ARG start_ARG i.i.d. end_ARG end_RELOP italic_G italic_a italic_m italic_m italic_a ( 1 , italic_β ) are i.i.d exponential r.v’s which are also independent of Y(t)𝑌𝑡Y(t)italic_Y ( italic_t ), and f(x)𝑓𝑥f(x)italic_f ( italic_x ) is the probability density function (PDF).

By design, λj2μj2>0superscriptsubscript𝜆𝑗2superscriptsubscript𝜇𝑗20\lambda_{j}^{2}-\mu_{j}^{2}>0italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - italic_μ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT > 0. Checking the three cases considered in Lemma 2 shows also that νk(t)νj(t)0subscript𝜈𝑘𝑡subscript𝜈𝑗𝑡0\nu_{k}(t)-\nu_{j}(t)\leq 0italic_ν start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t ) - italic_ν start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_t ) ≤ 0 for t[0,1]𝑡01t\in[0,1]italic_t ∈ [ 0 , 1 ]. Thus the sign of tFY(t)(x)𝑡subscript𝐹𝑌𝑡𝑥\frac{\partial}{\partial t}F_{Y(t)}(x)divide start_ARG ∂ end_ARG start_ARG ∂ italic_t end_ARG italic_F start_POSTSUBSCRIPT italic_Y ( italic_t ) end_POSTSUBSCRIPT ( italic_x ) is always opposite that of 2x2fY(t)+νj(t)ψj+νk(t)ψk(x)superscript2superscript𝑥2subscript𝑓𝑌𝑡subscript𝜈𝑗𝑡subscript𝜓𝑗subscript𝜈𝑘𝑡subscript𝜓𝑘𝑥\frac{\partial^{2}}{\partial x^{2}}f_{Y(t)+\nu_{j}(t)\psi_{j}+\nu_{k}(t)\psi_{% k}}(x)divide start_ARG ∂ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG ∂ italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG italic_f start_POSTSUBSCRIPT italic_Y ( italic_t ) + italic_ν start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_t ) italic_ψ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT + italic_ν start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t ) italic_ψ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x ), which is the convexity of the density function. By the definition of x^uppersubscript^𝑥upper\hat{x}_{\mathrm{upper}}over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT roman_upper end_POSTSUBSCRIPT, the density function is convex on (x^upper,)subscript^𝑥upper(\hat{x}_{\mathrm{upper}},\infty)( over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT roman_upper end_POSTSUBSCRIPT , ∞ ), and so for any x𝑥xitalic_x in this region, FY(t)(x)subscript𝐹𝑌𝑡𝑥F_{Y(t)}(x)italic_F start_POSTSUBSCRIPT italic_Y ( italic_t ) end_POSTSUBSCRIPT ( italic_x ) decreases monotonically for t[0,1]𝑡01t\in[0,1]italic_t ∈ [ 0 , 1 ]. Since Y(0)=Q𝝁𝔼[Q𝝁]𝑌0subscript𝑄𝝁𝔼delimited-[]subscript𝑄𝝁Y(0)=Q_{\boldsymbol{\mu}}-\mathbb{E}[Q_{\boldsymbol{\mu}}]italic_Y ( 0 ) = italic_Q start_POSTSUBSCRIPT bold_italic_μ end_POSTSUBSCRIPT - blackboard_E [ italic_Q start_POSTSUBSCRIPT bold_italic_μ end_POSTSUBSCRIPT ] and Y(1)=Q𝝀𝔼[Q𝝀]𝑌1subscript𝑄𝝀𝔼delimited-[]subscript𝑄𝝀Y(1)=Q_{\boldsymbol{\lambda}}-\mathbb{E}[Q_{\boldsymbol{\lambda}}]italic_Y ( 1 ) = italic_Q start_POSTSUBSCRIPT bold_italic_λ end_POSTSUBSCRIPT - blackboard_E [ italic_Q start_POSTSUBSCRIPT bold_italic_λ end_POSTSUBSCRIPT ], the desired upper-tail inequality follows.

The lower-tail bound follows by symmetry. ∎

A.3 Absolute error tail results

Proof of Theorem 5.

Let Q𝝀=i=1rλ^Xisubscript𝑄𝝀superscriptsubscript𝑖1𝑟^𝜆subscript𝑋𝑖Q_{\boldsymbol{\lambda}}=\sum_{i=1}^{r}\hat{\lambda}X_{i}italic_Q start_POSTSUBSCRIPT bold_italic_λ end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT over^ start_ARG italic_λ end_ARG italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, where Xii.i.d.Gamma(α,β)superscriptsimilar-toi.i.d.subscript𝑋𝑖𝐺𝑎𝑚𝑚𝑎𝛼𝛽X_{i}\stackrel{{\scriptstyle\text{i.i.d.}}}{{\sim}}Gamma(\alpha,\beta)italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_RELOP SUPERSCRIPTOP start_ARG ∼ end_ARG start_ARG i.i.d. end_ARG end_RELOP italic_G italic_a italic_m italic_m italic_a ( italic_α , italic_β ), r=ϕ2/(λ2α)𝑟superscriptitalic-ϕ2superscript𝜆2𝛼r=\lceil\phi^{2}/(\lambda^{2}\alpha)\rceilitalic_r = ⌈ italic_ϕ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / ( italic_λ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_α ) ⌉ and λ^=βϕrα^𝜆𝛽italic-ϕ𝑟𝛼\hat{\lambda}=\frac{\beta\phi}{\sqrt{r\alpha}}over^ start_ARG italic_λ end_ARG = divide start_ARG italic_β italic_ϕ end_ARG start_ARG square-root start_ARG italic_r italic_α end_ARG end_ARG. This is a valid choice of distribution since

Var[Q𝝀]=rλ^2Var[X1]=r(β2ϕ2rα)αβ2=ϕ2Varsubscript𝑄𝝀𝑟superscript^𝜆2Varsubscript𝑋1𝑟superscript𝛽2superscriptitalic-ϕ2𝑟𝛼𝛼superscript𝛽2superscriptitalic-ϕ2\operatorname{Var}[Q_{\boldsymbol{\lambda}}]=r\hat{\lambda}^{2}\operatorname{% Var}[X_{1}]=r\left(\frac{\beta^{2}\phi^{2}}{r\alpha}\right)\frac{\alpha}{\beta% ^{2}}=\phi^{2}roman_Var [ italic_Q start_POSTSUBSCRIPT bold_italic_λ end_POSTSUBSCRIPT ] = italic_r over^ start_ARG italic_λ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_Var [ italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] = italic_r ( divide start_ARG italic_β start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_ϕ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_r italic_α end_ARG ) divide start_ARG italic_α end_ARG start_ARG italic_β start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG = italic_ϕ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT

and

scale(Q𝝀)=λ^β=ϕrαϕϕ2/λ2=λ.scalesubscript𝑄𝝀^𝜆𝛽italic-ϕ𝑟𝛼italic-ϕsuperscriptitalic-ϕ2superscript𝜆2𝜆\operatorname{scale}(Q_{\boldsymbol{\lambda}})=\frac{\hat{\lambda}}{\beta}=% \frac{\phi}{\sqrt{r\alpha}}\leq\frac{\phi}{\sqrt{\phi^{2}/\lambda^{2}}}=\lambda.roman_scale ( italic_Q start_POSTSUBSCRIPT bold_italic_λ end_POSTSUBSCRIPT ) = divide start_ARG over^ start_ARG italic_λ end_ARG end_ARG start_ARG italic_β end_ARG = divide start_ARG italic_ϕ end_ARG start_ARG square-root start_ARG italic_r italic_α end_ARG end_ARG ≤ divide start_ARG italic_ϕ end_ARG start_ARG square-root start_ARG italic_ϕ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / italic_λ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_ARG = italic_λ .

Then 𝔼[Q𝝀]=rλ^αβ=ϕrα𝔼delimited-[]subscript𝑄𝝀𝑟^𝜆𝛼𝛽italic-ϕ𝑟𝛼\mathbb{E}[Q_{\boldsymbol{\lambda}}]=r\hat{\lambda}\frac{\alpha}{\beta}=\phi% \sqrt{r\alpha}blackboard_E [ italic_Q start_POSTSUBSCRIPT bold_italic_λ end_POSTSUBSCRIPT ] = italic_r over^ start_ARG italic_λ end_ARG divide start_ARG italic_α end_ARG start_ARG italic_β end_ARG = italic_ϕ square-root start_ARG italic_r italic_α end_ARG, and the r.v Q𝝀+λ1ψ+λ2ψsubscript𝑄𝝀subscript𝜆1𝜓subscript𝜆2superscript𝜓Q_{\boldsymbol{\lambda}}+\lambda_{1}\psi+\lambda_{2}\psi^{\prime}italic_Q start_POSTSUBSCRIPT bold_italic_λ end_POSTSUBSCRIPT + italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_ψ + italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_ψ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT has distribution Gamma(rα+2,βλ^1)𝐺𝑎𝑚𝑚𝑎𝑟𝛼2𝛽superscript^𝜆1Gamma(r\alpha+2,\beta\hat{\lambda}^{-1})italic_G italic_a italic_m italic_m italic_a ( italic_r italic_α + 2 , italic_β over^ start_ARG italic_λ end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ), or equivalently, Gamma(rα+2,rα/ϕ)𝐺𝑎𝑚𝑚𝑎𝑟𝛼2𝑟𝛼italic-ϕGamma(r\alpha+2,\sqrt{r\alpha}/\phi)italic_G italic_a italic_m italic_m italic_a ( italic_r italic_α + 2 , square-root start_ARG italic_r italic_α end_ARG / italic_ϕ ).

Now in general, a Gamma(α,β)𝐺𝑎𝑚𝑚𝑎𝛼𝛽Gamma(\alpha,\beta)italic_G italic_a italic_m italic_m italic_a ( italic_α , italic_β ) random variable has density proportional to xα1eβxsuperscript𝑥𝛼1superscript𝑒𝛽𝑥x^{\alpha-1}e^{-\beta x}italic_x start_POSTSUPERSCRIPT italic_α - 1 end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT - italic_β italic_x end_POSTSUPERSCRIPT for x0𝑥0x\geq 0italic_x ≥ 0 (see (9)). The inflection points are where the second derivative is equal to zero. Since

d2dx2xα1eβxsuperscriptd2dsuperscript𝑥2superscript𝑥𝛼1superscript𝑒𝛽𝑥\displaystyle\frac{\mathrm{d}^{2}}{\mathrm{d}x^{2}}\,x^{\alpha-1}e^{-\beta x}divide start_ARG roman_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG roman_d italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG italic_x start_POSTSUPERSCRIPT italic_α - 1 end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT - italic_β italic_x end_POSTSUPERSCRIPT =ddx(βxα1+(α1)xα2)eβxabsentdd𝑥𝛽superscript𝑥𝛼1𝛼1superscript𝑥𝛼2superscript𝑒𝛽𝑥\displaystyle=\frac{\mathrm{d}}{\mathrm{d}x}\,(-\beta x^{\alpha-1}+(\alpha-1)x% ^{\alpha-2})e^{-\beta x}= divide start_ARG roman_d end_ARG start_ARG roman_d italic_x end_ARG ( - italic_β italic_x start_POSTSUPERSCRIPT italic_α - 1 end_POSTSUPERSCRIPT + ( italic_α - 1 ) italic_x start_POSTSUPERSCRIPT italic_α - 2 end_POSTSUPERSCRIPT ) italic_e start_POSTSUPERSCRIPT - italic_β italic_x end_POSTSUPERSCRIPT
=(β2xα12β(α1)xα2+(α1)(α2)xα3)eβxabsentsuperscript𝛽2superscript𝑥𝛼12𝛽𝛼1superscript𝑥𝛼2𝛼1𝛼2superscript𝑥𝛼3superscript𝑒𝛽𝑥\displaystyle=(\beta^{2}x^{\alpha-1}-2\beta(\alpha-1)x^{\alpha-2}+(\alpha-1)(% \alpha-2)x^{\alpha-3})e^{-\beta x}= ( italic_β start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_x start_POSTSUPERSCRIPT italic_α - 1 end_POSTSUPERSCRIPT - 2 italic_β ( italic_α - 1 ) italic_x start_POSTSUPERSCRIPT italic_α - 2 end_POSTSUPERSCRIPT + ( italic_α - 1 ) ( italic_α - 2 ) italic_x start_POSTSUPERSCRIPT italic_α - 3 end_POSTSUPERSCRIPT ) italic_e start_POSTSUPERSCRIPT - italic_β italic_x end_POSTSUPERSCRIPT
=((βx)22(α1)(βx)+(α1)(α2))xα3eβx,absentsuperscript𝛽𝑥22𝛼1𝛽𝑥𝛼1𝛼2superscript𝑥𝛼3superscript𝑒𝛽𝑥\displaystyle=((\beta x)^{2}-2(\alpha-1)(\beta x)+(\alpha-1)(\alpha-2))x^{% \alpha-3}e^{-\beta x},= ( ( italic_β italic_x ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 2 ( italic_α - 1 ) ( italic_β italic_x ) + ( italic_α - 1 ) ( italic_α - 2 ) ) italic_x start_POSTSUPERSCRIPT italic_α - 3 end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT - italic_β italic_x end_POSTSUPERSCRIPT ,

the nontrivial inflection points are the two zeros of the quadratic term. The larger of the two zeros is given by

x^+=β12(α1)+4(α1)24(α1)(α2)2=α1+α1β.subscript^𝑥superscript𝛽12𝛼14superscript𝛼124𝛼1𝛼22𝛼1𝛼1𝛽\hat{x}_{+}=\beta^{-1}\frac{2(\alpha-1)+\sqrt{4(\alpha-1)^{2}-4(\alpha-1)(% \alpha-2)}}{2}=\frac{\alpha-1+\sqrt{\alpha-1}}{\beta}.over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT + end_POSTSUBSCRIPT = italic_β start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT divide start_ARG 2 ( italic_α - 1 ) + square-root start_ARG 4 ( italic_α - 1 ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 4 ( italic_α - 1 ) ( italic_α - 2 ) end_ARG end_ARG start_ARG 2 end_ARG = divide start_ARG italic_α - 1 + square-root start_ARG italic_α - 1 end_ARG end_ARG start_ARG italic_β end_ARG .

Substitute for α𝛼\alphaitalic_α and β𝛽\betaitalic_β the values rα+2𝑟𝛼2r\alpha+2italic_r italic_α + 2 and rα/ϕ𝑟𝛼italic-ϕ\sqrt{r\alpha}/\phisquare-root start_ARG italic_r italic_α end_ARG / italic_ϕ, and subtract 𝔼[Q𝝀]𝔼delimited-[]subscript𝑄𝝀\mathbb{E}[Q_{\boldsymbol{\lambda}}]blackboard_E [ italic_Q start_POSTSUBSCRIPT bold_italic_λ end_POSTSUBSCRIPT ] to get

x^upperrα+1+rα+1rα/ϕϕrα=ϕ1+rα+1rα.subscript^𝑥upper𝑟𝛼1𝑟𝛼1𝑟𝛼italic-ϕitalic-ϕ𝑟𝛼italic-ϕ1𝑟𝛼1𝑟𝛼\hat{x}_{\mathrm{upper}}\geq\frac{r\alpha+1+\sqrt{r\alpha+1}}{\sqrt{r\alpha}/% \phi}-\phi\sqrt{r\alpha}=\phi\frac{1+\sqrt{r\alpha+1}}{\sqrt{r\alpha}}.over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT roman_upper end_POSTSUBSCRIPT ≥ divide start_ARG italic_r italic_α + 1 + square-root start_ARG italic_r italic_α + 1 end_ARG end_ARG start_ARG square-root start_ARG italic_r italic_α end_ARG / italic_ϕ end_ARG - italic_ϕ square-root start_ARG italic_r italic_α end_ARG = italic_ϕ divide start_ARG 1 + square-root start_ARG italic_r italic_α + 1 end_ARG end_ARG start_ARG square-root start_ARG italic_r italic_α end_ARG end_ARG . (15)

Conjecture 2 is motivated by the fact that the right-hand-side of (15) is bounded above as

ϕ1+rα+1rαϕ1+ϕ2/λ2+1ϕ2/λ2=λ+ϕ2+λ2.italic-ϕ1𝑟𝛼1𝑟𝛼italic-ϕ1superscriptitalic-ϕ2superscript𝜆21superscriptitalic-ϕ2superscript𝜆2𝜆superscriptitalic-ϕ2superscript𝜆2\phi\frac{1+\sqrt{r\alpha+1}}{\sqrt{r\alpha}}\leq\phi\frac{1+\sqrt{\phi^{2}/% \lambda^{2}+1}}{\sqrt{\phi^{2}/\lambda^{2}}}=\lambda+\sqrt{\phi^{2}+\lambda^{2% }}.italic_ϕ divide start_ARG 1 + square-root start_ARG italic_r italic_α + 1 end_ARG end_ARG start_ARG square-root start_ARG italic_r italic_α end_ARG end_ARG ≤ italic_ϕ divide start_ARG 1 + square-root start_ARG italic_ϕ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / italic_λ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 1 end_ARG end_ARG start_ARG square-root start_ARG italic_ϕ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / italic_λ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_ARG = italic_λ + square-root start_ARG italic_ϕ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_λ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG .

References

  • [1] Alice Cortinovis and Daniel Kressner. On randomized trace estimates for indefinite matrices with an application to determinants. Foundations of Computational Mathematics, 22(3):875–903, 2022.
  • [2] Ethan N Epperly. Don’t use gaussians in stochastic trace estimation. https://www.ethanepperly.com/index.php/2024/01/28/, Jan 2024. Accessed: 2024-11-01.
  • [3] Ethan N. Epperly, Joel A. Tropp, and Robert J. Webber. Xtrace: Making the most of every sample in stochastic trace estimation. SIAM Journal on Matrix Analysis and Applications, 45(1):1–23, 2024.
  • [4] Raphael A Meyer, Cameron Musco, Christopher Musco, and David P Woodruff. Hutch++: Optimal stochastic trace estimation. In Symposium on Simplicity in Algorithms (SOSA), pages 142–155. SIAM, 2021.
  • [5] David Persson, Alice Cortinovis, and Daniel Kressner. Improved variants of the hutch++ algorithm for trace estimation. SIAM Journal on Matrix Analysis and Applications, 43(3):1162–1185, 2022.
  • [6] David Persson, Alice Cortinovis, and Daniel Kressner. Improved variants of the hutch++ algorithm for trace estimation (preprint), 2022.
  • [7] Josip E Pečarić and Yung Liang Tong. Convex functions, partial orderings, and statistical applications. Academic Press, 1992.
  • [8] Farbod Roosta-Khorasani and Gábor J. Székely. Schur properties of convolutions of gamma random variables. Metrika, 78(8):997–1014, 2015.
  • [9] Farbod Roosta-Khorasani, Gábor J. Székely, and Uri M. Ascher. Assessing stochastic algorithms for large scale nonlinear least squares problems using extremal probabilities of linear combinations of gamma random variables. SIAM/ASA Journal on Uncertainty Quantification, 3(1):61–90, 2015.
  • [10] Gábor J. Székely and Nail K. Bakirov. Extremal probabilities for gaussian quadratic forms. Probability Theory and Related Fields, 126(2):184–202, 2003.