Fast, robust approximate message passing

Misha Ivkov Stanford University. [email protected]. Supported by NSF Graduate Research Fellowship.    Tselil Schramm Stanford University. [email protected]. Supported by NSF CAREER award # 2143246.
(November 5, 2024)
Abstract

We give a fast, spectral procedure for implementing approximate-message passing (AMP) algorithms robustly. For any quadratic optimization problem over symmetric matrices X𝑋Xitalic_X with independent subgaussian entries, and any separable AMP algorithm 𝒜𝒜\mathcal{A}caligraphic_A, our algorithm performs a spectral pre-processing step and then mildly modifies the iterates of 𝒜𝒜\mathcal{A}caligraphic_A. If given the perturbed input X+En×n𝑋𝐸superscript𝑛𝑛X+E\in\mathbb{R}^{n\times n}italic_X + italic_E ∈ blackboard_R start_POSTSUPERSCRIPT italic_n × italic_n end_POSTSUPERSCRIPT for any E𝐸Eitalic_E supported on a εn×εn𝜀𝑛𝜀𝑛\varepsilon n\times\varepsilon nitalic_ε italic_n × italic_ε italic_n principal minor, our algorithm outputs a solution v^^𝑣\hat{v}over^ start_ARG italic_v end_ARG which is guaranteed to be close to the output of 𝒜𝒜\mathcal{A}caligraphic_A on the uncorrupted X𝑋Xitalic_X, with 𝒜(X)v^2f(ε)𝒜(X)2subscriptnorm𝒜𝑋^𝑣2𝑓𝜀subscriptnorm𝒜𝑋2\|\mathcal{A}(X)-\hat{v}\|_{2}\leqslant f(\varepsilon)\|\mathcal{A}(X)\|_{2}∥ caligraphic_A ( italic_X ) - over^ start_ARG italic_v end_ARG ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⩽ italic_f ( italic_ε ) ∥ caligraphic_A ( italic_X ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT where f(ε)0𝑓𝜀0f(\varepsilon)\to 0italic_f ( italic_ε ) → 0 as ε0𝜀0\varepsilon\to 0italic_ε → 0 depending only on ε𝜀\varepsilonitalic_ε.

1 Introduction

Approximate Message Passing (AMP) is a family of algorithmic methods which generalize matrix power iteration. Suppose we are given a symmetric matrix Xn×n𝑋superscript𝑛𝑛X\in\mathbb{R}^{n\times n}italic_X ∈ blackboard_R start_POSTSUPERSCRIPT italic_n × italic_n end_POSTSUPERSCRIPT, and our goal is to maximize the quadratic form vXvsuperscript𝑣top𝑋𝑣v^{\top}Xvitalic_v start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_X italic_v over vectors v𝑣vitalic_v in some constraint set K𝐾Kitalic_K. The basic AMP algorithm starts from some initialization x(0)nsuperscript𝑥0superscript𝑛x^{(0)}\in\mathbb{R}^{n}italic_x start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT and computes iterates x(1),x(2),superscript𝑥1superscript𝑥2x^{(1)},x^{(2)},\ldotsitalic_x start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT , italic_x start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT , … by setting x(t+1)Xf(x(t))superscript𝑥𝑡1𝑋𝑓superscript𝑥𝑡x^{(t+1)}\approx Xf(x^{(t)})italic_x start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ≈ italic_X italic_f ( italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ),111The \approx relation hides a lower-order additive term, the “Onsager correction,” which depends on x(t)superscript𝑥𝑡x^{(t)}italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT. For the sake of simplicity we ignore this in the present discussion. where the “denoiser” f𝑓fitalic_f is a function (of the algorithm designers’ choosing) from \mathbb{R}\to\mathbb{R}blackboard_R → blackboard_R applied coordinate-wise. The goal of the “powering” action, Xx(t)𝑋superscript𝑥𝑡Xx^{(t)}italic_X italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT, is to increase the quadratic form, while the denoiser f𝑓fitalic_f is chosen to bring f(x(t))𝑓superscript𝑥𝑡f(x^{(t)})italic_f ( italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) close to the constraint set K𝐾Kitalic_K.

AMP algorithms are extremely popular in high-dimensional statistics. In this context, given a prior distribution over the matrix X𝑋Xitalic_X, it is often possible to optimize the design of the denoisers f𝑓fitalic_f in such a way that AMP gives an FPTAS, in that x(t)superscript𝑥𝑡x^{(t)}italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT obtains an (1ε)1𝜀(1-\varepsilon)( 1 - italic_ε )-optimal solution for t𝑡titalic_t large enough as a function of ε𝜀\varepsilonitalic_ε. Introduced initially as a generalization of Belief Propagation methods from statistical physics [Bol14, DMM09, BM11], AMP algorithms are now state-of-the-art for a variety of average-case optimization problems, including compressed sensing [DMM09], sparse Principal Components Analysis (PCA) [DM14], linear regression [DMM09, BM11, KMS+12], non-negative PCA [MR15], and more (many additional examples may be found in the surveys [Mon12, FVRS22]). One especially notable recent application is the breakthrough work of Montanari for optimizing the Sherrington-Kirkpatrick Hamiltonian, an average-case version of max-cut [Mon21].

One major drawback of AMP algorithms is that they are not robust. The NP-hardness of quadratic optimization means that, obviously, one cannot hope for the optimality of AMP on average-case inputs to generalize to arbitrary inputs X𝑋Xitalic_X. But even structured perturbations can throw AMP off [CZK14, RSFS19]; for example, an additive perturbation to X𝑋Xitalic_X by a rank-1111 matrix of large norm, or planting a principal minor of uniform sign (as described in [IS24]).

Our prior work addressing this issue [IS24] shows that for a certain class of adversarial corruptions, AMP can be simulated robustly by polynomial-sized semidefinite programming relaxations in the “local statistics hierarchy.” While this result is a proof of concept that a robust version of AMP is possible, it is perhaps more interesting from a complexity-theoretic perspective than an algorithmic one: the semidefinite programs are of size nexp(t)superscript𝑛𝑡n^{\exp(t)}italic_n start_POSTSUPERSCRIPT roman_exp ( italic_t ) end_POSTSUPERSCRIPT, where t𝑡titalic_t is the number of AMP iterations. When AMP is an FPTAS, the algorithm of [IS24] gives a robust PTAS, but the running time is too slow to feasibly implement on any computer.

In the present work, we obtain simple and fast spectral algorithms which run in time O(n3)𝑂superscript𝑛3O(n^{3})italic_O ( italic_n start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ), while not just matching but even improving on the robustness guarantees of [IS24]. In the “spectral algorithms from sum-of-squares analyses” line of work (initiated in [HSSS16]), our result stands out as giving a particularly dramatic reduction in running time, as well as in yielding a significantly simpler analysis.

1.1 Setup and definitions

We give some necessary definitions of AMP and the noise model that we consider.

Definition 1.1 (AMP algorithm).

An Approximate Message Passing algorithm is specified by a sequence of denoiser functions =f0,f1,f2,subscript𝑓0subscript𝑓1subscript𝑓2\mathcal{F}=f_{0},f_{1},f_{2},\ldotscaligraphic_F = italic_f start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , …, with ft:t+1:subscript𝑓𝑡superscript𝑡1f_{t}:\mathbb{R}^{t+1}\to\mathbb{R}italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT : blackboard_R start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT → blackboard_R for each t𝑡t\in\mathbb{N}italic_t ∈ blackboard_N. It takes as input a symmetric n×n𝑛𝑛n\times nitalic_n × italic_n matrix X𝑋Xitalic_X, a number of iterations T𝑇T\in\mathbb{N}italic_T ∈ blackboard_N, and produces a sequence of iterates x(0),x(1),,x(T)superscript𝑥0superscript𝑥1superscript𝑥𝑇x^{(0)},x^{(1)},\ldots,x^{(T)}italic_x start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT , italic_x start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT , … , italic_x start_POSTSUPERSCRIPT ( italic_T ) end_POSTSUPERSCRIPT, with x(0)=1superscript𝑥01x^{(0)}=\vec{1}italic_x start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT = over→ start_ARG 1 end_ARG and

x(t+1)=Xft(x(t),x(t1),,x(0))Δt(x(t),x(t1),,x(0)),superscript𝑥𝑡1𝑋subscript𝑓𝑡superscript𝑥𝑡superscript𝑥𝑡1superscript𝑥0subscriptΔ𝑡superscript𝑥𝑡superscript𝑥𝑡1superscript𝑥0x^{(t+1)}=Xf_{t}(x^{(t)},x^{(t-1)},\ldots,x^{(0)})-\Delta_{t}(x^{(t)},x^{(t-1)% },\ldots,x^{(0)}),italic_x start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT = italic_X italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT , italic_x start_POSTSUPERSCRIPT ( italic_t - 1 ) end_POSTSUPERSCRIPT , … , italic_x start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT ) - roman_Δ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT , italic_x start_POSTSUPERSCRIPT ( italic_t - 1 ) end_POSTSUPERSCRIPT , … , italic_x start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT ) ,

where ftsubscript𝑓𝑡f_{t}italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is applied coordinate-wise, and ΔtsubscriptΔ𝑡\Delta_{t}roman_Δ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is the Onsager correction term for decreasing correlations between iterates and is fully determined by \mathcal{F}caligraphic_F (see Definition 2.1). AMP algorithms often also come with a rounding procedure which is applied to the final iterate, in order to ensure it satisfies the optimization constraints.

We note that we are considering separable AMP algorithms (where the denoisers are applied coordinate-wise) with fixed starting point x(0)=1superscript𝑥01x^{(0)}=\vec{1}italic_x start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT = over→ start_ARG 1 end_ARG. In full generality AMP may relax both of these criteria, but the majority of AMP analyses are compatible with these assumptions.

Example 1.2 (non-negative PCA).

In the non-negative principal components analysis (PCA) problem, one is given a matrix Xn×n𝑋superscript𝑛𝑛X\in\mathbb{R}^{n\times n}italic_X ∈ blackboard_R start_POSTSUPERSCRIPT italic_n × italic_n end_POSTSUPERSCRIPT and asked to maximize vXvsuperscript𝑣top𝑋𝑣v^{\top}Xvitalic_v start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_X italic_v over non-negative unit vectors v0𝑣0v\geqslant 0italic_v ⩾ 0. The AMP algorithm which starts from x(0)=1superscript𝑥01x^{(0)}=\vec{1}italic_x start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT = over→ start_ARG 1 end_ARG and uniformly chooses the separable denoiser fs(x(s),,x(0))=f(x(s))subscript𝑓𝑠superscript𝑥𝑠superscript𝑥0𝑓superscript𝑥𝑠f_{s}(x^{(s)},\ldots,x^{(0)})=f(x^{(s)})italic_f start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT , … , italic_x start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT ) = italic_f ( italic_x start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT ), with f(x)=max(x,0)𝑓𝑥𝑥0f(x)=\max(x,0)italic_f ( italic_x ) = roman_max ( italic_x , 0 ), is an FPTAS for non-negative PCA on X𝑋Xitalic_X with i.i.d. subgaussian entries [MR15].222Technically x(t)superscript𝑥𝑡x^{(t)}italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT may not be a unit vector nor non-negative, but AMP algorithms such as this one usually include a final “rounding” step—in this case, the rounding is just applying f(x)=max(x,0)𝑓𝑥𝑥0f(x)=\max(x,0)italic_f ( italic_x ) = roman_max ( italic_x , 0 ) followed by projection to the unit ball. In this case, up to the Onsager correction, AMP coincides with projected gradient ascent with “infinite” step size.

We will allow adversarially-chosen perturbations in the following model.

Definition 1.3 (ε𝜀\varepsilonitalic_ε-principal minor corruption).

Given matrices X,Yn×n𝑋𝑌superscript𝑛𝑛X,Y\in\mathbb{R}^{n\times n}italic_X , italic_Y ∈ blackboard_R start_POSTSUPERSCRIPT italic_n × italic_n end_POSTSUPERSCRIPT, we say Y𝑌Yitalic_Y is an ε𝜀\varepsilonitalic_ε-principal minor corruption of X𝑋Xitalic_X if YX𝑌𝑋Y-Xitalic_Y - italic_X is supported on an εn×εn𝜀𝑛𝜀𝑛\varepsilon n\times\varepsilon nitalic_ε italic_n × italic_ε italic_n-principal minor.

A mean-00 random variable 𝑿𝑿\bm{X}bold_italic_X is said to be σ𝜎\sigmaitalic_σ-subgaussian if for each integer k𝑘k\in\mathbb{N}italic_k ∈ blackboard_N, 𝐄[|𝑿|k]σkkk/2𝐄superscript𝑿𝑘superscript𝜎𝑘superscript𝑘𝑘2\operatorname*{\mathbf{E}}[|\bm{X}|^{k}]\leqslant\sigma^{k}k^{k/2}bold_E [ | bold_italic_X | start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ] ⩽ italic_σ start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_k start_POSTSUPERSCRIPT italic_k / 2 end_POSTSUPERSCRIPT. For example, a mean-00 Gaussian with variance σ2superscript𝜎2\sigma^{2}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT is σ𝜎\sigmaitalic_σ-subgaussian, and a uniformly random sign {±1}absentplus-or-minus1\in\{\pm 1\}∈ { ± 1 } is 1111-subgaussian. Note that rescaling a σ𝜎\sigmaitalic_σ-subgaussian variable 𝑿𝑿\bm{X}bold_italic_X to C𝑿𝐶𝑿C\bm{X}italic_C bold_italic_X for constant C𝐶Citalic_C rescales the subgaussian parameter to Cσ𝐶𝜎C\sigmaitalic_C italic_σ.

1.2 Results

Our main theorem is the following.

Theorem 1.4 (Informal version of Theorem 3.1).

Suppose 𝒜𝒜\mathcal{A}caligraphic_A is a T𝑇Titalic_T-step AMP algorithm with O(1)𝑂1O(1)italic_O ( 1 )-Lipschitz or polynomial denoiser functions. Let X𝑋Xitalic_X be a symmetric n×n𝑛𝑛n\times nitalic_n × italic_n matrix with i.i.d. O(1)n𝑂1𝑛\frac{O(1)}{\sqrt{n}}divide start_ARG italic_O ( 1 ) end_ARG start_ARG square-root start_ARG italic_n end_ARG end_ARG-subgaussian entries having mean 00 and variance 1n1𝑛\frac{1}{n}divide start_ARG 1 end_ARG start_ARG italic_n end_ARG, and let vAMP(X)subscript𝑣AMP𝑋v_{\mathrm{AMP}}(X)italic_v start_POSTSUBSCRIPT roman_AMP end_POSTSUBSCRIPT ( italic_X ) be the output of 𝒜𝒜\mathcal{A}caligraphic_A on X𝑋Xitalic_X. Then there exists an algorithm which when given access to an ε𝜀\varepsilonitalic_ε-principal minor corruption Y𝑌Yitalic_Y produces in time O(εn3logn)𝑂𝜀superscript𝑛3𝑛O(\varepsilon n^{3}\log n)italic_O ( italic_ε italic_n start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT roman_log italic_n ) a vector v^(Y)^𝑣𝑌\hat{v}(Y)over^ start_ARG italic_v end_ARG ( italic_Y ) satisfying

v^(Y)vAMP(X)2O(εlogd1ε)vAMP(X)2,superscriptnorm^𝑣𝑌subscript𝑣AMP𝑋2𝑂𝜀superscript𝑑1𝜀superscriptnormsubscript𝑣AMP𝑋2\|\hat{v}(Y)-v_{\mathrm{AMP}}(X)\|^{2}\leqslant O(\varepsilon\log^{d}\tfrac{1}% {\varepsilon})\cdot\|v_{\mathrm{AMP}}(X)\|^{2},∥ over^ start_ARG italic_v end_ARG ( italic_Y ) - italic_v start_POSTSUBSCRIPT roman_AMP end_POSTSUBSCRIPT ( italic_X ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⩽ italic_O ( italic_ε roman_log start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_ε end_ARG ) ⋅ ∥ italic_v start_POSTSUBSCRIPT roman_AMP end_POSTSUBSCRIPT ( italic_X ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ,

with probability 1o(1)1𝑜11-o(1)1 - italic_o ( 1 ) over the randomness of X𝑋Xitalic_X, where d=1𝑑1d=1italic_d = 1 if the denoisers are Lipschitz, and d=kT𝑑superscript𝑘𝑇d=k^{T}italic_d = italic_k start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT if the denoisers are degree kabsent𝑘\leqslant k⩽ italic_k polynomials.

In words, given access to an adversarially corrupted matrix Y𝑌Yitalic_Y, our algorithm can find a vector v^(Y)^𝑣𝑌\hat{v}(Y)over^ start_ARG italic_v end_ARG ( italic_Y ) which is close to the output of AMP on the uncorrupted matrix X𝑋Xitalic_X.333Since X𝑋Xitalic_X has bounded operator norm, this implies that v^(Y)^𝑣𝑌\hat{v}(Y)over^ start_ARG italic_v end_ARG ( italic_Y ) has objective value v^Xv^superscript^𝑣top𝑋^𝑣\hat{v}^{\top}X\hat{v}over^ start_ARG italic_v end_ARG start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_X over^ start_ARG italic_v end_ARG within an additive O~(ε)~𝑂𝜀\tilde{O}(\sqrt{\varepsilon})over~ start_ARG italic_O end_ARG ( square-root start_ARG italic_ε end_ARG ) of the objective of vAMP(X)subscript𝑣AMP𝑋v_{\mathrm{AMP}}(X)italic_v start_POSTSUBSCRIPT roman_AMP end_POSTSUBSCRIPT ( italic_X ). The result improves on that of [IS24] in that it (1) runs in time O(εn3logn)𝑂𝜀superscript𝑛3𝑛O(\varepsilon n^{3}\log n)italic_O ( italic_ε italic_n start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT roman_log italic_n ) rather than nexp(T)superscript𝑛𝑇n^{\exp(T)}italic_n start_POSTSUPERSCRIPT roman_exp ( italic_T ) end_POSTSUPERSCRIPT, and (2) guarantees that 𝒜(X)v^(Y)f(ε)𝒜(X)norm𝒜𝑋^𝑣𝑌𝑓𝜀norm𝒜𝑋\|\mathcal{A}(X)-\hat{v}(Y)\|\leqslant f(\varepsilon)\|\mathcal{A}(X)\|∥ caligraphic_A ( italic_X ) - over^ start_ARG italic_v end_ARG ( italic_Y ) ∥ ⩽ italic_f ( italic_ε ) ∥ caligraphic_A ( italic_X ) ∥ for a function f(ε)ε0subscript𝜀𝑓𝜀0f(\varepsilon)\to_{\varepsilon}0italic_f ( italic_ε ) → start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT 0 which is independent of n𝑛nitalic_n (but does depend on T𝑇Titalic_T), whereas in [IS24] the function f(ε)𝑓𝜀f(\varepsilon)italic_f ( italic_ε ) included a multiplicative factor of polylog(n)poly𝑛\mathrm{poly}\log(n)roman_poly roman_log ( italic_n ), and thus was trivial unless ε=o(1)𝜀𝑜1\varepsilon=o(1)italic_ε = italic_o ( 1 ).

As noted in [IS24], an equivalent result is information-theoretically impossible under the stronger corruption model in which XY𝑋𝑌X-Yitalic_X - italic_Y is supported on εn2𝜀superscript𝑛2\varepsilon n^{2}italic_ε italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT arbitrary entries (unless ε=o(n1/2)𝜀𝑜superscript𝑛12\varepsilon=o(n^{-1/2})italic_ε = italic_o ( italic_n start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT )).

As a direct corollary, we can robustly simulate Montanari’s algorithm [Mon21] for finding the ground state of the Sherrington-Kirkpatrick Hamiltonian—that is, an approximately optimal solution for Max-Cut with i.i.d. Gaussian edge weights.

Corollary 1.5 (Fast, robust Sherrington Kirkpatrick).

Suppose X𝑋Xitalic_X is a symmetric matrix with entries sampled i.i.d. from 𝒩(0,1n)𝒩01𝑛\mathcal{N}(0,\frac{1}{n})caligraphic_N ( 0 , divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ). Then there is an algorithm which when run on an ε𝜀\varepsilonitalic_ε-principal minor corruption Y𝑌Yitalic_Y of X𝑋Xitalic_X, with probability 1o(1)1𝑜11-o(1)1 - italic_o ( 1 ) produces in time O(εn3logn)𝑂𝜀superscript𝑛3𝑛O(\varepsilon n^{3}\log n)italic_O ( italic_ε italic_n start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT roman_log italic_n ) a unit vector v^(Y){±1/n}n^𝑣𝑌superscriptplus-or-minus1𝑛𝑛\hat{v}(Y)\in\{\pm 1/\sqrt{n}\}^{n}over^ start_ARG italic_v end_ARG ( italic_Y ) ∈ { ± 1 / square-root start_ARG italic_n end_ARG } start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT achieving objective value v^(Y)Xv^(Y)OBJAMPO(εlog1ε)^𝑣superscript𝑌top𝑋^𝑣𝑌subscriptOBJAMP𝑂𝜀1𝜀\hat{v}(Y)^{\top}X\hat{v}(Y)\geqslant\mathrm{OBJ_{AMP}}-O(\sqrt{\varepsilon% \log\frac{1}{\varepsilon}})over^ start_ARG italic_v end_ARG ( italic_Y ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_X over^ start_ARG italic_v end_ARG ( italic_Y ) ⩾ roman_OBJ start_POSTSUBSCRIPT roman_AMP end_POSTSUBSCRIPT - italic_O ( square-root start_ARG italic_ε roman_log divide start_ARG 1 end_ARG start_ARG italic_ε end_ARG end_ARG ).

The value OBJAMPsubscriptOBJAMP\mathrm{OBJ_{AMP}}roman_OBJ start_POSTSUBSCRIPT roman_AMP end_POSTSUBSCRIPT is the objective value achieved by Montanari’s AMP algorithm; modulo a widely-believed conjecture in statistical physics, OBJAMPsubscriptOBJAMP\mathrm{OBJ_{AMP}}roman_OBJ start_POSTSUBSCRIPT roman_AMP end_POSTSUBSCRIPT approaches OPT=maxv{±1/nvXv1.52\mathrm{OPT}=\max_{v\in\{\pm 1/\sqrt{n}}v^{\top}Xv\approx 1.52roman_OPT = roman_max start_POSTSUBSCRIPT italic_v ∈ { ± 1 / square-root start_ARG italic_n end_ARG end_POSTSUBSCRIPT italic_v start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_X italic_v ≈ 1.52 as T𝑇T\to\inftyitalic_T → ∞. The corollary follows from Theorem 1.4 because Montanari’s denoisers are Lipschitz, and the rounding scheme applied to place the final iterate in the hypercube is also Lipschitz.

In Section 4, we give a simple proof (along similar lines as the proof of Theorem 1.4) that AMP is robust to adversarial perturbations of small spectral norm. This fact is folklore, but we feel our proof is quite simple and may be of interest.

1.3 Experiments

Our algorithm is fast enough that it can be easily implemented and run on a laptop. We have run some experiments to demonstrate the utility of our method. We consider the non-negative PCA objective described in Example 1.2. In [MR15], it was shown that AMP with denoiser function f(x)=max(0,x)𝑓𝑥0𝑥f(x)=\max(0,x)italic_f ( italic_x ) = roman_max ( 0 , italic_x ) is an FPTAS for OPT=maxv0,v=1vXv=2OPTsubscriptformulae-sequence𝑣0norm𝑣1superscript𝑣top𝑋𝑣2\mathrm{OPT}=\max_{v\geqslant 0,\|v\|=1}v^{\top}Xv=\sqrt{2}roman_OPT = roman_max start_POSTSUBSCRIPT italic_v ⩾ 0 , ∥ italic_v ∥ = 1 end_POSTSUBSCRIPT italic_v start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_X italic_v = square-root start_ARG 2 end_ARG.

In Figure 1, we show the result for n=3000,ε=0.02formulae-sequence𝑛3000𝜀0.02n=3000,\varepsilon=0.02italic_n = 3000 , italic_ε = 0.02, with the adversarial corruption given by perturbing an εn×εn𝜀𝑛𝜀𝑛\varepsilon n\times\varepsilon nitalic_ε italic_n × italic_ε italic_n principal minor by sampling two independent rank 50=56εn5056𝜀𝑛50=\frac{5}{6}\varepsilon n50 = divide start_ARG 5 end_ARG start_ARG 6 end_ARG italic_ε italic_n Wishart matrices, each normalized to have expected Frobenius norm 100100100100, and adding one and subtracting the other. Without having taken pains to optimize the running time, the implementation in Python on a laptop takes less than 5 minutes. We have plotted (1) the correlation of our algorithm’s output, v^(Y)^𝑣𝑌\hat{v}(Y)over^ start_ARG italic_v end_ARG ( italic_Y ), with vAMP(X)subscript𝑣AMP𝑋v_{\mathrm{AMP}}(X)italic_v start_POSTSUBSCRIPT roman_AMP end_POSTSUBSCRIPT ( italic_X ), and (2) the objective value of the output for the uncorrupted matrix X𝑋Xitalic_X, v^(Y)Xv^(Y)^𝑣superscript𝑌top𝑋^𝑣𝑌\hat{v}(Y)^{\top}X\hat{v}(Y)over^ start_ARG italic_v end_ARG ( italic_Y ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_X over^ start_ARG italic_v end_ARG ( italic_Y ), as a function of the number of iterations. For comparison, we plot in Figure 1 the performance of (a) AMP on the corrupt matrix, vAMP(Y)subscript𝑣AMP𝑌v_{\mathrm{AMP}}(Y)italic_v start_POSTSUBSCRIPT roman_AMP end_POSTSUBSCRIPT ( italic_Y ), and (b) AMP on a “naive” spectral cleaning Y~~𝑌\tilde{Y}over~ start_ARG italic_Y end_ARG of Y𝑌Yitalic_Y, given by deleting all larger-than-expected eigenvalues. Our procedure performs much better than AMP on the corrupt input. Empirically, the naive cleaning performance is comparable to ours, but unlike our algorithm, the naive procedure does not come with provable guarantees for arbitrary perturbations (and we suspect the naive procedure may be succeeding due to a small-n𝑛nitalic_n effect).

Refer to caption
Refer to caption
Figure 1: Plot of the correlation of the vector v^(Y)^𝑣𝑌\hat{v}(Y)over^ start_ARG italic_v end_ARG ( italic_Y ) with the output of AMP on the “clean” matrix X𝑋Xitalic_X, and of the objective value attained by v^(Y)^𝑣𝑌\hat{v}(Y)over^ start_ARG italic_v end_ARG ( italic_Y ) on the clean matrix X𝑋Xitalic_X.

1.4 Discussion

We give a fast spectral algorithm for simulating AMP under adversarial principal minor corruptions. Our algorithm is an implementation of the “spectral algorithms from sum-of-squares (SoS) analyses” strategy introduced in [HSSS16]. We find it to be a particularly striking example of this strategy—not only was the running time reduced from nexp(T)superscript𝑛𝑇n^{\exp(T)}italic_n start_POSTSUPERSCRIPT roman_exp ( italic_T ) end_POSTSUPERSCRIPT to O(n3)𝑂superscript𝑛3O(n^{3})italic_O ( italic_n start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ), but also, the analysis very transparently mimics/distills that of [IS24] to yield a much cleaner argument. We draw a comparison to previous spectral-to-SoS analyses in robust statistics, most of which have been based on a “filtering” approach (e.g. [JLST21, DKK+19]); in the filtering algorithms, the non-SoS analysis required significant additional tools. Another fitting comparison is to recent works obtaining robust spectral algorithms for community recovery in the stochastic block model [MRW24, DdHS23, DdNS22], where it was important to have a very fine-grained understanding of the spectrum of specific matrices. In our case, we are able to get away with a much simpler analysis.

Though we have improved on the result in [IS24] in terms of running time and the robustness-accuracy tradeoff, we differ from our prior work in one aspect: we require a description of the denoisers \mathcal{F}caligraphic_F used in the AMP algorithm 𝒜𝒜\mathcal{A}caligraphic_A, whereas the algorithm in [IS24] has access only to the low-degree moments of the joint distribution over X,𝒜(X)𝑋𝒜𝑋X,\mathcal{A}(X)italic_X , caligraphic_A ( italic_X ). We find it unlikely that a fast algorithm could succeed without a description of \mathcal{F}caligraphic_F, but we pose this as a question nonetheless.

Another question is whether our error guarantees are optimal, as a function of the number of AMP iterations T𝑇Titalic_T. In our theorem, the O~(ε)~𝑂𝜀\tilde{O}(\sqrt{\varepsilon})over~ start_ARG italic_O end_ARG ( square-root start_ARG italic_ε end_ARG ) hides factors that grow with the number of AMP iterations; however our experiments (Figure 1) seem to suggest that the error stabilizes—is this a small n𝑛nitalic_n effect? Or perhaps an artifact of the specific perturbation from our experiments?

One clear direction for future work is making AMP robust when the input matrix X𝑋Xitalic_X has planted structure, rather than just having i.i.d. subgaussian entries. For example, AMP has been a successful algorithm for “spiked matrix models” in which X=G+λuu𝑋𝐺𝜆𝑢superscript𝑢topX=G+\lambda uu^{\top}italic_X = italic_G + italic_λ italic_u italic_u start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT with G𝐺Gitalic_G a Gaussian matrix and uu𝑢superscript𝑢topuu^{\top}italic_u italic_u start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT a rank-1 spike, the goal often being to find u𝑢uitalic_u given X𝑋Xitalic_X. In this case, it is not completely clear which noise model to study. In some cases (e.g. when u𝑢uitalic_u is sparse) a principal minor corruption could simply erase the spike uu𝑢superscript𝑢topuu^{\top}italic_u italic_u start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT. However, it is an interesting question whether our techniques can be extended to this case—currently, our algorithm incorporates information about i.i.d. subgaussian variables, which makes it inappropriate for planted models (the same is true of [IS24]).

Finally, it is interesting to consider alternative corruption models. The principal minor corruption is tractable to study, and the fact that it is adversarial makes it a powerful model. We know from [IS24] that a similar result is information-theoretically impossible under the strongest sparse adversarial corruption model, in which an arbitrary subset of εn2𝜀superscript𝑛2\varepsilon n^{2}italic_ε italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT entries is perturbed. However, it would be interesting to consider alternative corruption models that more faithfully model the distribution shift one expects to see in practice, for example in the application of compressed sensing.

1.5 Technical overview

Though the proof of Theorem 1.4 is not long, we briefly summarize the main ideas here. For the sake of simplicity, in this technical overview we pretend that the AMP iteration has the form x(t)=Xf(x(t1))superscript𝑥𝑡𝑋𝑓superscript𝑥𝑡1x^{(t)}=Xf(x^{(t-1)})italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT = italic_X italic_f ( italic_x start_POSTSUPERSCRIPT ( italic_t - 1 ) end_POSTSUPERSCRIPT ), ignoring the Onsager correction and the dependence on more than one prior iterate.

Recall that we are given an ε𝜀\varepsilonitalic_ε-principal minor corruption Y𝑌Yitalic_Y of X𝑋Xitalic_X. The fact that X𝑋Xitalic_X has i.i.d. subgaussian entries of variance 1n1𝑛\frac{1}{n}divide start_ARG 1 end_ARG start_ARG italic_n end_ARG implies that with high probability, X𝗈𝗉=O(1)subscriptnorm𝑋𝗈𝗉𝑂1\|X\|_{\operatorname{\mathsf{op}}}=O(1)∥ italic_X ∥ start_POSTSUBSCRIPT sansserif_op end_POSTSUBSCRIPT = italic_O ( 1 ). The first step of our algorithm is a spectral procedure which removes O(εn)𝑂𝜀𝑛O(\varepsilon n)italic_O ( italic_ε italic_n ) rows and columns of Y𝑌Yitalic_Y, producing a matrix Y^^𝑌\hat{Y}over^ start_ARG italic_Y end_ARG with Y^𝗈𝗉=O(1)subscriptnorm^𝑌𝗈𝗉𝑂1\|\hat{Y}\|_{\operatorname{\mathsf{op}}}=O(1)∥ over^ start_ARG italic_Y end_ARG ∥ start_POSTSUBSCRIPT sansserif_op end_POSTSUBSCRIPT = italic_O ( 1 ). Then, we run a modified version of the AMP algorithm on the cleaned input matrix Y^^𝑌\hat{Y}over^ start_ARG italic_Y end_ARG, producing iterates y(1),y(2),superscript𝑦1superscript𝑦2y^{(1)},y^{(2)},\ldotsitalic_y start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT , … just as the original AMP algorithm would have, except that at each iteration we clip the entries y(t)=Y^f(𝖼𝗅𝗂𝗉(y(t1)))superscript𝑦𝑡^𝑌𝑓𝖼𝗅𝗂𝗉superscript𝑦𝑡1y^{(t)}=\hat{Y}f(\operatorname{\mathsf{clip}}(y^{(t-1)}))italic_y start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT = over^ start_ARG italic_Y end_ARG italic_f ( sansserif_clip ( italic_y start_POSTSUPERSCRIPT ( italic_t - 1 ) end_POSTSUPERSCRIPT ) ), so that the magnitude of all entries of 𝖼𝗅𝗂𝗉(y(t1))𝖼𝗅𝗂𝗉superscript𝑦𝑡1\operatorname{\mathsf{clip}}(y^{(t-1)})sansserif_clip ( italic_y start_POSTSUPERSCRIPT ( italic_t - 1 ) end_POSTSUPERSCRIPT ) does not exceed the ε𝜀\varepsilonitalic_ε-quantile444In the proof we choose the threshold to not exactly correspond to the ε𝜀\varepsilonitalic_ε-quantile, but this choice would have also worked and is simpler for the sake of this overview. value O(polylog1ε)𝑂poly1𝜀O(\mathrm{poly}\log\frac{1}{\varepsilon})italic_O ( roman_poly roman_log divide start_ARG 1 end_ARG start_ARG italic_ε end_ARG ) of the entries in a typical iterate x(t1)superscript𝑥𝑡1x^{(t-1)}italic_x start_POSTSUPERSCRIPT ( italic_t - 1 ) end_POSTSUPERSCRIPT from a clean input matrix.

We argue that y(t)x(t)O~(ε)x(t)normsuperscript𝑦𝑡superscript𝑥𝑡~𝑂𝜀normsuperscript𝑥𝑡\|y^{(t)}-x^{(t)}\|\leqslant\tilde{O}(\sqrt{\varepsilon})\|x^{(t)}\|∥ italic_y start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT - italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ∥ ⩽ over~ start_ARG italic_O end_ARG ( square-root start_ARG italic_ε end_ARG ) ∥ italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ∥ by induction on t𝑡titalic_t; In the base case, t=0𝑡0t=0italic_t = 0, the iterates are identical as x(t)=1=y(t)superscript𝑥𝑡1superscript𝑦𝑡x^{(t)}=\vec{1}=y^{(t)}italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT = over→ start_ARG 1 end_ARG = italic_y start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT. Now for t1𝑡1t\geqslant 1italic_t ⩾ 1, suppose that x(t)superscript𝑥𝑡x^{(t)}italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT is the (unobserved) iterate AMP would have produced on X𝑋Xitalic_X. Then

y(t)x(t)normsuperscript𝑦𝑡superscript𝑥𝑡\displaystyle\left\|y^{(t)}-x^{(t)}\right\|∥ italic_y start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT - italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ∥ =Y^f(𝖼𝗅𝗂𝗉(y(t1)))Xf(x(t1))absentnorm^𝑌𝑓𝖼𝗅𝗂𝗉superscript𝑦𝑡1𝑋𝑓superscript𝑥𝑡1\displaystyle=\left\|\hat{Y}f(\operatorname{\mathsf{clip}}(y^{(t-1)}))-Xf(x^{(% t-1)})\right\|= ∥ over^ start_ARG italic_Y end_ARG italic_f ( sansserif_clip ( italic_y start_POSTSUPERSCRIPT ( italic_t - 1 ) end_POSTSUPERSCRIPT ) ) - italic_X italic_f ( italic_x start_POSTSUPERSCRIPT ( italic_t - 1 ) end_POSTSUPERSCRIPT ) ∥
Y^(f(𝖼𝗅𝗂𝗉(y(t1)))f(x(t1)))+(Y^X)f(x(t1))absentnorm^𝑌𝑓𝖼𝗅𝗂𝗉superscript𝑦𝑡1𝑓superscript𝑥𝑡1norm^𝑌𝑋𝑓superscript𝑥𝑡1\displaystyle\leqslant\left\|\hat{Y}(f(\operatorname{\mathsf{clip}}(y^{(t-1)})% )-f(x^{(t-1)}))\right\|+\left\|(\hat{Y}-X)f(x^{(t-1)})\right\|⩽ ∥ over^ start_ARG italic_Y end_ARG ( italic_f ( sansserif_clip ( italic_y start_POSTSUPERSCRIPT ( italic_t - 1 ) end_POSTSUPERSCRIPT ) ) - italic_f ( italic_x start_POSTSUPERSCRIPT ( italic_t - 1 ) end_POSTSUPERSCRIPT ) ) ∥ + ∥ ( over^ start_ARG italic_Y end_ARG - italic_X ) italic_f ( italic_x start_POSTSUPERSCRIPT ( italic_t - 1 ) end_POSTSUPERSCRIPT ) ∥
Y^𝗈𝗉f(𝖼𝗅𝗂𝗉(y(t1)))f(x(t1))+(Y^X)f(x(t1))absentsubscriptnorm^𝑌𝗈𝗉norm𝑓𝖼𝗅𝗂𝗉superscript𝑦𝑡1𝑓superscript𝑥𝑡1norm^𝑌𝑋𝑓superscript𝑥𝑡1\displaystyle\leqslant\left\|\hat{Y}\right\|_{\operatorname{\mathsf{op}}}\left% \|f(\operatorname{\mathsf{clip}}(y^{(t-1)}))-f(x^{(t-1)})\right\|+\left\|(\hat% {Y}-X)f(x^{(t-1)})\right\|⩽ ∥ over^ start_ARG italic_Y end_ARG ∥ start_POSTSUBSCRIPT sansserif_op end_POSTSUBSCRIPT ∥ italic_f ( sansserif_clip ( italic_y start_POSTSUPERSCRIPT ( italic_t - 1 ) end_POSTSUPERSCRIPT ) ) - italic_f ( italic_x start_POSTSUPERSCRIPT ( italic_t - 1 ) end_POSTSUPERSCRIPT ) ∥ + ∥ ( over^ start_ARG italic_Y end_ARG - italic_X ) italic_f ( italic_x start_POSTSUPERSCRIPT ( italic_t - 1 ) end_POSTSUPERSCRIPT ) ∥ (1)

The spectral cleaning ensures that Y𝗈𝗉=O(1)subscriptnorm𝑌𝗈𝗉𝑂1\|Y\|_{\operatorname{\mathsf{op}}}=O(1)∥ italic_Y ∥ start_POSTSUBSCRIPT sansserif_op end_POSTSUBSCRIPT = italic_O ( 1 ). To further bound the first term in (1), consider the illustrative case of the denoiser f(x)=x2𝑓𝑥superscript𝑥2f(x)=x^{2}italic_f ( italic_x ) = italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. Then for any vectors a,b𝑎𝑏a,bitalic_a , italic_b, f(a)f(b)=(a+b)(ab)𝑓𝑎𝑓𝑏𝑎𝑏𝑎𝑏f(a)-f(b)=(a+b)\circ(a-b)italic_f ( italic_a ) - italic_f ( italic_b ) = ( italic_a + italic_b ) ∘ ( italic_a - italic_b ), for \circ the entrywise product. Thus we have

f(𝖼𝗅𝗂𝗉(y(t1)))f(x(t1))norm𝑓𝖼𝗅𝗂𝗉superscript𝑦𝑡1𝑓superscript𝑥𝑡1\displaystyle\left\|f(\operatorname{\mathsf{clip}}(y^{(t-1)}))-f(x^{(t-1)})\right\|∥ italic_f ( sansserif_clip ( italic_y start_POSTSUPERSCRIPT ( italic_t - 1 ) end_POSTSUPERSCRIPT ) ) - italic_f ( italic_x start_POSTSUPERSCRIPT ( italic_t - 1 ) end_POSTSUPERSCRIPT ) ∥ =(𝖼𝗅𝗂𝗉(y(t1))+x(t1))(𝖼𝗅𝗂𝗉(y(t1))x(t1))absentnorm𝖼𝗅𝗂𝗉superscript𝑦𝑡1superscript𝑥𝑡1𝖼𝗅𝗂𝗉superscript𝑦𝑡1superscript𝑥𝑡1\displaystyle=\left\|(\operatorname{\mathsf{clip}}(y^{(t-1)})+x^{(t-1)})\circ(% \operatorname{\mathsf{clip}}(y^{(t-1)})-x^{(t-1)})\right\|= ∥ ( sansserif_clip ( italic_y start_POSTSUPERSCRIPT ( italic_t - 1 ) end_POSTSUPERSCRIPT ) + italic_x start_POSTSUPERSCRIPT ( italic_t - 1 ) end_POSTSUPERSCRIPT ) ∘ ( sansserif_clip ( italic_y start_POSTSUPERSCRIPT ( italic_t - 1 ) end_POSTSUPERSCRIPT ) - italic_x start_POSTSUPERSCRIPT ( italic_t - 1 ) end_POSTSUPERSCRIPT ) ∥
𝖼𝗅𝗂𝗉(y(t1))𝖼𝗅𝗂𝗉(y(t1))x(t1)+x(t1)(𝖼𝗅𝗂𝗉(y(t1))x(t1))absentsubscriptnorm𝖼𝗅𝗂𝗉superscript𝑦𝑡1norm𝖼𝗅𝗂𝗉superscript𝑦𝑡1superscript𝑥𝑡1normsuperscript𝑥𝑡1𝖼𝗅𝗂𝗉superscript𝑦𝑡1superscript𝑥𝑡1\displaystyle\leqslant\left\|\operatorname{\mathsf{clip}}(y^{(t-1)})\right\|_{% \infty}\cdot\left\|\operatorname{\mathsf{clip}}(y^{(t-1)})-x^{(t-1)}\right\|+% \left\|x^{(t-1)}\circ(\operatorname{\mathsf{clip}}(y^{(t-1)})-x^{(t-1)})\right\|⩽ ∥ sansserif_clip ( italic_y start_POSTSUPERSCRIPT ( italic_t - 1 ) end_POSTSUPERSCRIPT ) ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ⋅ ∥ sansserif_clip ( italic_y start_POSTSUPERSCRIPT ( italic_t - 1 ) end_POSTSUPERSCRIPT ) - italic_x start_POSTSUPERSCRIPT ( italic_t - 1 ) end_POSTSUPERSCRIPT ∥ + ∥ italic_x start_POSTSUPERSCRIPT ( italic_t - 1 ) end_POSTSUPERSCRIPT ∘ ( sansserif_clip ( italic_y start_POSTSUPERSCRIPT ( italic_t - 1 ) end_POSTSUPERSCRIPT ) - italic_x start_POSTSUPERSCRIPT ( italic_t - 1 ) end_POSTSUPERSCRIPT ) ∥ (2)

The first and second terms of (2) are bounded in a similar manner, we begin by explaining the first. Because of the clipping procedure, 𝖼𝗅𝗂𝗉(y(t1))=O(polylog1ε)subscriptnorm𝖼𝗅𝗂𝗉superscript𝑦𝑡1𝑂poly1𝜀\|\operatorname{\mathsf{clip}}(y^{(t-1)})\|_{\infty}=O(\mathrm{poly}\log\frac{% 1}{\varepsilon})∥ sansserif_clip ( italic_y start_POSTSUPERSCRIPT ( italic_t - 1 ) end_POSTSUPERSCRIPT ) ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = italic_O ( roman_poly roman_log divide start_ARG 1 end_ARG start_ARG italic_ε end_ARG ). Further, by the triangle inequality,

𝖼𝗅𝗂𝗉(y(t1))x(t1)𝖼𝗅𝗂𝗉(y(t1))𝖼𝗅𝗂𝗉(x(t1))+𝖼𝗅𝗂𝗉(x(t1))x(t1).norm𝖼𝗅𝗂𝗉superscript𝑦𝑡1superscript𝑥𝑡1norm𝖼𝗅𝗂𝗉superscript𝑦𝑡1𝖼𝗅𝗂𝗉superscript𝑥𝑡1norm𝖼𝗅𝗂𝗉superscript𝑥𝑡1superscript𝑥𝑡1\|\operatorname{\mathsf{clip}}(y^{(t-1)})-x^{(t-1)}\|\leqslant\|\operatorname{% \mathsf{clip}}(y^{(t-1)})-\operatorname{\mathsf{clip}}(x^{(t-1)})\|+\|% \operatorname{\mathsf{clip}}(x^{(t-1)})-x^{(t-1)}\|.∥ sansserif_clip ( italic_y start_POSTSUPERSCRIPT ( italic_t - 1 ) end_POSTSUPERSCRIPT ) - italic_x start_POSTSUPERSCRIPT ( italic_t - 1 ) end_POSTSUPERSCRIPT ∥ ⩽ ∥ sansserif_clip ( italic_y start_POSTSUPERSCRIPT ( italic_t - 1 ) end_POSTSUPERSCRIPT ) - sansserif_clip ( italic_x start_POSTSUPERSCRIPT ( italic_t - 1 ) end_POSTSUPERSCRIPT ) ∥ + ∥ sansserif_clip ( italic_x start_POSTSUPERSCRIPT ( italic_t - 1 ) end_POSTSUPERSCRIPT ) - italic_x start_POSTSUPERSCRIPT ( italic_t - 1 ) end_POSTSUPERSCRIPT ∥ . (3)

The first term on the right of (3) can be bounded by O~(ε)x(t1)~𝑂𝜀normsuperscript𝑥𝑡1\tilde{O}(\sqrt{\varepsilon})\cdot\|x^{(t-1)}\|over~ start_ARG italic_O end_ARG ( square-root start_ARG italic_ε end_ARG ) ⋅ ∥ italic_x start_POSTSUPERSCRIPT ( italic_t - 1 ) end_POSTSUPERSCRIPT ∥ from the inductive hypothesis, because the 𝖼𝗅𝗂𝗉𝖼𝗅𝗂𝗉\operatorname{\mathsf{clip}}sansserif_clip function is 1111-Lipschitz. The second term in (3) can be bounded by O~(ε)x(t1)~𝑂𝜀normsuperscript𝑥𝑡1\tilde{O}(\sqrt{\varepsilon})\cdot\|x^{(t-1)}\|over~ start_ARG italic_O end_ARG ( square-root start_ARG italic_ε end_ARG ) ⋅ ∥ italic_x start_POSTSUPERSCRIPT ( italic_t - 1 ) end_POSTSUPERSCRIPT ∥, because the distribution of x(t1)superscript𝑥𝑡1x^{(t-1)}italic_x start_POSTSUPERSCRIPT ( italic_t - 1 ) end_POSTSUPERSCRIPT’s entries is known, and is roughly that of independent polynomials in Gaussian random variables. To bound the second term from (2), we separate the contribution of the entries of x(t1)superscript𝑥𝑡1x^{(t-1)}italic_x start_POSTSUPERSCRIPT ( italic_t - 1 ) end_POSTSUPERSCRIPT which are bounded by O(polylog1ε)𝑂poly1𝜀O(\mathrm{poly}\log\frac{1}{\varepsilon})italic_O ( roman_poly roman_log divide start_ARG 1 end_ARG start_ARG italic_ε end_ARG ), to which we can apply an identical argument, and the entries which exceed this threshold, and then appeal to the fact that these integrate to a small total. A similar argument can be used for arbitrary polynomial f𝑓fitalic_f (for Lipschitz f𝑓fitalic_f, (1) can be bounded directly and the clipping is not necessary).

To bound the second term in (1), we use the fact that Y^X^𝑌𝑋\hat{Y}-Xover^ start_ARG italic_Y end_ARG - italic_X can be written as the sum of a matrix E𝐸Eitalic_E, supported on an εn×εn𝜀𝑛𝜀𝑛\varepsilon n\times\varepsilon nitalic_ε italic_n × italic_ε italic_n principal minor, and a matrix F𝐹Fitalic_F which is equal to the support of X𝑋-X- italic_X on at most O(εn)𝑂𝜀𝑛O(\varepsilon n)italic_O ( italic_ε italic_n ) rows/columns—these are precisely the rows/columns of Y𝑌Yitalic_Y which were removed to form Y^^𝑌\hat{Y}over^ start_ARG italic_Y end_ARG, but were not involved in the initial principal minor corruption. So, (Y^X)f(x(t1))Ef(x(t1))+Ff(x(t1))norm^𝑌𝑋𝑓superscript𝑥𝑡1norm𝐸𝑓superscript𝑥𝑡1norm𝐹𝑓superscript𝑥𝑡1\|(\hat{Y}-X)f(x^{(t-1)})\|\leqslant\|Ef(x^{(t-1)})\|+\|Ff(x^{(t-1)})\|∥ ( over^ start_ARG italic_Y end_ARG - italic_X ) italic_f ( italic_x start_POSTSUPERSCRIPT ( italic_t - 1 ) end_POSTSUPERSCRIPT ) ∥ ⩽ ∥ italic_E italic_f ( italic_x start_POSTSUPERSCRIPT ( italic_t - 1 ) end_POSTSUPERSCRIPT ) ∥ + ∥ italic_F italic_f ( italic_x start_POSTSUPERSCRIPT ( italic_t - 1 ) end_POSTSUPERSCRIPT ) ∥. Since E𝐸Eitalic_E is supported on εn𝜀𝑛\varepsilon nitalic_ε italic_n columns,

Ef(x(t1))E𝗈𝗉maxI[n],|I|=εniIf(x(t1))i2.norm𝐸𝑓superscript𝑥𝑡1subscriptnorm𝐸𝗈𝗉subscriptformulae-sequence𝐼delimited-[]𝑛𝐼𝜀𝑛subscript𝑖𝐼𝑓superscriptsubscriptsuperscript𝑥𝑡1𝑖2\|Ef(x^{(t-1)})\|\leqslant\|E\|_{\operatorname{\mathsf{op}}}\cdot\max_{I% \subset[n],|I|=\varepsilon n}\sum_{i\in I}f(x^{(t-1)})_{i}^{2}.∥ italic_E italic_f ( italic_x start_POSTSUPERSCRIPT ( italic_t - 1 ) end_POSTSUPERSCRIPT ) ∥ ⩽ ∥ italic_E ∥ start_POSTSUBSCRIPT sansserif_op end_POSTSUBSCRIPT ⋅ roman_max start_POSTSUBSCRIPT italic_I ⊂ [ italic_n ] , | italic_I | = italic_ε italic_n end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_i ∈ italic_I end_POSTSUBSCRIPT italic_f ( italic_x start_POSTSUPERSCRIPT ( italic_t - 1 ) end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

Here again, because we know the order statistics of x(t1)superscript𝑥𝑡1x^{(t-1)}italic_x start_POSTSUPERSCRIPT ( italic_t - 1 ) end_POSTSUPERSCRIPT, and because f𝑓fitalic_f is required to be a well-behaved function, the maximum norm of f(x(t1))𝑓superscript𝑥𝑡1f(x^{(t-1)})italic_f ( italic_x start_POSTSUPERSCRIPT ( italic_t - 1 ) end_POSTSUPERSCRIPT ) when restricted to a subset of εn𝜀𝑛\varepsilon nitalic_ε italic_n coordinates is on the order of O~(ε)x(t1)~𝑂𝜀normsuperscript𝑥𝑡1\tilde{O}(\sqrt{\varepsilon})\|x^{(t-1)}\|over~ start_ARG italic_O end_ARG ( square-root start_ARG italic_ε end_ARG ) ∥ italic_x start_POSTSUPERSCRIPT ( italic_t - 1 ) end_POSTSUPERSCRIPT ∥. Also, since E𝐸Eitalic_E is a submatrix of Y^X^𝑌𝑋\hat{Y}-Xover^ start_ARG italic_Y end_ARG - italic_X, E𝗈𝗉Y^𝗈𝗉+X𝗈𝗉12subscriptnorm𝐸𝗈𝗉subscriptnorm^𝑌𝗈𝗉subscriptnorm𝑋𝗈𝗉12\|E\|_{\operatorname{\mathsf{op}}}\leqslant\|\hat{Y}\|_{\operatorname{\mathsf{% op}}}+\|X\|_{\operatorname{\mathsf{op}}}\leqslant 12∥ italic_E ∥ start_POSTSUBSCRIPT sansserif_op end_POSTSUBSCRIPT ⩽ ∥ over^ start_ARG italic_Y end_ARG ∥ start_POSTSUBSCRIPT sansserif_op end_POSTSUBSCRIPT + ∥ italic_X ∥ start_POSTSUBSCRIPT sansserif_op end_POSTSUBSCRIPT ⩽ 12.

The matrix F𝐹Fitalic_F can be split into the part F1subscript𝐹1F_{1}italic_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT supported on O(εn)𝑂𝜀𝑛O(\varepsilon n)italic_O ( italic_ε italic_n ) columns, for which the argument is identical to the case of Ef(x(t1)Ef(x^{(t-1)}italic_E italic_f ( italic_x start_POSTSUPERSCRIPT ( italic_t - 1 ) end_POSTSUPERSCRIPT above. But there is also a part F2subscript𝐹2F_{2}italic_F start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT supported on O(εn)𝑂𝜀𝑛O(\varepsilon n)italic_O ( italic_ε italic_n ) rows. Here, we have to take a different perspective: since F2subscript𝐹2F_{2}italic_F start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT is a restriction of X𝑋-X- italic_X to the rows indexed by some set T[n]𝑇delimited-[]𝑛T\subset[n]italic_T ⊂ [ italic_n ] with |T|=εn𝑇𝜀𝑛|T|=\varepsilon n| italic_T | = italic_ε italic_n, we have that F2f(x(t1))=(Xf(x(t1)))Tsubscript𝐹2𝑓superscript𝑥𝑡1subscript𝑋𝑓superscript𝑥𝑡1𝑇F_{2}f(x^{(t-1)})=(-Xf(x^{(t-1)}))_{T}italic_F start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_f ( italic_x start_POSTSUPERSCRIPT ( italic_t - 1 ) end_POSTSUPERSCRIPT ) = ( - italic_X italic_f ( italic_x start_POSTSUPERSCRIPT ( italic_t - 1 ) end_POSTSUPERSCRIPT ) ) start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT, which is an εn𝜀𝑛\varepsilon nitalic_ε italic_n-sparse subset of the vector Xf(x(t1)-Xf(x^{(t-1)}- italic_X italic_f ( italic_x start_POSTSUPERSCRIPT ( italic_t - 1 ) end_POSTSUPERSCRIPT. But we understand the order statistics of this vector too! Hence we have that F2f(x(t1))=O~(ε)x(t1)normsubscript𝐹2𝑓superscript𝑥𝑡1~𝑂𝜀normsuperscript𝑥𝑡1\|F_{2}f(x^{(t-1)})\|=\tilde{O}(\sqrt{\varepsilon})\|x^{(t-1)}\|∥ italic_F start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_f ( italic_x start_POSTSUPERSCRIPT ( italic_t - 1 ) end_POSTSUPERSCRIPT ) ∥ = over~ start_ARG italic_O end_ARG ( square-root start_ARG italic_ε end_ARG ) ∥ italic_x start_POSTSUPERSCRIPT ( italic_t - 1 ) end_POSTSUPERSCRIPT ∥ as desired.

Putting everything together, we have that y(t)x(t)O~(ε)x(t1)normsuperscript𝑦𝑡superscript𝑥𝑡~𝑂𝜀normsuperscript𝑥𝑡1\|y^{(t)}-x^{(t)}\|\leqslant\tilde{O}(\sqrt{\varepsilon})\cdot\|x^{(t-1)}\|∥ italic_y start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT - italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ∥ ⩽ over~ start_ARG italic_O end_ARG ( square-root start_ARG italic_ε end_ARG ) ⋅ ∥ italic_x start_POSTSUPERSCRIPT ( italic_t - 1 ) end_POSTSUPERSCRIPT ∥. The argument is now finished by again using our knowledge of the distribution of x(t1)superscript𝑥𝑡1x^{(t-1)}italic_x start_POSTSUPERSCRIPT ( italic_t - 1 ) end_POSTSUPERSCRIPT to conclude that x(t1)normsuperscript𝑥𝑡1\|x^{(t-1)}\|∥ italic_x start_POSTSUPERSCRIPT ( italic_t - 1 ) end_POSTSUPERSCRIPT ∥ and x(t)normsuperscript𝑥𝑡\|x^{(t)}\|∥ italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ∥ are within constant scalings of each other.

Much of this analysis mirrors and simplifies the analysis in [IS24]. There, a semidefinite program is used to obtain a pseudoexpectation of a “cleaned” version X^^𝑋\hat{X}over^ start_ARG italic_X end_ARG of Y𝑌Yitalic_Y. The semidefinite program has formal variables for low-degree symmetric polynomials of X^^𝑋\hat{X}over^ start_ARG italic_X end_ARG. It adds constraints to try to enforce that X^𝗈𝗉=O(1)subscriptnorm^𝑋𝗈𝗉𝑂1\|\hat{X}\|_{\operatorname{\mathsf{op}}}=O(1)∥ over^ start_ARG italic_X end_ARG ∥ start_POSTSUBSCRIPT sansserif_op end_POSTSUBSCRIPT = italic_O ( 1 ), that X^Y^𝑋𝑌\hat{X}-Yover^ start_ARG italic_X end_ARG - italic_Y be supported on a principal minor (by introducing indicator variables for “clean” rows and columns), as well as the constraint that some symmetric vector-valued polynomials in the entries of X^^𝑋\hat{X}over^ start_ARG italic_X end_ARG have entries which are no larger than corresponding polynomials in X𝑋Xitalic_X.

The high-level sequence of arguments mirrors those outlined in (1) and the subsequent lines. We introduce some additional structure/arguments because our spectral cleaning step (for which we design a natural-in-hindsight spectral cleaning algorithm) deletes rows and columns. One advantage of the present argument over that in [IS24] is that it is unclear how to make a semidefinite program leverage the order statistics of vector-valued polynomials, so in our prior work we crudely enforce a bound on the infinity norm of the vectors, which gives rise to polylognpoly𝑛\mathrm{poly}\log nroman_poly roman_log italic_n factors. Here we are able to circumvent this because we clip our iterates by hand.

2 AMP preliminaries

To complete Definition 1.1 from the introduction, we must define the Onsager correction term.

Definition 2.1 (Onsager correction).

The Onsager correction term for the AMP algorithm defined by denoisers =f1,subscript𝑓1\mathcal{F}=f_{1},\ldotscaligraphic_F = italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … on input X𝑋Xitalic_X with iterates x(0),x(1),superscript𝑥0superscript𝑥1x^{(0)},x^{(1)},\ldotsitalic_x start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT , italic_x start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT , … is the quantity

Δt(vt,,v0)=j=1tBt,jfj1(x(j1),,x(0))subscriptΔ𝑡subscript𝑣𝑡subscript𝑣0superscriptsubscript𝑗1𝑡subscript𝐵𝑡𝑗subscript𝑓𝑗1superscript𝑥𝑗1superscript𝑥0\Delta_{t}(v_{t},\ldots,v_{0})=\sum_{j=1}^{t}B_{t,j}\cdot f_{j-1}(x^{(j-1)},% \ldots,x^{(0)})roman_Δ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , … , italic_v start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) = ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_B start_POSTSUBSCRIPT italic_t , italic_j end_POSTSUBSCRIPT ⋅ italic_f start_POSTSUBSCRIPT italic_j - 1 end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ( italic_j - 1 ) end_POSTSUPERSCRIPT , … , italic_x start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT )

where Bt,j=𝐄X[bt,j]subscript𝐵𝑡𝑗subscript𝐄𝑋subscript𝑏𝑡𝑗B_{t,j}=\operatorname*{\mathbf{E}}_{X}[b_{t,j}]italic_B start_POSTSUBSCRIPT italic_t , italic_j end_POSTSUBSCRIPT = bold_E start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT [ italic_b start_POSTSUBSCRIPT italic_t , italic_j end_POSTSUBSCRIPT ] where bt,jsubscript𝑏𝑡𝑗b_{t,j}italic_b start_POSTSUBSCRIPT italic_t , italic_j end_POSTSUBSCRIPT is the normalized divergence of ftsubscript𝑓𝑡f_{t}italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT with respect to x(j)superscript𝑥𝑗x^{(j)}italic_x start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT:

bt,j=1ni=1nft(xit,,uij,,,xi0)uij|ujxj.b_{t,j}=\frac{1}{n}\sum_{i=1}^{n}\left.\frac{\partial f_{t}(x_{i}^{t},\ldots,u% _{i}^{j},\ldots,,x_{i}^{0})}{\partial u_{i}^{j}}\right|_{u^{j}\rightarrow x^{j% }}.italic_b start_POSTSUBSCRIPT italic_t , italic_j end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT divide start_ARG ∂ italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , … , italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT , … , , italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) end_ARG start_ARG ∂ italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT end_ARG | start_POSTSUBSCRIPT italic_u start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT → italic_x start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT end_POSTSUBSCRIPT .

We remark that the Onsager correction is usually defined with the function bt,jsubscript𝑏𝑡𝑗b_{t,j}italic_b start_POSTSUBSCRIPT italic_t , italic_j end_POSTSUBSCRIPT in place of the constant Bt,jsubscript𝐵𝑡𝑗B_{t,j}italic_B start_POSTSUBSCRIPT italic_t , italic_j end_POSTSUBSCRIPT (and in fact, generally one would estimate Bt,jsubscript𝐵𝑡𝑗B_{t,j}italic_B start_POSTSUBSCRIPT italic_t , italic_j end_POSTSUBSCRIPT from data by computing bt,jsubscript𝑏𝑡𝑗b_{t,j}italic_b start_POSTSUBSCRIPT italic_t , italic_j end_POSTSUBSCRIPT). For technical reasons it is easier for us to work with Bt,jsubscript𝐵𝑡𝑗B_{t,j}italic_B start_POSTSUBSCRIPT italic_t , italic_j end_POSTSUBSCRIPT. As was previously noted in the literature [FVRS22, Remark 2.4], when the denoisers are well-behaved this is effectively without loss of generality because the iterates produced by using bt,jsubscript𝑏𝑡𝑗b_{t,j}italic_b start_POSTSUBSCRIPT italic_t , italic_j end_POSTSUBSCRIPT vs. Bt,jsubscript𝐵𝑡𝑗B_{t,j}italic_B start_POSTSUBSCRIPT italic_t , italic_j end_POSTSUBSCRIPT are o(1)𝑜1o(1)italic_o ( 1 )-close; we discuss this further in Appendix A.

Definition 2.2 (Pseudo-Lipschitz Functions).

A function φ:t:𝜑superscript𝑡\varphi:\mathbb{R}^{t}\rightarrow\mathbb{R}italic_φ : blackboard_R start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT → blackboard_R is called Pseudo Lipschitz of order k𝑘kitalic_k (or PL(k)PL𝑘\operatorname{PL}(k)roman_PL ( italic_k )) if

|φ(x)φ(y)|L(1+x2k1+y2k1)xy2𝜑𝑥𝜑𝑦𝐿1superscriptsubscriptnorm𝑥2𝑘1superscriptsubscriptnorm𝑦2𝑘1subscriptnorm𝑥𝑦2|\varphi(x)-\varphi(y)|\leqslant L\left(1+\|x\|_{2}^{k-1}+\|y\|_{2}^{k-1}% \right)\|x-y\|_{2}| italic_φ ( italic_x ) - italic_φ ( italic_y ) | ⩽ italic_L ( 1 + ∥ italic_x ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT + ∥ italic_y ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT ) ∥ italic_x - italic_y ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT

for all x,yt𝑥𝑦superscript𝑡x,y\in\mathbb{R}^{t}italic_x , italic_y ∈ blackboard_R start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT.

Note that a function is Lipschitz exactly when it is PL(1)PL1\operatorname{PL}(1)roman_PL ( 1 ), and a polynomial of degree k𝑘kitalic_k lies in PL(k)PL𝑘\operatorname{PL}(k)roman_PL ( italic_k ). By a slight abuse of notation, we will say that constants lie in PL(0)PL0\operatorname{PL}(0)roman_PL ( 0 ).

We will need information about the order statistics of the entries of our iterates, x(t)superscript𝑥𝑡x^{(t)}italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT. When we run AMP with polynomial denoiser functions, each iterate x(t)superscript𝑥𝑡x^{(t)}italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT is a symmetric (fixed by coordinate relabeling), vector-valued polynomial in the entries of X𝑋Xitalic_X. So each entry is a bounded-degree polynomial of independent subgaussian random variables.

While the entries of x(t)superscript𝑥𝑡x^{(t)}italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT are not independent, they are sufficiently close to independent that for simple functions g::𝑔g:\mathbb{R}\to\mathbb{R}italic_g : blackboard_R → blackboard_R, the average 1ni=1ng(xi(t))1𝑛superscriptsubscript𝑖1𝑛𝑔subscriptsuperscript𝑥𝑡𝑖\frac{1}{n}\sum_{i=1}^{n}g(x^{(t)}_{i})divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_g ( italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) concentrates fairly well around the expectation of g𝑔gitalic_g on a polynomial of Gaussians. The same is true when the denoiser functions are Lipschitz. This fact is known as “state evolution” in the AMP literature. In the next corollary, we state a useful consequence that will allow us to control the order statistics of our iterates.

Corollary 2.3.

Suppose that f:t+1:𝑓superscript𝑡1f:\mathbb{R}^{t+1}\rightarrow\mathbb{R}italic_f : blackboard_R start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT → blackboard_R is PL(k)PL𝑘\operatorname{PL}(k)roman_PL ( italic_k ) for k0𝑘0k\geqslant 0italic_k ⩾ 0 and g:t+1:𝑔superscript𝑡1g:\mathbb{R}^{t+1}\rightarrow\mathbb{R}italic_g : blackboard_R start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT → blackboard_R is PL()PL\operatorname{PL}(\ell)roman_PL ( roman_ℓ ) with g(0)=0𝑔00g(\vec{0})=0italic_g ( over→ start_ARG 0 end_ARG ) = 0 and 11\ell\geqslant 1roman_ℓ ⩾ 1. Suppose x=x(t)𝑥superscript𝑥𝑡\vec{x}=x^{(t)}over→ start_ARG italic_x end_ARG = italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT is an AMP iterate resulting from the application of Pseudo Lipschitz denoisers on input X𝑋Xitalic_X a symmetric matrix with i.i.d. O(1)n𝑂1𝑛\frac{O(1)}{\sqrt{n}}divide start_ARG italic_O ( 1 ) end_ARG start_ARG square-root start_ARG italic_n end_ARG end_ARG-subgaussian entries having mean 00 and variance 1n1𝑛\frac{1}{n}divide start_ARG 1 end_ARG start_ARG italic_n end_ARG. Furthermore, let C>0𝐶0C>0italic_C > 0 be a constant (possibly depending on t𝑡titalic_t). Then, the following hold:

  • For any rmax(t,k)much-greater-than𝑟𝑡𝑘r\gg\max(t,k)italic_r ≫ roman_max ( italic_t , italic_k ),

    plimn1ni=1nf(xi)2𝟏[g(xi)2>θ]1θrC2r(3r)r.subscriptplim𝑛1𝑛superscriptsubscript𝑖1𝑛𝑓superscriptsubscript𝑥𝑖21delimited-[]𝑔superscriptsubscript𝑥𝑖2𝜃1superscript𝜃𝑟superscript𝐶2𝑟superscript3𝑟𝑟\operatorname{\operatornamewithlimits{p-lim}}_{n\rightarrow\infty}\frac{1}{n}% \sum_{i=1}^{n}f(\vec{x}_{i})^{2}\bm{1}[g(\vec{x}_{i})^{2}>\theta]\leqslant% \frac{1}{\theta^{r}}\cdot C^{2r}\cdot(3\ell r)^{\ell r}.start_OPFUNCTION roman_p - roman_lim end_OPFUNCTION start_POSTSUBSCRIPT italic_n → ∞ end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_f ( over→ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT bold_1 [ italic_g ( over→ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT > italic_θ ] ⩽ divide start_ARG 1 end_ARG start_ARG italic_θ start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT end_ARG ⋅ italic_C start_POSTSUPERSCRIPT 2 italic_r end_POSTSUPERSCRIPT ⋅ ( 3 roman_ℓ italic_r ) start_POSTSUPERSCRIPT roman_ℓ italic_r end_POSTSUPERSCRIPT .
  • For every I[n]𝐼delimited-[]𝑛I\subseteq[n]italic_I ⊆ [ italic_n ] with |I|εn𝐼𝜀𝑛|I|\leqslant\varepsilon n| italic_I | ⩽ italic_ε italic_n,

    plimn1niIf(xi)2Cεlogk1ε.subscriptplim𝑛1𝑛subscript𝑖𝐼𝑓superscriptsubscript𝑥𝑖2𝐶𝜀superscript𝑘1𝜀\operatorname{\operatornamewithlimits{p-lim}}_{n\rightarrow\infty}\frac{1}{n}% \sum_{i\in I}f(\vec{x}_{i})^{2}\leqslant C\varepsilon\log^{k}\frac{1}{% \varepsilon}.start_OPFUNCTION roman_p - roman_lim end_OPFUNCTION start_POSTSUBSCRIPT italic_n → ∞ end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i ∈ italic_I end_POSTSUBSCRIPT italic_f ( over→ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⩽ italic_C italic_ε roman_log start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_ε end_ARG .

We prove this corollary in Appendix A.

Sometimes we will use the phrase “Almost-Triangle Inequality” to refer to the inequality (a+b)22a2+2b2superscript𝑎𝑏22superscript𝑎22superscript𝑏2(a+b)^{2}\leqslant 2a^{2}+2b^{2}( italic_a + italic_b ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⩽ 2 italic_a start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 italic_b start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT.

3 Making AMP robust to principal minor corruptions

In this section, we prove our main theorem.

Theorem 3.1 (Main Theorem).

Let \mathcal{F}caligraphic_F be an AMP iteration consisting of either Lipschitz or polynomial denoiser functions. Suppose that X𝑋Xitalic_X is a symmetric matrix with i.i.d. entries of mean 00, variance 1n1𝑛\frac{1}{n}divide start_ARG 1 end_ARG start_ARG italic_n end_ARG, and subgaussian parameter O(1)n𝑂1𝑛\frac{O(1)}{\sqrt{n}}divide start_ARG italic_O ( 1 ) end_ARG start_ARG square-root start_ARG italic_n end_ARG end_ARG. Let vAMP(X)subscript𝑣AMP𝑋v_{\mathrm{AMP}}(X)italic_v start_POSTSUBSCRIPT roman_AMP end_POSTSUBSCRIPT ( italic_X ) denote the output of the T𝑇Titalic_T-step AMP algorithm on input X𝑋Xitalic_X, and set d𝑑ditalic_d to be the degree of vAMP(X)subscript𝑣AMP𝑋v_{\mathrm{AMP}}(X)italic_v start_POSTSUBSCRIPT roman_AMP end_POSTSUBSCRIPT ( italic_X ) as a polynomial, or 1111 if the denoisers are Lipschitz.555This aligns with the pseudo-Lipschitz degree of vAMP(X)subscript𝑣AMP𝑋v_{\mathrm{AMP}}(X)italic_v start_POSTSUBSCRIPT roman_AMP end_POSTSUBSCRIPT ( italic_X ), which functions similarly to the degree as a polynomial. Then, with probability 1o(1)1𝑜11-o(1)1 - italic_o ( 1 ) over the choice of X𝑋Xitalic_X, Algorithm 3.4 run on any ε𝜀\varepsilonitalic_ε-principal minor corruption Y𝑌Yitalic_Y of X𝑋Xitalic_X, produces in time O(εn3logn)𝑂𝜀superscript𝑛3𝑛O(\varepsilon n^{3}\log n)italic_O ( italic_ε italic_n start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT roman_log italic_n ) a vector v^(Y)^𝑣𝑌\hat{v}(Y)over^ start_ARG italic_v end_ARG ( italic_Y ) which satisfies

v^(Y)vAMP(X)22O(εlogd1ε)vAMP(X)22.superscriptsubscriptnorm^𝑣𝑌subscript𝑣AMP𝑋22𝑂𝜀superscript𝑑1𝜀superscriptsubscriptnormsubscript𝑣AMP𝑋22\left\|\hat{v}(Y)-v_{\mathrm{AMP}}(X)\right\|_{2}^{2}\leqslant O\left(% \varepsilon\log^{d}\frac{1}{\varepsilon}\right)\cdot\|v_{\mathrm{AMP}}(X)\|_{2% }^{2}.∥ over^ start_ARG italic_v end_ARG ( italic_Y ) - italic_v start_POSTSUBSCRIPT roman_AMP end_POSTSUBSCRIPT ( italic_X ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⩽ italic_O ( italic_ε roman_log start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_ε end_ARG ) ⋅ ∥ italic_v start_POSTSUBSCRIPT roman_AMP end_POSTSUBSCRIPT ( italic_X ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

Our algorithm consists of a pre-processing step, followed by a “robust” simulation of AMP:

  1. 1.

    In the pre-processing step, we spectrally clean Y𝑌Yitalic_Y by removing rows and columns to produce a matrix Y^^𝑌\hat{Y}over^ start_ARG italic_Y end_ARG with Y^𝗈𝗉=O(1)subscriptnorm^𝑌𝗈𝗉𝑂1\|\hat{Y}\|_{\operatorname{\mathsf{op}}}=O(1)∥ over^ start_ARG italic_Y end_ARG ∥ start_POSTSUBSCRIPT sansserif_op end_POSTSUBSCRIPT = italic_O ( 1 ).

  2. 2.

    Then, we run AMP on Y^^𝑌\hat{Y}over^ start_ARG italic_Y end_ARG, but with the following modification: after each iteration, we clip the iterate (coordinate-wise) to ensure all coordinates have not-too-large an absolute value.

The following definitions will help us to describe our algorithm.

Definition 3.2.

For ε>0𝜀0\varepsilon>0italic_ε > 0, define 𝖼𝗎𝗍𝗈𝖿𝖿(ε)=CTlog1ε𝖼𝗎𝗍𝗈𝖿𝖿𝜀subscript𝐶𝑇1𝜀\operatorname{\mathsf{cutoff}}(\varepsilon)=\sqrt{C_{T}\log\frac{1}{% \varepsilon}}sansserif_cutoff ( italic_ε ) = square-root start_ARG italic_C start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT roman_log divide start_ARG 1 end_ARG start_ARG italic_ε end_ARG end_ARG for an appropriately large CTsubscript𝐶𝑇C_{T}italic_C start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT depending on T𝑇Titalic_T, the total number of AMP iterations.666In practice, CT=16subscript𝐶𝑇16C_{T}=16italic_C start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT = 16 is a reasonable value. The “ε𝜀\varepsilonitalic_ε-clip” of y𝑦y\in\mathbb{R}italic_y ∈ blackboard_R is now defined to be

𝖼𝗅𝗂𝗉ε(y)={y|y|𝖼𝗎𝗍𝗈𝖿𝖿(ε)𝗌𝗂𝗀𝗇(y)𝖼𝗎𝗍𝗈𝖿𝖿(ε)|y|>𝖼𝗎𝗍𝗈𝖿𝖿(ε)superscript𝖼𝗅𝗂𝗉𝜀𝑦cases𝑦𝑦𝖼𝗎𝗍𝗈𝖿𝖿𝜀𝗌𝗂𝗀𝗇𝑦𝖼𝗎𝗍𝗈𝖿𝖿𝜀𝑦𝖼𝗎𝗍𝗈𝖿𝖿𝜀\operatorname{\mathsf{clip}}^{\varepsilon}(y)=\begin{cases}y&|y|\leqslant% \operatorname{\mathsf{cutoff}}(\varepsilon)\\ \mathsf{sign}(y)\cdot\operatorname{\mathsf{cutoff}}(\varepsilon)&|y|>% \operatorname{\mathsf{cutoff}}(\varepsilon)\end{cases}sansserif_clip start_POSTSUPERSCRIPT italic_ε end_POSTSUPERSCRIPT ( italic_y ) = { start_ROW start_CELL italic_y end_CELL start_CELL | italic_y | ⩽ sansserif_cutoff ( italic_ε ) end_CELL end_ROW start_ROW start_CELL sansserif_sign ( italic_y ) ⋅ sansserif_cutoff ( italic_ε ) end_CELL start_CELL | italic_y | > sansserif_cutoff ( italic_ε ) end_CELL end_ROW
Definition 3.3 (Matrix restriction).

Given a matrix Yn×n𝑌superscript𝑛𝑛Y\in\mathbb{R}^{n\times n}italic_Y ∈ blackboard_R start_POSTSUPERSCRIPT italic_n × italic_n end_POSTSUPERSCRIPT, Y^^𝑌\hat{Y}over^ start_ARG italic_Y end_ARG is an ε𝜀\varepsilonitalic_ε-restriction if there exists a set S[n]𝑆delimited-[]𝑛S\subseteq[n]italic_S ⊆ [ italic_n ] with |S|εn𝑆𝜀𝑛|S|\leqslant\varepsilon n| italic_S | ⩽ italic_ε italic_n such that zeroing out the rows and columns of Y𝑌Yitalic_Y with indices in S𝑆Sitalic_S yields Y^^𝑌\hat{Y}over^ start_ARG italic_Y end_ARG.

Pictorially, this is as follows:

Y=[YS,SYS,S¯YS¯,SYS¯,S¯]Y^=[𝟎S,S𝟎S,S¯𝟎S¯,SYS¯,S¯].𝑌matrixsubscript𝑌𝑆𝑆subscript𝑌𝑆¯𝑆subscript𝑌¯𝑆𝑆subscript𝑌¯𝑆¯𝑆^𝑌matrixsubscript0𝑆𝑆subscript0𝑆¯𝑆subscript0¯𝑆𝑆subscript𝑌¯𝑆¯𝑆Y=\begin{bmatrix}Y_{S,S}&Y_{S,\overline{S}}\\ Y_{\overline{S},S}&Y_{\overline{S},\overline{S}}\end{bmatrix}\longrightarrow% \hat{Y}=\begin{bmatrix}\mathbf{0}_{S,S}&\mathbf{0}_{S,\overline{S}}\\ \mathbf{0}_{\overline{S},S}&Y_{\overline{S},\overline{S}}\end{bmatrix}.italic_Y = [ start_ARG start_ROW start_CELL italic_Y start_POSTSUBSCRIPT italic_S , italic_S end_POSTSUBSCRIPT end_CELL start_CELL italic_Y start_POSTSUBSCRIPT italic_S , over¯ start_ARG italic_S end_ARG end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_Y start_POSTSUBSCRIPT over¯ start_ARG italic_S end_ARG , italic_S end_POSTSUBSCRIPT end_CELL start_CELL italic_Y start_POSTSUBSCRIPT over¯ start_ARG italic_S end_ARG , over¯ start_ARG italic_S end_ARG end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] ⟶ over^ start_ARG italic_Y end_ARG = [ start_ARG start_ROW start_CELL bold_0 start_POSTSUBSCRIPT italic_S , italic_S end_POSTSUBSCRIPT end_CELL start_CELL bold_0 start_POSTSUBSCRIPT italic_S , over¯ start_ARG italic_S end_ARG end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL bold_0 start_POSTSUBSCRIPT over¯ start_ARG italic_S end_ARG , italic_S end_POSTSUBSCRIPT end_CELL start_CELL italic_Y start_POSTSUBSCRIPT over¯ start_ARG italic_S end_ARG , over¯ start_ARG italic_S end_ARG end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] .
Algorithm 3.4 (Robust AMP)

Input: A symmetric n×n𝑛𝑛n\times nitalic_n × italic_n matrix Y𝑌Yitalic_Y.

Operation:

  1. 1.

    Compute a restriction Y^^𝑌\hat{Y}over^ start_ARG italic_Y end_ARG of Y𝑌Yitalic_Y satisfying Y^𝗈𝗉5𝐄[X𝗈𝗉]subscriptnorm^𝑌𝗈𝗉5𝐄subscriptnorm𝑋𝗈𝗉\|\hat{Y}\|_{\operatorname{\mathsf{op}}}\leqslant 5\cdot\operatorname*{\mathbf% {E}}[\|X\|_{\operatorname{\mathsf{op}}}]∥ over^ start_ARG italic_Y end_ARG ∥ start_POSTSUBSCRIPT sansserif_op end_POSTSUBSCRIPT ⩽ 5 ⋅ bold_E [ ∥ italic_X ∥ start_POSTSUBSCRIPT sansserif_op end_POSTSUBSCRIPT ] using Algorithm 3.7.

  2. 2.

    For t=1,,T𝑡1𝑇t=1,\ldots,Titalic_t = 1 , … , italic_T, set y(t)superscript𝑦𝑡y^{(t)}italic_y start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT to be the clipped AMP iteration

    y(t)=𝖼𝗅𝗂𝗉ε(Y^ft(y(t1),,y(0))j=1tBt,jfj1(y(j1),,y(0))).superscript𝑦𝑡superscript𝖼𝗅𝗂𝗉𝜀^𝑌subscript𝑓𝑡superscript𝑦𝑡1superscript𝑦0superscriptsubscript𝑗1𝑡subscript𝐵𝑡𝑗subscript𝑓𝑗1superscript𝑦𝑗1superscript𝑦0y^{(t)}=\operatorname{\mathsf{clip}}^{\varepsilon}\left(\hat{Y}f_{t}(y^{(t-1)}% ,\ldots,y^{(0)})-\sum_{j=1}^{t}B_{t,j}\cdot f_{j-1}(y^{(j-1)},\ldots,y^{(0)})% \right).italic_y start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT = sansserif_clip start_POSTSUPERSCRIPT italic_ε end_POSTSUPERSCRIPT ( over^ start_ARG italic_Y end_ARG italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y start_POSTSUPERSCRIPT ( italic_t - 1 ) end_POSTSUPERSCRIPT , … , italic_y start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT ) - ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_B start_POSTSUBSCRIPT italic_t , italic_j end_POSTSUBSCRIPT ⋅ italic_f start_POSTSUBSCRIPT italic_j - 1 end_POSTSUBSCRIPT ( italic_y start_POSTSUPERSCRIPT ( italic_j - 1 ) end_POSTSUPERSCRIPT , … , italic_y start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT ) ) .

Output: The vector v^=y(T)^𝑣superscript𝑦𝑇\hat{v}=y^{(T)}over^ start_ARG italic_v end_ARG = italic_y start_POSTSUPERSCRIPT ( italic_T ) end_POSTSUPERSCRIPT.

Theorem 3.1 is a consequence of the following two lemmas, one for each step of Algorithm 3.4.

Lemma 3.5 (Efficient spectral cleaning).

Suppose X𝑋Xitalic_X is a symmetric n×n𝑛𝑛n\times nitalic_n × italic_n matrix with i.i.d. entries of mean zero, variance 1n1𝑛\frac{1}{n}divide start_ARG 1 end_ARG start_ARG italic_n end_ARG, and subgaussian parameter O(1)n𝑂1𝑛\frac{O(1)}{\sqrt{n}}divide start_ARG italic_O ( 1 ) end_ARG start_ARG square-root start_ARG italic_n end_ARG end_ARG. With probability 1o(1)1𝑜11-o(1)1 - italic_o ( 1 ) over X𝑋Xitalic_X, Algorithm 3.7 run on any ε𝜀\varepsilonitalic_ε-principal minor corruption Y𝑌Yitalic_Y of X𝑋Xitalic_X with threshold value K=5𝐄[X𝗈𝗉]𝐾5𝐄subscriptnorm𝑋𝗈𝗉K=5\operatorname*{\mathbf{E}}[\|X\|_{\operatorname{\mathsf{op}}}]italic_K = 5 bold_E [ ∥ italic_X ∥ start_POSTSUBSCRIPT sansserif_op end_POSTSUBSCRIPT ] outputs in time O(εn3logn)𝑂𝜀superscript𝑛3𝑛O(\varepsilon n^{3}\log n)italic_O ( italic_ε italic_n start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT roman_log italic_n ) a matrix Y^^𝑌\hat{Y}over^ start_ARG italic_Y end_ARG which is a 4ε4𝜀4\varepsilon4 italic_ε-restriction of Y𝑌Yitalic_Y and satisfies Y^𝗈𝗉=O(1)subscriptnorm^𝑌𝗈𝗉𝑂1\|\hat{Y}\|_{\operatorname{\mathsf{op}}}=O(1)∥ over^ start_ARG italic_Y end_ARG ∥ start_POSTSUBSCRIPT sansserif_op end_POSTSUBSCRIPT = italic_O ( 1 ).

Lemma 3.6 (Success of AMP on restrictions).

Suppose X𝑋Xitalic_X is an n×n𝑛𝑛n\times nitalic_n × italic_n matrix with i.i.d. entries of mean zero, variance 1n1𝑛\frac{1}{n}divide start_ARG 1 end_ARG start_ARG italic_n end_ARG and subgaussian parameter O(1)n𝑂1𝑛\frac{O(1)}{\sqrt{n}}divide start_ARG italic_O ( 1 ) end_ARG start_ARG square-root start_ARG italic_n end_ARG end_ARG-subgaussian entries. Suppose that Y𝑌Yitalic_Y is an ε𝜀\varepsilonitalic_ε-principal minor corruption of X𝑋Xitalic_X and Y^^𝑌\hat{Y}over^ start_ARG italic_Y end_ARG is a 4ε4𝜀4\varepsilon4 italic_ε-restriction of Y𝑌Yitalic_Y with Y^𝗈𝗉=O(1)subscriptnorm^𝑌𝗈𝗉𝑂1\|\hat{Y}\|_{\operatorname{\mathsf{op}}}=O(1)∥ over^ start_ARG italic_Y end_ARG ∥ start_POSTSUBSCRIPT sansserif_op end_POSTSUBSCRIPT = italic_O ( 1 ). Then the clipped AMP iteration from Algorithm 3.4 on Y^^𝑌\hat{Y}over^ start_ARG italic_Y end_ARG produces a vector v^^𝑣\hat{v}over^ start_ARG italic_v end_ARG such that v^vAMP(X)22O(εlogd1ε)vAMP(X)22superscriptsubscriptnorm^𝑣subscript𝑣AMP𝑋22𝑂𝜀superscript𝑑1𝜀superscriptsubscriptnormsubscript𝑣AMP𝑋22\|\hat{v}-v_{\mathrm{AMP}}(X)\|_{2}^{2}\leqslant O(\varepsilon\log^{d}\frac{1}% {\varepsilon})\|v_{\mathrm{AMP}}(X)\|_{2}^{2}∥ over^ start_ARG italic_v end_ARG - italic_v start_POSTSUBSCRIPT roman_AMP end_POSTSUBSCRIPT ( italic_X ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⩽ italic_O ( italic_ε roman_log start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_ε end_ARG ) ∥ italic_v start_POSTSUBSCRIPT roman_AMP end_POSTSUBSCRIPT ( italic_X ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT with probability 1on(1)1subscript𝑜𝑛11-o_{n}(1)1 - italic_o start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( 1 ) over the choice of X𝑋Xitalic_X.

When combined, these two lemmas immediately imply Theorem 3.1.

3.1 Spectral cleaning

The goal of this section is to prove Lemma 3.5. Here we present the algorithm to construct Y^^𝑌\hat{Y}over^ start_ARG italic_Y end_ARG which is a 4ε4𝜀4\varepsilon4 italic_ε-restriction of Y𝑌Yitalic_Y and has Y^𝗈𝗉=O(1)subscriptnorm^𝑌𝗈𝗉𝑂1\|\hat{Y}\|_{\operatorname{\mathsf{op}}}=O(1)∥ over^ start_ARG italic_Y end_ARG ∥ start_POSTSUBSCRIPT sansserif_op end_POSTSUBSCRIPT = italic_O ( 1 ).

Algorithm 3.7 (Spectral cleaning of principal minor corruptions)

Input: A symmetric n×n𝑛𝑛n\times nitalic_n × italic_n matrix Y𝑌Yitalic_Y, and a threshold value K0𝐾0K\geqslant 0italic_K ⩾ 0.

Operation:

  1. 1.

    Let Y^=Y^𝑌𝑌\hat{Y}=Yover^ start_ARG italic_Y end_ARG = italic_Y.

  2. 2.

    While Y^𝗈𝗉>Ksubscriptnorm^𝑌𝗈𝗉𝐾\|\hat{Y}\|_{\operatorname{\mathsf{op}}}>K∥ over^ start_ARG italic_Y end_ARG ∥ start_POSTSUBSCRIPT sansserif_op end_POSTSUBSCRIPT > italic_K:

    1. (a)

      Let v𝑣vitalic_v be the eigenvector of Y^^𝑌\hat{Y}over^ start_ARG italic_Y end_ARG with eigenvalue of largest magnitude.

    2. (b)

      Sample i[n]𝑖delimited-[]𝑛i\in[n]italic_i ∈ [ italic_n ] with probability vi2superscriptsubscript𝑣𝑖2v_{i}^{2}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT.

    3. (c)

      Zero out the i𝑖iitalic_i-th row and column of Y^^𝑌\hat{Y}over^ start_ARG italic_Y end_ARG.

Output: Matrix Y^^𝑌\hat{Y}over^ start_ARG italic_Y end_ARG.

Note that critically we do not require that we exactly recover the corrupted rows and columns: all that matters is that we remove the indices that contribute the most to the spectral corruption.

Proof of Lemma 3.5.

Certainly, the algorithm terminates since no index can be sampled more than once. We will show that with high probability, O(ε)n𝑂𝜀𝑛O(\varepsilon)nitalic_O ( italic_ε ) italic_n indices are removed. The runtime bound can be deduced from noting that we have to run power iteration at most once per index removal, each run taking O(n2logn)𝑂superscript𝑛2𝑛O(n^{2}\log n)italic_O ( italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_log italic_n ) time.

For convenience, let α=𝐄[X𝗈𝗉]𝛼𝐄subscriptnorm𝑋𝗈𝗉\alpha=\operatorname*{\mathbf{E}}[\|X\|_{\operatorname{\mathsf{op}}}]italic_α = bold_E [ ∥ italic_X ∥ start_POSTSUBSCRIPT sansserif_op end_POSTSUBSCRIPT ], and recall our threshold is K=5α𝐾5𝛼K=5\alphaitalic_K = 5 italic_α. Since X𝑋Xitalic_X has independent subgaussian entries, with high probability X𝗈𝗉α+o(1)subscriptnorm𝑋𝗈𝗉𝛼𝑜1\|X\|_{\operatorname{\mathsf{op}}}\leqslant\alpha+o(1)∥ italic_X ∥ start_POSTSUBSCRIPT sansserif_op end_POSTSUBSCRIPT ⩽ italic_α + italic_o ( 1 ). Let Q𝑄Qitalic_Q denote the set of corrupted indices in Y𝑌Yitalic_Y. Furthermore, let Y(0)=Y,Y(1),,Y(t)superscript𝑌0𝑌superscript𝑌1superscript𝑌𝑡Y^{(0)}=Y,Y^{(1)},\ldots,Y^{(t)}italic_Y start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT = italic_Y , italic_Y start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT , … , italic_Y start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT denote the matrix Y^^𝑌\hat{Y}over^ start_ARG italic_Y end_ARG after each iteration of the while loop. Similarly define E(t)superscript𝐸𝑡E^{(t)}italic_E start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT and Q(t)superscript𝑄𝑡Q^{(t)}italic_Q start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT (the non-zeroed out corrupted indices).

Note that if all indices in Q𝑄Qitalic_Q are removed, the while loop will terminate (it can terminate in other instances, but this is just one stopping condition). We show that with high probability we will reach Y(t)𝗈𝗉5αsubscriptnormsuperscript𝑌𝑡𝗈𝗉5𝛼\|Y^{(t)}\|_{\operatorname{\mathsf{op}}}\leqslant 5\alpha∥ italic_Y start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT sansserif_op end_POSTSUBSCRIPT ⩽ 5 italic_α within 4εn4𝜀𝑛4\varepsilon n4 italic_ε italic_n iterations (an thus Y^^𝑌\hat{Y}over^ start_ARG italic_Y end_ARG is a 4ε4𝜀4\varepsilon4 italic_ε-restriction of Y𝑌Yitalic_Y) using a win-win analysis: either we reach a small operator norm before removing all of Q𝑄Qitalic_Q or we remove all of Q𝑄Qitalic_Q (which implies the remaining matrix has norm α+o(1)absent𝛼𝑜1\leqslant\alpha+o(1)⩽ italic_α + italic_o ( 1 ), because it is a principal minor of X𝑋Xitalic_X). In particular, the crux of the argument is the following:

Claim 3.8

Let v𝑣vitalic_v be the top eigenvector of Y(t)superscript𝑌𝑡Y^{(t)}italic_Y start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT and suppose Y(t)𝗈𝗉>5αsubscriptnormsuperscript𝑌𝑡𝗈𝗉5𝛼\|Y^{(t)}\|_{\operatorname{\mathsf{op}}}>5\alpha∥ italic_Y start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT sansserif_op end_POSTSUBSCRIPT > 5 italic_α. Then with high probability over X𝑋Xitalic_X,

iQ(t)vi212o(1).subscript𝑖superscript𝑄𝑡superscriptsubscript𝑣𝑖212𝑜1\sum_{i\in Q^{(t)}}v_{i}^{2}\geqslant\frac{1}{2}-o(1).∑ start_POSTSUBSCRIPT italic_i ∈ italic_Q start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⩾ divide start_ARG 1 end_ARG start_ARG 2 end_ARG - italic_o ( 1 ) .

Note that this claim is equivalent to saying that at each iteration of the while loop there is at least a 1212\frac{1}{2}divide start_ARG 1 end_ARG start_ARG 2 end_ARG probability of removing some index from Q𝑄Qitalic_Q.

Proof of Claim.

Suppose that vY(t)v>5αsuperscript𝑣topsuperscript𝑌𝑡𝑣5𝛼v^{\top}Y^{(t)}v>5\alphaitalic_v start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_Y start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT italic_v > 5 italic_α (as opposed to vY(t)v<5αsuperscript𝑣topsuperscript𝑌𝑡𝑣5𝛼v^{\top}Y^{(t)}v<-5\alphaitalic_v start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_Y start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT italic_v < - 5 italic_α). Let v~~𝑣\tilde{v}over~ start_ARG italic_v end_ARG be v𝑣vitalic_v such that all indices outside of Q(t)superscript𝑄𝑡Q^{(t)}italic_Q start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT are set to zero. Our goal is to lower bound v~22superscriptsubscriptnorm~𝑣22\|\tilde{v}\|_{2}^{2}∥ over~ start_ARG italic_v end_ARG ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. Notice vE(t)v=v~E(t)v~superscript𝑣topsuperscript𝐸𝑡𝑣superscript~𝑣topsuperscript𝐸𝑡~𝑣v^{\top}E^{(t)}v=\tilde{v}^{\top}E^{(t)}\tilde{v}italic_v start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_E start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT italic_v = over~ start_ARG italic_v end_ARG start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_E start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT over~ start_ARG italic_v end_ARG by definition. Since v𝑣vitalic_v is the top eigenvector of Y(t)superscript𝑌𝑡Y^{(t)}italic_Y start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT,

vE(t)v=v(Y(t)X(t))vY(t)𝗈𝗉X(t)𝗈𝗉(E(t)𝗈𝗉X(t)𝗈𝗉)X(t)𝗈𝗉E(t)𝗈𝗉2αo(1)superscript𝑣topsuperscript𝐸𝑡𝑣superscript𝑣topsuperscript𝑌𝑡superscript𝑋𝑡𝑣subscriptnormsuperscript𝑌𝑡𝗈𝗉subscriptnormsuperscript𝑋𝑡𝗈𝗉subscriptnormsuperscript𝐸𝑡𝗈𝗉subscriptnormsuperscript𝑋𝑡𝗈𝗉subscriptnormsuperscript𝑋𝑡𝗈𝗉subscriptnormsuperscript𝐸𝑡𝗈𝗉2𝛼𝑜1v^{\top}E^{(t)}v=v^{\top}(Y^{(t)}-X^{(t)})v\geqslant\|Y^{(t)}\|_{\operatorname% {\mathsf{op}}}-\|X^{(t)}\|_{\operatorname{\mathsf{op}}}\geqslant(\|E^{(t)}\|_{% \operatorname{\mathsf{op}}}-\|X^{(t)}\|_{\operatorname{\mathsf{op}}})-\|X^{(t)% }\|_{\operatorname{\mathsf{op}}}\geqslant\|E^{(t)}\|_{\operatorname{\mathsf{op% }}}-2\alpha-o(1)italic_v start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_E start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT italic_v = italic_v start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_Y start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT - italic_X start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) italic_v ⩾ ∥ italic_Y start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT sansserif_op end_POSTSUBSCRIPT - ∥ italic_X start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT sansserif_op end_POSTSUBSCRIPT ⩾ ( ∥ italic_E start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT sansserif_op end_POSTSUBSCRIPT - ∥ italic_X start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT sansserif_op end_POSTSUBSCRIPT ) - ∥ italic_X start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT sansserif_op end_POSTSUBSCRIPT ⩾ ∥ italic_E start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT sansserif_op end_POSTSUBSCRIPT - 2 italic_α - italic_o ( 1 )

where in the second to last step we used that Y(t)=X(t)+E(t)superscript𝑌𝑡superscript𝑋𝑡superscript𝐸𝑡Y^{(t)}=X^{(t)}+E^{(t)}italic_Y start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT = italic_X start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT + italic_E start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT and in the last step we used that, since X(t)superscript𝑋𝑡X^{(t)}italic_X start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT is a principal minor of X𝑋Xitalic_X, X(t)𝗈𝗉X𝗈𝗉α+o(1)subscriptnormsuperscript𝑋𝑡𝗈𝗉subscriptnorm𝑋𝗈𝗉𝛼𝑜1\|X^{(t)}\|_{\operatorname{\mathsf{op}}}\leqslant\|X\|_{\operatorname{\mathsf{% op}}}\leqslant\alpha+o(1)∥ italic_X start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT sansserif_op end_POSTSUBSCRIPT ⩽ ∥ italic_X ∥ start_POSTSUBSCRIPT sansserif_op end_POSTSUBSCRIPT ⩽ italic_α + italic_o ( 1 ) w.h.p.

However, note that vE(t)vE(t)𝗈𝗉v~22superscript𝑣topsuperscript𝐸𝑡𝑣subscriptnormsuperscript𝐸𝑡𝗈𝗉superscriptsubscriptnorm~𝑣22v^{\top}E^{(t)}v\leqslant\|E^{(t)}\|_{\operatorname{\mathsf{op}}}\cdot\|\tilde% {v}\|_{2}^{2}italic_v start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_E start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT italic_v ⩽ ∥ italic_E start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT sansserif_op end_POSTSUBSCRIPT ⋅ ∥ over~ start_ARG italic_v end_ARG ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, which implies that v~2212α+o(1)E𝗈𝗉superscriptsubscriptnorm~𝑣2212𝛼𝑜1subscriptnorm𝐸𝗈𝗉\|\tilde{v}\|_{2}^{2}\geqslant 1-\frac{2\alpha+o(1)}{\|E\|_{\operatorname{% \mathsf{op}}}}∥ over~ start_ARG italic_v end_ARG ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⩾ 1 - divide start_ARG 2 italic_α + italic_o ( 1 ) end_ARG start_ARG ∥ italic_E ∥ start_POSTSUBSCRIPT sansserif_op end_POSTSUBSCRIPT end_ARG. Since E(t)𝗈𝗉Y(t)𝗈𝗉X(t)𝗈𝗉5α(αo(1))subscriptnormsuperscript𝐸𝑡𝗈𝗉subscriptnormsuperscript𝑌𝑡𝗈𝗉subscriptnormsuperscript𝑋𝑡𝗈𝗉5𝛼𝛼𝑜1\|E^{(t)}\|_{\operatorname{\mathsf{op}}}\geqslant\|Y^{(t)}\|_{\operatorname{% \mathsf{op}}}-\|X^{(t)}\|_{\operatorname{\mathsf{op}}}\geqslant 5\alpha-(% \alpha-o(1))∥ italic_E start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT sansserif_op end_POSTSUBSCRIPT ⩾ ∥ italic_Y start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT sansserif_op end_POSTSUBSCRIPT - ∥ italic_X start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT sansserif_op end_POSTSUBSCRIPT ⩾ 5 italic_α - ( italic_α - italic_o ( 1 ) ) by assumption, this implies that v~2212o(1)superscriptsubscriptnorm~𝑣2212𝑜1\|\tilde{v}\|_{2}^{2}\geqslant\frac{1}{2}-o(1)∥ over~ start_ARG italic_v end_ARG ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⩾ divide start_ARG 1 end_ARG start_ARG 2 end_ARG - italic_o ( 1 ). The proof of the case vYv<5αsuperscript𝑣top𝑌𝑣5𝛼v^{\top}Yv<-5\alphaitalic_v start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_Y italic_v < - 5 italic_α is identical up to a change of sign. ∎

To prove that our loop terminates in 4εn4𝜀𝑛4\varepsilon n4 italic_ε italic_n steps with high probability, define the stopping time τ=min{t0:Y(t)𝗈𝗉5α}𝜏:𝑡0subscriptnormsuperscript𝑌𝑡𝗈𝗉5𝛼\tau=\min\{t\geqslant 0:\|Y^{(t)}\|_{\operatorname{\mathsf{op}}}\leqslant 5\alpha\}italic_τ = roman_min { italic_t ⩾ 0 : ∥ italic_Y start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT sansserif_op end_POSTSUBSCRIPT ⩽ 5 italic_α }. Now, let Itsubscript𝐼𝑡I_{t}italic_I start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT denote the indicator of whether the index removed between Y(t)superscript𝑌𝑡Y^{(t)}italic_Y start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT and Y(t+1)superscript𝑌𝑡1Y^{(t+1)}italic_Y start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT was in Q𝑄Qitalic_Q, and note that each Itsubscript𝐼𝑡I_{t}italic_I start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT independently stochastically dominates a Bt𝖡𝖾𝗋(12o(1))similar-tosubscript𝐵𝑡𝖡𝖾𝗋12𝑜1B_{t}\sim\mathsf{Ber}(\frac{1}{2}-o(1))italic_B start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∼ sansserif_Ber ( divide start_ARG 1 end_ARG start_ARG 2 end_ARG - italic_o ( 1 ) ). Suppose that τ4εn𝜏4𝜀𝑛\tau\geqslant 4\varepsilon nitalic_τ ⩾ 4 italic_ε italic_n. Then, it follows that

j=14εnBt1εn,superscriptsubscript𝑗14𝜀𝑛subscript𝐵𝑡1𝜀𝑛\sum_{j=1}^{4\varepsilon n}B_{t-1}\leqslant\varepsilon n,∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 4 italic_ε italic_n end_POSTSUPERSCRIPT italic_B start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ⩽ italic_ε italic_n ,

which happens with exponentially small probability (this is equivalent to asking for the probability that 𝖡𝗂𝗇𝗈𝗆𝗂𝖺𝗅(4εn,12o(1))εn𝖡𝗂𝗇𝗈𝗆𝗂𝖺𝗅4𝜀𝑛12𝑜1𝜀𝑛\mathsf{Binomial}(4\varepsilon n,\frac{1}{2}-o(1))\leqslant\varepsilon nsansserif_Binomial ( 4 italic_ε italic_n , divide start_ARG 1 end_ARG start_ARG 2 end_ARG - italic_o ( 1 ) ) ⩽ italic_ε italic_n). Together, this implies that τ4εn𝜏4𝜀𝑛\tau\leqslant 4\varepsilon nitalic_τ ⩽ 4 italic_ε italic_n with high probability.∎

3.2 Analysis of clipped AMP on spectrally cleaned input

In this section we will prove Lemma 3.6. To begin, we examine the effect of a combination of principal minor and restriction corruptions. Suppose Y𝑌Yitalic_Y is an ε𝜀\varepsilonitalic_ε-principal minor corruption of X𝑋Xitalic_X, and suppose Y^^𝑌\hat{Y}over^ start_ARG italic_Y end_ARG is a 4ε4𝜀4\varepsilon4 italic_ε-restriction of Y𝑌Yitalic_Y. Let S𝑆Sitalic_S denote the set of rows in the support of YX𝑌𝑋Y-Xitalic_Y - italic_X, and let T𝑇Titalic_T denote the set of rows in the support of YY^𝑌^𝑌Y-\hat{Y}italic_Y - over^ start_ARG italic_Y end_ARG. For simplicity, let S=STsuperscript𝑆𝑆𝑇S^{\prime}=S\setminus Titalic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_S ∖ italic_T (the set of corrupted rows which are not removed by the restriction). Then, the matrix evolves as follows:

X=[XS,SXS,S¯XS¯,SXS,S]Y=[XS,S+ES,SXS,S¯XS¯,SXS,S]Y^=[𝟎T,T𝟎T,S𝟎T,TS¯𝟎S,TXS,S+ES,SXS,TS¯𝟎TS¯,TXTS¯,SXTS¯,TS¯].𝑋matrixsubscript𝑋𝑆𝑆subscript𝑋𝑆¯𝑆subscript𝑋¯𝑆𝑆subscript𝑋𝑆𝑆𝑌matrixsubscript𝑋𝑆𝑆subscript𝐸𝑆𝑆subscript𝑋𝑆¯𝑆subscript𝑋¯𝑆𝑆subscript𝑋𝑆𝑆^𝑌matrixsubscript0𝑇𝑇subscript0𝑇superscript𝑆subscript0𝑇¯𝑇𝑆subscript0superscript𝑆𝑇subscript𝑋superscript𝑆superscript𝑆subscript𝐸superscript𝑆superscript𝑆subscript𝑋superscript𝑆¯𝑇𝑆subscript0¯𝑇𝑆𝑇subscript𝑋¯𝑇𝑆superscript𝑆subscript𝑋¯𝑇𝑆¯𝑇𝑆X=\begin{bmatrix}X_{S,S}&X_{S,\overline{S}}\\ X_{\overline{S},S}&X_{S,S}\end{bmatrix}\longrightarrow Y=\begin{bmatrix}X_{S,S% }+E_{S,S}&X_{S,\overline{S}}\\ X_{\overline{S},S}&X_{S,S}\end{bmatrix}\longrightarrow\hat{Y}=\begin{bmatrix}% \mathbf{0}_{T,T}&\mathbf{0}_{T,S^{\prime}}&\mathbf{0}_{T,\overline{T\cup S}}\\ \mathbf{0}_{S^{\prime},T}&X_{S^{\prime},S^{\prime}}+E_{S^{\prime},S^{\prime}}&% X_{S^{\prime},\overline{T\cup S}}\\ \mathbf{0}_{\overline{T\cup S},T}&X_{\overline{T\cup S},S^{\prime}}&X_{% \overline{T\cup S},\overline{T\cup S}}\end{bmatrix}.italic_X = [ start_ARG start_ROW start_CELL italic_X start_POSTSUBSCRIPT italic_S , italic_S end_POSTSUBSCRIPT end_CELL start_CELL italic_X start_POSTSUBSCRIPT italic_S , over¯ start_ARG italic_S end_ARG end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_X start_POSTSUBSCRIPT over¯ start_ARG italic_S end_ARG , italic_S end_POSTSUBSCRIPT end_CELL start_CELL italic_X start_POSTSUBSCRIPT italic_S , italic_S end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] ⟶ italic_Y = [ start_ARG start_ROW start_CELL italic_X start_POSTSUBSCRIPT italic_S , italic_S end_POSTSUBSCRIPT + italic_E start_POSTSUBSCRIPT italic_S , italic_S end_POSTSUBSCRIPT end_CELL start_CELL italic_X start_POSTSUBSCRIPT italic_S , over¯ start_ARG italic_S end_ARG end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_X start_POSTSUBSCRIPT over¯ start_ARG italic_S end_ARG , italic_S end_POSTSUBSCRIPT end_CELL start_CELL italic_X start_POSTSUBSCRIPT italic_S , italic_S end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] ⟶ over^ start_ARG italic_Y end_ARG = [ start_ARG start_ROW start_CELL bold_0 start_POSTSUBSCRIPT italic_T , italic_T end_POSTSUBSCRIPT end_CELL start_CELL bold_0 start_POSTSUBSCRIPT italic_T , italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_CELL start_CELL bold_0 start_POSTSUBSCRIPT italic_T , over¯ start_ARG italic_T ∪ italic_S end_ARG end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL bold_0 start_POSTSUBSCRIPT italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_T end_POSTSUBSCRIPT end_CELL start_CELL italic_X start_POSTSUBSCRIPT italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT + italic_E start_POSTSUBSCRIPT italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_CELL start_CELL italic_X start_POSTSUBSCRIPT italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , over¯ start_ARG italic_T ∪ italic_S end_ARG end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL bold_0 start_POSTSUBSCRIPT over¯ start_ARG italic_T ∪ italic_S end_ARG , italic_T end_POSTSUBSCRIPT end_CELL start_CELL italic_X start_POSTSUBSCRIPT over¯ start_ARG italic_T ∪ italic_S end_ARG , italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_CELL start_CELL italic_X start_POSTSUBSCRIPT over¯ start_ARG italic_T ∪ italic_S end_ARG , over¯ start_ARG italic_T ∪ italic_S end_ARG end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] .

In particular, if we let E𝐸Eitalic_E be the portion of the error matrix YX𝑌𝑋Y-Xitalic_Y - italic_X which survives the restriction, and then let F𝐹Fitalic_F be the remainder in Y^=X+E+F^𝑌𝑋𝐸𝐹\hat{Y}=X+E+Fover^ start_ARG italic_Y end_ARG = italic_X + italic_E + italic_F, it follows that Fi,jsubscript𝐹𝑖𝑗F_{i,j}italic_F start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT is either Xi,jsubscript𝑋𝑖𝑗-X_{i,j}- italic_X start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT or 00. Furthermore, we will split F𝐹Fitalic_F into two sections: F1|T|×nsubscript𝐹1superscript𝑇𝑛F_{1}\in\mathbb{R}^{|T|\times n}italic_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT | italic_T | × italic_n end_POSTSUPERSCRIPT consisting of all entries in rows indexed by T𝑇Titalic_T (in other words, F1=XT,[n]subscript𝐹1subscript𝑋𝑇delimited-[]𝑛F_{1}=-X_{T,[n]}italic_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = - italic_X start_POSTSUBSCRIPT italic_T , [ italic_n ] end_POSTSUBSCRIPT), and F2(n|T|)×|T|subscript𝐹2superscript𝑛𝑇𝑇F_{2}\in\mathbb{R}^{(n-|T|)\times|T|}italic_F start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT ( italic_n - | italic_T | ) × | italic_T | end_POSTSUPERSCRIPT consisting of all entries in columns indexed by T𝑇Titalic_T, except those covered by F1subscript𝐹1F_{1}italic_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT (in other words, F2=XT¯,T)F_{2}=-X_{\overline{T},T})italic_F start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = - italic_X start_POSTSUBSCRIPT over¯ start_ARG italic_T end_ARG , italic_T end_POSTSUBSCRIPT ). Pictorially, this can be represented via

Y^X=[(X)T,T(X)T,S(X)T,TS¯(X)S,TES,S𝟎S,TS¯(X)TS¯,T𝟎TS¯,S𝟎TS¯,TS¯]=[F1F1F1F2ES,S𝟎S,TS¯F2𝟎TS¯,S𝟎TS¯,TS¯].^𝑌𝑋matrixsubscript𝑋𝑇𝑇subscript𝑋𝑇superscript𝑆subscript𝑋𝑇¯𝑇𝑆subscript𝑋superscript𝑆𝑇subscript𝐸superscript𝑆superscript𝑆subscript0superscript𝑆¯𝑇𝑆subscript𝑋¯𝑇𝑆𝑇subscript0¯𝑇𝑆superscript𝑆subscript0¯𝑇𝑆¯𝑇𝑆matrixsubscript𝐹1subscript𝐹1subscript𝐹1subscript𝐹2subscript𝐸superscript𝑆superscript𝑆subscript0superscript𝑆¯𝑇𝑆subscript𝐹2subscript0¯𝑇𝑆superscript𝑆subscript0¯𝑇𝑆¯𝑇𝑆\hat{Y}-X=\begin{bmatrix}\mathbf{(}-X)_{T,T}&(-X)_{T,S^{\prime}}&\mathbf{(}-X)% _{T,\overline{T\cup S}}\\ \mathbf{(}-X)_{S^{\prime},T}&E_{S^{\prime},S^{\prime}}&\mathbf{0}_{S^{\prime},% \overline{T\cup S}}\\ \mathbf{(}-X)_{\overline{T\cup S},T}&\mathbf{0}_{\overline{T\cup S},S^{\prime}% }&\mathbf{0}_{\overline{T\cup S},\overline{T\cup S}}\end{bmatrix}=\begin{% bmatrix}F_{1}&F_{1}&F_{1}\\ F_{2}&E_{S^{\prime},S^{\prime}}&\mathbf{0}_{S^{\prime},\overline{T\cup S}}\\ F_{2}&\mathbf{0}_{\overline{T\cup S},S^{\prime}}&\mathbf{0}_{\overline{T\cup S% },\overline{T\cup S}}\end{bmatrix}.over^ start_ARG italic_Y end_ARG - italic_X = [ start_ARG start_ROW start_CELL ( - italic_X ) start_POSTSUBSCRIPT italic_T , italic_T end_POSTSUBSCRIPT end_CELL start_CELL ( - italic_X ) start_POSTSUBSCRIPT italic_T , italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_CELL start_CELL ( - italic_X ) start_POSTSUBSCRIPT italic_T , over¯ start_ARG italic_T ∪ italic_S end_ARG end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL ( - italic_X ) start_POSTSUBSCRIPT italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_T end_POSTSUBSCRIPT end_CELL start_CELL italic_E start_POSTSUBSCRIPT italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_CELL start_CELL bold_0 start_POSTSUBSCRIPT italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , over¯ start_ARG italic_T ∪ italic_S end_ARG end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL ( - italic_X ) start_POSTSUBSCRIPT over¯ start_ARG italic_T ∪ italic_S end_ARG , italic_T end_POSTSUBSCRIPT end_CELL start_CELL bold_0 start_POSTSUBSCRIPT over¯ start_ARG italic_T ∪ italic_S end_ARG , italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_CELL start_CELL bold_0 start_POSTSUBSCRIPT over¯ start_ARG italic_T ∪ italic_S end_ARG , over¯ start_ARG italic_T ∪ italic_S end_ARG end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] = [ start_ARG start_ROW start_CELL italic_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_CELL start_CELL italic_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_CELL start_CELL italic_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_F start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_CELL start_CELL italic_E start_POSTSUBSCRIPT italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_CELL start_CELL bold_0 start_POSTSUBSCRIPT italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , over¯ start_ARG italic_T ∪ italic_S end_ARG end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_F start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_CELL start_CELL bold_0 start_POSTSUBSCRIPT over¯ start_ARG italic_T ∪ italic_S end_ARG , italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_CELL start_CELL bold_0 start_POSTSUBSCRIPT over¯ start_ARG italic_T ∪ italic_S end_ARG , over¯ start_ARG italic_T ∪ italic_S end_ARG end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] .

As a warm-up, we show that each of these quantities is bounded in operator norm.

Proposition 3.9.

Suppose that Y^𝗈𝗉5𝐄[X𝗈𝗉]=:5α\|\hat{Y}\|_{\operatorname{\mathsf{op}}}\leqslant 5\operatorname*{\mathbf{E}}[% \|X\|_{\operatorname{\mathsf{op}}}]=:5\alpha∥ over^ start_ARG italic_Y end_ARG ∥ start_POSTSUBSCRIPT sansserif_op end_POSTSUBSCRIPT ⩽ 5 bold_E [ ∥ italic_X ∥ start_POSTSUBSCRIPT sansserif_op end_POSTSUBSCRIPT ] = : 5 italic_α. For the above definitions of E,F1,F2𝐸subscript𝐹1subscript𝐹2E,F_{1},F_{2}italic_E , italic_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_F start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, we have that

E𝗈𝗉6αandF1𝗈𝗉,F2𝗈𝗉2αformulae-sequencesubscriptnorm𝐸𝗈𝗉6𝛼andsubscriptnormsubscript𝐹1𝗈𝗉subscriptnormsubscript𝐹2𝗈𝗉2𝛼\|E\|_{\operatorname{\mathsf{op}}}\leqslant 6\alpha\qquad\mathrm{and}\qquad\|F% _{1}\|_{\operatorname{\mathsf{op}}},\|F_{2}\|_{\operatorname{\mathsf{op}}}% \leqslant 2\alpha∥ italic_E ∥ start_POSTSUBSCRIPT sansserif_op end_POSTSUBSCRIPT ⩽ 6 italic_α roman_and ∥ italic_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT sansserif_op end_POSTSUBSCRIPT , ∥ italic_F start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT sansserif_op end_POSTSUBSCRIPT ⩽ 2 italic_α

with high probability.

Proof.

Let us begin with F1subscript𝐹1F_{1}italic_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. Begin by considering F1subscript𝐹1F_{1}italic_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, which has each entry Fijsubscript𝐹𝑖𝑗F_{ij}italic_F start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT an independent subgaussian random variable. Applying standard matrix concentration arguments (e.g. Theorem 4.6.1 in [Ver18]), we have with high probability that F1𝗈𝗉α(1+O(|T|/n))4αsubscriptnormsubscript𝐹1𝗈𝗉𝛼1𝑂𝑇𝑛4𝛼\|F_{1}\|_{\operatorname{\mathsf{op}}}\leqslant\alpha(1+O(\sqrt{|T|/n}))% \leqslant 4\alpha∥ italic_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT sansserif_op end_POSTSUBSCRIPT ⩽ italic_α ( 1 + italic_O ( square-root start_ARG | italic_T | / italic_n end_ARG ) ) ⩽ 4 italic_α. We can apply a similar argument to see that F2𝗈𝗉4αsubscriptnormsubscript𝐹2𝗈𝗉4𝛼\|F_{2}\|_{\operatorname{\mathsf{op}}}\leqslant 4\alpha∥ italic_F start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT sansserif_op end_POSTSUBSCRIPT ⩽ 4 italic_α as well.

Now, consider Y^T¯,T¯=XT¯,T¯+Esubscript^𝑌¯𝑇¯𝑇subscript𝑋¯𝑇¯𝑇𝐸\hat{Y}_{\overline{T},\overline{T}}=X_{\overline{T},\overline{T}}+Eover^ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT over¯ start_ARG italic_T end_ARG , over¯ start_ARG italic_T end_ARG end_POSTSUBSCRIPT = italic_X start_POSTSUBSCRIPT over¯ start_ARG italic_T end_ARG , over¯ start_ARG italic_T end_ARG end_POSTSUBSCRIPT + italic_E. We then have that

E𝗈𝗉Y^T¯,T¯𝗈𝗉+XT¯,T¯𝗈𝗉Y^𝗈𝗉+X𝗈𝗉6αsubscriptnorm𝐸𝗈𝗉subscriptnormsubscript^𝑌¯𝑇¯𝑇𝗈𝗉subscriptnormsubscript𝑋¯𝑇¯𝑇𝗈𝗉subscriptnorm^𝑌𝗈𝗉subscriptnorm𝑋𝗈𝗉6𝛼\left\|E\right\|_{\operatorname{\mathsf{op}}}\leqslant\left\|\hat{Y}_{% \overline{T},\overline{T}}\right\|_{\operatorname{\mathsf{op}}}+\left\|X_{% \overline{T},\overline{T}}\right\|_{\operatorname{\mathsf{op}}}\leqslant\|\hat% {Y}\|_{\operatorname{\mathsf{op}}}+\|X\|_{\operatorname{\mathsf{op}}}\leqslant 6\alpha∥ italic_E ∥ start_POSTSUBSCRIPT sansserif_op end_POSTSUBSCRIPT ⩽ ∥ over^ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT over¯ start_ARG italic_T end_ARG , over¯ start_ARG italic_T end_ARG end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT sansserif_op end_POSTSUBSCRIPT + ∥ italic_X start_POSTSUBSCRIPT over¯ start_ARG italic_T end_ARG , over¯ start_ARG italic_T end_ARG end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT sansserif_op end_POSTSUBSCRIPT ⩽ ∥ over^ start_ARG italic_Y end_ARG ∥ start_POSTSUBSCRIPT sansserif_op end_POSTSUBSCRIPT + ∥ italic_X ∥ start_POSTSUBSCRIPT sansserif_op end_POSTSUBSCRIPT ⩽ 6 italic_α

as the operator norm of a principal minor is at most that of the original matrix. ∎

With this in mind, we are ready to prove Lemma 3.6, which we reprint here for clarity.

Lemma (Restatement of Lemma 3.6).

Suppose X𝑋Xitalic_X is an n×n𝑛𝑛n\times nitalic_n × italic_n matrix with i.i.d. entries of mean zero, variance 1n1𝑛\frac{1}{n}divide start_ARG 1 end_ARG start_ARG italic_n end_ARG and subgaussian parameter O(1)n𝑂1𝑛\frac{O(1)}{\sqrt{n}}divide start_ARG italic_O ( 1 ) end_ARG start_ARG square-root start_ARG italic_n end_ARG end_ARG-subgaussian entries. Suppose that Y𝑌Yitalic_Y is an ε𝜀\varepsilonitalic_ε-principal minor corruption of X𝑋Xitalic_X and Y^^𝑌\hat{Y}over^ start_ARG italic_Y end_ARG is a 4ε4𝜀4\varepsilon4 italic_ε-restriction of Y𝑌Yitalic_Y with Y^𝗈𝗉=O(1)subscriptnorm^𝑌𝗈𝗉𝑂1\|\hat{Y}\|_{\operatorname{\mathsf{op}}}=O(1)∥ over^ start_ARG italic_Y end_ARG ∥ start_POSTSUBSCRIPT sansserif_op end_POSTSUBSCRIPT = italic_O ( 1 ). Then the clipped AMP iteration from Algorithm 3.4 on Y^^𝑌\hat{Y}over^ start_ARG italic_Y end_ARG produces a vector v^^𝑣\hat{v}over^ start_ARG italic_v end_ARG such that v^vAMP(X)22O(εlogd1ε)vAMP(X)22superscriptsubscriptnorm^𝑣subscript𝑣AMP𝑋22𝑂𝜀superscript𝑑1𝜀superscriptsubscriptnormsubscript𝑣AMP𝑋22\|\hat{v}-v_{\mathrm{AMP}}(X)\|_{2}^{2}\leqslant O(\varepsilon\log^{d}\frac{1}% {\varepsilon})\|v_{\mathrm{AMP}}(X)\|_{2}^{2}∥ over^ start_ARG italic_v end_ARG - italic_v start_POSTSUBSCRIPT roman_AMP end_POSTSUBSCRIPT ( italic_X ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⩽ italic_O ( italic_ε roman_log start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_ε end_ARG ) ∥ italic_v start_POSTSUBSCRIPT roman_AMP end_POSTSUBSCRIPT ( italic_X ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT with probability 1on(1)1subscript𝑜𝑛11-o_{n}(1)1 - italic_o start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( 1 ) over the choice of X𝑋Xitalic_X.

The proof follows from a few central claims. The first of these shows that clipping cannot substantially change how far we are from the true AMP iteration.

Proposition 3.10 (Clipping preserves error).

Define y~(t)superscript~𝑦𝑡\widetilde{y}^{(t)}over~ start_ARG italic_y end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT to be the unclipped version of y(t)superscript𝑦𝑡y^{(t)}italic_y start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT (that is, the inner expression passed to 𝖼𝗅𝗂𝗉ε()superscript𝖼𝗅𝗂𝗉𝜀\operatorname{\mathsf{clip}}^{\varepsilon}(\cdot)sansserif_clip start_POSTSUPERSCRIPT italic_ε end_POSTSUPERSCRIPT ( ⋅ )). Then,

y(t)x(t)2y~(t)x(t)2+εnsubscriptnormsuperscript𝑦𝑡superscript𝑥𝑡2subscriptnormsuperscript~𝑦𝑡superscript𝑥𝑡2𝜀𝑛\|y^{(t)}-x^{(t)}\|_{2}\leqslant\|\widetilde{y}^{(t)}-x^{(t)}\|_{2}+\sqrt{% \varepsilon n}∥ italic_y start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT - italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⩽ ∥ over~ start_ARG italic_y end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT - italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + square-root start_ARG italic_ε italic_n end_ARG

with probability 1on(1)1subscript𝑜𝑛11-o_{n}(1)1 - italic_o start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( 1 ).

The next proposition aims to show that even though E,F1𝐸subscript𝐹1E,F_{1}italic_E , italic_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, and F2subscript𝐹2F_{2}italic_F start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT have constant operator norms, their row (or column) sparsity allow for controlling their effect on AMP iterates. Here we also introduce the shorthand ft(x)ft(x(t1),x(t2),,x(0))subscript𝑓𝑡𝑥subscript𝑓𝑡superscript𝑥𝑡1superscript𝑥𝑡2superscript𝑥0f_{t}(x)\triangleq f_{t}(x^{(t-1)},x^{(t-2)},\ldots,x^{(0)})italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x ) ≜ italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ( italic_t - 1 ) end_POSTSUPERSCRIPT , italic_x start_POSTSUPERSCRIPT ( italic_t - 2 ) end_POSTSUPERSCRIPT , … , italic_x start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT ).

Proposition 3.11 (Block-sparse corruptions have small error).

Suppose that ftPL(dt)subscript𝑓𝑡PLsubscript𝑑𝑡f_{t}\in\operatorname{PL}(d_{t})italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ roman_PL ( italic_d start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) and define d¯t=maxjtdjsubscript¯𝑑𝑡subscript𝑗𝑡subscript𝑑𝑗\overline{d}_{t}=\max_{j\leqslant t}d_{j}over¯ start_ARG italic_d end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = roman_max start_POSTSUBSCRIPT italic_j ⩽ italic_t end_POSTSUBSCRIPT italic_d start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT. There exists a constant C>0𝐶0C>0italic_C > 0 (independent of n𝑛nitalic_n and ε𝜀\varepsilonitalic_ε but possibly dependent on t𝑡titalic_t) such that each of Eft(x)22,F1ft(x)22superscriptsubscriptnorm𝐸subscript𝑓𝑡𝑥22superscriptsubscriptnormsubscript𝐹1subscript𝑓𝑡𝑥22\|Ef_{t}(x)\|_{2}^{2},\|F_{1}f_{t}(x)\|_{2}^{2}∥ italic_E italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , ∥ italic_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, and F2ft(x)22superscriptsubscriptnormsubscript𝐹2subscript𝑓𝑡𝑥22\|F_{2}f_{t}(x)\|_{2}^{2}∥ italic_F start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT are bounded by Cεnlogd¯t1ε𝐶𝜀𝑛superscriptsubscript¯𝑑𝑡1𝜀C\varepsilon n\cdot\log^{\overline{d}_{t}}\frac{1}{\varepsilon}italic_C italic_ε italic_n ⋅ roman_log start_POSTSUPERSCRIPT over¯ start_ARG italic_d end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_ε end_ARG with probability 1on(1)1subscript𝑜𝑛11-o_{n}(1)1 - italic_o start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( 1 ).

The final proposition aims to show that applying polynomials to clipped AMP iterates cannot dramatically change closeness. Note that this is not true in general and requires both boundedness and state evolution to hold in our case.

Proposition 3.12 (Pseudo-Lipschitz functions preserve closeness of AMP iterates).

Suppose that ftPL(dt)subscript𝑓𝑡PLsubscript𝑑𝑡f_{t}\in\operatorname{PL}(d_{t})italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ roman_PL ( italic_d start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ), and let M=max0i<ty(i)x(i)22𝑀subscript0𝑖𝑡superscriptsubscriptnormsuperscript𝑦𝑖superscript𝑥𝑖22M=\max_{0\leqslant i<t}\|y^{(i)}-x^{(i)}\|_{2}^{2}italic_M = roman_max start_POSTSUBSCRIPT 0 ⩽ italic_i < italic_t end_POSTSUBSCRIPT ∥ italic_y start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT - italic_x start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. Then, there exists a constant CT>0subscript𝐶𝑇0C_{T}>0italic_C start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT > 0 (independent of n𝑛nitalic_n and ε𝜀\varepsilonitalic_ε but dependent on T𝑇Titalic_T) such that

ft(y)ft(x)22M(CTt)dtlogdt1(1ε)+tεnsuperscriptsubscriptnormsubscript𝑓𝑡𝑦subscript𝑓𝑡𝑥22𝑀superscriptsubscript𝐶𝑇𝑡subscript𝑑𝑡superscriptsubscript𝑑𝑡11𝜀𝑡𝜀𝑛\left\|f_{t}(y)-f_{t}(x)\right\|_{2}^{2}\leqslant M(C_{T}t)^{d_{t}}\cdot\log^{% d_{t}-1}\left(\tfrac{1}{\varepsilon}\right)+t\cdot\varepsilon n∥ italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y ) - italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⩽ italic_M ( italic_C start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT italic_t ) start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ⋅ roman_log start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT ( divide start_ARG 1 end_ARG start_ARG italic_ε end_ARG ) + italic_t ⋅ italic_ε italic_n

with probability 1on(1)1subscript𝑜𝑛11-o_{n}(1)1 - italic_o start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( 1 ).

Together, these three propositions allow us to prove the lemma.

Proof of Lemma 3.6.

We prove by induction on the iteration t𝑡titalic_t. Certainly, y(0)=x(0)=1superscript𝑦0superscript𝑥01y^{(0)}=x^{(0)}=\vec{1}italic_y start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT = italic_x start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT = over→ start_ARG 1 end_ARG so the base case is complete. Else, suppose we have proven the statement for all k<t𝑘𝑡k<titalic_k < italic_t. We prove for t𝑡titalic_t.

By Proposition 3.10, we have that y(t)x(t)22y~(t)x(t)22+εnsuperscriptsubscriptnormsuperscript𝑦𝑡superscript𝑥𝑡22superscriptsubscriptnormsuperscript~𝑦𝑡superscript𝑥𝑡22𝜀𝑛\|y^{(t)}-x^{(t)}\|_{2}^{2}\leqslant\|\widetilde{y}^{(t)}-x^{(t)}\|_{2}^{2}+\varepsilon n∥ italic_y start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT - italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⩽ ∥ over~ start_ARG italic_y end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT - italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_ε italic_n, so let us handle this first term. To decrease verbiage, let 𝖬𝖠𝖷=max1jtfj(y)fj(x)2𝖬𝖠𝖷subscript1𝑗𝑡subscriptnormsubscript𝑓𝑗𝑦subscript𝑓𝑗𝑥2\mathsf{MAX}=\max\limits_{1\leqslant j\leqslant t}\|f_{j}(y)-f_{j}(x)\|_{2}sansserif_MAX = roman_max start_POSTSUBSCRIPT 1 ⩽ italic_j ⩽ italic_t end_POSTSUBSCRIPT ∥ italic_f start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_y ) - italic_f start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_x ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. By the Triangle Inequality and the definition of the AMP iteration, we have that

y~(t)x(t)2subscriptnormsuperscript~𝑦𝑡superscript𝑥𝑡2\displaystyle\|\widetilde{y}^{(t)}-x^{(t)}\|_{2}∥ over~ start_ARG italic_y end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT - italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT =Y^ft(y)Xft(x)+j=1tBt,j(fj1(y)fj1(x))2absentsubscriptnorm^𝑌subscript𝑓𝑡𝑦𝑋subscript𝑓𝑡𝑥superscriptsubscript𝑗1𝑡subscript𝐵𝑡𝑗subscript𝑓𝑗1𝑦subscript𝑓𝑗1𝑥2\displaystyle=\left\|\hat{Y}f_{t}(y)-Xf_{t}(x)+\sum_{j=1}^{t}B_{t,j}\left(f_{j% -1}(y)-f_{j-1}(x)\right)\right\|_{2}= ∥ over^ start_ARG italic_Y end_ARG italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y ) - italic_X italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x ) + ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_B start_POSTSUBSCRIPT italic_t , italic_j end_POSTSUBSCRIPT ( italic_f start_POSTSUBSCRIPT italic_j - 1 end_POSTSUBSCRIPT ( italic_y ) - italic_f start_POSTSUBSCRIPT italic_j - 1 end_POSTSUBSCRIPT ( italic_x ) ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT
Y^(ft(y)ft(x)+ft(x))Xft(x)2+j=1t|Bt,j|fj1(y)fj1(x)2absentsubscriptnorm^𝑌subscript𝑓𝑡𝑦subscript𝑓𝑡𝑥subscript𝑓𝑡𝑥𝑋subscript𝑓𝑡𝑥2superscriptsubscript𝑗1𝑡subscript𝐵𝑡𝑗subscriptnormsubscript𝑓𝑗1𝑦subscript𝑓𝑗1𝑥2\displaystyle\leqslant\left\|\hat{Y}(f_{t}(y)-f_{t}(x)+f_{t}(x))-Xf_{t}(x)% \right\|_{2}+\sum_{j=1}^{t}|B_{t,j}|\left\|f_{j-1}(y)-f_{j-1}(x)\right\|_{2}⩽ ∥ over^ start_ARG italic_Y end_ARG ( italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y ) - italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x ) + italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x ) ) - italic_X italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT | italic_B start_POSTSUBSCRIPT italic_t , italic_j end_POSTSUBSCRIPT | ∥ italic_f start_POSTSUBSCRIPT italic_j - 1 end_POSTSUBSCRIPT ( italic_y ) - italic_f start_POSTSUBSCRIPT italic_j - 1 end_POSTSUBSCRIPT ( italic_x ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT
Y^(ft(y)ft(x))2+(Y^X)ft(x)2+𝖬𝖠𝖷j=1t|Bt,j|absentsubscriptnorm^𝑌subscript𝑓𝑡𝑦subscript𝑓𝑡𝑥2subscriptnorm^𝑌𝑋subscript𝑓𝑡𝑥2𝖬𝖠𝖷superscriptsubscript𝑗1𝑡subscript𝐵𝑡𝑗\displaystyle\leqslant\left\|\hat{Y}(f_{t}(y)-f_{t}(x))\right\|_{2}+\left\|(% \hat{Y}-X)f_{t}(x)\right\|_{2}+\mathsf{MAX}\cdot\sum_{j=1}^{t}|B_{t,j}|⩽ ∥ over^ start_ARG italic_Y end_ARG ( italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y ) - italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x ) ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + ∥ ( over^ start_ARG italic_Y end_ARG - italic_X ) italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + sansserif_MAX ⋅ ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT | italic_B start_POSTSUBSCRIPT italic_t , italic_j end_POSTSUBSCRIPT |
Y^𝗈𝗉(ft(y)ft(x))2+(Y^X)ft(x)2+𝖬𝖠𝖷j=1t|Bt,j|absentsubscriptnorm^𝑌𝗈𝗉subscriptnormsubscript𝑓𝑡𝑦subscript𝑓𝑡𝑥2subscriptnorm^𝑌𝑋subscript𝑓𝑡𝑥2𝖬𝖠𝖷superscriptsubscript𝑗1𝑡subscript𝐵𝑡𝑗\displaystyle\leqslant\left\|\hat{Y}\right\|_{\operatorname{\mathsf{op}}}\left% \|(f_{t}(y)-f_{t}(x))\right\|_{2}+\left\|(\hat{Y}-X)f_{t}(x)\right\|_{2}+% \mathsf{MAX}\cdot\sum_{j=1}^{t}|B_{t,j}|⩽ ∥ over^ start_ARG italic_Y end_ARG ∥ start_POSTSUBSCRIPT sansserif_op end_POSTSUBSCRIPT ∥ ( italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y ) - italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x ) ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + ∥ ( over^ start_ARG italic_Y end_ARG - italic_X ) italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + sansserif_MAX ⋅ ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT | italic_B start_POSTSUBSCRIPT italic_t , italic_j end_POSTSUBSCRIPT |
𝖬𝖠𝖷(10+j=1t|Bt,j|)+Eft(x)2+F1ft(x)2+F2ft(x)2absent𝖬𝖠𝖷10superscriptsubscript𝑗1𝑡subscript𝐵𝑡𝑗subscriptnorm𝐸subscript𝑓𝑡𝑥2subscriptnormsubscript𝐹1subscript𝑓𝑡𝑥2subscriptnormsubscript𝐹2subscript𝑓𝑡𝑥2\displaystyle\leqslant\mathsf{MAX}\cdot\left(10+\sum_{j=1}^{t}|B_{t,j}|\right)% +\|Ef_{t}(x)\|_{2}+\|F_{1}f_{t}(x)\|_{2}+\|F_{2}f_{t}(x)\|_{2}⩽ sansserif_MAX ⋅ ( 10 + ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT | italic_B start_POSTSUBSCRIPT italic_t , italic_j end_POSTSUBSCRIPT | ) + ∥ italic_E italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + ∥ italic_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + ∥ italic_F start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT
CM(CTt)dtlogdt1(1ε)+tεn+3Cεnlogd¯t(1ε)absent𝐶𝑀superscriptsubscript𝐶𝑇𝑡subscript𝑑𝑡superscriptsubscript𝑑𝑡11𝜀𝑡𝜀𝑛3𝐶𝜀𝑛superscriptsubscript¯𝑑𝑡1𝜀\displaystyle\leqslant C\cdot\sqrt{M(C_{T}t)^{d_{t}}\cdot\log^{d_{t}-1}\left(% \tfrac{1}{\varepsilon}\right)+t\cdot\varepsilon n}+3\sqrt{C\varepsilon n\cdot% \log^{\overline{d}_{t}}\left(\tfrac{1}{\varepsilon}\right)}⩽ italic_C ⋅ square-root start_ARG italic_M ( italic_C start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT italic_t ) start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ⋅ roman_log start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT ( divide start_ARG 1 end_ARG start_ARG italic_ε end_ARG ) + italic_t ⋅ italic_ε italic_n end_ARG + 3 square-root start_ARG italic_C italic_ε italic_n ⋅ roman_log start_POSTSUPERSCRIPT over¯ start_ARG italic_d end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( divide start_ARG 1 end_ARG start_ARG italic_ε end_ARG ) end_ARG

where for the last inequality we applied Proposition 3.11 and Proposition 3.12. Now, the Almost-Triangle Inequality and combining with Proposition 3.10 implies that

y(t)x(t)22superscriptsubscriptnormsuperscript𝑦𝑡superscript𝑥𝑡22\displaystyle\|y^{(t)}-x^{(t)}\|_{2}^{2}∥ italic_y start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT - italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT 2C2(M(CTt)dtlogdt1(1ε)+tεn)+36Cεnlogd¯t(1ε)+2εnabsent2superscript𝐶2𝑀superscriptsubscript𝐶𝑇𝑡subscript𝑑𝑡superscriptsubscript𝑑𝑡11𝜀𝑡𝜀𝑛36𝐶𝜀𝑛superscriptsubscript¯𝑑𝑡1𝜀2𝜀𝑛\displaystyle\leqslant 2C^{2}\left(M(C_{T}t)^{d_{t}}\cdot\log^{d_{t}-1}\left(% \tfrac{1}{\varepsilon}\right)+t\cdot\varepsilon n\right)+36C\varepsilon n\cdot% \log^{\overline{d}_{t}}\left(\tfrac{1}{\varepsilon}\right)+2\varepsilon n⩽ 2 italic_C start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_M ( italic_C start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT italic_t ) start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ⋅ roman_log start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT ( divide start_ARG 1 end_ARG start_ARG italic_ε end_ARG ) + italic_t ⋅ italic_ε italic_n ) + 36 italic_C italic_ε italic_n ⋅ roman_log start_POSTSUPERSCRIPT over¯ start_ARG italic_d end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( divide start_ARG 1 end_ARG start_ARG italic_ε end_ARG ) + 2 italic_ε italic_n
=M(Ct)dtlogdt1(1ε)+Cεnlogd¯t(1ε).absent𝑀superscript𝐶𝑡subscript𝑑𝑡superscriptsubscript𝑑𝑡11𝜀𝐶𝜀𝑛superscriptsubscript¯𝑑𝑡1𝜀\displaystyle=M(Ct)^{d_{t}}\cdot\log^{d_{t}-1}\left(\tfrac{1}{\varepsilon}% \right)+C\cdot\varepsilon n\log^{\overline{d}_{t}}\left(\tfrac{1}{\varepsilon}% \right).= italic_M ( italic_C italic_t ) start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ⋅ roman_log start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT ( divide start_ARG 1 end_ARG start_ARG italic_ε end_ARG ) + italic_C ⋅ italic_ε italic_n roman_log start_POSTSUPERSCRIPT over¯ start_ARG italic_d end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( divide start_ARG 1 end_ARG start_ARG italic_ε end_ARG ) .

If the AMP iteration consists of Lipschitz denoisers, it follows that dt=1subscript𝑑𝑡1d_{t}=1italic_d start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = 1 for all t𝑡titalic_t and thus y(t)x(t)22(Ct)tεnlog1εsuperscriptsubscriptnormsuperscript𝑦𝑡superscript𝑥𝑡22superscript𝐶𝑡𝑡𝜀𝑛1𝜀\|y^{(t)}-x^{(t)}\|_{2}^{2}\leqslant(Ct)^{t}\cdot\varepsilon n\log\frac{1}{\varepsilon}∥ italic_y start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT - italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⩽ ( italic_C italic_t ) start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ⋅ italic_ε italic_n roman_log divide start_ARG 1 end_ARG start_ARG italic_ε end_ARG. Else, notice that the power of log1ε1𝜀\log\frac{1}{\varepsilon}roman_log divide start_ARG 1 end_ARG start_ARG italic_ε end_ARG can be at most td¯t𝑡subscript¯𝑑𝑡t\overline{d}_{t}italic_t over¯ start_ARG italic_d end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, which completes the proof.

We finish this section by proving the three propositions.

Proof of Proposition 3.10.

Note that 𝖼𝗅𝗂𝗉εsuperscript𝖼𝗅𝗂𝗉𝜀\operatorname{\mathsf{clip}}^{\varepsilon}sansserif_clip start_POSTSUPERSCRIPT italic_ε end_POSTSUPERSCRIPT is a 1111-Lipschitz function. So, by the triangle inequality,

y(t)x(t)2=𝖼𝗅𝗂𝗉ε(y~(t))x(t)2subscriptnormsuperscript𝑦𝑡superscript𝑥𝑡2subscriptnormsuperscript𝖼𝗅𝗂𝗉𝜀superscript~𝑦𝑡superscript𝑥𝑡2\displaystyle\|y^{(t)}-x^{(t)}\|_{2}=\|\operatorname{\mathsf{clip}}^{% \varepsilon}(\tilde{y}^{(t)})-x^{(t)}\|_{2}∥ italic_y start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT - italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = ∥ sansserif_clip start_POSTSUPERSCRIPT italic_ε end_POSTSUPERSCRIPT ( over~ start_ARG italic_y end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) - italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT 𝖼𝗅𝗂𝗉ε(y~(t))𝖼𝗅𝗂𝗉ε(x(t))2+𝖼𝗅𝗂𝗉ε(x(t))x(t)2absentsubscriptnormsuperscript𝖼𝗅𝗂𝗉𝜀superscript~𝑦𝑡superscript𝖼𝗅𝗂𝗉𝜀superscript𝑥𝑡2subscriptnormsuperscript𝖼𝗅𝗂𝗉𝜀superscript𝑥𝑡superscript𝑥𝑡2\displaystyle\leqslant\|\operatorname{\mathsf{clip}}^{\varepsilon}(\tilde{y}^{% (t)})-\operatorname{\mathsf{clip}}^{\varepsilon}(x^{(t)})\|_{2}+\|% \operatorname{\mathsf{clip}}^{\varepsilon}(x^{(t)})-x^{(t)}\|_{2}⩽ ∥ sansserif_clip start_POSTSUPERSCRIPT italic_ε end_POSTSUPERSCRIPT ( over~ start_ARG italic_y end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) - sansserif_clip start_POSTSUPERSCRIPT italic_ε end_POSTSUPERSCRIPT ( italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + ∥ sansserif_clip start_POSTSUPERSCRIPT italic_ε end_POSTSUPERSCRIPT ( italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) - italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT
y~(t)x(t)2+𝖼𝗅𝗂𝗉ε(x(t))x(t)2absentsubscriptnormsuperscript~𝑦𝑡superscript𝑥𝑡2subscriptnormsuperscript𝖼𝗅𝗂𝗉𝜀superscript𝑥𝑡superscript𝑥𝑡2\displaystyle\leqslant\|\tilde{y}^{(t)}-x^{(t)}\|_{2}+\|\operatorname{\mathsf{% clip}}^{\varepsilon}(x^{(t)})-x^{(t)}\|_{2}⩽ ∥ over~ start_ARG italic_y end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT - italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + ∥ sansserif_clip start_POSTSUPERSCRIPT italic_ε end_POSTSUPERSCRIPT ( italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) - italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT
=y~(t)x(t)2+(i=1n(xi(t))2𝟏[(xi(t))2>CTlog1ε])1/2absentsubscriptnormsuperscript~𝑦𝑡superscript𝑥𝑡2superscriptsuperscriptsubscript𝑖1𝑛superscriptsubscriptsuperscript𝑥𝑡𝑖21delimited-[]superscriptsubscriptsuperscript𝑥𝑡𝑖2subscript𝐶𝑇1𝜀12\displaystyle=\|\tilde{y}^{(t)}-x^{(t)}\|_{2}+\left(\sum_{i=1}^{n}(x^{(t)}_{i}% )^{2}\cdot\bm{1}\left[(x^{(t)}_{i})^{2}>C_{T}\log\frac{1}{\varepsilon}\right]% \right)^{1/2}= ∥ over~ start_ARG italic_y end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT - italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + ( ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⋅ bold_1 [ ( italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT > italic_C start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT roman_log divide start_ARG 1 end_ARG start_ARG italic_ε end_ARG ] ) start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT

so it remains to bound this second quantity. We may apply Corollary 2.3 with f(xi)=g(xi)=xi(t)𝑓subscript𝑥𝑖𝑔subscript𝑥𝑖superscriptsubscript𝑥𝑖𝑡f(\vec{x}_{i})=g(\vec{x}_{i})=x_{i}^{(t)}italic_f ( over→ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = italic_g ( over→ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT (which is Lipschitz) and θ=CTlog1ε𝜃subscript𝐶𝑇1𝜀\theta=C_{T}\log\frac{1}{\varepsilon}italic_θ = italic_C start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT roman_log divide start_ARG 1 end_ARG start_ARG italic_ε end_ARG, which implies that

1ni=1n(xi(t))2𝟏[(xi(t))2>CTlog1ε]1(CTlog1ε)r(C)2r(3r)r=(3(C)2rCTlog1ε)r.1𝑛superscriptsubscript𝑖1𝑛superscriptsubscriptsuperscript𝑥𝑡𝑖21delimited-[]superscriptsubscriptsuperscript𝑥𝑡𝑖2subscript𝐶𝑇1𝜀1superscriptsubscript𝐶𝑇1𝜀𝑟superscriptsuperscript𝐶2𝑟superscript3𝑟𝑟superscript3superscriptsuperscript𝐶2𝑟subscript𝐶𝑇1𝜀𝑟\frac{1}{n}\sum_{i=1}^{n}(x^{(t)}_{i})^{2}\cdot\bm{1}\left[(x^{(t)}_{i})^{2}>C% _{T}\log\frac{1}{\varepsilon}\right]\leqslant\frac{1}{(C_{T}\log\frac{1}{% \varepsilon})^{r}}\cdot(C^{\prime})^{2r}\cdot(3r)^{r}=\left(\frac{3(C^{\prime}% )^{2}\cdot r}{C_{T}\log\frac{1}{\varepsilon}}\right)^{r}.divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⋅ bold_1 [ ( italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT > italic_C start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT roman_log divide start_ARG 1 end_ARG start_ARG italic_ε end_ARG ] ⩽ divide start_ARG 1 end_ARG start_ARG ( italic_C start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT roman_log divide start_ARG 1 end_ARG start_ARG italic_ε end_ARG ) start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT end_ARG ⋅ ( italic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 italic_r end_POSTSUPERSCRIPT ⋅ ( 3 italic_r ) start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT = ( divide start_ARG 3 ( italic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⋅ italic_r end_ARG start_ARG italic_C start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT roman_log divide start_ARG 1 end_ARG start_ARG italic_ε end_ARG end_ARG ) start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT .

By choosing CT3e(C)2subscript𝐶𝑇3𝑒superscriptsuperscript𝐶2C_{T}\geqslant 3e(C^{\prime})^{2}italic_C start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ⩾ 3 italic_e ( italic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT and taking r=log1ε𝑟1𝜀r=\log\frac{1}{\varepsilon}italic_r = roman_log divide start_ARG 1 end_ARG start_ARG italic_ε end_ARG, it follows that

1ni=1n(xi(t))2𝟏[(xi(t))2>CTlog1ε]ε1𝑛superscriptsubscript𝑖1𝑛superscriptsubscriptsuperscript𝑥𝑡𝑖21delimited-[]superscriptsubscriptsuperscript𝑥𝑡𝑖2subscript𝐶𝑇1𝜀𝜀\frac{1}{n}\sum_{i=1}^{n}(x^{(t)}_{i})^{2}\cdot\bm{1}\left[(x^{(t)}_{i})^{2}>C% _{T}\log\frac{1}{\varepsilon}\right]\leqslant\varepsilondivide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⋅ bold_1 [ ( italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT > italic_C start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT roman_log divide start_ARG 1 end_ARG start_ARG italic_ε end_ARG ] ⩽ italic_ε

from where the conclusion follows. ∎

Proof of Proposition 3.11.

Let Ssuperscript𝑆S^{\prime}italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT be the indices in the support of E𝐸Eitalic_E, and let T𝑇Titalic_T be the set of indices in the row-support of F1subscript𝐹1F_{1}italic_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT (and column-support of F2subscript𝐹2F_{2}italic_F start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT), as in the figure at the beginning of the section. For a given vector v𝑣vitalic_v, we will define vSsubscript𝑣superscript𝑆v_{S^{\prime}}italic_v start_POSTSUBSCRIPT italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT to be the restriction of v𝑣vitalic_v to Ssuperscript𝑆S^{\prime}italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. Then, note that

Eft(x)22=E(ft(x))S22E𝗈𝗉ft(x)S2212ft(x)S22,superscriptsubscriptnorm𝐸subscript𝑓𝑡𝑥22superscriptsubscriptnorm𝐸subscriptsubscript𝑓𝑡𝑥superscript𝑆22subscriptnorm𝐸𝗈𝗉superscriptsubscriptnormsubscript𝑓𝑡subscript𝑥superscript𝑆2212superscriptsubscriptnormsubscript𝑓𝑡subscript𝑥superscript𝑆22\|Ef_{t}(x)\|_{2}^{2}=\left\|E(f_{t}(x))_{S^{\prime}}\right\|_{2}^{2}\leqslant% \|E\|_{\operatorname{\mathsf{op}}}\cdot\left\|f_{t}(x)_{S^{\prime}}\right\|_{2% }^{2}\leqslant 12\left\|f_{t}(x)_{S^{\prime}}\right\|_{2}^{2},∥ italic_E italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = ∥ italic_E ( italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x ) ) start_POSTSUBSCRIPT italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⩽ ∥ italic_E ∥ start_POSTSUBSCRIPT sansserif_op end_POSTSUBSCRIPT ⋅ ∥ italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x ) start_POSTSUBSCRIPT italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⩽ 12 ∥ italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x ) start_POSTSUBSCRIPT italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ,

and similarly F2ft(x)22F2𝗈𝗉ft(x)T224ft(x)T22superscriptsubscriptnormsubscript𝐹2subscript𝑓𝑡𝑥22subscriptnormsubscript𝐹2𝗈𝗉superscriptsubscriptnormsubscript𝑓𝑡subscript𝑥𝑇224superscriptsubscriptnormsubscript𝑓𝑡subscript𝑥𝑇22\|F_{2}f_{t}(x)\|_{2}^{2}\leqslant\|F_{2}\|_{\operatorname{\mathsf{op}}}\left% \|f_{t}(x)_{T}\right\|_{2}^{2}\leqslant 4\left\|f_{t}(x)_{T}\right\|_{2}^{2}∥ italic_F start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⩽ ∥ italic_F start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT sansserif_op end_POSTSUBSCRIPT ∥ italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x ) start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⩽ 4 ∥ italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x ) start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. To handle each of these, notice that |S|,|T|4εnsuperscript𝑆𝑇4𝜀𝑛|S^{\prime}|,|T|\leqslant 4\varepsilon n| italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | , | italic_T | ⩽ 4 italic_ε italic_n. Therefore, we may apply Corollary 2.3 to deduce that

1nft(x)S22Cεlogdt1ε1𝑛superscriptsubscriptnormsubscript𝑓𝑡subscript𝑥superscript𝑆22𝐶𝜀superscriptsubscript𝑑𝑡1𝜀\frac{1}{n}\left\|f_{t}(x)_{S^{\prime}}\right\|_{2}^{2}\leqslant C\varepsilon% \log^{d_{t}}\frac{1}{\varepsilon}divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∥ italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x ) start_POSTSUBSCRIPT italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⩽ italic_C italic_ε roman_log start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_ε end_ARG

and similarly for ft(x)Tsubscript𝑓𝑡subscript𝑥𝑇f_{t}(x)_{T}italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x ) start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT. This implies the boundedness of Eft(x)22superscriptsubscriptnorm𝐸subscript𝑓𝑡𝑥22\|Ef_{t}(x)\|_{2}^{2}∥ italic_E italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT and F2ft(x)22superscriptsubscriptnormsubscript𝐹2subscript𝑓𝑡𝑥22\|F_{2}f_{t}(x)\|_{2}^{2}∥ italic_F start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT.

We cannot use the same argument for F1ft(x)22superscriptsubscriptnormsubscript𝐹1subscript𝑓𝑡𝑥22\|F_{1}f_{t}(x)\|_{2}^{2}∥ italic_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT because F1subscript𝐹1F_{1}italic_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is supported on all columns Instead, let us recall that F1=Xsubscript𝐹1𝑋F_{1}=-Xitalic_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = - italic_X on its supported rows, so F1ft(x)=(Xft(x))Tsubscript𝐹1subscript𝑓𝑡𝑥subscript𝑋subscript𝑓𝑡𝑥𝑇F_{1}f_{t}(x)=(-Xf_{t}(x))_{T}italic_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x ) = ( - italic_X italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x ) ) start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT and we are trying to bound F1ft(x)22=(Xft(x))T22superscriptsubscriptnormsubscript𝐹1subscript𝑓𝑡𝑥22superscriptsubscriptnormsubscript𝑋subscript𝑓𝑡𝑥𝑇22\|F_{1}f_{t}(x)\|_{2}^{2}=\left\|(-Xf_{t}(x))_{T}\right\|_{2}^{2}∥ italic_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = ∥ ( - italic_X italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x ) ) start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. Using the definition of the AMP iteration, we can rewrite

x(t)=Xft(x)j=1tBt,jfj1(x)Xft(x)=x(t)j=1tBt,jfj1(x).superscript𝑥𝑡𝑋subscript𝑓𝑡𝑥superscriptsubscript𝑗1𝑡subscript𝐵𝑡𝑗subscript𝑓𝑗1𝑥𝑋subscript𝑓𝑡𝑥superscript𝑥𝑡superscriptsubscript𝑗1𝑡subscript𝐵𝑡𝑗subscript𝑓𝑗1𝑥x^{(t)}=Xf_{t}(x)-\sum_{j=1}^{t}B_{t,j}f_{j-1}(x)\implies-Xf_{t}(x)=-x^{(t)}-% \sum_{j=1}^{t}B_{t,j}f_{j-1}(x).italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT = italic_X italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x ) - ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_B start_POSTSUBSCRIPT italic_t , italic_j end_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT italic_j - 1 end_POSTSUBSCRIPT ( italic_x ) ⟹ - italic_X italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x ) = - italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT - ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_B start_POSTSUBSCRIPT italic_t , italic_j end_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT italic_j - 1 end_POSTSUBSCRIPT ( italic_x ) .

Therefore, Xft(x)PL(maxj<tdj)𝑋subscript𝑓𝑡𝑥PLsubscript𝑗𝑡subscript𝑑𝑗-Xf_{t}(x)\in\operatorname{PL}(\max_{j<t}d_{j})- italic_X italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x ) ∈ roman_PL ( roman_max start_POSTSUBSCRIPT italic_j < italic_t end_POSTSUBSCRIPT italic_d start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) and is a function of the iterates x(0),x(1),,x(t)superscript𝑥0superscript𝑥1superscript𝑥𝑡x^{(0)},x^{(1)},\ldots,x^{(t)}italic_x start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT , italic_x start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT , … , italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT. Once more applying Corollary 2.3, it follows that

1nF1ft(x)22=1n(Xft(x))T22Cεlogmaxj<tdj1ε1𝑛superscriptsubscriptnormsubscript𝐹1subscript𝑓𝑡𝑥221𝑛superscriptsubscriptnormsubscript𝑋subscript𝑓𝑡𝑥𝑇22𝐶𝜀superscriptsubscript𝑗𝑡subscript𝑑𝑗1𝜀\frac{1}{n}\|F_{1}f_{t}(x)\|_{2}^{2}=\frac{1}{n}\left\|(-Xf_{t}(x))_{T}\right% \|_{2}^{2}\leqslant C\varepsilon\log^{\max_{j<t}d_{j}}\frac{1}{\varepsilon}divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∥ italic_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∥ ( - italic_X italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x ) ) start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⩽ italic_C italic_ε roman_log start_POSTSUPERSCRIPT roman_max start_POSTSUBSCRIPT italic_j < italic_t end_POSTSUBSCRIPT italic_d start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_ε end_ARG

and we are done. ∎

Proof of Proposition 3.12.

We begin by applying the definition of PL(dt)PLsubscript𝑑𝑡\operatorname{PL}(d_{t})roman_PL ( italic_d start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ). In particular, combined with the Almost-Triangle Inequality we find that

ft(y)ft(x)22superscriptsubscriptnormsubscript𝑓𝑡𝑦subscript𝑓𝑡𝑥22\displaystyle\|f_{t}(y)-f_{t}(x)\|_{2}^{2}∥ italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y ) - italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT =i=1n(ft(yi)ft(xi))2absentsuperscriptsubscript𝑖1𝑛superscriptsubscript𝑓𝑡subscript𝑦𝑖subscript𝑓𝑡subscript𝑥𝑖2\displaystyle=\sum_{i=1}^{n}(f_{t}(y_{i})-f_{t}(x_{i}))^{2}= ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) - italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
L2i=1n(1+yidt1+xidt1)2yixi2absentsuperscript𝐿2superscriptsubscript𝑖1𝑛superscript1superscriptnormsubscript𝑦𝑖subscript𝑑𝑡1superscriptnormsubscript𝑥𝑖subscript𝑑𝑡12superscriptnormsubscript𝑦𝑖subscript𝑥𝑖2\displaystyle\leqslant L^{2}\sum_{i=1}^{n}(1+\|y_{i}\|^{d_{t}-1}+\|x_{i}\|^{d_% {t}-1})^{2}\cdot\|y_{i}-x_{i}\|^{2}⩽ italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( 1 + ∥ italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT + ∥ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⋅ ∥ italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
3L2i=1n(1+yi2(dt1)+xi2(dt1))j=0t1(yi(j)xi(j))2absent3superscript𝐿2superscriptsubscript𝑖1𝑛1superscriptnormsubscript𝑦𝑖2subscript𝑑𝑡1superscriptnormsubscript𝑥𝑖2subscript𝑑𝑡1superscriptsubscript𝑗0𝑡1superscriptsubscriptsuperscript𝑦𝑗𝑖subscriptsuperscript𝑥𝑗𝑖2\displaystyle\leqslant 3L^{2}\sum_{i=1}^{n}\left(1+\|y_{i}\|^{2(d_{t}-1)}+\|x_% {i}\|^{2(d_{t}-1)}\right)\cdot\sum_{j=0}^{t-1}(y^{(j)}_{i}-x^{(j)}_{i})^{2}⩽ 3 italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( 1 + ∥ italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 ( italic_d start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - 1 ) end_POSTSUPERSCRIPT + ∥ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 ( italic_d start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - 1 ) end_POSTSUPERSCRIPT ) ⋅ ∑ start_POSTSUBSCRIPT italic_j = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ( italic_y start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_x start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
3L2(1+maxiyi2(dt1))j=0t1y(j)x(j)22+j=0t1i=1nxi2(dt1)(yi(j)xi(j))2absent3superscript𝐿21subscript𝑖superscriptnormsubscript𝑦𝑖2subscript𝑑𝑡1superscriptsubscript𝑗0𝑡1superscriptsubscriptnormsuperscript𝑦𝑗superscript𝑥𝑗22superscriptsubscript𝑗0𝑡1superscriptsubscript𝑖1𝑛superscriptnormsubscript𝑥𝑖2subscript𝑑𝑡1superscriptsubscriptsuperscript𝑦𝑗𝑖subscriptsuperscript𝑥𝑗𝑖2\displaystyle\leqslant 3L^{2}\left(1+\max_{i}\|y_{i}\|^{2(d_{t}-1)}\right)\sum% _{j=0}^{t-1}\left\|y^{(j)}-x^{(j)}\right\|_{2}^{2}+\sum_{j=0}^{t-1}\sum_{i=1}^% {n}\|x_{i}\|^{2(d_{t}-1)}\cdot(y^{(j)}_{i}-x^{(j)}_{i})^{2}⩽ 3 italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 1 + roman_max start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 ( italic_d start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - 1 ) end_POSTSUPERSCRIPT ) ∑ start_POSTSUBSCRIPT italic_j = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ∥ italic_y start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT - italic_x start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ∑ start_POSTSUBSCRIPT italic_j = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∥ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 ( italic_d start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - 1 ) end_POSTSUPERSCRIPT ⋅ ( italic_y start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_x start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
M6tL2(CTtlog1ε)dt1+j=0t1i=1nxi2(dt1)(yi(j)xi(j))2.absent𝑀6𝑡superscript𝐿2superscriptsubscript𝐶𝑇𝑡1𝜀subscript𝑑𝑡1superscriptsubscript𝑗0𝑡1superscriptsubscript𝑖1𝑛superscriptnormsubscript𝑥𝑖2subscript𝑑𝑡1superscriptsubscriptsuperscript𝑦𝑗𝑖subscriptsuperscript𝑥𝑗𝑖2\displaystyle\leqslant M\cdot 6tL^{2}(C_{T}\cdot t\log\tfrac{1}{\varepsilon})^% {d_{t}-1}+\sum_{j=0}^{t-1}\sum_{i=1}^{n}\|x_{i}\|^{2(d_{t}-1)}\cdot(y^{(j)}_{i% }-x^{(j)}_{i})^{2}.⩽ italic_M ⋅ 6 italic_t italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_C start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ⋅ italic_t roman_log divide start_ARG 1 end_ARG start_ARG italic_ε end_ARG ) start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT + ∑ start_POSTSUBSCRIPT italic_j = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∥ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 ( italic_d start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - 1 ) end_POSTSUPERSCRIPT ⋅ ( italic_y start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_x start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . (4)

The last inequality holds because for each i𝑖iitalic_i, yi2(dt1)=(j=0t1(yi(j))2)dt1(CTtlog1ε)dt1superscriptnormsubscript𝑦𝑖2subscript𝑑𝑡1superscriptsuperscriptsubscript𝑗0𝑡1superscriptsubscriptsuperscript𝑦𝑗𝑖2subscript𝑑𝑡1superscriptsubscript𝐶𝑇𝑡1𝜀subscript𝑑𝑡1\|y_{i}\|^{2(d_{t}-1)}=\left(\sum_{j=0}^{t-1}(y^{(j)}_{i})^{2}\right)^{d_{t}-1% }\leqslant(C_{T}\cdot t\log\frac{1}{\varepsilon})^{d_{t}-1}∥ italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 ( italic_d start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - 1 ) end_POSTSUPERSCRIPT = ( ∑ start_POSTSUBSCRIPT italic_j = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ( italic_y start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT ⩽ ( italic_C start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ⋅ italic_t roman_log divide start_ARG 1 end_ARG start_ARG italic_ε end_ARG ) start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT. Therefore, it remains to handle the last sum.

We claim that

xi2(dt1)(yi(j)xi(j))2[(CTtlog1ε)dt1(yi(j)xi(j))2]+[xi2(dt1)(|xi(j)|+2CTlog1ε)2𝟏[xi2>CTtlog1ε]].superscriptnormsubscript𝑥𝑖2subscript𝑑𝑡1superscriptsubscriptsuperscript𝑦𝑗𝑖subscriptsuperscript𝑥𝑗𝑖2delimited-[]superscriptsubscript𝐶𝑇𝑡1𝜀subscript𝑑𝑡1superscriptsubscriptsuperscript𝑦𝑗𝑖subscriptsuperscript𝑥𝑗𝑖2delimited-[]superscriptnormsubscript𝑥𝑖2subscript𝑑𝑡1superscriptsuperscriptsubscript𝑥𝑖𝑗2subscript𝐶𝑇1𝜀21delimited-[]superscriptnormsubscript𝑥𝑖2subscript𝐶𝑇𝑡1𝜀\|x_{i}\|^{2(d_{t}-1)}\cdot(y^{(j)}_{i}-x^{(j)}_{i})^{2}\leqslant\Biggl{[}(C_{% T}\cdot t\log\tfrac{1}{\varepsilon})^{d_{t}-1}(y^{(j)}_{i}-x^{(j)}_{i})^{2}% \Biggr{]}+\Biggl{[}\|x_{i}\|^{2(d_{t}-1)}\cdot(|x_{i}^{(j)}|+2\sqrt{C_{T}\log% \tfrac{1}{\varepsilon}})^{2}\cdot\bm{1}\left[\|x_{i}\|^{2}>C_{T}\cdot t\log% \tfrac{1}{\varepsilon}\right]\Biggr{]}.∥ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 ( italic_d start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - 1 ) end_POSTSUPERSCRIPT ⋅ ( italic_y start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_x start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⩽ [ ( italic_C start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ⋅ italic_t roman_log divide start_ARG 1 end_ARG start_ARG italic_ε end_ARG ) start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_y start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_x start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] + [ ∥ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 ( italic_d start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - 1 ) end_POSTSUPERSCRIPT ⋅ ( | italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT | + 2 square-root start_ARG italic_C start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT roman_log divide start_ARG 1 end_ARG start_ARG italic_ε end_ARG end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⋅ bold_1 [ ∥ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT > italic_C start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ⋅ italic_t roman_log divide start_ARG 1 end_ARG start_ARG italic_ε end_ARG ] ] .

Indeed,

  • If xi2CTtlog1εsuperscriptnormsubscript𝑥𝑖2subscript𝐶𝑇𝑡1𝜀\|x_{i}\|^{2}\leqslant C_{T}\cdot t\log\frac{1}{\varepsilon}∥ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⩽ italic_C start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ⋅ italic_t roman_log divide start_ARG 1 end_ARG start_ARG italic_ε end_ARG, then certainly the left side is bounded by the first term.

  • Else, note that (yi(j)xi(j))2(|xi(j)|+2CTlog1ε)2superscriptsubscriptsuperscript𝑦𝑗𝑖subscriptsuperscript𝑥𝑗𝑖2superscriptsuperscriptsubscript𝑥𝑖𝑗2subscript𝐶𝑇1𝜀2(y^{(j)}_{i}-x^{(j)}_{i})^{2}\leqslant(|x_{i}^{(j)}|+2\sqrt{C_{T}\log\frac{1}{% \varepsilon}})^{2}( italic_y start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_x start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⩽ ( | italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT | + 2 square-root start_ARG italic_C start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT roman_log divide start_ARG 1 end_ARG start_ARG italic_ε end_ARG end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, where the absolute value protects against opposite signs. Therefore, in this latter case we have that the left side is bounded by the second term.

Summing over i[n]𝑖delimited-[]𝑛i\in[n]italic_i ∈ [ italic_n ], it follows that

i=1nxi2(dt1)(yi(j)xi(j))2M(CTtlog1ε)dt1+i=1nxi2(dt1)(|xi(j)|+2CTlog1ε)2𝟏[xi2>CTtlog1ε]superscriptsubscript𝑖1𝑛superscriptnormsubscript𝑥𝑖2subscript𝑑𝑡1superscriptsubscriptsuperscript𝑦𝑗𝑖subscriptsuperscript𝑥𝑗𝑖2𝑀superscriptsubscript𝐶𝑇𝑡1𝜀subscript𝑑𝑡1superscriptsubscript𝑖1𝑛superscriptnormsubscript𝑥𝑖2subscript𝑑𝑡1superscriptsuperscriptsubscript𝑥𝑖𝑗2subscript𝐶𝑇1𝜀21delimited-[]superscriptnormsubscript𝑥𝑖2subscript𝐶𝑇𝑡1𝜀\sum_{i=1}^{n}\|x_{i}\|^{2(d_{t}-1)}\cdot(y^{(j)}_{i}-x^{(j)}_{i})^{2}% \leqslant M\cdot(C_{T}\cdot t\log\frac{1}{\varepsilon})^{d_{t}-1}+\sum_{i=1}^{% n}\|x_{i}\|^{2(d_{t}-1)}\cdot(|x_{i}^{(j)}|+2\sqrt{C_{T}\log\tfrac{1}{% \varepsilon}})^{2}\cdot\bm{1}\left[\|x_{i}\|^{2}>C_{T}\cdot t\log\tfrac{1}{% \varepsilon}\right]∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∥ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 ( italic_d start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - 1 ) end_POSTSUPERSCRIPT ⋅ ( italic_y start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_x start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⩽ italic_M ⋅ ( italic_C start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ⋅ italic_t roman_log divide start_ARG 1 end_ARG start_ARG italic_ε end_ARG ) start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT + ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∥ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 ( italic_d start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - 1 ) end_POSTSUPERSCRIPT ⋅ ( | italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT | + 2 square-root start_ARG italic_C start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT roman_log divide start_ARG 1 end_ARG start_ARG italic_ε end_ARG end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⋅ bold_1 [ ∥ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT > italic_C start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ⋅ italic_t roman_log divide start_ARG 1 end_ARG start_ARG italic_ε end_ARG ]

Applying Corollary 2.3 to this second term with f(xi)=xidt1(|xi(j)|+2CTlog1ε)𝑓subscript𝑥𝑖superscriptnormsubscript𝑥𝑖subscript𝑑𝑡1superscriptsubscript𝑥𝑖𝑗2subscript𝐶𝑇1𝜀f(\vec{x}_{i})=\|x_{i}\|^{d_{t}-1}\cdot\left(|x_{i}^{(j)}|+2\sqrt{C_{T}\log% \tfrac{1}{\varepsilon}}\right)italic_f ( over→ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = ∥ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT ⋅ ( | italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT | + 2 square-root start_ARG italic_C start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT roman_log divide start_ARG 1 end_ARG start_ARG italic_ε end_ARG end_ARG ), g(xi)=xi𝑔subscript𝑥𝑖normsubscript𝑥𝑖g(\vec{x}_{i})=\|x_{i}\|italic_g ( over→ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = ∥ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ (which is Lipschitz), and θ=CTtlog1ε𝜃subscript𝐶𝑇𝑡1𝜀\theta=C_{T}\cdot t\log\frac{1}{\varepsilon}italic_θ = italic_C start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ⋅ italic_t roman_log divide start_ARG 1 end_ARG start_ARG italic_ε end_ARG, it follows that

1ni=1nxi2(dt1)(|xi(j)|+2CTlog1ε)2𝟏[xi2>CTtlog1ε](3(C)2rCTtlog1ε)rε1𝑛superscriptsubscript𝑖1𝑛superscriptnormsubscript𝑥𝑖2subscript𝑑𝑡1superscriptsuperscriptsubscript𝑥𝑖𝑗2subscript𝐶𝑇1𝜀21delimited-[]superscriptnormsubscript𝑥𝑖2subscript𝐶𝑇𝑡1𝜀superscript3superscriptsuperscript𝐶2𝑟subscript𝐶𝑇𝑡1𝜀𝑟𝜀\frac{1}{n}\sum_{i=1}^{n}\|x_{i}\|^{2(d_{t}-1)}\cdot(|x_{i}^{(j)}|+2\sqrt{C_{T% }\log\tfrac{1}{\varepsilon}})^{2}\cdot\bm{1}\left[\|x_{i}\|^{2}>C_{T}\cdot t% \log\tfrac{1}{\varepsilon}\right]\leqslant\left(\frac{3(C^{\prime})^{2}r}{C_{T% }\cdot t\log\frac{1}{\varepsilon}}\right)^{r}\leqslant\varepsilondivide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∥ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 ( italic_d start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - 1 ) end_POSTSUPERSCRIPT ⋅ ( | italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT | + 2 square-root start_ARG italic_C start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT roman_log divide start_ARG 1 end_ARG start_ARG italic_ε end_ARG end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⋅ bold_1 [ ∥ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT > italic_C start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ⋅ italic_t roman_log divide start_ARG 1 end_ARG start_ARG italic_ε end_ARG ] ⩽ ( divide start_ARG 3 ( italic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_r end_ARG start_ARG italic_C start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ⋅ italic_t roman_log divide start_ARG 1 end_ARG start_ARG italic_ε end_ARG end_ARG ) start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT ⩽ italic_ε

by taking r=log1ε𝑟1𝜀r=\log\frac{1}{\varepsilon}italic_r = roman_log divide start_ARG 1 end_ARG start_ARG italic_ε end_ARG and having CTt>3e(C)2subscript𝐶𝑇𝑡3𝑒superscriptsuperscript𝐶2C_{T}\cdot t>3e(C^{\prime})^{2}italic_C start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ⋅ italic_t > 3 italic_e ( italic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. Therefore, plugging this all back in to (4), we have that

ft(y)ft(x)22superscriptsubscriptnormsubscript𝑓𝑡𝑦subscript𝑓𝑡𝑥22\displaystyle\|f_{t}(y)-f_{t}(x)\|_{2}^{2}∥ italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y ) - italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT M6tL2(CTtlog1ε)dt1+j=0t1i=1nxi2(dt1)(yi(j)xi(j))2absent𝑀6𝑡superscript𝐿2superscriptsubscript𝐶𝑇𝑡1𝜀subscript𝑑𝑡1superscriptsubscript𝑗0𝑡1superscriptsubscript𝑖1𝑛superscriptnormsubscript𝑥𝑖2subscript𝑑𝑡1superscriptsubscriptsuperscript𝑦𝑗𝑖subscriptsuperscript𝑥𝑗𝑖2\displaystyle\leqslant M\cdot 6tL^{2}(C_{T}\cdot t\log\tfrac{1}{\varepsilon})^% {d_{t}-1}+\sum_{j=0}^{t-1}\sum_{i=1}^{n}\|x_{i}\|^{2(d_{t}-1)}\cdot(y^{(j)}_{i% }-x^{(j)}_{i})^{2}⩽ italic_M ⋅ 6 italic_t italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_C start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ⋅ italic_t roman_log divide start_ARG 1 end_ARG start_ARG italic_ε end_ARG ) start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT + ∑ start_POSTSUBSCRIPT italic_j = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∥ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 ( italic_d start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - 1 ) end_POSTSUPERSCRIPT ⋅ ( italic_y start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_x start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
M6tL2(CTtlog1ε)dt1+j=0t1M(CTtlog1ε)dt1+εnabsent𝑀6𝑡superscript𝐿2superscriptsubscript𝐶𝑇𝑡1𝜀subscript𝑑𝑡1superscriptsubscript𝑗0𝑡1𝑀superscriptsubscript𝐶𝑇𝑡1𝜀subscript𝑑𝑡1𝜀𝑛\displaystyle\leqslant M\cdot 6tL^{2}(C_{T}\cdot t\log\tfrac{1}{\varepsilon})^% {d_{t}-1}+\sum_{j=0}^{t-1}M\cdot(C_{T}\cdot t\log\frac{1}{\varepsilon})^{d_{t}% -1}+\varepsilon n⩽ italic_M ⋅ 6 italic_t italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_C start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ⋅ italic_t roman_log divide start_ARG 1 end_ARG start_ARG italic_ε end_ARG ) start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT + ∑ start_POSTSUBSCRIPT italic_j = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT italic_M ⋅ ( italic_C start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ⋅ italic_t roman_log divide start_ARG 1 end_ARG start_ARG italic_ε end_ARG ) start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT + italic_ε italic_n
M(CTt)dtlogdt1(1ε)+tεnabsent𝑀superscriptsubscript𝐶𝑇𝑡subscript𝑑𝑡superscriptsubscript𝑑𝑡11𝜀𝑡𝜀𝑛\displaystyle\leqslant M\cdot(C_{T}\cdot t)^{d_{t}}\cdot\log^{d_{t}-1}\left(% \tfrac{1}{\varepsilon}\right)+t\cdot\varepsilon n⩽ italic_M ⋅ ( italic_C start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ⋅ italic_t ) start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ⋅ roman_log start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT ( divide start_ARG 1 end_ARG start_ARG italic_ε end_ARG ) + italic_t ⋅ italic_ε italic_n

as desired, assuming that CT>6L2subscript𝐶𝑇6superscript𝐿2C_{T}>6L^{2}italic_C start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT > 6 italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. ∎

4 AMP is robust to small spectral perturbations

Here we argue that AMP is robust to spectral perturbations.

Lemma 4.1.

Suppose that X𝑋Xitalic_X has independent entries of mean 00, variance 1n1𝑛\frac{1}{n}divide start_ARG 1 end_ARG start_ARG italic_n end_ARG, and subgaussian parameter O(1)n𝑂1𝑛\frac{O(1)}{\sqrt{n}}divide start_ARG italic_O ( 1 ) end_ARG start_ARG square-root start_ARG italic_n end_ARG end_ARG. Let \mathcal{F}caligraphic_F be an AMP algorithm consisting of Lipschitz denoiser functions with Lipschitz constant at most L𝐿Litalic_L, and let vAMP(X)subscript𝑣AMP𝑋v_{\mathrm{AMP}}(X)italic_v start_POSTSUBSCRIPT roman_AMP end_POSTSUBSCRIPT ( italic_X ) denote the output of the T𝑇Titalic_T-step AMP algorithm on input X𝑋Xitalic_X, and vAMP(Y)subscript𝑣AMP𝑌v_{\mathrm{AMP}}(Y)italic_v start_POSTSUBSCRIPT roman_AMP end_POSTSUBSCRIPT ( italic_Y ) denote the output of the same algorithm on input Y𝑌Yitalic_Y for any Y𝑌Yitalic_Y satisfying YX𝗈𝗉εsubscriptnorm𝑌𝑋𝗈𝗉𝜀\|Y-X\|_{\operatorname{\mathsf{op}}}\leqslant\varepsilon∥ italic_Y - italic_X ∥ start_POSTSUBSCRIPT sansserif_op end_POSTSUBSCRIPT ⩽ italic_ε. Then there exists a universal constant C𝐶Citalic_C such that with probability 1o(1)1𝑜11-o(1)1 - italic_o ( 1 ) over X𝑋Xitalic_X,

1nvAMP(Y)vAMP(X)22ε2C2T+2((T+1)!)2.1𝑛superscriptsubscriptnormsubscript𝑣AMP𝑌subscript𝑣AMP𝑋22superscript𝜀2superscript𝐶2𝑇2superscript𝑇12\frac{1}{n}\|v_{\mathrm{AMP}}(Y)-v_{\mathrm{AMP}}(X)\|_{2}^{2}\leqslant% \varepsilon^{2}\cdot C^{2T+2}\cdot((T+1)!)^{2}.divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∥ italic_v start_POSTSUBSCRIPT roman_AMP end_POSTSUBSCRIPT ( italic_Y ) - italic_v start_POSTSUBSCRIPT roman_AMP end_POSTSUBSCRIPT ( italic_X ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⩽ italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⋅ italic_C start_POSTSUPERSCRIPT 2 italic_T + 2 end_POSTSUPERSCRIPT ⋅ ( ( italic_T + 1 ) ! ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

Since the starting iterate x(0)=1superscript𝑥01x^{(0)}=\vec{1}italic_x start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT = over→ start_ARG 1 end_ARG and X𝑋Xitalic_X has entries of variance 1n1𝑛\frac{1}{n}divide start_ARG 1 end_ARG start_ARG italic_n end_ARG, the scaling 1nvAMP(X)21similar-to1𝑛superscriptnormsubscript𝑣AMP𝑋21\frac{1}{n}\|v_{\mathrm{AMP}}(X)\|^{2}\sim 1divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∥ italic_v start_POSTSUBSCRIPT roman_AMP end_POSTSUBSCRIPT ( italic_X ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∼ 1 is of the correct order for reasonable denoisers, in which case the above implies that vAMP(X),vAMP(Y)subscript𝑣AMP𝑋subscript𝑣AMP𝑌v_{\mathrm{AMP}}(X),v_{\mathrm{AMP}}(Y)italic_v start_POSTSUBSCRIPT roman_AMP end_POSTSUBSCRIPT ( italic_X ) , italic_v start_POSTSUBSCRIPT roman_AMP end_POSTSUBSCRIPT ( italic_Y ) are 1O(ε2)1𝑂superscript𝜀21-O(\varepsilon^{2})1 - italic_O ( italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT )-correlated.

Proof.

Let us denote the iterates to be y(t)superscript𝑦𝑡y^{(t)}italic_y start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT and x(t)superscript𝑥𝑡x^{(t)}italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT for Y𝑌Yitalic_Y and X𝑋Xitalic_X, respectively, and let E=YX𝐸𝑌𝑋E=Y-Xitalic_E = italic_Y - italic_X. When t=0𝑡0t=0italic_t = 0, x(0)=y(0)=1superscript𝑥0superscript𝑦01x^{(0)}=y^{(0)}=\vec{1}italic_x start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT = italic_y start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT = over→ start_ARG 1 end_ARG so the statement trivially holds. Now assuming we have shown this for all k<t𝑘𝑡k<titalic_k < italic_t, we will now show it t𝑡titalic_t. We will use the shorthand fk(x)=fk(x(k),,x(0))subscript𝑓𝑘𝑥subscript𝑓𝑘superscript𝑥𝑘superscript𝑥0f_{k}(x)=f_{k}(x^{(k)},\ldots,x^{(0)})italic_f start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_x ) = italic_f start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT , … , italic_x start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT ). We may expand the expression for y(t)x(t)superscript𝑦𝑡superscript𝑥𝑡y^{(t)}-x^{(t)}italic_y start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT - italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT:

y(t)x(t)2subscriptnormsuperscript𝑦𝑡superscript𝑥𝑡2\displaystyle\left\|y^{(t)}-x^{(t)}\right\|_{2}∥ italic_y start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT - italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT =Yft(y)Xft(x)+j=1tBt,j(fj(y)fj(x))2absentsubscriptnorm𝑌subscript𝑓𝑡𝑦𝑋subscript𝑓𝑡𝑥superscriptsubscript𝑗1𝑡subscript𝐵𝑡𝑗subscript𝑓𝑗𝑦subscript𝑓𝑗𝑥2\displaystyle=\left\|Yf_{t}(y)-Xf_{t}(x)+\sum_{j=1}^{t}B_{t,j}\left(f_{j}(y)-f% _{j}(x)\right)\right\|_{2}= ∥ italic_Y italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y ) - italic_X italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x ) + ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_B start_POSTSUBSCRIPT italic_t , italic_j end_POSTSUBSCRIPT ( italic_f start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_y ) - italic_f start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_x ) ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT
=Y(ft(y)ft(x))+(YX)ft(x)+j=1tBt,j(fj(y)fj(x))2absentsubscriptnorm𝑌subscript𝑓𝑡𝑦subscript𝑓𝑡𝑥𝑌𝑋subscript𝑓𝑡𝑥superscriptsubscript𝑗1𝑡subscript𝐵𝑡𝑗subscript𝑓𝑗𝑦subscript𝑓𝑗𝑥2\displaystyle=\left\|Y\big{(}f_{t}(y)-f_{t}(x)\big{)}+(Y-X)f_{t}(x)+\sum_{j=1}% ^{t}B_{t,j}\cdot\left(f_{j}(y)-f_{j}(x)\right)\right\|_{2}= ∥ italic_Y ( italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y ) - italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x ) ) + ( italic_Y - italic_X ) italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x ) + ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_B start_POSTSUBSCRIPT italic_t , italic_j end_POSTSUBSCRIPT ⋅ ( italic_f start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_y ) - italic_f start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_x ) ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT
Y𝗈𝗉ft(y)ft(x)+YX𝗈𝗉ft(x)+j=1t|Bt,j|fj(y)fj(x)absentsubscriptnorm𝑌𝗈𝗉normsubscript𝑓𝑡𝑦subscript𝑓𝑡𝑥subscriptnorm𝑌𝑋𝗈𝗉normsubscript𝑓𝑡𝑥superscriptsubscript𝑗1𝑡subscript𝐵𝑡𝑗normsubscript𝑓𝑗𝑦subscript𝑓𝑗𝑥\displaystyle\leqslant\|Y\|_{\operatorname{\mathsf{op}}}\|f_{t}(y)-f_{t}(x)\|+% \|Y-X\|_{\operatorname{\mathsf{op}}}\|f_{t}(x)\|+\sum_{j=1}^{t}|B_{t,j}|\cdot% \left\|f_{j}(y)-f_{j}(x)\right\|⩽ ∥ italic_Y ∥ start_POSTSUBSCRIPT sansserif_op end_POSTSUBSCRIPT ∥ italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y ) - italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x ) ∥ + ∥ italic_Y - italic_X ∥ start_POSTSUBSCRIPT sansserif_op end_POSTSUBSCRIPT ∥ italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x ) ∥ + ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT | italic_B start_POSTSUBSCRIPT italic_t , italic_j end_POSTSUBSCRIPT | ⋅ ∥ italic_f start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_y ) - italic_f start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_x ) ∥
And since Y𝗈𝗉2X𝗈𝗉=O(1)subscriptnorm𝑌𝗈𝗉2subscriptnorm𝑋𝗈𝗉𝑂1\|Y\|_{\operatorname{\mathsf{op}}}\leqslant 2\|X\|_{\operatorname{\mathsf{op}}% }=O(1)∥ italic_Y ∥ start_POSTSUBSCRIPT sansserif_op end_POSTSUBSCRIPT ⩽ 2 ∥ italic_X ∥ start_POSTSUBSCRIPT sansserif_op end_POSTSUBSCRIPT = italic_O ( 1 ) with high probability and |Bt,j|=O(1)subscript𝐵𝑡𝑗𝑂1|B_{t,j}|=O(1)| italic_B start_POSTSUBSCRIPT italic_t , italic_j end_POSTSUBSCRIPT | = italic_O ( 1 ) from the subgaussianity of X𝑋Xitalic_X, and YX𝗈𝗉εsubscriptnorm𝑌𝑋𝗈𝗉𝜀\|Y-X\|_{\operatorname{\mathsf{op}}}\leqslant\varepsilon∥ italic_Y - italic_X ∥ start_POSTSUBSCRIPT sansserif_op end_POSTSUBSCRIPT ⩽ italic_ε by assumption, for a constant C𝐶Citalic_C sufficiently large,
(Ct+C/2)maxktfk(y)fk(x)2+εft(x)2absent𝐶𝑡𝐶2subscript𝑘𝑡subscriptnormsubscript𝑓𝑘𝑦subscript𝑓𝑘𝑥2𝜀subscriptnormsubscript𝑓𝑡𝑥2\displaystyle\leqslant(Ct+C/2)\max_{k\leqslant t}\left\|f_{k}(y)-f_{k}(x)% \right\|_{2}+\varepsilon\left\|f_{t}(x)\right\|_{2}⩽ ( italic_C italic_t + italic_C / 2 ) roman_max start_POSTSUBSCRIPT italic_k ⩽ italic_t end_POSTSUBSCRIPT ∥ italic_f start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_y ) - italic_f start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_x ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + italic_ε ∥ italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT

To control the first term, we invoke the Lipschitzness of f𝑓fitalic_f,

fk(y)fk(x)22Lj=1ky(j)x(j)22Lky(k1)x(k1)22(Ck(k)!)2ε2n.superscriptsubscriptnormsubscript𝑓𝑘𝑦subscript𝑓𝑘𝑥22𝐿superscriptsubscript𝑗1𝑘superscriptsubscriptnormsuperscript𝑦𝑗superscript𝑥𝑗22𝐿𝑘superscriptsubscriptnormsuperscript𝑦𝑘1superscript𝑥𝑘122superscriptsuperscript𝐶𝑘𝑘2superscript𝜀2𝑛\|f_{k}(y)-f_{k}(x)\|_{2}^{2}\leqslant L\sum_{j=1}^{k}\|y^{(j)}-x^{(j)}\|_{2}^% {2}\leqslant Lk\|y^{(k-1)}-x^{(k-1)}\|_{2}^{2}\leqslant(C^{k}(k)!)^{2}\cdot% \varepsilon^{2}n.∥ italic_f start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_y ) - italic_f start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_x ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⩽ italic_L ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ italic_y start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT - italic_x start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⩽ italic_L italic_k ∥ italic_y start_POSTSUPERSCRIPT ( italic_k - 1 ) end_POSTSUPERSCRIPT - italic_x start_POSTSUPERSCRIPT ( italic_k - 1 ) end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⩽ ( italic_C start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ( italic_k ) ! ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⋅ italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_n .

For the second term, we have from Corollary 2.3 (applied with ε=1𝜀1\varepsilon=1italic_ε = 1) that ft(x)212Cnsubscriptnormsubscript𝑓𝑡𝑥212𝐶𝑛\|f_{t}(x)\|_{2}\leqslant\frac{1}{2}C\sqrt{n}∥ italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⩽ divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_C square-root start_ARG italic_n end_ARG. Combining these facts, we find that

y(t)x(t)2(Ct+12C)(Ctt!)εn+12Cεn=εnCt+1(t+1)!subscriptnormsuperscript𝑦𝑡superscript𝑥𝑡2𝐶𝑡12𝐶superscript𝐶𝑡𝑡𝜀𝑛12𝐶𝜀𝑛𝜀𝑛superscript𝐶𝑡1𝑡1\|y^{(t)}-x^{(t)}\|_{2}\leqslant(Ct+\tfrac{1}{2}C)(C^{t}t!)\cdot\varepsilon% \sqrt{n}+\tfrac{1}{2}C\varepsilon\sqrt{n}=\varepsilon\sqrt{n}\cdot C^{t+1}(t+1)!∥ italic_y start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT - italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⩽ ( italic_C italic_t + divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_C ) ( italic_C start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_t ! ) ⋅ italic_ε square-root start_ARG italic_n end_ARG + divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_C italic_ε square-root start_ARG italic_n end_ARG = italic_ε square-root start_ARG italic_n end_ARG ⋅ italic_C start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ( italic_t + 1 ) !

and so

1ny(t)x(t)22ε2(Ct+1(t+1)!)2.1𝑛superscriptsubscriptnormsuperscript𝑦𝑡superscript𝑥𝑡22superscript𝜀2superscriptsuperscript𝐶𝑡1𝑡12\frac{1}{n}\|y^{(t)}-x^{(t)}\|_{2}^{2}\leqslant\varepsilon^{2}\cdot(C^{t+1}(t+% 1)!)^{2}.\qeddivide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∥ italic_y start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT - italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⩽ italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⋅ ( italic_C start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ( italic_t + 1 ) ! ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . italic_∎

Acknowledgments

We thank Spencer Compton, Sam Hopkins and Andrea Montanari for helpful discussions.

References

  • [BLM12] Mohsen Bayati, Marc Lelarge, and Andrea Montanari. Universality in polytope phase transitions and iterative algorithms. In Proceedings of the 2012 IEEE International Symposium on Information Theory, ISIT 2012, Cambridge, MA, USA, July 1-6, 2012, pages 1643–1647. IEEE, 2012.
  • [BM11] Mohsen Bayati and Andrea Montanari. The dynamics of message passing on dense graphs, with applications to compressed sensing. IEEE Transactions on Information Theory, 57(2):764–785, 2011.
  • [BMN20] Raphael Berthier, Andrea Montanari, and Phan-Minh Nguyen. State evolution for approximate message passing with non-separable functions. Information and Inference: A Journal of the IMA, 9(1):33–79, 2020.
  • [Bol14] Erwin Bolthausen. An iterative construction of solutions of the TAP equations for the Sherrington–Kirkpatrick model. Communications in Mathematical Physics, 325(1):333–366, 2014.
  • [CL20] Wei-Kuo Chen and Wai-Kit Lam. Universality of approximate message passing algorithms. CoRR, abs/2003.10431, 2020.
  • [CZK14] Francesco Caltagirone, Lenka Zdeborová, and Florent Krzakala. On convergence of approximate message passing. In 2014 IEEE International Symposium on Information Theory, pages 1812–1816. IEEE, 2014.
  • [DdHS23] Jingqiu Ding, Tommaso d’Orsi, Yiding Hua, and David Steurer. Reaching Kesten-Stigum threshold in the stochastic block model under node corruptions. In The Thirty Sixth Annual Conference on Learning Theory, pages 4044–4071. PMLR, 2023.
  • [DdNS22] Jingqiu Ding, Tommaso d’Orsi, Rajai Nasser, and David Steurer. Robust recovery for stochastic block models. In 2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS), pages 387–394. IEEE, 2022.
  • [DKK+19] Ilias Diakonikolas, Gautam Kamath, Daniel Kane, Jerry Li, Jacob Steinhardt, and Alistair Stewart. Sever: A robust meta-algorithm for stochastic optimization. In International Conference on Machine Learning, pages 1596–1606. PMLR, 2019.
  • [DM14] Yash Deshpande and Andrea Montanari. Information-theoretically optimal sparse pca. In 2014 IEEE International Symposium on Information Theory, pages 2197–2201. IEEE, 2014.
  • [DMM09] David L Donoho, Arian Maleki, and Andrea Montanari. Message-passing algorithms for compressed sensing. Proceedings of the National Academy of Sciences, 106(45):18914–18919, 2009.
  • [FVRS22] Oliver Y Feng, Ramji Venkataramanan, Cynthia Rush, and Richard J Samworth. A unifying tutorial on approximate message passing. Foundations and Trends in Machine Learning, 15(4):335–536, 2022.
  • [HSSS16] Samuel B Hopkins, Tselil Schramm, Jonathan Shi, and David Steurer. Fast spectral algorithms from sum-of-squares proofs: tensor decomposition and planted sparse vectors. In Proceedings of the forty-eighth annual ACM symposium on Theory of Computing, pages 178–191, 2016.
  • [IS24] Misha Ivkov and Tselil Schramm. Semidefinite programs simulate approximate message passing robustly. In Proceedings of the 56th Annual ACM Symposium on Theory of Computing, pages 348–357, 2024.
  • [JLST21] Arun Jambulapati, Jerry Li, Tselil Schramm, and Kevin Tian. Robust regression revisited: Acceleration and improved estimation rates. Advances in Neural Information Processing Systems, 34:4475–4488, 2021.
  • [JP24] Chris Jones and Lucas Pesenti. Diagram analysis of iterative algorithms. CoRR, abs/2404.07881, 2024.
  • [KMS+12] Florent Krzakala, Marc Mézard, Francois Sausset, Yifan Sun, and Lenka Zdeborová. Probabilistic reconstruction in compressed sensing: algorithms, phase diagrams, and threshold achieving matrices. Journal of Statistical Mechanics: Theory and Experiment, 2012(08):P08009, 2012.
  • [Mon12] Andrea Montanari. Graphical models concepts in compressed sensing. Compressed Sensing: Theory and Applications, page 394, 2012.
  • [Mon21] Andrea Montanari. Optimization of the Sherrington–Kirkpatrick Hamiltonian. SIAM Journal on Computing, (0):FOCS19–1, 2021.
  • [MR15] Andrea Montanari and Emile Richard. Non-negative principal component analysis: Message passing algorithms and sharp asymptotics. IEEE Transactions on Information Theory, 62(3):1458–1484, 2015.
  • [MRW24] Sidhanth Mohanty, Prasad Raghavendra, and David X Wu. Robust recovery for stochastic block models, simplified and generalized. In Proceedings of the 56th Annual ACM Symposium on Theory of Computing, pages 367–374, 2024.
  • [RSFS19] Sundeep Rangan, Philip Schniter, Alyson K Fletcher, and Subrata Sarkar. On the convergence of approximate message passing with arbitrary matrices. IEEE Transactions on Information Theory, 65(9):5339–5351, 2019.
  • [Ver18] Roman Vershynin. High-dimensional probability: An introduction with applications in data science, volume 47. Cambridge university press, 2018.

Appendix A Statistics of AMP iterate entries

The most important theorem for us is known as state evolution, which intuitively states that the statistics of entries iterates of an AMP iteration behave somewhat like statistics of Gaussians. There are two version of state evolution that will be important to us, corresponding to polynomial and Lipschitz iterations.

Theorem A.1 (Polynomial State Evolution (e.g. [BLM12, Theorem 4] or [JP24, Theorem 4.21 and Theorem 5.2])).

Suppose that x0,x1,,xTsuperscript𝑥0superscript𝑥1superscript𝑥𝑇x^{0},x^{1},\ldots,x^{T}italic_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT , italic_x start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , … , italic_x start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT is an AMP iteration corresponding to polynomial denoisers, with input X𝑋Xitalic_X a symmetric matrix having i.i.d O(1)n𝑂1𝑛\frac{O(1)}{\sqrt{n}}divide start_ARG italic_O ( 1 ) end_ARG start_ARG square-root start_ARG italic_n end_ARG end_ARG-subgaussian entries with mean 00 and variance 1n1𝑛\frac{1}{n}divide start_ARG 1 end_ARG start_ARG italic_n end_ARG. Then, for any PL(k)PL𝑘\operatorname{PL}(k)roman_PL ( italic_k ) function φ:T+1:𝜑superscript𝑇1\varphi:\mathbb{R}^{T+1}\rightarrow\mathbb{R}italic_φ : blackboard_R start_POSTSUPERSCRIPT italic_T + 1 end_POSTSUPERSCRIPT → blackboard_R,

plimn1ni=1nφ(xiT,xiT1,,xi0)=𝐄[φ(UT,UT1,,U0)]subscriptplim𝑛1𝑛superscriptsubscript𝑖1𝑛𝜑subscriptsuperscript𝑥𝑇𝑖subscriptsuperscript𝑥𝑇1𝑖subscriptsuperscript𝑥0𝑖𝐄𝜑superscript𝑈𝑇superscript𝑈𝑇1superscript𝑈0\operatorname{\operatornamewithlimits{p-lim}}_{n\rightarrow\infty}\frac{1}{n}% \sum_{i=1}^{n}\varphi(x^{T}_{i},x^{T-1}_{i},\ldots,x^{0}_{i})=\operatorname*{% \mathbf{E}}[\varphi(U^{T},U^{T-1},\ldots,U^{0})]start_OPFUNCTION roman_p - roman_lim end_OPFUNCTION start_POSTSUBSCRIPT italic_n → ∞ end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_φ ( italic_x start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_x start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , … , italic_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = bold_E [ italic_φ ( italic_U start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT , italic_U start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT , … , italic_U start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) ]

where U0,U1,,UTsuperscript𝑈0superscript𝑈1superscript𝑈𝑇U^{0},U^{1},\ldots,U^{T}italic_U start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT , italic_U start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , … , italic_U start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT form an appropriate Gaussian process with covariance independent of n𝑛nitalic_n.

To prove a similar theorem in the Lipschitz setting, we note from a remark in [CL20] that [BLM12, Proposition 6] can be extended to handle Lipschitz denoiser functions, which implies the same universality as for polynomial denoisers.

Theorem A.2 (Lipschitz State Evolution (e.g. [FVRS22, Theorem 2.3])).

Suppose that x0,x1,,xTsuperscript𝑥0superscript𝑥1superscript𝑥𝑇x^{0},x^{1},\ldots,x^{T}italic_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT , italic_x start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , … , italic_x start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT is an AMP iteration corresponding to Lipschitz denoisers, with input X𝑋Xitalic_X a symmetric matrix having i.i.d O(1)n𝑂1𝑛\frac{O(1)}{\sqrt{n}}divide start_ARG italic_O ( 1 ) end_ARG start_ARG square-root start_ARG italic_n end_ARG end_ARG-subgaussian entries with mean 00 and variance 1n1𝑛\frac{1}{n}divide start_ARG 1 end_ARG start_ARG italic_n end_ARG. Then, for any PL(k)PL𝑘\operatorname{PL}(k)roman_PL ( italic_k ) function φ:T+1:𝜑superscript𝑇1\varphi:\mathbb{R}^{T+1}\rightarrow\mathbb{R}italic_φ : blackboard_R start_POSTSUPERSCRIPT italic_T + 1 end_POSTSUPERSCRIPT → blackboard_R,

plimn1ni=1nφ(xiT,xiT1,,xi0)=𝐄[φ(UT,UT1,,U0)]subscriptplim𝑛1𝑛superscriptsubscript𝑖1𝑛𝜑subscriptsuperscript𝑥𝑇𝑖subscriptsuperscript𝑥𝑇1𝑖subscriptsuperscript𝑥0𝑖𝐄𝜑superscript𝑈𝑇superscript𝑈𝑇1superscript𝑈0\operatorname{\operatornamewithlimits{p-lim}}_{n\rightarrow\infty}\frac{1}{n}% \sum_{i=1}^{n}\varphi(x^{T}_{i},x^{T-1}_{i},\ldots,x^{0}_{i})=\operatorname*{% \mathbf{E}}[\varphi(U^{T},U^{T-1},\ldots,U^{0})]start_OPFUNCTION roman_p - roman_lim end_OPFUNCTION start_POSTSUBSCRIPT italic_n → ∞ end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_φ ( italic_x start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_x start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , … , italic_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = bold_E [ italic_φ ( italic_U start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT , italic_U start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT , … , italic_U start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) ]

where U0,U1,,UTsuperscript𝑈0superscript𝑈1superscript𝑈𝑇U^{0},U^{1},\ldots,U^{T}italic_U start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT , italic_U start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , … , italic_U start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT form an appropriate Gaussian process with covariance independent of n𝑛nitalic_n.

As noted by [FVRS22, Remark 2.4], state evolution holds for Lipschitz denoisers when we consider Bt,jsubscript𝐵𝑡𝑗B_{t,j}italic_B start_POSTSUBSCRIPT italic_t , italic_j end_POSTSUBSCRIPT instead of bt,jsubscript𝑏𝑡𝑗b_{t,j}italic_b start_POSTSUBSCRIPT italic_t , italic_j end_POSTSUBSCRIPT in the AMP iteration. By adapting the version of this proof presented in [BMN20, Corollary 2], we may also substitute Bt,jsubscript𝐵𝑡𝑗B_{t,j}italic_B start_POSTSUBSCRIPT italic_t , italic_j end_POSTSUBSCRIPT for bt,jsubscript𝑏𝑡𝑗b_{t,j}italic_b start_POSTSUBSCRIPT italic_t , italic_j end_POSTSUBSCRIPT when considering polynomial denoisers.

Consequences of state evolution for Pseudo-Lipschitz functions

We collect some facts about Pseudo-Lipschitz functions, after which we can prove Corollary 2.3, which implies the concentration we need for order statistics of the AMP iterates.

Proposition A.3.

Suppose that f:t:𝑓superscript𝑡f:\mathbb{R}^{t}\rightarrow\mathbb{R}italic_f : blackboard_R start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT → blackboard_R is PL(a)PL𝑎\operatorname{PL}(a)roman_PL ( italic_a ) with Lipschitz constant L1subscript𝐿1L_{1}italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and g:t:𝑔superscript𝑡g:\mathbb{R}^{t}\rightarrow\mathbb{R}italic_g : blackboard_R start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT → blackboard_R is PL(b)PL𝑏\operatorname{PL}(b)roman_PL ( italic_b ) with Lipschitz constant L2subscript𝐿2L_{2}italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. Furthermore, suppose that f(0)=g(0)=0𝑓0𝑔00f(\vec{0})=g(\vec{0})=0italic_f ( over→ start_ARG 0 end_ARG ) = italic_g ( over→ start_ARG 0 end_ARG ) = 0. Then, fgPL(a+b)𝑓𝑔PL𝑎𝑏f\cdot g\in\operatorname{PL}(a+b)italic_f ⋅ italic_g ∈ roman_PL ( italic_a + italic_b ) with Lipschitz constant 12L1L212subscript𝐿1subscript𝐿212L_{1}L_{2}12 italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT.

Proof.

We may write

|f(x)g(x)f(y)g(y)|12|f(x)f(y)||g(x)+g(y)|+12|g(x)g(y)||f(x)+f(y)|.𝑓𝑥𝑔𝑥𝑓𝑦𝑔𝑦12𝑓𝑥𝑓𝑦𝑔𝑥𝑔𝑦12𝑔𝑥𝑔𝑦𝑓𝑥𝑓𝑦|f(x)g(x)-f(y)g(y)|\leqslant\tfrac{1}{2}|f(x)-f(y)|\cdot|g(x)+g(y)|+\tfrac{1}{% 2}|g(x)-g(y)|\cdot|f(x)+f(y)|.| italic_f ( italic_x ) italic_g ( italic_x ) - italic_f ( italic_y ) italic_g ( italic_y ) | ⩽ divide start_ARG 1 end_ARG start_ARG 2 end_ARG | italic_f ( italic_x ) - italic_f ( italic_y ) | ⋅ | italic_g ( italic_x ) + italic_g ( italic_y ) | + divide start_ARG 1 end_ARG start_ARG 2 end_ARG | italic_g ( italic_x ) - italic_g ( italic_y ) | ⋅ | italic_f ( italic_x ) + italic_f ( italic_y ) | .

We can see that |f(x)|L1(1+xa1)x𝑓𝑥subscript𝐿11superscriptnorm𝑥𝑎1norm𝑥|f(x)|\leqslant L_{1}(1+\|x\|^{a-1})\|x\|| italic_f ( italic_x ) | ⩽ italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( 1 + ∥ italic_x ∥ start_POSTSUPERSCRIPT italic_a - 1 end_POSTSUPERSCRIPT ) ∥ italic_x ∥ and |g(x)|L2(1+xb1)x𝑔𝑥subscript𝐿21superscriptnorm𝑥𝑏1norm𝑥|g(x)|\leqslant L_{2}(1+\|x\|^{b-1})\|x\|| italic_g ( italic_x ) | ⩽ italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( 1 + ∥ italic_x ∥ start_POSTSUPERSCRIPT italic_b - 1 end_POSTSUPERSCRIPT ) ∥ italic_x ∥ by applying centeredness. Therefore, for the first term we obtain that

|f(x)f(y)||g(x)+g(y)|𝑓𝑥𝑓𝑦𝑔𝑥𝑔𝑦\displaystyle|f(x)-f(y)|\cdot|g(x)+g(y)|| italic_f ( italic_x ) - italic_f ( italic_y ) | ⋅ | italic_g ( italic_x ) + italic_g ( italic_y ) | L1(1+xa1+ya1)xyL2(x+y+xb+yb)absentsubscript𝐿11superscriptnorm𝑥𝑎1superscriptnorm𝑦𝑎1norm𝑥𝑦subscript𝐿2norm𝑥norm𝑦superscriptnorm𝑥𝑏superscriptnorm𝑦𝑏\displaystyle\leqslant L_{1}(1+\|x\|^{a-1}+\|y\|^{a-1})\|x-y\|\cdot L_{2}(\|x% \|+\|y\|+\|x\|^{b}+\|y\|^{b})⩽ italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( 1 + ∥ italic_x ∥ start_POSTSUPERSCRIPT italic_a - 1 end_POSTSUPERSCRIPT + ∥ italic_y ∥ start_POSTSUPERSCRIPT italic_a - 1 end_POSTSUPERSCRIPT ) ∥ italic_x - italic_y ∥ ⋅ italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( ∥ italic_x ∥ + ∥ italic_y ∥ + ∥ italic_x ∥ start_POSTSUPERSCRIPT italic_b end_POSTSUPERSCRIPT + ∥ italic_y ∥ start_POSTSUPERSCRIPT italic_b end_POSTSUPERSCRIPT )
=L1L2xy[(1+xa1+ya1)(x+y+xb+yb)]absentsubscript𝐿1subscript𝐿2norm𝑥𝑦delimited-[]1superscriptnorm𝑥𝑎1superscriptnorm𝑦𝑎1norm𝑥norm𝑦superscriptnorm𝑥𝑏superscriptnorm𝑦𝑏\displaystyle=L_{1}L_{2}\|x-y\|\left[(1+\|x\|^{a-1}+\|y\|^{a-1})(\|x\|+\|y\|+% \|x\|^{b}+\|y\|^{b})\right]= italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∥ italic_x - italic_y ∥ [ ( 1 + ∥ italic_x ∥ start_POSTSUPERSCRIPT italic_a - 1 end_POSTSUPERSCRIPT + ∥ italic_y ∥ start_POSTSUPERSCRIPT italic_a - 1 end_POSTSUPERSCRIPT ) ( ∥ italic_x ∥ + ∥ italic_y ∥ + ∥ italic_x ∥ start_POSTSUPERSCRIPT italic_b end_POSTSUPERSCRIPT + ∥ italic_y ∥ start_POSTSUPERSCRIPT italic_b end_POSTSUPERSCRIPT ) ]

It remains to show that the product of the latter two terms is at most C(1+xa+b1+ya+b1)𝐶1superscriptnorm𝑥𝑎𝑏1superscriptnorm𝑦𝑎𝑏1C(1+\|x\|^{a+b-1}+\|y\|^{a+b-1})italic_C ( 1 + ∥ italic_x ∥ start_POSTSUPERSCRIPT italic_a + italic_b - 1 end_POSTSUPERSCRIPT + ∥ italic_y ∥ start_POSTSUPERSCRIPT italic_a + italic_b - 1 end_POSTSUPERSCRIPT ). Consider casing on whether max(x,y)1norm𝑥norm𝑦1\max(\|x\|,\|y\|)\leqslant 1roman_max ( ∥ italic_x ∥ , ∥ italic_y ∥ ) ⩽ 1. If this is at most 1111, the above product is bounded by 12121212. Else, suppose without loss of generality that xynorm𝑥norm𝑦\|x\|\geqslant\|y\|∥ italic_x ∥ ⩾ ∥ italic_y ∥ and x>1norm𝑥1\|x\|>1∥ italic_x ∥ > 1. Then, we can bound the product by 3xa14xb=12xa+b13superscriptnorm𝑥𝑎14superscriptnorm𝑥𝑏12superscriptnorm𝑥𝑎𝑏13\|x\|^{a-1}\cdot 4\|x\|^{b}=12\|x\|^{a+b-1}3 ∥ italic_x ∥ start_POSTSUPERSCRIPT italic_a - 1 end_POSTSUPERSCRIPT ⋅ 4 ∥ italic_x ∥ start_POSTSUPERSCRIPT italic_b end_POSTSUPERSCRIPT = 12 ∥ italic_x ∥ start_POSTSUPERSCRIPT italic_a + italic_b - 1 end_POSTSUPERSCRIPT. This implies that

|f(x)f(y)||g(x)+g(y)|12L1L2(1+xa+b1+ya+b1)xy𝑓𝑥𝑓𝑦𝑔𝑥𝑔𝑦12subscript𝐿1subscript𝐿21superscriptnorm𝑥𝑎𝑏1superscriptnorm𝑦𝑎𝑏1norm𝑥𝑦|f(x)-f(y)|\cdot|g(x)+g(y)|\leqslant 12L_{1}L_{2}(1+\|x\|^{a+b-1}+\|y\|^{a+b-1% })\|x-y\|| italic_f ( italic_x ) - italic_f ( italic_y ) | ⋅ | italic_g ( italic_x ) + italic_g ( italic_y ) | ⩽ 12 italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( 1 + ∥ italic_x ∥ start_POSTSUPERSCRIPT italic_a + italic_b - 1 end_POSTSUPERSCRIPT + ∥ italic_y ∥ start_POSTSUPERSCRIPT italic_a + italic_b - 1 end_POSTSUPERSCRIPT ) ∥ italic_x - italic_y ∥

and symmetrically for |g(x)g(y)||f(x)+f(y)|𝑔𝑥𝑔𝑦𝑓𝑥𝑓𝑦|g(x)-g(y)|\cdot|f(x)+f(y)|| italic_g ( italic_x ) - italic_g ( italic_y ) | ⋅ | italic_f ( italic_x ) + italic_f ( italic_y ) |. Thus, we have shown that fgPL(a+b)𝑓𝑔PL𝑎𝑏f\cdot g\in\operatorname{PL}(a+b)italic_f ⋅ italic_g ∈ roman_PL ( italic_a + italic_b ) with Lipschitz constant 12L1L212subscript𝐿1subscript𝐿212L_{1}L_{2}12 italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT.

Finally, we prove Corollary 2.3.

Corollary (Restatement of Corollary 2.3).

Suppose that f:t+1:𝑓superscript𝑡1f:\mathbb{R}^{t+1}\rightarrow\mathbb{R}italic_f : blackboard_R start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT → blackboard_R is PL(k)PL𝑘\operatorname{PL}(k)roman_PL ( italic_k ) for k0𝑘0k\geqslant 0italic_k ⩾ 0 and g:t+1:𝑔superscript𝑡1g:\mathbb{R}^{t+1}\rightarrow\mathbb{R}italic_g : blackboard_R start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT → blackboard_R is PL()PL\operatorname{PL}(\ell)roman_PL ( roman_ℓ ) with g(0)=0𝑔00g(\vec{0})=0italic_g ( over→ start_ARG 0 end_ARG ) = 0 and 11\ell\geqslant 1roman_ℓ ⩾ 1. Suppose x=x(t)𝑥superscript𝑥𝑡\vec{x}=x^{(t)}over→ start_ARG italic_x end_ARG = italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT is an AMP iterate resulting from the application of Pseudo Lipschitz denoisers on input X𝑋Xitalic_X a symmetric matrix with i.i.d. O(1)n𝑂1𝑛\frac{O(1)}{\sqrt{n}}divide start_ARG italic_O ( 1 ) end_ARG start_ARG square-root start_ARG italic_n end_ARG end_ARG-subgaussian entries having mean 00 and variance 1n1𝑛\frac{1}{n}divide start_ARG 1 end_ARG start_ARG italic_n end_ARG. Furthermore, let C>0𝐶0C>0italic_C > 0 be a constant (possibly depending on t𝑡titalic_t). Then, the following hold:

  • For any rmax(t,k)much-greater-than𝑟𝑡𝑘r\gg\max(t,k)italic_r ≫ roman_max ( italic_t , italic_k ),

    plimn1ni=1nf(xi)2𝟏[g(xi)2>θ]1θrC2r(3r)r.subscriptplim𝑛1𝑛superscriptsubscript𝑖1𝑛𝑓superscriptsubscript𝑥𝑖21delimited-[]𝑔superscriptsubscript𝑥𝑖2𝜃1superscript𝜃𝑟superscript𝐶2𝑟superscript3𝑟𝑟\operatorname{\operatornamewithlimits{p-lim}}_{n\rightarrow\infty}\frac{1}{n}% \sum_{i=1}^{n}f(\vec{x}_{i})^{2}\bm{1}[g(\vec{x}_{i})^{2}>\theta]\leqslant% \frac{1}{\theta^{r}}\cdot C^{2r}\cdot(3\ell r)^{\ell r}.start_OPFUNCTION roman_p - roman_lim end_OPFUNCTION start_POSTSUBSCRIPT italic_n → ∞ end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_f ( over→ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT bold_1 [ italic_g ( over→ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT > italic_θ ] ⩽ divide start_ARG 1 end_ARG start_ARG italic_θ start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT end_ARG ⋅ italic_C start_POSTSUPERSCRIPT 2 italic_r end_POSTSUPERSCRIPT ⋅ ( 3 roman_ℓ italic_r ) start_POSTSUPERSCRIPT roman_ℓ italic_r end_POSTSUPERSCRIPT .
  • For every I[n]𝐼delimited-[]𝑛I\subseteq[n]italic_I ⊆ [ italic_n ] with |I|εn𝐼𝜀𝑛|I|\leqslant\varepsilon n| italic_I | ⩽ italic_ε italic_n,

    plimn1niIf(xi)2Cεlogk1ε.subscriptplim𝑛1𝑛subscript𝑖𝐼𝑓superscriptsubscript𝑥𝑖2𝐶𝜀superscript𝑘1𝜀\operatorname{\operatornamewithlimits{p-lim}}_{n\rightarrow\infty}\frac{1}{n}% \sum_{i\in I}f(\vec{x}_{i})^{2}\leqslant C\varepsilon\log^{k}\frac{1}{% \varepsilon}.start_OPFUNCTION roman_p - roman_lim end_OPFUNCTION start_POSTSUBSCRIPT italic_n → ∞ end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i ∈ italic_I end_POSTSUBSCRIPT italic_f ( over→ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⩽ italic_C italic_ε roman_log start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_ε end_ARG .

Before the proof, note that a priori we have no control over 𝟏[g(xi)2>θ]1delimited-[]𝑔superscriptsubscript𝑥𝑖2𝜃\bm{1}[g(\vec{x}_{i})^{2}>\theta]bold_1 [ italic_g ( over→ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT > italic_θ ] (this is not Pseudo-Lipschitz at any degree). However, we show that we can approximate it above and below by Pseudo-Lipschitz functions and thus still reason about it.

Claim A.4

There exists a sequence of Lipschitz functions f1(x;L)subscript𝑓1𝑥𝐿f_{1}(x;L)italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_x ; italic_L ) and f2(x;L)subscript𝑓2𝑥𝐿f_{2}(x;L)italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_x ; italic_L ) each having Lipschitz constant L𝐿Litalic_L such that f1(x;L)𝟏[x>θ]f2(x;L)subscript𝑓1𝑥𝐿1delimited-[]𝑥𝜃subscript𝑓2𝑥𝐿f_{1}(x;L)\leqslant\bm{1}[x>\theta]\leqslant f_{2}(x;L)italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_x ; italic_L ) ⩽ bold_1 [ italic_x > italic_θ ] ⩽ italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_x ; italic_L ) and as L𝐿L\rightarrow\inftyitalic_L → ∞ these bounding functions converge to 𝟏[x2>θ]1delimited-[]superscript𝑥2𝜃\bm{1}[x^{2}>\theta]bold_1 [ italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT > italic_θ ].

Proof.

Define

f1(x;L)={0xθL(xθ)θ<xθ+1L1otherwiseandf2(x;L)={0xθ1LL(xθ)+1θ1L<xθ1otherwiseformulae-sequencesubscript𝑓1𝑥𝐿cases0𝑥𝜃𝐿𝑥𝜃𝜃𝑥𝜃1𝐿1otherwiseandsubscript𝑓2𝑥𝐿cases0𝑥𝜃1𝐿𝐿𝑥𝜃1𝜃1𝐿𝑥𝜃1otherwisef_{1}(x;L)=\begin{cases}0&\qquad x\leqslant\theta\\ L(x-\theta)&\qquad\theta<x\leqslant\theta+\frac{1}{L}\\ 1&\qquad\text{otherwise}\end{cases}\qquad\text{and}\qquad f_{2}(x;L)=\begin{% cases}0&\qquad x\leqslant\theta-\frac{1}{L}\\ L\left(x-\theta\right)+1&\qquad\theta-\frac{1}{L}<x\leqslant\theta\\ 1&\qquad\text{otherwise}\end{cases}italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_x ; italic_L ) = { start_ROW start_CELL 0 end_CELL start_CELL italic_x ⩽ italic_θ end_CELL end_ROW start_ROW start_CELL italic_L ( italic_x - italic_θ ) end_CELL start_CELL italic_θ < italic_x ⩽ italic_θ + divide start_ARG 1 end_ARG start_ARG italic_L end_ARG end_CELL end_ROW start_ROW start_CELL 1 end_CELL start_CELL otherwise end_CELL end_ROW and italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_x ; italic_L ) = { start_ROW start_CELL 0 end_CELL start_CELL italic_x ⩽ italic_θ - divide start_ARG 1 end_ARG start_ARG italic_L end_ARG end_CELL end_ROW start_ROW start_CELL italic_L ( italic_x - italic_θ ) + 1 end_CELL start_CELL italic_θ - divide start_ARG 1 end_ARG start_ARG italic_L end_ARG < italic_x ⩽ italic_θ end_CELL end_ROW start_ROW start_CELL 1 end_CELL start_CELL otherwise end_CELL end_ROW

which by definition satisfy the given constraint. ∎

This implies that state evolution holds for indicators (and we can treat them as if they are another Lipschitz function).

Proof of Corollary 2.3.

Let’s begin with the first bullet point. By state evolution, we have that

plimn1ni=1nf(xi)2𝟏[g(xi)2>θ]=𝐄UN(0,Σ)[f(U)2𝟏[g(U)2>θ]],subscriptplim𝑛1𝑛superscriptsubscript𝑖1𝑛𝑓superscriptsubscript𝑥𝑖21delimited-[]𝑔superscriptsubscript𝑥𝑖2𝜃subscript𝐄similar-to𝑈𝑁0Σ𝑓superscript𝑈21delimited-[]𝑔superscript𝑈2𝜃\operatorname{\operatornamewithlimits{p-lim}}_{n\rightarrow\infty}\frac{1}{n}% \sum_{i=1}^{n}f(\vec{x}_{i})^{2}\bm{1}[g(\vec{x}_{i})^{2}>\theta]=% \operatorname*{\mathbf{E}}_{U\sim N(0,\Sigma)}\left[f(\vec{U})^{2}\bm{1}[g(% \vec{U})^{2}>\theta]\right],start_OPFUNCTION roman_p - roman_lim end_OPFUNCTION start_POSTSUBSCRIPT italic_n → ∞ end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_f ( over→ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT bold_1 [ italic_g ( over→ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT > italic_θ ] = bold_E start_POSTSUBSCRIPT italic_U ∼ italic_N ( 0 , roman_Σ ) end_POSTSUBSCRIPT [ italic_f ( over→ start_ARG italic_U end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT bold_1 [ italic_g ( over→ start_ARG italic_U end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT > italic_θ ] ] ,

where ΣΣ\Sigmaroman_Σ is the covariance matrix of U𝑈\vec{U}over→ start_ARG italic_U end_ARG. Note that for any r1𝑟1r\geqslant 1italic_r ⩾ 1, 𝟏[x2>θ]x2rθr1delimited-[]superscript𝑥2𝜃superscript𝑥2𝑟superscript𝜃𝑟\bm{1}[x^{2}>\theta]\leqslant\frac{x^{2r}}{\theta^{r}}bold_1 [ italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT > italic_θ ] ⩽ divide start_ARG italic_x start_POSTSUPERSCRIPT 2 italic_r end_POSTSUPERSCRIPT end_ARG start_ARG italic_θ start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT end_ARG (by case analysis on whether x2>θsuperscript𝑥2𝜃x^{2}>\thetaitalic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT > italic_θ).

Therefore, we find that

𝐄UN(0,Σ)[f(U)2g(U)2𝟏[g(U)2>θ]]1θr𝐄U[(f(U)g(U)r)2].subscript𝐄similar-to𝑈𝑁0Σ𝑓superscript𝑈2𝑔superscript𝑈21delimited-[]𝑔superscript𝑈2𝜃1superscript𝜃𝑟subscript𝐄𝑈superscript𝑓𝑈𝑔superscript𝑈𝑟2\operatorname*{\mathbf{E}}_{U\sim N(0,\Sigma)}\left[f(\vec{U})^{2}g(\vec{U})^{% 2}\bm{1}[g(\vec{U})^{2}>\theta]\right]\leqslant\frac{1}{\theta^{r}}% \operatorname*{\mathbf{E}}_{U}\left[(f(\vec{U})g(\vec{U})^{r})^{2}\right].bold_E start_POSTSUBSCRIPT italic_U ∼ italic_N ( 0 , roman_Σ ) end_POSTSUBSCRIPT [ italic_f ( over→ start_ARG italic_U end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_g ( over→ start_ARG italic_U end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT bold_1 [ italic_g ( over→ start_ARG italic_U end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT > italic_θ ] ] ⩽ divide start_ARG 1 end_ARG start_ARG italic_θ start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT end_ARG bold_E start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT [ ( italic_f ( over→ start_ARG italic_U end_ARG ) italic_g ( over→ start_ARG italic_U end_ARG ) start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] .

By Proposition A.3, we have that f(U)g(U)rPL(a+br)𝑓𝑈𝑔superscript𝑈𝑟PL𝑎𝑏𝑟f(\vec{U})g(\vec{U})^{r}\in\operatorname{PL}(a+br)italic_f ( over→ start_ARG italic_U end_ARG ) italic_g ( over→ start_ARG italic_U end_ARG ) start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT ∈ roman_PL ( italic_a + italic_b italic_r ).

Define h(x)=f(Σ1/2x)g(Σ1/2x)r𝑥𝑓superscriptΣ12𝑥𝑔superscriptsuperscriptΣ12𝑥𝑟h(\vec{x})=f(\Sigma^{1/2}x)\cdot g(\Sigma^{1/2}x)^{r}italic_h ( over→ start_ARG italic_x end_ARG ) = italic_f ( roman_Σ start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT italic_x ) ⋅ italic_g ( roman_Σ start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT italic_x ) start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT, which is also PL(a+br)PL𝑎𝑏𝑟\operatorname{PL}(a+br)roman_PL ( italic_a + italic_b italic_r ) and centered with Lipschitz constant at most (12LΣ𝗈𝗉)rsuperscript12𝐿subscriptnormΣ𝗈𝗉𝑟(12L\|\Sigma\|_{\operatorname{\mathsf{op}}})^{r}( 12 italic_L ∥ roman_Σ ∥ start_POSTSUBSCRIPT sansserif_op end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT. In particular, we thus have that h(x)(12LΣ𝗈𝗉)r(x+xa+br)𝑥superscript12𝐿subscriptnormΣ𝗈𝗉𝑟norm𝑥superscriptnorm𝑥𝑎𝑏𝑟h(\vec{x})\leqslant(12L\|\Sigma\|_{\operatorname{\mathsf{op}}})^{r}\cdot(\|x\|% +\|x\|^{a+br})italic_h ( over→ start_ARG italic_x end_ARG ) ⩽ ( 12 italic_L ∥ roman_Σ ∥ start_POSTSUBSCRIPT sansserif_op end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT ⋅ ( ∥ italic_x ∥ + ∥ italic_x ∥ start_POSTSUPERSCRIPT italic_a + italic_b italic_r end_POSTSUPERSCRIPT ) and

h(x)22(12LΣ𝗈𝗉)2r(x2+x2(a+br)).superscript𝑥22superscript12𝐿subscriptnormΣ𝗈𝗉2𝑟superscriptnorm𝑥2superscriptnorm𝑥2𝑎𝑏𝑟h(\vec{x})^{2}\leqslant 2(12L\|\Sigma\|_{\operatorname{\mathsf{op}}})^{2r}(\|x% \|^{2}+\|x\|^{2(a+br)}).italic_h ( over→ start_ARG italic_x end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⩽ 2 ( 12 italic_L ∥ roman_Σ ∥ start_POSTSUBSCRIPT sansserif_op end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 italic_r end_POSTSUPERSCRIPT ( ∥ italic_x ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ∥ italic_x ∥ start_POSTSUPERSCRIPT 2 ( italic_a + italic_b italic_r ) end_POSTSUPERSCRIPT ) .

Now, we may compute that

𝐄gN(0,I)[h(g)2r]subscript𝐄similar-to𝑔𝑁0𝐼superscript𝑔2𝑟\displaystyle\operatorname*{\mathbf{E}}_{g\sim N(0,I)}[h(\vec{g})^{2r}]bold_E start_POSTSUBSCRIPT italic_g ∼ italic_N ( 0 , italic_I ) end_POSTSUBSCRIPT [ italic_h ( over→ start_ARG italic_g end_ARG ) start_POSTSUPERSCRIPT 2 italic_r end_POSTSUPERSCRIPT ] 2(12LΣ𝗈𝗉)2r(1+k0+k1++kt=a+bri=0t𝐄[xi2ki])absent2superscript12𝐿subscriptnormΣ𝗈𝗉2𝑟1subscriptsubscript𝑘0subscript𝑘1subscript𝑘𝑡𝑎𝑏𝑟superscriptsubscriptproduct𝑖0𝑡𝐄superscriptsubscript𝑥𝑖2subscript𝑘𝑖\displaystyle\leqslant 2(12L\|\Sigma\|_{\operatorname{\mathsf{op}}})^{2r}\cdot% \left(1+\sum_{k_{0}+k_{1}+\cdots+k_{t}=a+br}\,\,\,\prod_{i=0}^{t}\operatorname% *{\mathbf{E}}\left[x_{i}^{2k_{i}}\right]\right)⩽ 2 ( 12 italic_L ∥ roman_Σ ∥ start_POSTSUBSCRIPT sansserif_op end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 italic_r end_POSTSUPERSCRIPT ⋅ ( 1 + ∑ start_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_k start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + ⋯ + italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_a + italic_b italic_r end_POSTSUBSCRIPT ∏ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT bold_E [ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 italic_k start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ] )
2(12LΣ𝗈𝗉)2r(1+(t+a+brt)(2(a+br))a+br)absent2superscript12𝐿subscriptnormΣ𝗈𝗉2𝑟1binomial𝑡𝑎𝑏𝑟𝑡superscript2𝑎𝑏𝑟𝑎𝑏𝑟\displaystyle\leqslant 2(12L\|\Sigma\|_{\operatorname{\mathsf{op}}})^{2r}\left% (1+\binom{t+a+br}{t}(2(a+br))^{a+br}\right)⩽ 2 ( 12 italic_L ∥ roman_Σ ∥ start_POSTSUBSCRIPT sansserif_op end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 italic_r end_POSTSUPERSCRIPT ( 1 + ( FRACOP start_ARG italic_t + italic_a + italic_b italic_r end_ARG start_ARG italic_t end_ARG ) ( 2 ( italic_a + italic_b italic_r ) ) start_POSTSUPERSCRIPT italic_a + italic_b italic_r end_POSTSUPERSCRIPT )
2(12LΣ𝗈𝗉)2r(2(a+br))a+br+tabsent2superscript12𝐿subscriptnormΣ𝗈𝗉2𝑟superscript2𝑎𝑏𝑟𝑎𝑏𝑟𝑡\displaystyle\leqslant 2(12L\|\Sigma\|_{\operatorname{\mathsf{op}}})^{2r}\cdot% (2(a+br))^{a+br+t}⩽ 2 ( 12 italic_L ∥ roman_Σ ∥ start_POSTSUBSCRIPT sansserif_op end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 italic_r end_POSTSUPERSCRIPT ⋅ ( 2 ( italic_a + italic_b italic_r ) ) start_POSTSUPERSCRIPT italic_a + italic_b italic_r + italic_t end_POSTSUPERSCRIPT
(24LΣ𝗈𝗉)2r(3br)brabsentsuperscript24𝐿subscriptnormΣ𝗈𝗉2𝑟superscript3𝑏𝑟𝑏𝑟\displaystyle\leqslant(24L\|\Sigma\|_{\operatorname{\mathsf{op}}})^{2r}\cdot(3% br)^{br}⩽ ( 24 italic_L ∥ roman_Σ ∥ start_POSTSUBSCRIPT sansserif_op end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 italic_r end_POSTSUPERSCRIPT ⋅ ( 3 italic_b italic_r ) start_POSTSUPERSCRIPT italic_b italic_r end_POSTSUPERSCRIPT

where in the last two steps we used that rmax(a,t)much-greater-than𝑟𝑎𝑡r\gg\max(a,t)italic_r ≫ roman_max ( italic_a , italic_t ).

Now, we may use this to prove the second bullet point. We can assume without loss of generality that f(0)=0𝑓00f(\vec{0})=0italic_f ( over→ start_ARG 0 end_ARG ) = 0: otherwise, note that f(x)22(f(x)f(0))2+2f(0)2𝑓superscript𝑥22superscript𝑓𝑥𝑓022𝑓superscript02f(x)^{2}\leqslant 2(f(x)-f(0))^{2}+2f(0)^{2}italic_f ( italic_x ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⩽ 2 ( italic_f ( italic_x ) - italic_f ( 0 ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 italic_f ( 0 ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT so we can with a factor of 2222 loss consider centering f𝑓fitalic_f. Furthermore, note that we only need to consider the top ε𝜀\varepsilonitalic_ε quantile of indices i𝑖iitalic_i to prove the statement of the lemma. Now, for any θ>1𝜃1\theta>1italic_θ > 1, consider writing

i=1nf(xi)2𝟏[i in ε quantile]superscriptsubscript𝑖1𝑛𝑓superscriptsubscript𝑥𝑖21delimited-[]𝑖 in 𝜀 quantile\displaystyle\sum_{i=1}^{n}f(\vec{x}_{i})^{2}\bm{1}[i\text{ in }\varepsilon% \text{ quantile}]∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_f ( over→ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT bold_1 [ italic_i in italic_ε quantile ] =i=1nf(xi)2𝟏[i in ε quantile]𝟏[f(xi)2θ]+i=1nf(xi)2𝟏[i in ε quantile]𝟏[f(xi)2>θ]absentsuperscriptsubscript𝑖1𝑛𝑓superscriptsubscript𝑥𝑖21delimited-[]𝑖 in 𝜀 quantile1delimited-[]𝑓superscriptsubscript𝑥𝑖2𝜃superscriptsubscript𝑖1𝑛𝑓superscriptsubscript𝑥𝑖21delimited-[]𝑖 in 𝜀 quantile1delimited-[]𝑓superscriptsubscript𝑥𝑖2𝜃\displaystyle=\sum_{i=1}^{n}f(\vec{x}_{i})^{2}\bm{1}[i\text{ in }\varepsilon% \text{ quantile}]\bm{1}[f(\vec{x}_{i})^{2}\leqslant\theta]+\sum_{i=1}^{n}f(% \vec{x}_{i})^{2}\bm{1}[i\text{ in }\varepsilon\text{ quantile}]\bm{1}[f(\vec{x% }_{i})^{2}>\theta]= ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_f ( over→ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT bold_1 [ italic_i in italic_ε quantile ] bold_1 [ italic_f ( over→ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⩽ italic_θ ] + ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_f ( over→ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT bold_1 [ italic_i in italic_ε quantile ] bold_1 [ italic_f ( over→ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT > italic_θ ]
εθn+i=1nf(xi)2𝟏[f(xi)2>θ].absent𝜀𝜃𝑛superscriptsubscript𝑖1𝑛𝑓superscriptsubscript𝑥𝑖21delimited-[]𝑓superscriptsubscript𝑥𝑖2𝜃\displaystyle\leqslant\varepsilon\theta\cdot n+\sum_{i=1}^{n}f(\vec{x}_{i})^{2% }\bm{1}[f(\vec{x}_{i})^{2}>\theta].⩽ italic_ε italic_θ ⋅ italic_n + ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_f ( over→ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT bold_1 [ italic_f ( over→ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT > italic_θ ] .

Therefore, dividing both sides by n𝑛nitalic_n and taking the plimplim\operatorname{\operatornamewithlimits{p-lim}}roman_p - roman_lim implies that

plimn1ni=1nf(xi)2𝟏[i in ε quantile]εθ+plimn1ni=1nf(xi)2𝟏[f(xi)2>θ].subscriptplim𝑛1𝑛superscriptsubscript𝑖1𝑛𝑓superscriptsubscript𝑥𝑖21delimited-[]𝑖 in 𝜀 quantile𝜀𝜃subscriptplim𝑛1𝑛superscriptsubscript𝑖1𝑛𝑓superscriptsubscript𝑥𝑖21delimited-[]𝑓superscriptsubscript𝑥𝑖2𝜃\operatorname{\operatornamewithlimits{p-lim}}_{n\rightarrow\infty}\frac{1}{n}% \sum_{i=1}^{n}f(\vec{x}_{i})^{2}\bm{1}[i\text{ in }\varepsilon\text{ quantile}% ]\leqslant\varepsilon\theta+\operatorname{\operatornamewithlimits{p-lim}}_{n% \rightarrow\infty}\frac{1}{n}\sum_{i=1}^{n}f(\vec{x}_{i})^{2}\bm{1}[f(\vec{x}_% {i})^{2}>\theta].start_OPFUNCTION roman_p - roman_lim end_OPFUNCTION start_POSTSUBSCRIPT italic_n → ∞ end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_f ( over→ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT bold_1 [ italic_i in italic_ε quantile ] ⩽ italic_ε italic_θ + start_OPFUNCTION roman_p - roman_lim end_OPFUNCTION start_POSTSUBSCRIPT italic_n → ∞ end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_f ( over→ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT bold_1 [ italic_f ( over→ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT > italic_θ ] .

Our goal is to show that by choosing θ=Θ(logk1ε)𝜃Θsuperscript𝑘1𝜀\theta=\Theta(\log^{k}\frac{1}{\varepsilon})italic_θ = roman_Θ ( roman_log start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_ε end_ARG ), the latter expectation is O(εlogk1ε)𝑂𝜀superscript𝑘1𝜀O(\varepsilon\log^{k}\frac{1}{\varepsilon})italic_O ( italic_ε roman_log start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_ε end_ARG ) which would complete the proof. By the above result, we have that

plimn1ni=1nf(xi)2𝟏[f(xi)2>θ]1θrC2r(3kr)krsubscriptplim𝑛1𝑛superscriptsubscript𝑖1𝑛𝑓superscriptsubscript𝑥𝑖21delimited-[]𝑓superscriptsubscript𝑥𝑖2𝜃1superscript𝜃𝑟superscript𝐶2𝑟superscript3𝑘𝑟𝑘𝑟\operatorname{\operatornamewithlimits{p-lim}}_{n\rightarrow\infty}\frac{1}{n}% \sum_{i=1}^{n}f(\vec{x}_{i})^{2}\bm{1}[f(\vec{x}_{i})^{2}>\theta]\leqslant% \frac{1}{\theta^{r}}\cdot C^{2r}\cdot(3kr)^{kr}start_OPFUNCTION roman_p - roman_lim end_OPFUNCTION start_POSTSUBSCRIPT italic_n → ∞ end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_f ( over→ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT bold_1 [ italic_f ( over→ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT > italic_θ ] ⩽ divide start_ARG 1 end_ARG start_ARG italic_θ start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT end_ARG ⋅ italic_C start_POSTSUPERSCRIPT 2 italic_r end_POSTSUPERSCRIPT ⋅ ( 3 italic_k italic_r ) start_POSTSUPERSCRIPT italic_k italic_r end_POSTSUPERSCRIPT

(since fPL(k)𝑓PL𝑘f\in\operatorname{PL}(k)italic_f ∈ roman_PL ( italic_k )). Take r=log1ε𝑟1𝜀r=\log\frac{1}{\varepsilon}italic_r = roman_log divide start_ARG 1 end_ARG start_ARG italic_ε end_ARG and θ=3ekC2rk=Θε(logk1ε)𝜃3𝑒𝑘superscript𝐶2superscript𝑟𝑘subscriptΘ𝜀superscript𝑘1𝜀\theta=3ekC^{2}\cdot r^{k}=\Theta_{\varepsilon}(\log^{k}\frac{1}{\varepsilon})italic_θ = 3 italic_e italic_k italic_C start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⋅ italic_r start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT = roman_Θ start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT ( roman_log start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_ε end_ARG ). From here, we find that

plimn1ni=1nf(xi)2𝟏[f(xi)2>θ]subscriptplim𝑛1𝑛superscriptsubscript𝑖1𝑛𝑓superscriptsubscript𝑥𝑖21delimited-[]𝑓superscriptsubscript𝑥𝑖2𝜃\displaystyle\operatorname{\operatornamewithlimits{p-lim}}_{n\rightarrow\infty% }\frac{1}{n}\sum_{i=1}^{n}f(\vec{x}_{i})^{2}\bm{1}[f(\vec{x}_{i})^{2}>\theta]start_OPFUNCTION roman_p - roman_lim end_OPFUNCTION start_POSTSUBSCRIPT italic_n → ∞ end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_f ( over→ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT bold_1 [ italic_f ( over→ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT > italic_θ ] 1θrC2r(3kr)krabsent1superscript𝜃𝑟superscript𝐶2𝑟superscript3𝑘𝑟𝑘𝑟\displaystyle\leqslant\frac{1}{\theta^{r}}\cdot C^{2r}\cdot(3kr)^{kr}⩽ divide start_ARG 1 end_ARG start_ARG italic_θ start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT end_ARG ⋅ italic_C start_POSTSUPERSCRIPT 2 italic_r end_POSTSUPERSCRIPT ⋅ ( 3 italic_k italic_r ) start_POSTSUPERSCRIPT italic_k italic_r end_POSTSUPERSCRIPT
=(3kC2rk3ekC2rk)rabsentsuperscript3𝑘superscript𝐶2superscript𝑟𝑘3𝑒𝑘superscript𝐶2superscript𝑟𝑘𝑟\displaystyle=\left(\frac{3kC^{2}\cdot r^{k}}{3ekC^{2}\cdot r^{k}}\right)^{r}= ( divide start_ARG 3 italic_k italic_C start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⋅ italic_r start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT end_ARG start_ARG 3 italic_e italic_k italic_C start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⋅ italic_r start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT end_ARG ) start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT
ε.absent𝜀\displaystyle\leqslant\varepsilon.⩽ italic_ε .

Thus, we have that

plimn1ni=1nf(xi)2𝟏[i in ε quantile]subscriptplim𝑛1𝑛superscriptsubscript𝑖1𝑛𝑓superscriptsubscript𝑥𝑖21delimited-[]𝑖 in 𝜀 quantile\displaystyle\operatorname{\operatornamewithlimits{p-lim}}_{n\rightarrow\infty% }\frac{1}{n}\sum_{i=1}^{n}f(\vec{x}_{i})^{2}\bm{1}[i\text{ in }\varepsilon% \text{ quantile}]start_OPFUNCTION roman_p - roman_lim end_OPFUNCTION start_POSTSUBSCRIPT italic_n → ∞ end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_f ( over→ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT bold_1 [ italic_i in italic_ε quantile ] εθ+𝐄[f(U)2𝟏[f(U)2>θ]]3ekC2εlogk1εabsent𝜀𝜃𝐄𝑓superscript𝑈21delimited-[]𝑓superscript𝑈2𝜃3𝑒𝑘superscript𝐶2𝜀superscript𝑘1𝜀\displaystyle\leqslant\varepsilon\theta+\operatorname*{\mathbf{E}}\left[f(\vec% {U})^{2}\bm{1}[f(U)^{2}>\theta]\right]\leqslant 3ekC^{2}\cdot\varepsilon\log^{% k}\frac{1}{\varepsilon}⩽ italic_ε italic_θ + bold_E [ italic_f ( over→ start_ARG italic_U end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT bold_1 [ italic_f ( italic_U ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT > italic_θ ] ] ⩽ 3 italic_e italic_k italic_C start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⋅ italic_ε roman_log start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_ε end_ARG

which is exactly as desired. ∎