\startlocaldefs\endlocaldefs

, and

Repetition effects in a Sequential Monte Carlo sampler

Sarah Cannonlabel=e1][email protected]\orcid0000-0001-6510-4669 [    Daryl DeFordlabel=e2][email protected]\orcid0000-0003-2032-3168 [    Moon Duchinlabel=e3][email protected]\orcid0000-0003-4498-4067 [ Department of Mathematical Sciences, Claremont McKenna Collegepresep=, ]e1 Department of Mathematics and Statistics, Washington State Universitypresep=, ]e2 Department of Mathematics and Brooks School of Public Policy, Cornell Universitypresep=, ]e3
Abstract

We investigate the prevalence of sample repetition in a Sequential Monte Carlo (SMC) method recently introduced for political redistricting.

Sequential Monte Carlo,
sampling,
graph partitions,
redistricting,
keywords:

1 Introduction

In this note, we consider the structure of the SMC (Sequential Monte Carlo) method developed by McCartan–Imai for sampling partitions of a fixed node-weighted graph G𝐺Gitalic_G into a given number k𝑘kitalic_k of connected subgraphs with nearly equal total weight (McCartan and Imai, 2023). The main application is to redistricting, where the weight is by population, so we will refer to the partition as a plan and its parts as districts. SMC operates by fixing a sample size S𝑆Sitalic_S and a number of districts k𝑘kitalic_k, then creating a top generation of partial plans by marking off one connected subset of G𝐺Gitalic_G with roughly 1/k1𝑘1/k1 / italic_k of the total weight in each plan (Figure 1, left). In the next generation, S𝑆Sitalic_S agents sample with replacement from those partial plans according to a weighting function; continuing the progress in the partial plan, the agents then mark off a second subset of about the same size. This process continues for k𝑘kitalic_k generations until the entire graph is marked. We will summarize the salient combinatorial features of this scheme in a descendancy diagram D𝐷Ditalic_D (Figure 1, right), showing only the paths that connect from the bottom layer to the top. The nodes of D𝐷Ditalic_D involved in these paths are called active or activated. The final generation consists of S𝑆Sitalic_S complete partitions. This is the method developed by the SMC authors to generate ensembles of random districting plans, like the four complete plans at the bottom of Figure 1.

Refer to captionRefer to caption
Figure 1: Simple example of partitioning a 4×4444\times 44 × 4 grid into four districts. The adjacency pattern of the grid is the graph G𝐺Gitalic_G, the number of districts is k=4𝑘4k=4italic_k = 4, and the size of the sample is S=4𝑆4S=4italic_S = 4. At right, the process is abstracted into a descendancy diagram. The district marked last (green) does not have a row of the diagram, because it is made up of area left over after the third district (orange) is marked.

SMC samples face a certain amount of characteristic redundancy. As the authors note, ”because the SMC algorithm involves repeated resampling with replacement, for a finite number of samples it can suffer from particle system collapse (Liu, Chen and Logvinenko, 2001), where many of the sampled plans share a small number of common districts (which originate as common ancestor particles in the SMC scheme).” In practical terms, this means that SMC samples can tend to have certain districts or sets of districts repeated many times across plans, like the blue square at the top left of the grid in Figure 1, which appears in 3/4 of the final sample. One aim of this note is to study this concentration of ancestry in combinatorial terms.

The second aim is to understand the tradeoffs faced by users in the choice of sampler. Citing McCartan–Imai again: ”Like MCMC algorithms, the SMC algorithm generates samples which approximate the target distribution arbitrarily well as the sample size increases.” Our second set of questions explores the convergence guarantees and practical diagnostics available with SMC in relation to the production of various data artifacts used in redistricting analysis, like histograms and boxplots. (See also Cannon et al. (2022) for more comparison of methods.)

Diagnosing strengths and weaknesses of the method has important real-world value. This is because SMC for redistricting, as implemented by the authors in the open-source package Redist (Kenny et al., ), is already in widespread use in courts of law. The repetition of districts has been flagged as a limitation that undermines statistical claims about the sample. For instance, mathematician Kristopher Tapp produced an affidavit in New York state Senate litigation in which he attempted a replication of another expert’s SMC ensemble of 5000 maps, and found that a certain set of 31 districts (covering about half of the state) appeared identically in over 64% of the sample.111Tapp further notes that while an ensemble of 5000 63-district maps can have up to 315,000 distinct districts, his replication ensemble had only 12,319, so that each district was repeated an average of 1360 times. He calls this ”a head-turning level of redundancy.” (Tapp, 2022) In New Mexico state Senate litigation, a defense brief described the SMC method as being ”plagued with duplicate simulations”; as a prophylactic measure to protect against this criticism, a defense expert cosmetically altered his SMC sample by perturbing the boundaries of districts so that he could claim that no districts were duplicated.222”Dr. Chen’s implementation of the MCMC version of an SMC algorithm [sic] did not result in any duplicated maps. [Exh. D, Dep. ST 54:17–55:17 (falsely testifying that Dr. Chen’s simulations contain duplicates), 136:6–136:20 (correcting his mistaken testimony)].” The opposing expert Sean Trende was no more sophisticated, opining in deposition that ”Duplicates happen all the time… So it doesn’t bother me, unless it gets extreme to where you end up having, like, 20 maps.” (NM Legislative Defendants, 2022) It is clear that rigorous attention to the issues around duplication and the consequences for statistical interpretation are greatly needed in the field.

1.1 Motivating questions

To understand the SMC algorithm for k𝑘kitalic_k districts, we will begin by studying uniform descendancy diagrams with levels (also called layers or generations) labeled i=1𝑖1i=1italic_i = 1 to k1𝑘1k-1italic_k - 1 from bottom to top, each of width S𝑆Sitalic_S, in which each node chooses a parent uniformly at random from the generation above. Nodes in the top layer (indexed k1𝑘1k-1italic_k - 1) represent partial plans with a single initial district marked, and those at level i𝑖iitalic_i represent districting plans with ki𝑘𝑖k-iitalic_k - italic_i districts marked. At level 1, k1𝑘1k-1italic_k - 1 districts are marked, which determines the k𝑘kitalic_kth and final district and amounts to specifying a complete plan. A node in a descendancy diagram is active if it has a descendant in the bottom layer, and surviving ancestors are active nodes at the top level (so named because they have descendants in the final population). Calculations using descendancy diagrams will omit all non-active nodes, because they have no role in the final sample constructed by SMC.

Refer to captionS=10𝑆10S=10italic_S = 10, k=6𝑘6k=6italic_k = 6Refer to captionS=12𝑆12S=12italic_S = 12, k=11𝑘11k=11italic_k = 11
Figure 2: These two figures show structures we call descendancy diagrams. The bottom row is labeled as generation 1 in each case, increasing in index with each layer until generation k1𝑘1k-1italic_k - 1 at the top. Each of these two diagrams has A(D)=2𝐴𝐷2A(D)=2italic_A ( italic_D ) = 2, meaning that there are two top-level ancestors from which all members of the bottom generation are descended.

If we write D𝒟(S,k)𝐷𝒟𝑆𝑘D\in\mathcal{D}(S,k)italic_D ∈ caligraphic_D ( italic_S , italic_k ) for a specific diagram of this form, then let A(D)𝐴𝐷A(D)italic_A ( italic_D ) be the number of surviving ancestors in that diagram, and let A(S,k)𝐴𝑆𝑘A(S,k)italic_A ( italic_S , italic_k ) be the expected number of surviving ancestors 𝔼(A(D))𝔼𝐴𝐷\mathbb{E}(A(D))blackboard_E ( italic_A ( italic_D ) ) as D𝐷Ditalic_D ranges over the uniform distribution on 𝒟(S,k)𝒟𝑆𝑘\mathcal{D}(S,k)caligraphic_D ( italic_S , italic_k ). We consider the following questions.

Refer to caption
Figure 3: The S=12,k=11formulae-sequence𝑆12𝑘11S=12,k=11italic_S = 12 , italic_k = 11 example is repeated, now with the diagram nodes decorated by their number of final-generation descendants. High numbers appearing on low levels are markers of extreme redundancy.
Question 1 (Ancestor extinction).

As a function of S𝑆Sitalic_S and k𝑘kitalic_k, what is the distribution of the number of surviving ancestors A(D)𝐴𝐷A(D)italic_A ( italic_D ) in a uniform descendancy diagram? Give bounds or asymptotics for the expectation A(S,k)𝐴𝑆𝑘A(S,k)italic_A ( italic_S , italic_k ).

Question 2 (Extent of redundancy).

Define G(D,j)𝐺𝐷𝑗G(D,j)italic_G ( italic_D , italic_j ) to be the number of final plans with at least j𝑗jitalic_j districts in common. How is that G𝐺Gitalic_G distributed over 𝒟𝒟\mathcal{D}caligraphic_D for each j𝑗jitalic_j?

For instance, see Figure 3. Seeing high numbers at low levels is an indicator of repetition. In this case, there is one district that appears in 11 out of 12 final plans (value 11 appearing in top row). Worse than that, it is part of a group of 3 districts that appear identically in those 11 plans (value 11 in row 8=k38𝑘38=k-38 = italic_k - 3). Also, there is a group of 7 districts that appears in 8 out of 12 final plans (value 8 in row 4=k74𝑘74=k-74 = italic_k - 7). So for the sample of plans constructed according to this diagram D𝐷Ditalic_D, we have G(D,3)=11𝐺𝐷311G(D,3)=11italic_G ( italic_D , 3 ) = 11 and G(D,7)=8𝐺𝐷78G(D,7)=8italic_G ( italic_D , 7 ) = 8.

The questions so far concern the combinatorics of descendancy diagrams, but the combinatorics is only one source of redundancy. It turns out to be compounded by several other factors. One compounding factor is the non-uniformity of weights in the intermediate generations that are used when the each generation picks its parents. A second factor is the graph being partitioned, which can start to have bottlenecks obstructing further district selection if the first few have been marked in an unlucky way.

Question 3 (Weighting and graph topology).

How do ancestry concentration and redundancy get more severe as the weighting factors deviate from uniform, and when realistic graphs are used as the basis for the partition?

Finally, we broaden the scope and consider the overall quality of the ensemble of S𝑆Sitalic_S plans that consists of members of the final generation.

Question 4 (Convergence guarantees).

What are the convergence guarantees for the sampling distribution obtained from the weak SMC Central Limit Theorem (McCartan and Imai, 2023, Prop 4.2)? How do they limit the extent of redundancy?

Question 5 (Summary statistics).

How do the convergence guarantees and diagnostics relate to the production of histograms, boxplots, and other percentile summary statistics?

There are many other elements of the SMC code as implemented in Redist that pull results away from the simplicity of descendancy diagrams, besides those already mentioned (inter-generational weighting factors and bottlenecks in the graph topology). Another example is that SMC creates sequentially labeled districting plans but seeks to sample unlabeled plans; this is addressed with a corrective factor ψ𝜓\psiitalic_ψ for which a second, auxiliary round of Monte Carlo estimation was described in the paper when sampling plans with k>13𝑘13k>13italic_k > 13 districts.333Some district configurations have many more ways of being sequentially labeled than others; this can produce distortion factors of over a million in practice. A corrective factor ψ𝜓\psiitalic_ψ is described in §4.4.2 of McCartan and Imai (2023). Another example is a final reweighting step to align the sample more closely with the target distribution.

The validation datasets employed in the article introducing SMC are graphs with 36 and 50 nodes and k=3,4,6𝑘346k=3,4,6italic_k = 3 , 4 , 6 districts; it is not at all clear that some of the challenges faced by SMC will become visible at that scale. The New York Senate case, in which the judge gave some credence to SMC ensemble made with default settings, had over 16,000 nodes and k=63𝑘63k=63italic_k = 63 districts.

Finally, it is very important to highlight that most of the SMC samples appearing in expert work, as well as the published work of the SMC authors, use small samples. Typically, sample sizes of 5000 are presented based on a single run at S=5000𝑆5000S=5000italic_S = 5000 or by combining and subsampling multiple runs with S=2500𝑆2500S=2500italic_S = 2500.444The largest sample sizes we have found in the SMC 50-state data repository are at S=30,000𝑆30000S=30,000italic_S = 30 , 000 (McCartan et al., 2022). Our independent attempts to produce large runs can reach S=100,000𝑆100000S=100,000italic_S = 100 , 000 on certain states, but it is difficult to get past that size using basic professional-level computer resources. This is because the Redist SMC code has significant memory overhead associated with storing partial plans. Even on a medium-sized problem like redistricting Pennsylvania into k=18𝑘18k=18italic_k = 18 districts at the precinct level, a run with S=100,000𝑆100000S=100,000italic_S = 100 , 000 consumes over 25 GB of RAM. Thus for practical purposes, SMC sampling is currently engineered to be performed with multiple small runs, not with sample sizes in the millions.

Below, we will alternate between trying to isolate the effects of different features of the SMC sampler and simply reporting outcomes when the code is run. This produces both a theoretical and a practical analysis.

Acknowledgments

We are grateful to Peter Rock for his excellent work conducting SMC experiments to support this project, building on his development of Python wrappers for the core Redist functionality. Replication repo at github.com/mggg/SMC-repetition. We thank Cory McCartan for sharing his time to explain SMC for redistricting and the Redist code, as well as for feedback and a correction on an earlier draft of this note. We thank Peter Winkler and Chris Hoffman for helpful conversations and pointers to the literature. This material is based upon work supported by the National Science Foundation under Grant No. DMS-1928930 and by the Alfred P. Sloan Foundation under grant G-2021-16778, while the authors were in residence at the Simons Laufer Mathematical Sciences Institute (formerly MSRI) during the Fall 2023 semester. SC is also supported in part by NSF CCF-2104795; MD by NSF DMS-2005512.

2 Structure of descendancy diagrams

This section presents theoretical results, derived from the combinatorics of the descendancy diagrams, that allow us to bound the A(S,k)𝐴𝑆𝑘A(S,k)italic_A ( italic_S , italic_k ) values and begin to explain the expected behavior of collisions. We start by providing a Markov chain formulation for calculating exact values of A(S,k)𝐴𝑆𝑘A(S,k)italic_A ( italic_S , italic_k ) before providing more computationally efficient bounds on asymptotic behavior in terms of two recursive sequences. We conclude by examining the effect of weighting factors at each layer, proving that uniform choices minimize collisions and evaluating how much the non-uniformity explains the repetitions found in empirical data.

2.1 Setup

We start with some simple observations about this model. When we take a uniform descendancy diagram with two layers (corresponding to an SMC process with three districts), Question 1 is a rephrasing of the classic birthday problem from probability. As k𝑘kitalic_k grows, the generalized birthday problem also has a combinatorial interpretation as a sequential balls-in-bins model (see Raab and Steger (1998)) or via random coagulations (see Bertoin (2006)). Even in the language of ancestry, this has been studied in the context of genetic drift as the Wright-Fisher Model (see Durrett (2008)). It is well known that the probability that two individuals’ lineages remain distinct for at least i𝑖iitalic_i levels is (11/S)isuperscript11𝑆𝑖(1-1/S)^{i}( 1 - 1 / italic_S ) start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT and the probability that \ellroman_ℓ lineages remain (pairwise) distinct for at least i𝑖iitalic_i levels is (11/S)i(12/S)i(1(1)/S)isuperscript11𝑆𝑖superscript12𝑆𝑖superscript11𝑆𝑖(1-1/S)^{i}(1-2/S)^{i}\ldots(1-(\ell-1)/S)^{i}( 1 - 1 / italic_S ) start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( 1 - 2 / italic_S ) start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT … ( 1 - ( roman_ℓ - 1 ) / italic_S ) start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT. After renormalizing and sending S𝑆S\rightarrow\inftyitalic_S → ∞, the time of the first (pairwise) coalescence among \ellroman_ℓ distinct lineages approaches an exponential distribution with rate (1)/(2S)12𝑆\ell(\ell-1)/(2S)roman_ℓ ( roman_ℓ - 1 ) / ( 2 italic_S ). There is a rich literature considering variations of this model, using it to design Markov chains, and extending it to infinite S𝑆Sitalic_S.

Our setting differs slightly from the Wright-Fisher model by only considering lineages that extend to the bottom layer and discarding the others---in the language developed above, we only track active nodes.

Lemma 2.1 (One-step probabilities).

If a given generation i𝑖iitalic_i has 1tS1𝑡𝑆1\leq t\leq S1 ≤ italic_t ≤ italic_S active nodes, then the expected number of ancestors in the generation immediately above (generation i+1𝑖1i+1italic_i + 1) is SS(11S)t𝑆𝑆superscript11𝑆𝑡S-S(1-\frac{1}{S})^{t}italic_S - italic_S ( 1 - divide start_ARG 1 end_ARG start_ARG italic_S end_ARG ) start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT. The probability that there are exactly v𝑣vitalic_v activated nodes in generation i+1𝑖1i+1italic_i + 1 when there are t𝑡titalic_t activated nodes in generation i𝑖iitalic_i is P(v,t,S)=(Sv)i=0v(1)vi(vi)(iS)t𝑃𝑣𝑡𝑆binomial𝑆𝑣superscriptsubscript𝑖0𝑣superscript1𝑣𝑖binomial𝑣𝑖superscript𝑖𝑆𝑡P(v,t,S)=\binom{S}{v}\sum_{i=0}^{v}(-1)^{v-i}\binom{v}{i}\left(\frac{i}{S}% \right)^{t}italic_P ( italic_v , italic_t , italic_S ) = ( FRACOP start_ARG italic_S end_ARG start_ARG italic_v end_ARG ) ∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT ( - 1 ) start_POSTSUPERSCRIPT italic_v - italic_i end_POSTSUPERSCRIPT ( FRACOP start_ARG italic_v end_ARG start_ARG italic_i end_ARG ) ( divide start_ARG italic_i end_ARG start_ARG italic_S end_ARG ) start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT.

Proof.

In generation i+1𝑖1i+1italic_i + 1, let Ijsubscript𝐼𝑗I_{j}italic_I start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT be an indicator variable representing node j𝑗jitalic_j being chosen at least once. For any individual 1jS1𝑗𝑆1\leq j\leq S1 ≤ italic_j ≤ italic_S we have [Ij=1]=1(11S)tdelimited-[]subscript𝐼𝑗11superscript11𝑆𝑡\mathbb{P}[I_{j}=1]=1-(1-\frac{1}{S})^{t}blackboard_P [ italic_I start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = 1 ] = 1 - ( 1 - divide start_ARG 1 end_ARG start_ARG italic_S end_ARG ) start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT. Then the number of activated nodes is j=1SIjsuperscriptsubscript𝑗1𝑆subscript𝐼𝑗\sum_{j=1}^{S}I_{j}∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT italic_I start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, and by linearity of expectation its expected value is SS(11S)t𝑆𝑆superscript11𝑆𝑡S-S(1-\frac{1}{S})^{t}italic_S - italic_S ( 1 - divide start_ARG 1 end_ARG start_ARG italic_S end_ARG ) start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT, as desired.

The second statement is a birthday problem variant. For each set of (Sv)binomial𝑆𝑣\binom{S}{v}( FRACOP start_ARG italic_S end_ARG start_ARG italic_v end_ARG ) parents the probability that all edges end up in that set is (vS)tsuperscript𝑣𝑆𝑡\left(\frac{v}{S}\right)^{t}( divide start_ARG italic_v end_ARG start_ARG italic_S end_ARG ) start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT and applying inclusion/exclusion to account for versions that don’t select every element in that set gives the desired result. ∎

With this lemma we can can compute the expected value across two or more generations exactly, but the formula does not give much insight, so we omit it.

We can reformulate the problem as a Markov chain on the states 1,2,3,,s123𝑠1,2,3,\ldots,s1 , 2 , 3 , … , italic_s representing the number of activated nodes at a given layer. This is an absorbing Markov chain with absorbing state 1, with transition probabilities given by Mi,j={P(i,j,s)ij0i<jsubscript𝑀𝑖𝑗cases𝑃𝑖𝑗𝑠𝑖𝑗0𝑖𝑗M_{i,j}=\begin{cases}P(i,j,s)&i\geq j\\ 0&i<j\end{cases}italic_M start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT = { start_ROW start_CELL italic_P ( italic_i , italic_j , italic_s ) end_CELL start_CELL italic_i ≥ italic_j end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL italic_i < italic_j end_CELL end_ROW, forming a lower-triangular transition matrix. The expectation we seek is A(S,k)=[001]Mk2[12S]𝐴𝑆𝑘matrix001superscript𝑀𝑘2matrix12𝑆A(S,k)=\begin{bmatrix}0&0&\cdots&1\end{bmatrix}M^{k-2}\begin{bmatrix}1\\ 2\\ \vdots\\ S\end{bmatrix}italic_A ( italic_S , italic_k ) = [ start_ARG start_ROW start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL ⋯ end_CELL start_CELL 1 end_CELL end_ROW end_ARG ] italic_M start_POSTSUPERSCRIPT italic_k - 2 end_POSTSUPERSCRIPT [ start_ARG start_ROW start_CELL 1 end_CELL end_ROW start_ROW start_CELL 2 end_CELL end_ROW start_ROW start_CELL ⋮ end_CELL end_ROW start_ROW start_CELL italic_S end_CELL end_ROW end_ARG ]. For example, when S=3𝑆3S=3italic_S = 3, we have M=[1001/32/301/96/92/9]𝑀matrix10013230196929M=\begin{bmatrix}1&0&0\\ 1/3&2/3&0\\ 1/9&6/9&2/9\end{bmatrix}italic_M = [ start_ARG start_ROW start_CELL 1 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL 1 / 3 end_CELL start_CELL 2 / 3 end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL 1 / 9 end_CELL start_CELL 6 / 9 end_CELL start_CELL 2 / 9 end_CELL end_ROW end_ARG ] and for diagrams with two layers (k=3𝑘3k=3italic_k = 3) we have that the three nodes have an expected 19/9=2.1111992.11119/9=2.111...19 / 9 = 2.111 … parents.

2.2 Limiting behavior

A(S,k)𝐴𝑆𝑘A(S,k)italic_A ( italic_S , italic_k ) is defined as the expected number of top nodes surviving to the bottom over diagrams D𝒟(S,k)𝐷𝒟𝑆𝑘D\in\mathcal{D}(S,k)italic_D ∈ caligraphic_D ( italic_S , italic_k ). First, observe that for fixed S𝑆Sitalic_S, we have limkA(S,k)=1subscript𝑘𝐴𝑆𝑘1\lim\limits_{k\rightarrow\infty}A(S,k)=1roman_lim start_POSTSUBSCRIPT italic_k → ∞ end_POSTSUBSCRIPT italic_A ( italic_S , italic_k ) = 1. This follows directly from the Markov chain interpretation of the problem, because it is a non-increasing sequence of positive integers with a positive probability of strict decrease at each step while the value is greater than 1.

Given S𝑆Sitalic_S, construct a sequence of coefficients as follows: aS,0=1subscript𝑎𝑆01a_{S,0}=1italic_a start_POSTSUBSCRIPT italic_S , 0 end_POSTSUBSCRIPT = 1; and aS,i+1=1(11S)(aS,i)Ssubscript𝑎𝑆𝑖11superscript11𝑆subscript𝑎𝑆𝑖𝑆a_{S,i+1}=1-\left(1-\frac{1}{S}\right)^{(a_{S,i})S}italic_a start_POSTSUBSCRIPT italic_S , italic_i + 1 end_POSTSUBSCRIPT = 1 - ( 1 - divide start_ARG 1 end_ARG start_ARG italic_S end_ARG ) start_POSTSUPERSCRIPT ( italic_a start_POSTSUBSCRIPT italic_S , italic_i end_POSTSUBSCRIPT ) italic_S end_POSTSUPERSCRIPT. Where S𝑆Sitalic_S is understood to be fixed we will write simply a0=1subscript𝑎01a_{0}=1italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 1, ai+1=1(11S)aiSsubscript𝑎𝑖11superscript11𝑆subscript𝑎𝑖𝑆a_{i+1}=1-\left(1-\frac{1}{S}\right)^{a_{i}S}italic_a start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT = 1 - ( 1 - divide start_ARG 1 end_ARG start_ARG italic_S end_ARG ) start_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_S end_POSTSUPERSCRIPT. By Lemma 2.1, these approximate the share of active nodes at a given level of the diagram: aiSA(S,i)subscript𝑎𝑖𝑆𝐴𝑆𝑖a_{i}S\approx A(S,i)italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_S ≈ italic_A ( italic_S , italic_i ). We will get a rigorous upper bound below.

Note that as S𝑆Sitalic_S gets large, (11S)Ssuperscript11𝑆𝑆\left(1-\frac{1}{S}\right)^{S}( 1 - divide start_ARG 1 end_ARG start_ARG italic_S end_ARG ) start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT rapidly converges to 1/e1𝑒1/e1 / italic_e from below. With this in mind, we can offer an second approximation with a sequence given by b0=1subscript𝑏01b_{0}=1italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 1; and bi+1=11ebisubscript𝑏𝑖11superscript1𝑒subscript𝑏𝑖b_{i+1}=1-\frac{1}{e}^{b_{i}}italic_b start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT = 1 - divide start_ARG 1 end_ARG start_ARG italic_e end_ARG start_POSTSUPERSCRIPT italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, which is more likely to have a useful generating function, if an analytic description is desired. Table 1 and Figure 4 show the bisubscript𝑏𝑖b_{i}italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT to be close to aS,isubscript𝑎𝑆𝑖a_{S,i}italic_a start_POSTSUBSCRIPT italic_S , italic_i end_POSTSUBSCRIPT for large S𝑆Sitalic_S---and show that biA(S,i)aS,isubscript𝑏𝑖𝐴𝑆𝑖subscript𝑎𝑆𝑖b_{i}\leq A(S,i)\leq a_{S,i}italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≤ italic_A ( italic_S , italic_i ) ≤ italic_a start_POSTSUBSCRIPT italic_S , italic_i end_POSTSUBSCRIPT in all instances we investigated.

i𝑖iitalic_i 0 1 2 3 4 5 6 7 8 9 10
a10,isubscript𝑎10𝑖a_{10,i}italic_a start_POSTSUBSCRIPT 10 , italic_i end_POSTSUBSCRIPT 1 0.6513 0.4965 0.4073 0.3490 0.3077 0.2769 0.253 0.234 0.2185 0.2056
a100,isubscript𝑎100𝑖a_{100,i}italic_a start_POSTSUBSCRIPT 100 , italic_i end_POSTSUBSCRIPT 1 0.6340 0.4712 0.3772 0.3155 0.2718 0.2390 0.2135 0.1056 0.1931 0.1625
a1000,isubscript𝑎1000𝑖a_{1000,i}italic_a start_POSTSUBSCRIPT 1000 , italic_i end_POSTSUBSCRIPT 1 0.6323 0.4688 0.3744 0.3124 0.2684 0.2355 0.2099 0.1895 0.1727 0.1587
a5000,isubscript𝑎5000𝑖a_{5000,i}italic_a start_POSTSUBSCRIPT 5000 , italic_i end_POSTSUBSCRIPT 1 0.6322 0.4686 0.3741 0.3121 0.2682 0.2352 0.2096 0.1891 0.1723 0.1583
bisubscript𝑏𝑖b_{i}italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT 1 0.6321 0.4685 0.3741 0.3121 0.2681 0.2352 0.2095 0.1890 0.1723 0.1582
Table 1: Values of aS,isubscript𝑎𝑆𝑖a_{S,i}italic_a start_POSTSUBSCRIPT italic_S , italic_i end_POSTSUBSCRIPT for S{10,100,1000,5000}𝑆1010010005000S\in\{10,100,1000,5000\}italic_S ∈ { 10 , 100 , 1000 , 5000 } and 0i100𝑖100\leq i\leq 100 ≤ italic_i ≤ 10. As S𝑆Sitalic_S grows, the aS,isubscript𝑎𝑆𝑖a_{S,i}italic_a start_POSTSUBSCRIPT italic_S , italic_i end_POSTSUBSCRIPT and bisubscript𝑏𝑖b_{i}italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT get close.
Lemma 2.2.

For fixed S>1𝑆1S>1italic_S > 1 and ai=aS,isubscript𝑎𝑖subscript𝑎𝑆𝑖a_{i}=a_{S,i}italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_a start_POSTSUBSCRIPT italic_S , italic_i end_POSTSUBSCRIPT, we have limiai=1Ssubscript𝑖subscript𝑎𝑖1𝑆\lim\limits_{i\rightarrow\infty}a_{i}=\frac{1}{S}roman_lim start_POSTSUBSCRIPT italic_i → ∞ end_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_S end_ARG.

Proof.

First, we show by induction that ai1/Ssubscript𝑎𝑖1𝑆a_{i}\geq 1/Sitalic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≥ 1 / italic_S for all i𝑖iitalic_i. This is clearly true for a0=1subscript𝑎01a_{0}=1italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 1. If ai1/Ssubscript𝑎𝑖1𝑆a_{i}\geq 1/Sitalic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≥ 1 / italic_S, then aiS1subscript𝑎𝑖𝑆1a_{i}S\geq 1italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_S ≥ 1 and

ai+1=1(11S)aiS1(11S)1=1S.subscript𝑎𝑖11superscript11𝑆subscript𝑎𝑖𝑆1superscript11𝑆11𝑆a_{i+1}=1-\left(1-\frac{1}{S}\right)^{a_{i}S}\geq 1-\left(1-\frac{1}{S}\right)% ^{1}=\frac{1}{S}.italic_a start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT = 1 - ( 1 - divide start_ARG 1 end_ARG start_ARG italic_S end_ARG ) start_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_S end_POSTSUPERSCRIPT ≥ 1 - ( 1 - divide start_ARG 1 end_ARG start_ARG italic_S end_ARG ) start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_S end_ARG .

We will also need the following fact, which follows from the inequality 1+c<ec1𝑐superscript𝑒𝑐1+c<e^{c}1 + italic_c < italic_e start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT for c=1/(S1)0𝑐1𝑆10c=1/(S-1)\neq 0italic_c = 1 / ( italic_S - 1 ) ≠ 0:

(11S)ln(11S)S<1.-\left(1-\frac{1}{S}\right)\ln\left(1-\frac{1}{S}\right)^{S}<1.- ( 1 - divide start_ARG 1 end_ARG start_ARG italic_S end_ARG ) roman_ln ( 1 - divide start_ARG 1 end_ARG start_ARG italic_S end_ARG ) start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT < 1 . (1)

As we know ai+1=1(11S)aiSsubscript𝑎𝑖11superscript11𝑆subscript𝑎𝑖𝑆a_{i+1}=1-\left(1-\frac{1}{S}\right)^{a_{i}S}italic_a start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT = 1 - ( 1 - divide start_ARG 1 end_ARG start_ARG italic_S end_ARG ) start_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_S end_POSTSUPERSCRIPT, we will focus on the function f(x)=1(11S)xS𝑓𝑥1superscript11𝑆𝑥𝑆f(x)=1-\left(1-\frac{1}{S}\right)^{xS}italic_f ( italic_x ) = 1 - ( 1 - divide start_ARG 1 end_ARG start_ARG italic_S end_ARG ) start_POSTSUPERSCRIPT italic_x italic_S end_POSTSUPERSCRIPT. We begin by noting f(1S)=1S𝑓1𝑆1𝑆f(\frac{1}{S})=\frac{1}{S}italic_f ( divide start_ARG 1 end_ARG start_ARG italic_S end_ARG ) = divide start_ARG 1 end_ARG start_ARG italic_S end_ARG. We also see that

f(x)=(11S)Sxln(11S)Sf^{\prime}(x)=-\left(1-\frac{1}{S}\right)^{Sx}\ln\left(1-\frac{1}{S}\right)^{S}italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_x ) = - ( 1 - divide start_ARG 1 end_ARG start_ARG italic_S end_ARG ) start_POSTSUPERSCRIPT italic_S italic_x end_POSTSUPERSCRIPT roman_ln ( 1 - divide start_ARG 1 end_ARG start_ARG italic_S end_ARG ) start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT

Equation (1) tells us that f(1S)<1superscript𝑓1𝑆1f^{\prime}(\frac{1}{S})<1italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( divide start_ARG 1 end_ARG start_ARG italic_S end_ARG ) < 1. For x1S𝑥1𝑆x\geq\frac{1}{S}italic_x ≥ divide start_ARG 1 end_ARG start_ARG italic_S end_ARG, the values we are interested in, this slope is always positive (because ln(11S)S\ln\left(1-\frac{1}{S}\right)^{S}roman_ln ( 1 - divide start_ARG 1 end_ARG start_ARG italic_S end_ARG ) start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT is negative) and is strictly decreasing in x𝑥xitalic_x. This implies for x>1S𝑥1𝑆x>\frac{1}{S}italic_x > divide start_ARG 1 end_ARG start_ARG italic_S end_ARG, f(x)𝑓𝑥f(x)italic_f ( italic_x ) is strictly bounded above by the line tangent to it at 1/S1𝑆1/S1 / italic_S:

f(x)<f(1S)+f(1S)(x1S)=1S+f(1S)(x1S)𝑓𝑥𝑓1𝑆superscript𝑓1𝑆𝑥1𝑆1𝑆superscript𝑓1𝑆𝑥1𝑆f(x)<f\left(\frac{1}{S}\right)+f^{\prime}\left(\frac{1}{S}\right)\left(x-\frac% {1}{S}\right)=\frac{1}{S}+f^{\prime}\left(\frac{1}{S}\right)\left(x-\frac{1}{S% }\right)italic_f ( italic_x ) < italic_f ( divide start_ARG 1 end_ARG start_ARG italic_S end_ARG ) + italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( divide start_ARG 1 end_ARG start_ARG italic_S end_ARG ) ( italic_x - divide start_ARG 1 end_ARG start_ARG italic_S end_ARG ) = divide start_ARG 1 end_ARG start_ARG italic_S end_ARG + italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( divide start_ARG 1 end_ARG start_ARG italic_S end_ARG ) ( italic_x - divide start_ARG 1 end_ARG start_ARG italic_S end_ARG )

First, we will show the sequence of aisubscript𝑎𝑖a_{i}italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT’s is decreasing. Using f(1S)<1superscript𝑓1𝑆1f^{\prime}(\frac{1}{S})<1italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( divide start_ARG 1 end_ARG start_ARG italic_S end_ARG ) < 1, we see

ai+1=f(ai)<1S+f(1S)(ai1S)<1S+ai1S=aisubscript𝑎𝑖1𝑓subscript𝑎𝑖1𝑆superscript𝑓1𝑆subscript𝑎𝑖1𝑆1𝑆subscript𝑎𝑖1𝑆subscript𝑎𝑖a_{i+1}=f(a_{i})<\frac{1}{S}+f^{\prime}\left(\frac{1}{S}\right)\left(a_{i}-% \frac{1}{S}\right)<\frac{1}{S}+a_{i}-\frac{1}{S}=a_{i}italic_a start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT = italic_f ( italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) < divide start_ARG 1 end_ARG start_ARG italic_S end_ARG + italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( divide start_ARG 1 end_ARG start_ARG italic_S end_ARG ) ( italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_S end_ARG ) < divide start_ARG 1 end_ARG start_ARG italic_S end_ARG + italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_S end_ARG = italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT

As the sequence of aisubscript𝑎𝑖a_{i}italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT’s is bounded below by 1S1𝑆\frac{1}{S}divide start_ARG 1 end_ARG start_ARG italic_S end_ARG and strictly decreasing, its limit must exist.

Suppose, for the sake of contradiction, that the limit of the aisubscript𝑎𝑖a_{i}italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT’s is strictly greater than 1/S1𝑆1/S1 / italic_S, that is, it is 1/S+α1𝑆𝛼1/S+\alpha1 / italic_S + italic_α for some α>0𝛼0\alpha>0italic_α > 0. This means for all ε>0𝜀0\varepsilon>0italic_ε > 0, there is some sufficiently large i𝑖iitalic_i such that ai<1/S+α+εsubscript𝑎𝑖1𝑆𝛼𝜀a_{i}<1/S+\alpha+\varepsilonitalic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT < 1 / italic_S + italic_α + italic_ε. Choose ε𝜀\varepsilonitalic_ε such that 0<ε<α(1f(1S))/f(1S)0𝜀𝛼1superscript𝑓1𝑆superscript𝑓1𝑆0<\varepsilon<\alpha(1-f^{\prime}(\frac{1}{S}))/f^{\prime}(\frac{1}{S})0 < italic_ε < italic_α ( 1 - italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( divide start_ARG 1 end_ARG start_ARG italic_S end_ARG ) ) / italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( divide start_ARG 1 end_ARG start_ARG italic_S end_ARG ). This is possible to do because α>0𝛼0\alpha>0italic_α > 0 and f(1S)<1superscript𝑓1𝑆1f^{\prime}(\frac{1}{S})<1italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( divide start_ARG 1 end_ARG start_ARG italic_S end_ARG ) < 1. Note this choice of ε𝜀\varepsilonitalic_ε means f(1S)(α+ε)<αsuperscript𝑓1𝑆𝛼𝜀𝛼f^{\prime}(\frac{1}{S})(\alpha+\varepsilon)<\alphaitalic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( divide start_ARG 1 end_ARG start_ARG italic_S end_ARG ) ( italic_α + italic_ε ) < italic_α. It follows that:

ai+1=f(ai)<1S+f(1S)(ai1S)<1S+f(1S)(α+ε)<1S+α.subscript𝑎𝑖1𝑓subscript𝑎𝑖1𝑆superscript𝑓1𝑆subscript𝑎𝑖1𝑆1𝑆superscript𝑓1𝑆𝛼𝜀1𝑆𝛼a_{i+1}=f(a_{i})<\frac{1}{S}+f^{\prime}\left(\frac{1}{S}\right)\left(a_{i}-% \frac{1}{S}\right)<\frac{1}{S}+f^{\prime}\left(\frac{1}{S}\right)(\alpha+% \varepsilon)<\frac{1}{S}+\alpha.italic_a start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT = italic_f ( italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) < divide start_ARG 1 end_ARG start_ARG italic_S end_ARG + italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( divide start_ARG 1 end_ARG start_ARG italic_S end_ARG ) ( italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_S end_ARG ) < divide start_ARG 1 end_ARG start_ARG italic_S end_ARG + italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( divide start_ARG 1 end_ARG start_ARG italic_S end_ARG ) ( italic_α + italic_ε ) < divide start_ARG 1 end_ARG start_ARG italic_S end_ARG + italic_α .

As this is a monotone decreasing sequence and we assumed its limit was 1S+α1𝑆𝛼\frac{1}{S}+\alphadivide start_ARG 1 end_ARG start_ARG italic_S end_ARG + italic_α, it is impossible to have ai+1<1S+αsubscript𝑎𝑖11𝑆𝛼a_{i+1}<\frac{1}{S}+\alphaitalic_a start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT < divide start_ARG 1 end_ARG start_ARG italic_S end_ARG + italic_α, giving a contradiction. Therefore it must be the case that the limit of this sequence is 1S1𝑆\frac{1}{S}divide start_ARG 1 end_ARG start_ARG italic_S end_ARG, as claimed. ∎

Proposition 2.3.

A(S,k)akS.𝐴𝑆𝑘subscript𝑎𝑘𝑆A(S,k)\leq a_{k}S.italic_A ( italic_S , italic_k ) ≤ italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_S .

Proof.

Let Xisubscript𝑋𝑖X_{i}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT be a random variable denoting the number of active nodes at level i𝑖iitalic_i (those that have descendants in level 1111). Thus X1Ssubscript𝑋1𝑆X_{1}\equiv Sitalic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≡ italic_S. We are trying to get bounds on 𝔼[Xk1]=A(S,k)𝔼delimited-[]subscript𝑋𝑘1𝐴𝑆𝑘\mathbb{E}[X_{k-1}]=A(S,k)blackboard_E [ italic_X start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ] = italic_A ( italic_S , italic_k ), the expected number of active nodes in the top level of a k𝑘kitalic_k-district descendancy diagram, which has k1𝑘1k-1italic_k - 1 levels.

We will prove by induction that 𝔼[Xi]aiS𝔼delimited-[]subscript𝑋𝑖subscript𝑎𝑖𝑆\mathbb{E}[X_{i}]\leq a_{i}Sblackboard_E [ italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] ≤ italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_S, which suffices to prove the proposition. When i=0𝑖0i=0italic_i = 0, we have 𝔼[X0]=a0S=S𝔼delimited-[]subscript𝑋0subscript𝑎0𝑆𝑆\mathbb{E}[X_{0}]=a_{0}S=Sblackboard_E [ italic_X start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ] = italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_S = italic_S, and the statement is true. Fix i1𝑖1i\geq 1italic_i ≥ 1, and suppose 𝔼[Xi]aiS𝔼delimited-[]subscript𝑋𝑖subscript𝑎𝑖𝑆\mathbb{E}[X_{i}]\leq a_{i}Sblackboard_E [ italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] ≤ italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_S. By Lemma 2.1 and linearity of expectation, we have

𝔼[Xi+1Xi]=SS(11S)Xi.𝔼delimited-[]conditionalsubscript𝑋𝑖1subscript𝑋𝑖𝑆𝑆superscript11𝑆subscript𝑋𝑖\mathbb{E}[X_{i+1}\mid X_{i}]=S-S\left(1-\frac{1}{S}\right)^{X_{i}}.blackboard_E [ italic_X start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT ∣ italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] = italic_S - italic_S ( 1 - divide start_ARG 1 end_ARG start_ARG italic_S end_ARG ) start_POSTSUPERSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT .

We will use the Law of Total Expectation (𝔼[Xi+1]=𝔼[𝔼[Xi+1Xi]]𝔼delimited-[]subscript𝑋𝑖1𝔼delimited-[]𝔼delimited-[]conditionalsubscript𝑋𝑖1subscript𝑋𝑖\mathbb{E}[X_{i+1}]=\mathbb{E}[\mathbb{E}[X_{i+1}\mid X_{i}]]blackboard_E [ italic_X start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT ] = blackboard_E [ blackboard_E [ italic_X start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT ∣ italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] ]) and Jensen’s Inequality (for c>0𝑐0c>0italic_c > 0, 𝔼[cX]c𝔼[X]𝔼delimited-[]superscript𝑐𝑋superscript𝑐𝔼delimited-[]𝑋\mathbb{E}[c^{X}]\geq c^{\mathbb{E}[X]}blackboard_E [ italic_c start_POSTSUPERSCRIPT italic_X end_POSTSUPERSCRIPT ] ≥ italic_c start_POSTSUPERSCRIPT blackboard_E [ italic_X ] end_POSTSUPERSCRIPT). We get

𝔼[Xi+1]=𝔼[𝔼[Xi+1Xi]]=𝔼[SS(11S)Xi]=SS𝔼[(11S)Xi]𝔼delimited-[]subscript𝑋𝑖1𝔼delimited-[]𝔼delimited-[]conditionalsubscript𝑋𝑖1subscript𝑋𝑖𝔼delimited-[]𝑆𝑆superscript11𝑆subscript𝑋𝑖𝑆𝑆𝔼delimited-[]superscript11𝑆subscript𝑋𝑖\displaystyle\mathbb{E}[X_{i+1}]=\mathbb{E}[\mathbb{E}[X_{i+1}\mid X_{i}]]=% \mathbb{E}\left[S-S\left(1-\frac{1}{S}\right)^{X_{i}}\right]=S-S\mathbb{E}% \left[\left(1-\frac{1}{S}\right)^{X_{i}}\right]blackboard_E [ italic_X start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT ] = blackboard_E [ blackboard_E [ italic_X start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT ∣ italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] ] = blackboard_E [ italic_S - italic_S ( 1 - divide start_ARG 1 end_ARG start_ARG italic_S end_ARG ) start_POSTSUPERSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ] = italic_S - italic_S blackboard_E [ ( 1 - divide start_ARG 1 end_ARG start_ARG italic_S end_ARG ) start_POSTSUPERSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ]
SS(11S)𝔼[Xi]SS(11S)aiS=ai+1S.absent𝑆𝑆superscript11𝑆𝔼delimited-[]subscript𝑋𝑖𝑆𝑆superscript11𝑆subscript𝑎𝑖𝑆subscript𝑎𝑖1𝑆\displaystyle\hskip 34.1433pt\leq S-S\left(1-\frac{1}{S}\right)^{\mathbb{E}[X_% {i}]}\leq S-S\left(1-\frac{1}{S}\right)^{a_{i}S}=a_{i+1}S.\quad\qed≤ italic_S - italic_S ( 1 - divide start_ARG 1 end_ARG start_ARG italic_S end_ARG ) start_POSTSUPERSCRIPT blackboard_E [ italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] end_POSTSUPERSCRIPT ≤ italic_S - italic_S ( 1 - divide start_ARG 1 end_ARG start_ARG italic_S end_ARG ) start_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_S end_POSTSUPERSCRIPT = italic_a start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT italic_S . italic_∎
Remark 2.4.

While we only show akSsubscript𝑎𝑘𝑆a_{k}Sitalic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_S is an upper bound, for large k𝑘kitalic_k it is tight: in the limit as k𝑘k\rightarrow\inftyitalic_k → ∞, it matches the trivial lower bound 𝔼[Xk]1𝔼delimited-[]subscript𝑋𝑘1\mathbb{E}[X_{k}]\geq 1blackboard_E [ italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ] ≥ 1, which holds because there is at least one active node at each level. The empirical results, such as those presented in Figure 4, suggest the much stronger asymptotic A(S,k)akSsimilar-to𝐴𝑆𝑘subscript𝑎𝑘𝑆A(S,k)\sim a_{k}Sitalic_A ( italic_S , italic_k ) ∼ italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_S as S,k𝑆𝑘S,k\to\inftyitalic_S , italic_k → ∞.

Refer to captionA(S,k)𝐴𝑆𝑘A(S,k)italic_A ( italic_S , italic_k )Refer to captionRefer to captionk𝑘kitalic_k
Figure 4: If the distribution of weights is uniform, these plots show the expected number of surviving ancestors (districts drawn in the initial generation that appear in the final sample of plans) as k𝑘kitalic_k grows, for S=5,20,50𝑆52050S=5,20,50italic_S = 5 , 20 , 50. The horizontal axis is k𝑘kitalic_k in each plot and the vertical axis is the expected number of surviving ancestors. Green stars are precise outputs from the Markov chain expression, compared to the akSsubscript𝑎𝑘𝑆a_{k}Sitalic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_S values in orange and the bkSsubscript𝑏𝑘𝑆b_{k}Sitalic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_S values in purple (each interpolated by a curve). In these small experiments, it is always true that bkSA(S,k)akSsubscript𝑏𝑘𝑆𝐴𝑆𝑘subscript𝑎𝑘𝑆b_{k}S\leq A(S,k)\leq a_{k}Sitalic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_S ≤ italic_A ( italic_S , italic_k ) ≤ italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_S, and that akSA(S,k)subscript𝑎𝑘𝑆𝐴𝑆𝑘a_{k}S\approx A(S,k)italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_S ≈ italic_A ( italic_S , italic_k ) is a very good approximation.

Table 2 shows that the predicted repetition (due to combinatorial collision expected under uniform weighting) is already pronounced, but empirical runs on real-world geography show far greater redundancy. We include k=18𝑘18k=18italic_k = 18 (PA Congress 2010), k=42𝑘42k=42italic_k = 42 (NM state House), k=52𝑘52k=52italic_k = 52 (CA Congress 2020), k=63𝑘63k=63italic_k = 63 (NY State House), and k=203𝑘203k=203italic_k = 203 (PA State House).555The Redist software has a built-in feature to warn users about ”low plan diversity.” This warning message was triggered in 19/1000 PA Congressional runs, 534/1000 NM House runs, 939/1000 CA Congressional runs, 984/1000 NY House runs, and every PA House run. The message counsels users to ”Consider weakening or removing constraints, or increasing the population tolerance.” It is unclear how this kind of challenge was handled in the expert work cited in this note. Furthermore, restricting to runs that do not trigger a warning message does not improve the repetition enormously. In the same trials from Table 2, the max district repetition out of 5000 plans got as high as 2866 (PA-Cong), 4817 (NM-House), 4983 (CA-Cong), and 4622 (NY-House), even among runs with no low-diversity warning.

PA NM CA NY PA
Congress House Congress House House
k=18𝑘18k=18italic_k = 18 k=42𝑘42k=42italic_k = 42 k=52𝑘52k=52italic_k = 52 k=63𝑘63k=63italic_k = 63 k=203𝑘203k=203italic_k = 203
a5000,ksubscript𝑎5000𝑘a_{5000,k}italic_a start_POSTSUBSCRIPT 5000 , italic_k end_POSTSUBSCRIPT 0.1066 0.0466 0.0377 0.0313 0.0099
bksubscript𝑏𝑘b_{k}italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT 0.1065 0.0464 0.0376 0.0312 0.0098
predicted average repetition 9.4 21.6 26.6 32.1 102.0
1/aS/A1𝑎𝑆𝐴\nicefrac{{1}}{{a}}\approx\nicefrac{{S}}{{A}}/ start_ARG 1 end_ARG start_ARG italic_a end_ARG ≈ / start_ARG italic_S end_ARG start_ARG italic_A end_ARG
average district repetition 18.2 121.6 518.6 785.1 4905.0
(averaged over trials)
max district repetition 498.9 1896.9 3104.8 3350.8 4977.1
(averaged over trials)
max district repetition 4515 4951 5000 5000 5000
(max over trials)
Table 2: For several realistic-sized problems, we consider the expected repetition of the initial districts that survive to the final sample. We report 1/a1𝑎1/a1 / italic_a as predicted average repetition because it is a known bound that is quite tight already for small S𝑆Sitalic_S. We conduct 1000 trials with S=5000𝑆5000S=5000italic_S = 5000 for each column; completing one trial with k=203𝑘203k=203italic_k = 203 can take up to five days. The fact that the observed average repetition is appreciably more severe than predicted points to the impact of other causes of redundancy, like non-uniform weights, graph bottlenecks, and final reweighting---these have a snowballing impact as the number of districts grows. These will be explored below.
Remark 2.5.

The case of square diagrams is a natural one to consider. Numerical results suggest that A(S,S)𝐴𝑆𝑆A(S,S)italic_A ( italic_S , italic_S ) limits to a constant slightly greater than 2. Subsequently, once there are two active nodes in a population of S𝑆Sitalic_S, it takes an expected S𝑆Sitalic_S more steps for those to collide, leaving a single common ancestor. This suggests that when k2S𝑘2𝑆k\approx 2Sitalic_k ≈ 2 italic_S, we expect the ancestry to collapse to a single node—one initial district will appear in every plan. This will be further discussed below in Table 3.

2.3 Non-uniform weights

Above we assumed that each active node chooses its parent uniformly at random. In practice, this is not how the SMC code works; the weights depend on graph properties of the partitions. The weight on node j𝑗jitalic_j at level i𝑖iitalic_i is given by wi(j)=(τ(Gi(j)))ρ1|Gi(j)|superscriptsubscript𝑤𝑖𝑗superscript𝜏superscriptsubscript𝐺𝑖𝑗𝜌1superscriptsubscript𝐺𝑖𝑗w_{i}^{(j)}=\frac{\left(\tau(G_{i}^{(j)})\right)^{\rho-1}}{\left|\partial G_{i% }^{(j)}\right|}italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT = divide start_ARG ( italic_τ ( italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT ) ) start_POSTSUPERSCRIPT italic_ρ - 1 end_POSTSUPERSCRIPT end_ARG start_ARG | ∂ italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT | end_ARG, where the numerator is τ𝜏\tauitalic_τ, the product of the number of spanning trees in the pieces of the partial plan Gi(j)superscriptsubscript𝐺𝑖𝑗G_{i}^{(j)}italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT, raised to a power ρ1𝜌1\rho-1italic_ρ - 1, and the denominator is the size of the edge cut.

Refer to caption
Figure 5: Truncation of a long-tailed histogram of weights in a descendancy diagram on state Senate districts in New Mexico (k=42,i=10,S=5000formulae-sequence𝑘42formulae-sequence𝑖10𝑆5000k=42,i=10,S=5000italic_k = 42 , italic_i = 10 , italic_S = 5000, default settings). If weights were uniform, the distribution of weights would be concentrated at the red line. Instead, when drawing the 33rd district in this SMC process, some 32-district partial plans are over 100 times likelier than others to be chosen.

When ρ1𝜌1\rho\neq 1italic_ρ ≠ 1 these weight factors will give wildly different probability of selection to plans based on their compactness, due to τ𝜏\tauitalic_τ values for districts that can easily differ by 10100superscript1010010^{100}10 start_POSTSUPERSCRIPT 100 end_POSTSUPERSCRIPT in realistic problems.666In particular, τ>eN𝜏superscript𝑒𝑁\tau>e^{N}italic_τ > italic_e start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT for many planar graphs on N𝑁Nitalic_N vertices (for instance τe1.6Nsimilar-to𝜏superscript𝑒1.6𝑁\tau\sim e^{1.6N}italic_τ ∼ italic_e start_POSTSUPERSCRIPT 1.6 italic_N end_POSTSUPERSCRIPT for triangular lattices), and Congressional districts might typically contain N=500𝑁500N=500italic_N = 500 precincts. This is one reason that validating on a 6×6666\times 66 × 6 grid with 6666 districts is inadequate to see salient effects of scale: in that setting, τ𝜏\tauitalic_τ values for individual districts can only differ by a factor of 15. See related discussion in DeFord, Duchin and Solomon (2021, §5.1). Even at ρ=1𝜌1\rho=1italic_ρ = 1, plans with longer boundaries will be weighted down. (See Figure 5 for an empirical example with ρ=1𝜌1\rho=1italic_ρ = 1.) These inter-generational weights thus present a compactness-related bias pulling away from uniformity for any choice of parameters.

Next we show that non-uniform weights exacerbate the sample repetition. Recall that Xisubscript𝑋𝑖X_{i}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT was a random variable denoting the number of active nodes at level i𝑖iitalic_i, where X1Ssubscript𝑋1𝑆X_{1}\equiv Sitalic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≡ italic_S and parents are chosen uniformly at random at each level. We now set up our second model by fixing some non-uniform distribution over S𝑆Sitalic_S nodes at each level 1,,k21𝑘21,\dots,k-21 , … , italic_k - 2 for the selection of parents. Fixing those distributions, we initialize Y1Ssubscript𝑌1𝑆Y_{1}\equiv Sitalic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≡ italic_S, and let Yisubscript𝑌𝑖Y_{i}italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT be a random variable denoting the number of active nodes at level i𝑖iitalic_i with the specified parent selection probabilities.

Lemma 2.6 (Uniform descendancy minimizes ancestor collapse).

For i2𝑖2i\geq 2italic_i ≥ 2, let Xisubscript𝑋𝑖X_{i}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT give the count of active nodes at level i𝑖iitalic_i randomized over uniform descendancy diagrams, while Yisubscript𝑌𝑖Y_{i}italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is defined instead with the same non-uniform distribution on each level. Then

𝔼[YiYi1=a]𝔼[XiXi1=a].𝔼delimited-[]conditionalsubscript𝑌𝑖subscript𝑌𝑖1𝑎𝔼delimited-[]conditionalsubscript𝑋𝑖subscript𝑋𝑖1𝑎\mathbb{E}[Y_{i}\mid Y_{i-1}=a]\leq\mathbb{E}[X_{i}\mid X_{i-1}=a].blackboard_E [ italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∣ italic_Y start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT = italic_a ] ≤ blackboard_E [ italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∣ italic_X start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT = italic_a ] .
Proof.

Consider the random variable Yisubscript𝑌𝑖Y_{i}italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. For j{1,2,,S}𝑗12𝑆j\in\{1,2,\ldots,S\}italic_j ∈ { 1 , 2 , … , italic_S }, let pjsubscript𝑝𝑗p_{j}italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT denote the probability that an individual at level i1𝑖1i-1italic_i - 1 chooses j𝑗jitalic_j as their parent in level i𝑖iitalic_i (note the pjsubscript𝑝𝑗p_{j}italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT’s may also vary with i𝑖iitalic_i). We will need Hölder’s Inequality, which states that for p,q[1,)𝑝𝑞1p,q\in[1,\infty)italic_p , italic_q ∈ [ 1 , ∞ ) satisfying 1/p+1/q=11𝑝1𝑞11/p+1/q=11 / italic_p + 1 / italic_q = 1 and any two vectors 𝐮𝐮\mathbf{u}bold_u and 𝐯𝐯\mathbf{v}bold_v, 𝐮,𝐯1𝐮p𝐯qsubscriptnorm𝐮𝐯1subscriptnorm𝐮𝑝subscriptnorm𝐯𝑞\|\langle\mathbf{u},\mathbf{v}\rangle\|_{1}\leq\|\mathbf{u}\|_{p}\|\mathbf{v}% \|_{q}∥ ⟨ bold_u , bold_v ⟩ ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≤ ∥ bold_u ∥ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ∥ bold_v ∥ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT. We apply this to the vectors 𝐮𝐮\mathbf{u}bold_u where uj=(1pj)/(S1)subscript𝑢𝑗1subscript𝑝𝑗𝑆1u_{j}=(1-p_{j})/(S-1)italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = ( 1 - italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) / ( italic_S - 1 ) and 𝐯=(1,1,1,,1)𝐯1111\mathbf{v}=(1,1,1,\dots,1)bold_v = ( 1 , 1 , 1 , … , 1 ). Note

𝐮,𝐯1=𝐮1=j=1S1pjS1=1.subscriptnorm𝐮𝐯1subscriptnorm𝐮1superscriptsubscript𝑗1𝑆1subscript𝑝𝑗𝑆11\|\langle\mathbf{u},\mathbf{v}\rangle\|_{1}=\|\mathbf{u}\|_{1}=\sum_{j=1}^{S}% \frac{1-p_{j}}{S-1}=1.∥ ⟨ bold_u , bold_v ⟩ ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = ∥ bold_u ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT divide start_ARG 1 - italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG start_ARG italic_S - 1 end_ARG = 1 .

Using p=a𝑝𝑎p=aitalic_p = italic_a and q=a/(a1)𝑞𝑎𝑎1q=a/(a-1)italic_q = italic_a / ( italic_a - 1 ), we see that

𝐮a=(j=1S(1pjS1)a)1/a=(1(S1)aj=1S(1pj)a)1/asubscriptnorm𝐮𝑎superscriptsuperscriptsubscript𝑗1𝑆superscript1subscript𝑝𝑗𝑆1𝑎1𝑎superscript1superscript𝑆1𝑎superscriptsubscript𝑗1𝑆superscript1subscript𝑝𝑗𝑎1𝑎\displaystyle\|\mathbf{u}\|_{a}=\left(\sum_{j=1}^{S}\left(\frac{1-p_{j}}{S-1}% \right)^{a}\right)^{1/a}=\left(\frac{1}{(S-1)^{a}}\sum_{j=1}^{S}(1-p_{j})^{a}% \right)^{1/a}∥ bold_u ∥ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT = ( ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT ( divide start_ARG 1 - italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG start_ARG italic_S - 1 end_ARG ) start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 / italic_a end_POSTSUPERSCRIPT = ( divide start_ARG 1 end_ARG start_ARG ( italic_S - 1 ) start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT ( 1 - italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 / italic_a end_POSTSUPERSCRIPT
𝐯a/(a1)=(j=1S1a/(a1))(a1)/a=S(a1)/asubscriptnorm𝐯𝑎𝑎1superscriptsuperscriptsubscript𝑗1𝑆superscript1𝑎𝑎1𝑎1𝑎superscript𝑆𝑎1𝑎\displaystyle\|\mathbf{v}\|_{a/(a-1)}=\left(\sum_{j=1}^{S}1^{a/(a-1)}\right)^{% (a-1)/a}=S^{(a-1)/a}∥ bold_v ∥ start_POSTSUBSCRIPT italic_a / ( italic_a - 1 ) end_POSTSUBSCRIPT = ( ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT 1 start_POSTSUPERSCRIPT italic_a / ( italic_a - 1 ) end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ( italic_a - 1 ) / italic_a end_POSTSUPERSCRIPT = italic_S start_POSTSUPERSCRIPT ( italic_a - 1 ) / italic_a end_POSTSUPERSCRIPT

Putting this together with Hölder’s inequality, we see that

(1(S1)aj=1S(1pj)a)1/aS(a1)/a1andj=1S(1pj)a(S1)aSa1=S(11S)a.formulae-sequencesuperscript1superscript𝑆1𝑎superscriptsubscript𝑗1𝑆superscript1subscript𝑝𝑗𝑎1𝑎superscript𝑆𝑎1𝑎1andsuperscriptsubscript𝑗1𝑆superscript1subscript𝑝𝑗𝑎superscript𝑆1𝑎superscript𝑆𝑎1𝑆superscript11𝑆𝑎\left(\frac{1}{(S-1)^{a}}\sum_{j=1}^{S}(1-p_{j})^{a}\right)^{1/a}S^{(a-1)/a}% \geq 1\qquad\hbox{\rm and}\qquad\sum_{j=1}^{S}(1-p_{j})^{a}\geq\frac{(S-1)^{a}% }{S^{a-1}}=S\left(1-\frac{1}{S}\right)^{a}.( divide start_ARG 1 end_ARG start_ARG ( italic_S - 1 ) start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT ( 1 - italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 / italic_a end_POSTSUPERSCRIPT italic_S start_POSTSUPERSCRIPT ( italic_a - 1 ) / italic_a end_POSTSUPERSCRIPT ≥ 1 and ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT ( 1 - italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT ≥ divide start_ARG ( italic_S - 1 ) start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT end_ARG start_ARG italic_S start_POSTSUPERSCRIPT italic_a - 1 end_POSTSUPERSCRIPT end_ARG = italic_S ( 1 - divide start_ARG 1 end_ARG start_ARG italic_S end_ARG ) start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT .

We now see that

𝔼[YiYi1=a]𝔼delimited-[]conditionalsubscript𝑌𝑖subscript𝑌𝑖1𝑎\displaystyle\mathbb{E}[Y_{i}\mid Y_{i-1}=a]blackboard_E [ italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∣ italic_Y start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT = italic_a ] =j=1S(1(1pj)a)=Sj=1S(1pj)aabsentsuperscriptsubscript𝑗1𝑆1superscript1subscript𝑝𝑗𝑎𝑆superscriptsubscript𝑗1𝑆superscript1subscript𝑝𝑗𝑎\displaystyle=\sum_{j=1}^{S}(1-(1-p_{j})^{a})=S-\sum_{j=1}^{S}(1-p_{j})^{a}= ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT ( 1 - ( 1 - italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT ) = italic_S - ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT ( 1 - italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT
SS(11S)a=S(1(11s)a)=𝔼[XiXi1=a].absent𝑆𝑆superscript11𝑆𝑎𝑆1superscript11𝑠𝑎𝔼delimited-[]conditionalsubscript𝑋𝑖subscript𝑋𝑖1𝑎\displaystyle\leq S-S\left(1-\frac{1}{S}\right)^{a}=S\left(1-\left(1-\frac{1}{% s}\right)^{a}\right)=\mathbb{E}[X_{i}\mid X_{i-1}=a].≤ italic_S - italic_S ( 1 - divide start_ARG 1 end_ARG start_ARG italic_S end_ARG ) start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT = italic_S ( 1 - ( 1 - divide start_ARG 1 end_ARG start_ARG italic_s end_ARG ) start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT ) = blackboard_E [ italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∣ italic_X start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT = italic_a ] .

This completes the proof. ∎

We find empirically that the weights at each level frequently have a distribution shaped like the one shown in Figure 5, which was drawn from a run on New Mexico state Senate districts. In New Mexico, it was common to see max-to-median weight ratios of 10 within a generation, and max-to-min ratios of 30, even with ρ=1𝜌1\rho=1italic_ρ = 1. In New York’s Senate districts, these ratios were commonly 30 and 2000, respectively. As we will see, skews of this kind will tend to significantly increase the collision rate.

Lowest level with a ”mega-ancestor” accounting for φ𝜑\varphiitalic_φ share of the final outputs(Low table entries for high φ𝜑\varphiitalic_φ are signs of extreme redundancy)
F𝐹Fitalic_F S=10𝑆10S=10italic_S = 10 100 1000 5000
φ=.01𝜑.01\varphi=.01italic_φ = .01 3.8 15.4
.1 5.5 51.1 256.3
.25 2.5 18.7 188.3 957.8
.5 5.5 60.2 622.2 3187.3
.75 14.6 144.7 1433.5 7092.4
1 17.9 201.9 2065.2 9966.2
Uniform weights
F𝐹Fitalic_F S=10𝑆10S=10italic_S = 10 100 1000 5000
φ=.01𝜑.01\varphi=.01italic_φ = .01 2.0 2.0
.1 2.0 4.6 78.4
.25 2.0 2.0 19.7 320.1
.5 2.0 2.8 65.8 1032.7
.75 2.0 5.3 151.7 2401.9
1 2.7 11.1 232.3 3533.2
100:1:1:1:1001:1:1100:1:1\dots:1100 : 1 : 1 … : 1 weights
NM NY
F𝐹Fitalic_F k=42𝑘42k=42italic_k = 42 k=63𝑘63k=63italic_k = 63
S=5000𝑆5000S=5000italic_S = 5000 S=5000𝑆5000S=5000italic_S = 5000
φ=.01𝜑.01\varphi=.01italic_φ = .01 3.2 (100%) 2.9 (100%)
.1.1.1.1 8.8 (100%) 6.9 (100%)
.25.25.25.25 14.9 (71.4%) 11.9 (99%)
.5.5.5.5 17.0 (23.5%) 18.8 (74%)
Actual runs of SMC code
Table 3: F(D,φ)𝐹𝐷𝜑F(D,\varphi)italic_F ( italic_D , italic_φ ) reports the lowest level at which some node is an ancestor to φ𝜑\varphiitalic_φ share of the bottom generation. (This is vacuous if φS1𝜑𝑆1\varphi S\leq 1italic_φ italic_S ≤ 1.) The tables show estimated expectations for F(D,φ)𝐹𝐷𝜑F(D,\varphi)italic_F ( italic_D , italic_φ ) with uniform weights and with stylized non-uniform weights. Each cell value is obtained by averaging over 1000 trials. Reading across the bottom row in the case of uniform weights (top left) confirms that A(S,2S)1𝐴𝑆2𝑆1A(S,2S)\approx 1italic_A ( italic_S , 2 italic_S ) ≈ 1. Collapse is much worse with stylized non-uniform weights (top right). However, the actual runs on New Mexico and New York (bottom) show that the empirical repetition can be far greater than either prediction. For instance, the NM cell at φ=.25𝜑.25\varphi=.25italic_φ = .25 tells us that 714 out of 1000 trials in New Mexico had a node serving as ancestor to 25% of the final generation, and that node on average occurred at level 14.9 (meaning 25%percent2525\%25 % of the plans had 4215=2742152742-15=2742 - 15 = 27 districts in common).

Recall that one goal (Question 2) is to measure the distribution of statistics G(D,j)𝐺𝐷𝑗G(D,j)italic_G ( italic_D , italic_j ) defined as the number of final plans in D𝐷Ditalic_D with j𝑗jitalic_j districts exactly in common. A mathematically more natural expression that contains the needed information but is a bit harder to phrase in English is F(D,φ)𝐹𝐷𝜑F(D,\varphi)italic_F ( italic_D , italic_φ ), the lowest level at which there is a "mega-ancestor" accounting for φ𝜑\varphiitalic_φ share of the final generation:

F(D,φ):=min{i:jwithd(i,j)>φS}.assign𝐹𝐷𝜑:𝑖𝑗with𝑑𝑖𝑗𝜑𝑆F(D,\varphi):=\min\{i:\exists\ j~{}\hbox{\rm with}~{}d(i,j)>\varphi\cdot S\}.italic_F ( italic_D , italic_φ ) := roman_min { italic_i : ∃ italic_j with italic_d ( italic_i , italic_j ) > italic_φ ⋅ italic_S } .

Table 3 shows an simulated comparison of F(D,φ)𝐹𝐷𝜑F(D,\varphi)italic_F ( italic_D , italic_φ ) between the case of uniform weights and a simple non-uniform setup where one node at each layer is 100 times more likely to be selected than each of the others (that is, the weights are 100:1:1:1:1001:1:1100:1:1\dots:1100 : 1 : 1 … : 1). Each are run 1000 times. As we would expect given Lemma 2.6, the ancestor collapse is accelerated in the non-uniform case. But even this significantly understates the actual repetition in 1000 runs to make SMC samples from New Mexico and New York; we can compare this as well to Tapp’s expert affidavit finding roughly that F(D,.6)<31𝐹𝐷.631F(D,.6)<31italic_F ( italic_D , .6 ) < 31 for a S=5000𝑆5000S=5000italic_S = 5000 sample in New York. The excess degeneracy in real-world samples is partly because the graph partition step itself can boost repetition; if partial progress has created a hard-to-split remainder, this creates yet another scenario in which a generation may be filled out with repeats. (This is referred to as graph "bottlenecks" elsewhere in this note.)

Refer to captionObserved G(D,j)𝐺𝐷𝑗G(D,j)italic_G ( italic_D , italic_j ) for New Mexico runs
Figure 6: We now recast the information from the same 1000 SMC runs to spell out the redundancy. The histograms show how many plans have a set of j𝑗jitalic_j districts exactly in common. A few observations are visible in the marked areas. On several runs, more than 98% of outputs shared 20 districts out of 42 exactly in common. On several runs, more than 90% of outputs shared 30 districts exactly in common. And finally, more than 1/4 of the runs have 50 or more plans in their outputs that are nearly identical statewide, sharing 40 out of 42 districts. These plots are from New Mexico; the effects in New York are still more extreme.

Figure 6 turns this around and shows the distribution of G(D,j)𝐺𝐷𝑗G(D,j)italic_G ( italic_D , italic_j ), the number of plans with j𝑗jitalic_j districts in common, for the same 1000 SMC runs on the New Mexico Senate. Though the redundancy is striking in New Mexico, it is even worse in New York, the other real-world example we track throughout this note.

3 Convergence guarantees and diagnostics

In McCartan and Imai (2023), the main convergence result is a weak central limit theorem stated as follows.

Proposition 3.1 (McCartan–Imai Prop 4.2).

Let πS=j=1Sw(j)δ[ξ(j)]subscript𝜋𝑆superscriptsubscript𝑗1𝑆superscript𝑤𝑗subscript𝛿delimited-[]𝜉𝑗\pi_{S}=\sum_{j=1}^{S}w^{(j)}\delta_{[\xi(j)]}italic_π start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT italic_w start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT italic_δ start_POSTSUBSCRIPT [ italic_ξ ( italic_j ) ] end_POSTSUBSCRIPT be the weighted particle approximation generated by [their SMC Algorithm]. Then for all measurable hhitalic_h on unlabeled plans, as S𝑆S\to\inftyitalic_S → ∞,

S(𝔼πS[h([ξ])]𝔼π[h([ξ])])𝑑𝒩(0,VSMC(h))𝑑𝑆subscript𝔼subscript𝜋𝑆delimited-[]delimited-[]𝜉subscript𝔼𝜋delimited-[]delimited-[]𝜉𝒩0subscript𝑉SMC\sqrt{S}\left(\mathbb{E}_{\pi_{S}}\left[h\left([\xi]\right)\right]-\mathbb{E}_% {\pi}\left[h\left([\xi]\right)\right]\right)\xrightarrow{d}\mathcal{N}(0,V_{% \rm SMC}(h))square-root start_ARG italic_S end_ARG ( blackboard_E start_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_h ( [ italic_ξ ] ) ] - blackboard_E start_POSTSUBSCRIPT italic_π end_POSTSUBSCRIPT [ italic_h ( [ italic_ξ ] ) ] ) start_ARROW overitalic_d → end_ARROW caligraphic_N ( 0 , italic_V start_POSTSUBSCRIPT roman_SMC end_POSTSUBSCRIPT ( italic_h ) )

for some asymptotic variance VSMC(h)subscript𝑉SMCV_{\rm SMC}(h)italic_V start_POSTSUBSCRIPT roman_SMC end_POSTSUBSCRIPT ( italic_h ).

As the authors note, this is convergence in probability rather than almost sure convergence. This style of convergence result does not rule out significant sample repetition.777Indeed, even stronger central limit theorems like those for Markov chains do not rule out this level of repetition, but observed repetition is far less severe for Markov chain methods than for SMC in the redistricting application—and, in any case, they admit far larger samples with current methods. To understand the guarantees better, consider the following construction.

Example (Controlled repetition sampler (CRS)).

Fix a parameter 0<α<1/20𝛼120<\alpha<1/20 < italic_α < 1 / 2. Given a distribution π𝜋\piitalic_π on a state space ΩΩ\Omegaroman_Ω, let the controlled repetition sampler with parameter α𝛼\alphaitalic_α be defined as follows: a sample of size S𝑆Sitalic_S is constructed by taking one draw xπsimilar-to𝑥𝜋x\sim\piitalic_x ∼ italic_π and adding x𝑥xitalic_x to the sample Sαsuperscript𝑆𝛼\lceil S^{\alpha}\rceil⌈ italic_S start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ⌉ times, each with weight 1/S1𝑆1/S1 / italic_S. Next, create a smaller SMC sample of S=SSαsuperscript𝑆𝑆superscript𝑆𝛼S^{\prime}=S-\lceil S^{\alpha}\rceilitalic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_S - ⌈ italic_S start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ⌉ plans. If a plan in this smaller sample is assigned weight w(j)superscript𝑤𝑗w^{(j)}italic_w start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT by SMC, we add it to our CRS sample with normalized weight w(j)S/Ssuperscript𝑤𝑗superscript𝑆𝑆w^{(j)}S^{\prime}/Sitalic_w start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT / italic_S.

Despite the amount of repetition present, where up to S𝑆\sqrt{S}square-root start_ARG italic_S end_ARG plans in the sample of S𝑆Sitalic_S are fully identical, this CRS method still satisfies the conclusions of Proposition 3.1. (This is proved in Appendix A.) Note that the power Sαsuperscript𝑆𝛼S^{\alpha}italic_S start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT could be replaced with any function of S𝑆Sitalic_S such that f(S)/Sf(S)0𝑓𝑆𝑆𝑓𝑆0f(S)/\sqrt{S-f(S)}\to 0italic_f ( italic_S ) / square-root start_ARG italic_S - italic_f ( italic_S ) end_ARG → 0.

Of course, since S/S0𝑆𝑆0\sqrt{S}/S\to 0square-root start_ARG italic_S end_ARG / italic_S → 0, the first and simplest way to minimize the effects of repetition is to employ very large samples. When large samples are computationally expensive, one may be tempted to combine multiple separate samples rather than enlarging a single sample. Indeed, in their small validation example (dividing the 6×6666\times 66 × 6 grid into k=6𝑘6k=6italic_k = 6 districts) the SMC authors have averaged 24 independent runs to obtain the estimates shown in McCartan and Imai (2023, Fig 4) rather than enlarging the size of a single sample past S=10,000𝑆10000S=10,000italic_S = 10 , 000. In other published work (McCartan et al., 2022), the same authors and collaborators present ensembles of 5000 maps for all 50 states, and do so in most cases by combining subsamples from multiple SMC runs. But we know of no theory for this: it is not clear what balance of the sample size S𝑆Sitalic_S and the number of separate runs would constitute best practices for users of SMC for redistricting.888For the illustrative example CRS, the ideal structure would be many samples of size 1. For SMC, this would fail, because a large S𝑆Sitalic_S is already needed for the reweighting of the final sample to take the J𝐽Jitalic_J term in the target distribution into account at all.

In SMC/Redist, the main convergence diagnostic is the Gelman--Rubin R^^𝑅\hat{R}over^ start_ARG italic_R end_ARG statistic, which compares within-sample variance to between-sample variance, typically applied to batches of 2500 or 5000 plans.999Information on the R^^𝑅\hat{R}over^ start_ARG italic_R end_ARG diagnostic can be found at Gelman and Rubin (1992); Vehtari et al. (June 2021). Discussing related issues for MCMC, Charles Geyer (Geyer, 2011) wrote of several popular techniques for using many short runs in place of one long run as "worse than useless": they raise your confidence and produce cosmetically better samples while giving no reason to believe the outputs are close to the target. For now, the computational costs of SMC/Redist lock users in to a many-short-runs framework.

4 Discussion

Since spanning tree methods were introduced in redistricting around 2018, many authors who need a single exemplar of a plan (such as to serve as the starting point for a Markov chain) obtain one by recursively partitioning a tree down to districts. One way to view SMC for redistricting is that it constructs the whole sample by running this seeding process many times. Employing the structure of a descendancy diagram allows the use of weights that help control the properties of the sample, but at the cost of introducing significant combinatorial redundancy.

Repetition undermines statistical conclusions and data visualization.

SMC is sometimes claimed to produce nearly independent samples from arbitrary distributions.101010For instance, this was at one point explicit in the Redist documentation at https://perma.cc/YV37-JZNR. However, district repetition can create massive dependencies, with many plans being identical on large regions, for the sample sizes currently possible with SMC/Redist. Numerous factors that are present in real use cases---non-uniform weights stemming from compactness terms and restricted choices in the decision tree caused by the connection topology of the state, among others---can contribute to extreme repetition, which is visible in plots. A highly redundant sample does not allow for reliable outlier analysis because its percentile statistics will often be far from those of the target distribution---for instance, if 20% of a sample had repeated or identical features, that would manifestly call into question its usefulness to identify 1% outliers. To put the same point in visual terms: significant repetition clearly undermines the use of an SMC ensemble to infer the shape of a distribution of summary statistics, such as in histograms and boxplots. If statistics such as R^^𝑅\hat{R}over^ start_ARG italic_R end_ARG are used as diagnostics on a histogram, they should be applied to the height of each bar. A box-and-whiskers plot on Pennsylvania, as shown in McCartan and Imai (2023, Fig 7), would need 90 R^^𝑅\hat{R}over^ start_ARG italic_R end_ARG calculations (top whisker, third quartile, median, first quartile, bottom whisker for each district).111111See Clelland et al. (2021) for an example where a diagnostic metric, in that case the Kolmogorov–Smirnov statistic, is used in this way to support a box-and-whiskers plot.

Final reweighting cannot compensate for small samples.

The SMC process leverages importance sampling by constructing an initial sample through a descendancy diagram and then reweighting according to the target distribution π𝜋\piitalic_π only when the sample is complete. Before the final reweighting, the sample is approximately distributed by the spanning tree distribution τ𝜏\tauitalic_τ on partitions---but somewhat distorted by repetition, labeling bias, and other artifacts of the construction. The authors intend to use this method to target arbitrary distributions π(ξ)eJ(ξ)τ(ξ)ρproportional-to𝜋𝜉superscript𝑒𝐽𝜉𝜏superscript𝜉𝜌\pi(\xi)\propto\displaystyle e^{-J(\xi)}\tau(\xi)^{\rho}italic_π ( italic_ξ ) ∝ italic_e start_POSTSUPERSCRIPT - italic_J ( italic_ξ ) end_POSTSUPERSCRIPT italic_τ ( italic_ξ ) start_POSTSUPERSCRIPT italic_ρ end_POSTSUPERSCRIPT, but, in particular, the energy functional J𝐽Jitalic_J is never used until the descendancy diagram is complete; partitions that are never encountered by the tree-based generation process cannot be rescued by reweighting. Thus attempts to target a general π𝜋\piitalic_π with SMC will fare no better than applying one-shot reweighting to previously known methods to sample from τ𝜏\tauitalic_τ, such as MCMC.121212If J𝐽Jitalic_J decomposes into a district-by-district score, then there is a way to take it into account in the intergenerational weights, but this is not done by default and it is not available for general J𝐽Jitalic_J. The accuracy depends on emitting a large and diverse sample from the descendancy process. With samples in the tens of thousands on practical problems, many legally relevant events will never be observed---by contrast, current Markov chain methods for sampling from τ𝜏\tauitalic_τ can be run to billions of (accepted) steps.

All-purpose SMC ensembles are unsuited for novel measurements.

Courts have often expressed an interest in the presence of individual districts with particular properties. For instance, in the litigation challenging the Pennsylvania Congressional plan, the court strongly discouraged the division of Pittsburgh across two districts (and the special master was said to treat it as a "disqualifying feature" of a plan).131313Carter v. Chapman (2022), see https://www.pacourts.us/assets/opinions/Supreme/out/J-20-2022mo.pdf. This preference emerged long after the initial expert work had been conducted. In the New Mexico legal challenge, the parties to litigation debated whether it was disqualifying if any district "contains more than 60% of the state’s active oil wells."141414Republican Party vs. Oliver (2023), see nmlegis.gov links, particularly Plaintiff’s Opposed Motion to Exclude Expert Report & Expert Testimony of Dr. Jowei Chen (NM Legislative Defendants, 2022). These examples illustrate that it will frequently be legally relevant to know if some district-level property is common in a neutrally constructed ensemble. If the most-repeated district in some Pennsylvania SMC run happens to divide Pittsburgh, say, the ensemble can give a highly misleading answer. Ensembles constructed with current methodology, including the ALARM ensembles published in Scientific Data (McCartan et al., 2022), are unsuited for estimating the frequency with which district-level properties occur, beyond those properties tested at the time of data generation.151515In McCartan et al. (2022), the R^^𝑅\hat{R}over^ start_ARG italic_R end_ARG statistic is used on a preset list of metrics: ”Finally, we evaluate the convergence of the algorithm for the specific set of summary statistics described above that are of interest to practitioners.” It is unclear exactly how R^^𝑅\hat{R}over^ start_ARG italic_R end_ARG is being applied.

All issues compound with large numbers of districts.

Large numbers of districts (large k𝑘kitalic_k) exacerbate all of the problems described here.161616Even with small numbers of districts, there can be issues: if few districts have many units, then the inter-generational weights can be highly non-uniform, which will likewise boost repetition. We suspect this is a real but less severe worry. The published validation efforts only go up to k6𝑘6k\leq 6italic_k ≤ 6 districts, and the authors themselves have used subdivision to make the problem more tractable---the ALARM 50-state project breaks up Texas (k=38𝑘38k=38italic_k = 38), Florida (k=27𝑘27k=27italic_k = 27), and California (k=52𝑘52k=52italic_k = 52) into three or more pieces, assembling statewide Congressional plans by combining smaller runs on separate regions.171717See Texas, Florida, and California readme files from the ALARM 50-state project. Ensembles that have been modularized in this way have an unknown relationship to the target distribution on the full state.


SMC for redistricting is a highly valuable addition to the literature on sampling methods for graph partitioning. In addition to the theoretical contribution, the Redist package that implements SMC is commendably well-documented and user-friendly, and its authors have designed default settings that allow beginners to complete runs on many full-sized problems. Also notably, the main code faced by users of Redist is in R, a programming language that is popular in social science graduate training. This has created the conditions for even first-time users of graph algorithms to generate materials for expert reports in court cases across the United States, but without a deep understanding of diagnostics and limitations.181818Experts indicating that SMC/Redist was their first exposure to graph algorithms, and sometimes even to algorithmic sampling methods more broadly, filed reports not only in New Mexico and New York, but also North Carolina, Pennsylvania, Louisiana, and Texas, and possibly more. As far as we encountered, they did not discuss convergence diagnostics or warning messages about sample diversity in their expert reports or filings.

In this note, we show that the combinatorics of descendancy diagrams combines with a host of other factors to create potentially severe district repetition in SMC samples. In most cases at the full problem scale of a U.S. state, minimizing the effects of repetition would call for far larger sample sizes than is currently possible. Issues compound when attempting to target distributions that differ from the spanning tree distribution τ𝜏\tauitalic_τ or when the plans have more than about a dozen districts. All of these observations counsel caution in using SMC on full-scale redistricting problems.

References

  • Bertoin (2006) {bbook}[author] \bauthor\bsnmBertoin, \bfnmJean\binitsJ. (\byear2006). \btitleRandom Fragmentation and Coagulation Processes. \bpublisherCambridge University Press. \endbibitem
  • Cannon et al. (2022) {bmisc}[author] \bauthor\bsnmCannon, \bfnmSarah\binitsS., \bauthor\bsnmDuchin, \bfnmMoon\binitsM., \bauthor\bsnmRandall, \bfnmDana\binitsD. and \bauthor\bsnmRule, \bfnmParker\binitsP. (\byear2022). \btitleSpanning tree methods for sampling graph partitions. \endbibitem
  • Clelland et al. (2021) {barticle}[author] \bauthor\bsnmClelland, \bfnmJeanne\binitsJ., \bauthor\bsnmColgate, \bfnmHaley\binitsH., \bauthor\bsnmDeFord, \bfnmDaryl\binitsD., \bauthor\bsnmMalmskog, \bfnmBeth\binitsB. and \bauthor\bsnmSancier-Barbosa, \bfnmFlavia\binitsF. (\byear2021). \btitleColorado in Context: Congressional Redistricting and Competing Fairness Criteria in Colorado. \bjournalJournal of Computational Social Science \bvolume5 \bpages180-226. \endbibitem
  • NM Legislative Defendants (2022) {bmisc}[author] \bauthor\bsnmNM Legislative Defendants (\byear2022). \btitleLegislative Defendants’ Response to Plaintiffs’ Proposed Motion to Exclude. \bnoteFifth Judicial District of New Mexico. Republican Party of New Mexico v. Maggie Toulouse Oliver (2022)., https://www.nmlegis.gov/Redistricting2021/Litigation%20Docs/292%20-%20September%2025,%202023%20Plaintiff’s%20Opposed%20Motion%20to%20Exclude%20Expert%20Report%20&%20Expert%20Testimony%20of%20Dr.%20Jowei%20Chen.pdf. \endbibitem
  • DeFord, Duchin and Solomon (2021) {barticle}[author] \bauthor\bsnmDeFord, \bfnmDaryl\binitsD., \bauthor\bsnmDuchin, \bfnmMoon\binitsM. and \bauthor\bsnmSolomon, \bfnmJustin\binitsJ. (\byear2021). \btitleRecombination: A Family of Markov Chains for Redistricting. \bjournalHarvard Data Science Review \bvolume3. \bnotehttps://hdsr.mitpress.mit.edu/pub/1ds8ptxu. \endbibitem
  • Durrett (2008) {bbook}[author] \bauthor\bsnmDurrett, \bfnmRick\binitsR. (\byear2008). \btitleProbability Models for DNA Sequence Evolution, \beditionSecond ed. \bpublisherSpringer. \endbibitem
  • Gelman and Rubin (1992) {barticle}[author] \bauthor\bsnmGelman, \bfnmAndrew\binitsA. and \bauthor\bsnmRubin, \bfnmDonald B.\binitsD. B. (\byear1992). \btitleInference from Iterative Simulation Using Multiple Sequences. \bjournalStatistical Science \bvolume7 \bpages457--472. \endbibitem
  • Geyer (2011) {bincollection}[author] \bauthor\bsnmGeyer, \bfnmCharles J.\binitsC. J. (\byear2011). \btitleIntroduction to Markov chain Monte Carlo. In \bbooktitleHandbook of Markov chain Monte Carlo. \bseriesChapman & Hall/CRC Handb. Mod. Stat. Methods \bpages3--48. \bpublisherCRC Press, Boca Raton, FL. \bmrnumber2858443 \endbibitem
  • (9) {bunpublished}[author] \bauthor\bsnmKenny, \bfnmC. T.\binitsC. T., \bauthor\bsnmMcCartan, \bfnmC.\binitsC., \bauthor\bsnmFifield, \bfnmB.\binitsB. and \bauthor\bsnmImai, \bfnmK.\binitsK. \btitleRedist: Computational algorithms for redistricting simulation. \bnoteR Package. https://CRAN.R-project.org/package=redist. \endbibitem
  • Liu, Chen and Logvinenko (2001) {bincollection}[author] \bauthor\bsnmLiu, \bfnmJun S.\binitsJ. S., \bauthor\bsnmChen, \bfnmRong\binitsR. and \bauthor\bsnmLogvinenko, \bfnmTanya\binitsT. (\byear2001). \btitleA theoretical framework for sequential importance sampling with resampling. In \bbooktitleSequential Monte Carlo methods in practice. \bseriesStat. Eng. Inf. Sci. \bpages225--246. \bpublisherSpringer, New York. \bmrnumber1847794 \endbibitem
  • McCartan and Imai (2023) {barticle}[author] \bauthor\bsnmMcCartan, \bfnmCory\binitsC. and \bauthor\bsnmImai, \bfnmKosuke\binitsK. (\byear2023). \btitleSequential Monte Carlo for sampling balanced and compact redistricting plans. \bjournalThe Annals of Applied Statistics \bvolume17 \bpages3300 -- 3323. \bdoi10.1214/23-AOAS1763 \endbibitem
  • McCartan et al. (2022) {barticle}[author] \bauthor\bsnmMcCartan, \bfnmCory\binitsC., \bauthor\bsnmKenny, \bfnmChristopher T.\binitsC. T., \bauthor\bsnmSimko, \bfnmTyler\binitsT., \bauthor\bsnmGarcia III, \bfnmGeorge\binitsG., \bauthor\bsnmWang, \bfnmKevin\binitsK., \bauthor\bsnmWu, \bfnmMelissa\binitsM., \bauthor\bsnmKuriwaki, \bfnmShiro\binitsS. and \bauthor\bsnmImai, \bfnmKosuke\binitsK. (\byear2022). \btitleSimulated redistricting plans for the analysis and evaluation of redistricting in the United States. \bjournalScientific Data \bvolume9 \bpages689. \bdoihttps://doi.org/10.1038/s41597-022-01808-2 \endbibitem
  • Raab and Steger (1998) {binproceedings}[author] \bauthor\bsnmRaab, \bfnmMartin\binitsM. and \bauthor\bsnmSteger, \bfnmAngelika\binitsA. (\byear1998). \btitle‘‘Balls into Bins’’ --- A Simple and Tight Analysis. In \bbooktitleRandomization and Approximation Techniques in Computer Science (\beditor\bfnmMichael\binitsM. \bsnmLuby, \beditor\bfnmJosé D. P.\binitsJ. D. P. \bsnmRolim and \beditor\bfnmMaria\binitsM. \bsnmSerna, eds.) \bpages159--170. \bpublisherSpringer Berlin Heidelberg, \baddressBerlin, Heidelberg. \endbibitem
  • Tapp (2022) {bmisc}[author] \bauthor\bsnmTapp, \bfnmKristopher\binitsK. (\byear2022). \btitleSecond Affidavit of Dr. Kristopher R. Tapp, PhD. \bnoteSupreme Court of the State of New York. Harkenrider v. Hochul (2022). https://perma.cc/29X3-59CX. \endbibitem
  • Vehtari et al. (June 2021) {barticle}[author] \bauthor\bsnmVehtari, \bfnmAki\binitsA., \bauthor\bsnmGelman, \bfnmAndrew\binitsA., \bauthor\bsnmSimpson, \bfnmDaniel\binitsD., \bauthor\bsnmCarpenter, \bfnmBob\binitsB. and \bauthor\bsnmBürkner, \bfnmPaul-Christian\binitsP.-C. (\byearJune 2021). \btitleRank-Normalization, Folding, and Localization: An Improved R^^𝑅\hat{R}over^ start_ARG italic_R end_ARG for Assessing Convergence of MCMC (with Discussion). \bjournalBayesian Analysis \bvolume16. \endbibitem

Appendix A Weak CLT for controlled repetition sampler

Proposition A.1.

For any 0<α<1/20𝛼120<\alpha<1/20 < italic_α < 1 / 2, the controlled repetition sampler with parameter α𝛼\alphaitalic_α satisfies

S(𝔼πS[h([ξ])]𝔼π[h([ξ])])𝑑𝒩(0,Vα(h))𝑑𝑆subscript𝔼subscript𝜋𝑆delimited-[]delimited-[]𝜉subscript𝔼𝜋delimited-[]delimited-[]𝜉𝒩0subscript𝑉𝛼\sqrt{S}\left(\mathbb{E}_{\pi_{S}}\left[h\left([\xi]\right)\right]-\mathbb{E}_% {\pi}\left[h\left([\xi]\right)\right]\right)\xrightarrow{d}\mathcal{N}(0,V_{% \alpha}(h))square-root start_ARG italic_S end_ARG ( blackboard_E start_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_h ( [ italic_ξ ] ) ] - blackboard_E start_POSTSUBSCRIPT italic_π end_POSTSUBSCRIPT [ italic_h ( [ italic_ξ ] ) ] ) start_ARROW overitalic_d → end_ARROW caligraphic_N ( 0 , italic_V start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_h ) )

for any measurable function hhitalic_h, where Vαsubscript𝑉𝛼V_{\alpha}italic_V start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT is an asymptotic variance.

Proof.

We will use πSsubscript𝜋superscript𝑆\pi_{S^{\prime}}italic_π start_POSTSUBSCRIPT italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT to denote the distribution for the SMC sample of size Ssuperscript𝑆S^{\prime}italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT used above (before it is reweighted by S/Ssuperscript𝑆𝑆S^{\prime}/Sitalic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT / italic_S), and πSsubscript𝜋𝑆\pi_{S}italic_π start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT to denote our complete sample of size S𝑆Sitalic_S, where

πS=S1/3Sδ[ξ0]+SSπS=S1/3Sδ[ξ0]+j=1SSSw(j)δ[ξ(j)].subscript𝜋𝑆superscript𝑆13𝑆subscript𝛿delimited-[]subscript𝜉0superscript𝑆𝑆subscript𝜋superscript𝑆superscript𝑆13𝑆subscript𝛿delimited-[]subscript𝜉0superscriptsubscript𝑗1superscript𝑆superscript𝑆𝑆superscript𝑤𝑗subscript𝛿delimited-[]𝜉𝑗\pi_{S}=\frac{\lceil S^{1/3}\rceil}{S}\delta_{[\xi_{0}]}+\frac{S^{\prime}}{S}% \pi_{S^{\prime}}=\frac{\lceil S^{1/3}\rceil}{S}\delta_{[\xi_{0}]}+\sum_{j=1}^{% S^{\prime}}\frac{S^{\prime}}{S}w^{(j)}\delta_{[\xi(j)]}.italic_π start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT = divide start_ARG ⌈ italic_S start_POSTSUPERSCRIPT 1 / 3 end_POSTSUPERSCRIPT ⌉ end_ARG start_ARG italic_S end_ARG italic_δ start_POSTSUBSCRIPT [ italic_ξ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ] end_POSTSUBSCRIPT + divide start_ARG italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG start_ARG italic_S end_ARG italic_π start_POSTSUBSCRIPT italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = divide start_ARG ⌈ italic_S start_POSTSUPERSCRIPT 1 / 3 end_POSTSUPERSCRIPT ⌉ end_ARG start_ARG italic_S end_ARG italic_δ start_POSTSUBSCRIPT [ italic_ξ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ] end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT divide start_ARG italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG start_ARG italic_S end_ARG italic_w start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT italic_δ start_POSTSUBSCRIPT [ italic_ξ ( italic_j ) ] end_POSTSUBSCRIPT .

Let hhitalic_h be any measurable function on unlabelled plans. We know, from Proposition 3.1 applied to πSsubscript𝜋superscript𝑆\pi_{S^{\prime}}italic_π start_POSTSUBSCRIPT italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT, that as S𝑆S\rightarrow\inftyitalic_S → ∞ (which also implies S)S^{\prime}\rightarrow\infty)italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT → ∞ ),

S(𝔼πS[h([ξ])]𝔼π[h([ξ])])𝑑𝒩(0,VSMC(h))𝑑superscript𝑆subscript𝔼subscript𝜋superscript𝑆delimited-[]delimited-[]𝜉subscript𝔼𝜋delimited-[]delimited-[]𝜉𝒩0subscript𝑉SMC\sqrt{S^{\prime}}\left(\mathbb{E}_{\pi_{S^{\prime}}}\left[h\left([\xi]\right)% \right]-\mathbb{E}_{\pi}\left[h\left([\xi]\right)\right]\right)\xrightarrow{d}% \mathcal{N}(0,V_{\rm SMC}(h))square-root start_ARG italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG ( blackboard_E start_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_h ( [ italic_ξ ] ) ] - blackboard_E start_POSTSUBSCRIPT italic_π end_POSTSUBSCRIPT [ italic_h ( [ italic_ξ ] ) ] ) start_ARROW overitalic_d → end_ARROW caligraphic_N ( 0 , italic_V start_POSTSUBSCRIPT roman_SMC end_POSTSUBSCRIPT ( italic_h ) )

We define a random variables YS:=S(𝔼πS[h([ξ])]𝔼π[h([ξ])])assignsubscriptsuperscript𝑌𝑆superscript𝑆subscript𝔼subscript𝜋superscript𝑆delimited-[]delimited-[]𝜉subscript𝔼𝜋delimited-[]delimited-[]𝜉Y^{\prime}_{S}:=\sqrt{S^{\prime}}\left(\mathbb{E}_{\pi_{S^{\prime}}}\left[h% \left([\xi]\right)\right]-\mathbb{E}_{\pi}\left[h\left([\xi]\right)\right]\right)italic_Y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT := square-root start_ARG italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG ( blackboard_E start_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_h ( [ italic_ξ ] ) ] - blackboard_E start_POSTSUBSCRIPT italic_π end_POSTSUBSCRIPT [ italic_h ( [ italic_ξ ] ) ] ), and let Z𝒩(0,VSMC)similar-to𝑍𝒩0subscript𝑉𝑆𝑀𝐶Z\sim\mathcal{N}(0,V_{SMC})italic_Z ∼ caligraphic_N ( 0 , italic_V start_POSTSUBSCRIPT italic_S italic_M italic_C end_POSTSUBSCRIPT ). We know for any a𝑎aitalic_a,

limSPr(YSa)=Pr(Za).subscript𝑆Prsubscriptsuperscript𝑌𝑆𝑎Pr𝑍𝑎\lim_{S\rightarrow\infty}\Pr(Y^{\prime}_{S}\leq a)=\Pr(Z\leq a).roman_lim start_POSTSUBSCRIPT italic_S → ∞ end_POSTSUBSCRIPT roman_Pr ( italic_Y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ≤ italic_a ) = roman_Pr ( italic_Z ≤ italic_a ) .

Because Z𝑍Zitalic_Z is a continuous random variable and thus Pr(Z=a)=0Pr𝑍𝑎0\Pr(Z=a)=0roman_Pr ( italic_Z = italic_a ) = 0, it’s also true that for any sequence bSsubscript𝑏𝑆b_{S}italic_b start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT with limSbS=asubscript𝑆subscript𝑏𝑆𝑎\lim_{S\rightarrow\infty}b_{S}=aroman_lim start_POSTSUBSCRIPT italic_S → ∞ end_POSTSUBSCRIPT italic_b start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT = italic_a,

limSPr(YSbS)=Pr(Za).subscript𝑆Prsubscriptsuperscript𝑌𝑆subscript𝑏𝑆Pr𝑍𝑎\lim_{S\rightarrow\infty}\Pr(Y^{\prime}_{S}\leq b_{S})=\Pr(Z\leq a).roman_lim start_POSTSUBSCRIPT italic_S → ∞ end_POSTSUBSCRIPT roman_Pr ( italic_Y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ≤ italic_b start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) = roman_Pr ( italic_Z ≤ italic_a ) .

Let YS:=S(𝔼πS[h([ξ])]𝔼π[h([ξ])])assignsubscript𝑌𝑆𝑆subscript𝔼subscript𝜋𝑆delimited-[]delimited-[]𝜉subscript𝔼𝜋delimited-[]delimited-[]𝜉Y_{S}:=\sqrt{S}\left(\mathbb{E}_{\pi_{S}}\left[h\left([\xi]\right)\right]-% \mathbb{E}_{\pi}\left[h\left([\xi]\right)\right]\right)italic_Y start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT := square-root start_ARG italic_S end_ARG ( blackboard_E start_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_h ( [ italic_ξ ] ) ] - blackboard_E start_POSTSUBSCRIPT italic_π end_POSTSUBSCRIPT [ italic_h ( [ italic_ξ ] ) ] ), and note that we are trying to show the sequence YSsubscript𝑌𝑆Y_{S}italic_Y start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT converges in distribution to 𝒩(0,VSMC)𝒩0subscript𝑉𝑆𝑀𝐶\mathcal{N}(0,V_{SMC})caligraphic_N ( 0 , italic_V start_POSTSUBSCRIPT italic_S italic_M italic_C end_POSTSUBSCRIPT ). We see that

𝔼πS[h([ξ])]=S1/3Sh([ξ0])+SS𝔼πS[h([ξ])].subscript𝔼subscript𝜋𝑆delimited-[]delimited-[]𝜉superscript𝑆13𝑆delimited-[]subscript𝜉0superscript𝑆𝑆subscript𝔼subscript𝜋superscript𝑆delimited-[]delimited-[]𝜉\mathbb{E}_{\pi_{S}}\left[h\left([\xi]\right)\right]=\frac{\lceil S^{1/3}% \rceil}{S}h([\xi_{0}])+\frac{S^{\prime}}{S}\mathbb{E}_{\pi_{S^{\prime}}}\left[% h\left([\xi]\right)\right].blackboard_E start_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_h ( [ italic_ξ ] ) ] = divide start_ARG ⌈ italic_S start_POSTSUPERSCRIPT 1 / 3 end_POSTSUPERSCRIPT ⌉ end_ARG start_ARG italic_S end_ARG italic_h ( [ italic_ξ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ] ) + divide start_ARG italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG start_ARG italic_S end_ARG blackboard_E start_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_h ( [ italic_ξ ] ) ] .

Using that we can rewrite 𝔼π[h([ξ])]=S1/3S𝔼π[h([ξ])]+SS𝔼π[h([ξ])]subscript𝔼𝜋delimited-[]delimited-[]𝜉superscript𝑆13𝑆subscript𝔼𝜋delimited-[]delimited-[]𝜉superscript𝑆𝑆subscript𝔼𝜋delimited-[]delimited-[]𝜉\mathbb{E}_{\pi}\left[h\left([\xi]\right)\right]=\frac{\lceil S^{1/3}\rceil}{S% }\mathbb{E}_{\pi}\left[h\left([\xi]\right)\right]+\frac{S^{\prime}}{S}\mathbb{% E}_{\pi}\left[h\left([\xi]\right)\right]blackboard_E start_POSTSUBSCRIPT italic_π end_POSTSUBSCRIPT [ italic_h ( [ italic_ξ ] ) ] = divide start_ARG ⌈ italic_S start_POSTSUPERSCRIPT 1 / 3 end_POSTSUPERSCRIPT ⌉ end_ARG start_ARG italic_S end_ARG blackboard_E start_POSTSUBSCRIPT italic_π end_POSTSUBSCRIPT [ italic_h ( [ italic_ξ ] ) ] + divide start_ARG italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG start_ARG italic_S end_ARG blackboard_E start_POSTSUBSCRIPT italic_π end_POSTSUBSCRIPT [ italic_h ( [ italic_ξ ] ) ], it follows that

YSsubscript𝑌𝑆\displaystyle Y_{S}italic_Y start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT =S(S1/3Sh([ξ0])+SS𝔼πS[h([ξ])]𝔼π[h([ξ])])absent𝑆superscript𝑆13𝑆delimited-[]subscript𝜉0superscript𝑆𝑆subscript𝔼subscript𝜋superscript𝑆delimited-[]delimited-[]𝜉subscript𝔼𝜋delimited-[]delimited-[]𝜉\displaystyle=\sqrt{S}\left(\frac{\lceil S^{1/3}\rceil}{S}h([\xi_{0}])+\frac{S% ^{\prime}}{S}\mathbb{E}_{\pi_{S^{\prime}}}\left[h\left([\xi]\right)\right]-% \mathbb{E}_{\pi}\left[h\left([\xi]\right)\right]\right)= square-root start_ARG italic_S end_ARG ( divide start_ARG ⌈ italic_S start_POSTSUPERSCRIPT 1 / 3 end_POSTSUPERSCRIPT ⌉ end_ARG start_ARG italic_S end_ARG italic_h ( [ italic_ξ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ] ) + divide start_ARG italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG start_ARG italic_S end_ARG blackboard_E start_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_h ( [ italic_ξ ] ) ] - blackboard_E start_POSTSUBSCRIPT italic_π end_POSTSUBSCRIPT [ italic_h ( [ italic_ξ ] ) ] )
=S1/3Sh([ξ0])S1/3S𝔼π[h([ξ])]+SS(𝔼πS[h([ξ])]𝔼π[h([ξ])])absentsuperscript𝑆13𝑆delimited-[]subscript𝜉0superscript𝑆13𝑆subscript𝔼𝜋delimited-[]delimited-[]𝜉superscript𝑆𝑆subscript𝔼subscript𝜋superscript𝑆delimited-[]delimited-[]𝜉subscript𝔼𝜋delimited-[]delimited-[]𝜉\displaystyle=\frac{\lceil S^{1/3}\rceil}{\sqrt{S}}h([\xi_{0}])-\frac{\lceil S% ^{1/3}\rceil}{\sqrt{S}}\mathbb{E}_{\pi}\left[h\left([\xi]\right)\right]+\frac{% S^{\prime}}{\sqrt{S}}\left(\mathbb{E}_{\pi_{S^{\prime}}}\left[h\left([\xi]% \right)\right]-\mathbb{E}_{\pi}\left[h\left([\xi]\right)\right]\right)= divide start_ARG ⌈ italic_S start_POSTSUPERSCRIPT 1 / 3 end_POSTSUPERSCRIPT ⌉ end_ARG start_ARG square-root start_ARG italic_S end_ARG end_ARG italic_h ( [ italic_ξ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ] ) - divide start_ARG ⌈ italic_S start_POSTSUPERSCRIPT 1 / 3 end_POSTSUPERSCRIPT ⌉ end_ARG start_ARG square-root start_ARG italic_S end_ARG end_ARG blackboard_E start_POSTSUBSCRIPT italic_π end_POSTSUBSCRIPT [ italic_h ( [ italic_ξ ] ) ] + divide start_ARG italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG start_ARG square-root start_ARG italic_S end_ARG end_ARG ( blackboard_E start_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_h ( [ italic_ξ ] ) ] - blackboard_E start_POSTSUBSCRIPT italic_π end_POSTSUBSCRIPT [ italic_h ( [ italic_ξ ] ) ] )
=S1/3S(h([ξ0])𝔼π[h([ξ])])+SSS(𝔼πS[h([ξ])]𝔼π[h([ξ])])absentsuperscript𝑆13𝑆delimited-[]subscript𝜉0subscript𝔼𝜋delimited-[]delimited-[]𝜉superscript𝑆𝑆superscript𝑆subscript𝔼subscript𝜋superscript𝑆delimited-[]delimited-[]𝜉subscript𝔼𝜋delimited-[]delimited-[]𝜉\displaystyle=\frac{\lceil S^{1/3}\rceil}{\sqrt{S}}\left(h([\xi_{0}])-\mathbb{% E}_{\pi}\left[h\left([\xi]\right)\right]\right)+\sqrt{\frac{S^{\prime}}{S}}% \cdot\sqrt{S^{\prime}}\left(\mathbb{E}_{\pi_{S^{\prime}}}\left[h\left([\xi]% \right)\right]-\mathbb{E}_{\pi}\left[h\left([\xi]\right)\right]\right)= divide start_ARG ⌈ italic_S start_POSTSUPERSCRIPT 1 / 3 end_POSTSUPERSCRIPT ⌉ end_ARG start_ARG square-root start_ARG italic_S end_ARG end_ARG ( italic_h ( [ italic_ξ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ] ) - blackboard_E start_POSTSUBSCRIPT italic_π end_POSTSUBSCRIPT [ italic_h ( [ italic_ξ ] ) ] ) + square-root start_ARG divide start_ARG italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG start_ARG italic_S end_ARG end_ARG ⋅ square-root start_ARG italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG ( blackboard_E start_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_h ( [ italic_ξ ] ) ] - blackboard_E start_POSTSUBSCRIPT italic_π end_POSTSUBSCRIPT [ italic_h ( [ italic_ξ ] ) ] )
=S1/3S(h([ξ0])𝔼π[h([ξ])])+SSYSabsentsuperscript𝑆13𝑆delimited-[]subscript𝜉0subscript𝔼𝜋delimited-[]delimited-[]𝜉superscript𝑆𝑆subscriptsuperscript𝑌𝑆\displaystyle=\frac{\lceil S^{1/3}\rceil}{\sqrt{S}}\left(h([\xi_{0}])-\mathbb{% E}_{\pi}\left[h\left([\xi]\right)\right]\right)+\sqrt{\frac{S^{\prime}}{S}}Y^{% \prime}_{S}= divide start_ARG ⌈ italic_S start_POSTSUPERSCRIPT 1 / 3 end_POSTSUPERSCRIPT ⌉ end_ARG start_ARG square-root start_ARG italic_S end_ARG end_ARG ( italic_h ( [ italic_ξ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ] ) - blackboard_E start_POSTSUBSCRIPT italic_π end_POSTSUBSCRIPT [ italic_h ( [ italic_ξ ] ) ] ) + square-root start_ARG divide start_ARG italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG start_ARG italic_S end_ARG end_ARG italic_Y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT

If we consider Pr(YSa)Prsubscript𝑌𝑆𝑎\Pr(Y_{S}\leq a)roman_Pr ( italic_Y start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ≤ italic_a ) for any a𝑎aitalic_a, we see that

Pr(YSa))\displaystyle\Pr\left(Y_{S}\leq a\right))roman_Pr ( italic_Y start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ≤ italic_a ) ) =Pr(S1/3S(h([ξ0])𝔼π[h([ξ])])+SSYSa)absentPrsuperscript𝑆13𝑆delimited-[]subscript𝜉0subscript𝔼𝜋delimited-[]delimited-[]𝜉superscript𝑆𝑆subscriptsuperscript𝑌superscript𝑆𝑎\displaystyle=\Pr\left(\frac{\lceil S^{1/3}\rceil}{\sqrt{S}}\left(h([\xi_{0}])% -\mathbb{E}_{\pi}\left[h\left([\xi]\right)\right]\right)+\sqrt{\frac{S^{\prime% }}{S}}Y^{\prime}_{S^{\prime}}\leq a\right)= roman_Pr ( divide start_ARG ⌈ italic_S start_POSTSUPERSCRIPT 1 / 3 end_POSTSUPERSCRIPT ⌉ end_ARG start_ARG square-root start_ARG italic_S end_ARG end_ARG ( italic_h ( [ italic_ξ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ] ) - blackboard_E start_POSTSUBSCRIPT italic_π end_POSTSUBSCRIPT [ italic_h ( [ italic_ξ ] ) ] ) + square-root start_ARG divide start_ARG italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG start_ARG italic_S end_ARG end_ARG italic_Y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ≤ italic_a )
=Pr(YSSS(a+S1/3S(𝔼π[h([ξ])]h([ξ0]))))absentPrsubscriptsuperscript𝑌𝑆𝑆superscript𝑆𝑎superscript𝑆13𝑆subscript𝔼𝜋delimited-[]delimited-[]𝜉delimited-[]subscript𝜉0\displaystyle=\Pr\left(Y^{\prime}_{S}\leq\sqrt{\frac{S}{S^{\prime}}}\left(a+% \frac{\lceil S^{1/3}\rceil}{\sqrt{S}}\left(\mathbb{E}_{\pi}\left[h\left([\xi]% \right)\right]-h([\xi_{0}])\right)\right)\right)= roman_Pr ( italic_Y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ≤ square-root start_ARG divide start_ARG italic_S end_ARG start_ARG italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG end_ARG ( italic_a + divide start_ARG ⌈ italic_S start_POSTSUPERSCRIPT 1 / 3 end_POSTSUPERSCRIPT ⌉ end_ARG start_ARG square-root start_ARG italic_S end_ARG end_ARG ( blackboard_E start_POSTSUBSCRIPT italic_π end_POSTSUBSCRIPT [ italic_h ( [ italic_ξ ] ) ] - italic_h ( [ italic_ξ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ] ) ) ) )

Note for bS:=SS(a+S1/3S(𝔼π[h([ξ])]h([ξ0])))assignsubscript𝑏𝑆𝑆superscript𝑆𝑎superscript𝑆13𝑆subscript𝔼𝜋delimited-[]delimited-[]𝜉delimited-[]subscript𝜉0b_{S}:=\frac{\sqrt{S}}{\sqrt{S^{\prime}}}\left(a+\frac{\lceil S^{1/3}\rceil}{% \sqrt{S}}\left(\mathbb{E}_{\pi}\left[h\left([\xi]\right)\right]-h([\xi_{0}])% \right)\right)italic_b start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT := divide start_ARG square-root start_ARG italic_S end_ARG end_ARG start_ARG square-root start_ARG italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG end_ARG ( italic_a + divide start_ARG ⌈ italic_S start_POSTSUPERSCRIPT 1 / 3 end_POSTSUPERSCRIPT ⌉ end_ARG start_ARG square-root start_ARG italic_S end_ARG end_ARG ( blackboard_E start_POSTSUBSCRIPT italic_π end_POSTSUBSCRIPT [ italic_h ( [ italic_ξ ] ) ] - italic_h ( [ italic_ξ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ] ) ) ), we see that limSbS=asubscript𝑆subscript𝑏𝑆𝑎\lim_{S\rightarrow\infty}b_{S}=aroman_lim start_POSTSUBSCRIPT italic_S → ∞ end_POSTSUBSCRIPT italic_b start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT = italic_a, as S/S𝑆superscript𝑆\sqrt{S/S^{\prime}}square-root start_ARG italic_S / italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG goes to one and S1/3Ssuperscript𝑆13superscript𝑆\frac{\lceil S^{1/3}\rceil}{\sqrt{S^{\prime}}}divide start_ARG ⌈ italic_S start_POSTSUPERSCRIPT 1 / 3 end_POSTSUPERSCRIPT ⌉ end_ARG start_ARG square-root start_ARG italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG end_ARG goes to 0. We conclude that for all a𝑎aitalic_a,

limSPr(YSa)=limSPr(YSbS)=Pr(Za),subscript𝑆Prsubscript𝑌𝑆𝑎subscript𝑆Prsubscriptsuperscript𝑌𝑆subscript𝑏𝑆Pr𝑍𝑎\lim_{S\rightarrow\infty}\Pr(Y_{S}\leq a)=\lim_{S\rightarrow\infty}\Pr(Y^{% \prime}_{S}\leq b_{S})=\Pr(Z\leq a),roman_lim start_POSTSUBSCRIPT italic_S → ∞ end_POSTSUBSCRIPT roman_Pr ( italic_Y start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ≤ italic_a ) = roman_lim start_POSTSUBSCRIPT italic_S → ∞ end_POSTSUBSCRIPT roman_Pr ( italic_Y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ≤ italic_b start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) = roman_Pr ( italic_Z ≤ italic_a ) ,

and so YSsubscript𝑌𝑆Y_{S}italic_Y start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT converges in distribution to 𝒩(0,Vα)𝒩0subscript𝑉𝛼\mathcal{N}(0,V_{\alpha})caligraphic_N ( 0 , italic_V start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ), as claimed. Note this result would hold with S1/3superscript𝑆13S^{1/3}italic_S start_POSTSUPERSCRIPT 1 / 3 end_POSTSUPERSCRIPT replaced by Sαsuperscript𝑆𝛼S^{\alpha}italic_S start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT for any α<1/2𝛼12\alpha<1/2italic_α < 1 / 2.