Approximating Klee’s Measure Problem and
a Lower Bound for Union Volume Estimation

Karl Bringmann [email protected]. Saarland University and Max-Planck-Institute for Informatics, Saarland Informatics Campus, Saarbrücken, Germany. This work is part of the project TIPEA that has received funding from the European Research Council (ERC) under the European Unions Horizon 2020 research and innovation programme (grant agreement No. 850979).    Kasper Green Larsen [email protected]. Aarhus University. Supported by a DFF Sapere Aude Research Leader Grant No. 9064-00068B.    André Nusser [email protected]. CNRS, Inria, I3S, Université Côte d’Azur, France. This work was supported by the French government through the France 2030 investment plan managed by the National Research Agency (ANR), as part of the Initiative of Excellence of Université Côte d’Azur under reference number ANR-15-IDEX-01. Part of this work was conducted while the author was at BARC, University of Copenhagen, supported by the VILLUM Foundation grant 16582.    Eva Rotenberg [email protected]. Technical University of Denmark. Supported by DFF Grant 2020-2023 (9131-00044B) “Dynamic Network Analysis”, the VILLUM Foundation grant VIL37507 “Efficient Recomputations for Changeful Problems” and the Carlsberg Foundation Young Researcher Fellowship CF21-0302 “Graph Algorithms with Geometric Applications”.    Yanheng Wang [email protected]. Saarland University and Max-Planck-Institute for Informatics, Saarland Informatics Campus, Saarbrücken, Germany. This work is part of the project TIPEA that has received funding from the European Research Council (ERC) under the European Unions Horizon 2020 research and innovation programme (grant agreement No. 850979).
Abstract

Union volume estimation is a classical algorithmic problem. Given a family of objects O1,,Ondsubscript𝑂1subscript𝑂𝑛superscript𝑑O_{1},\ldots,O_{n}\subseteq\mathbb{R}^{d}italic_O start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_O start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⊆ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, we want to approximate the volume of their union. In the special case where all objects are boxes (also known as hyperrectangles) this is known as Klee’s measure problem. The state-of-the-art algorithm [Karp, Luby, Madras ’89] for union volume estimation as well as Klee’s measure problem in constant dimension d𝑑ditalic_d computes a (1+ε)1𝜀(1+\varepsilon)( 1 + italic_ε )-approximation with constant success probability by using a total of O(n/ε2)𝑂𝑛superscript𝜀2O(n/\varepsilon^{2})italic_O ( italic_n / italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) queries of the form (i) ask for the volume of Oisubscript𝑂𝑖O_{i}italic_O start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, (ii) sample a point uniformly at random from Oisubscript𝑂𝑖O_{i}italic_O start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, and (iii) query whether a given point is contained in Oisubscript𝑂𝑖O_{i}italic_O start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT.

First, we show that if one can only interact with the objects via the aforementioned three queries, the query complexity of [Karp, Luby, Madras ’89] is indeed optimal, i.e., Ω(n/ε2)Ω𝑛superscript𝜀2\Omega(n/\varepsilon^{2})roman_Ω ( italic_n / italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) queries are necessary. Our lower bound already holds for estimating the union of equiponderous axis-aligned polygons in 2superscript2\mathbb{R}^{2}blackboard_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, and even if the algorithm is allowed to inspect the coordinates of the points sampled from the polygons, and still holds when a containment query can ask containment of an arbitrary (not necessarily sampled) point.

Second, guided by the insights of the lower bound, we provide a more efficient approximation algorithm for Klee’s measure problem improving the O(n/ε2)𝑂𝑛superscript𝜀2O(n/\varepsilon^{2})italic_O ( italic_n / italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) time to O((n+1ε2)logO(d)n)𝑂𝑛1superscript𝜀2superscript𝑂𝑑𝑛O((n+\frac{1}{\varepsilon^{2}})\cdot\log^{O(d)}n)italic_O ( ( italic_n + divide start_ARG 1 end_ARG start_ARG italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) ⋅ roman_log start_POSTSUPERSCRIPT italic_O ( italic_d ) end_POSTSUPERSCRIPT italic_n ). We achieve this improvement by exploiting the geometry of Klee’s measure problem in various ways: (1) Since we have access to the boxes’ coordinates, we can split the boxes into classes of boxes of similar shape. (2) Within each class, we show how to sample from the union of all boxes, by using orthogonal range searching. And (3) we exploit that boxes of different classes have small intersection, for most pairs of classes.

1 Introduction

We revisit the classical problem of union volume estimation: given objects O1,,Ondsubscript𝑂1subscript𝑂𝑛superscript𝑑O_{1},\ldots,O_{n}\subseteq\mathbb{R}^{d}italic_O start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_O start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⊆ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, we want to estimate the volume of O1Onsubscript𝑂1subscript𝑂𝑛O_{1}\cup\ldots\cup O_{n}italic_O start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∪ … ∪ italic_O start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT.111Technically, the objects need to be measurable. In fact, a generalization of this problem allows O1,,Onsubscript𝑂1subscript𝑂𝑛O_{1},\ldots,O_{n}italic_O start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_O start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT to be any measurable subsets of a measure space, and we want to estimate the measure of their union. However, throughout this paper the objects will always be boxes in dsuperscript𝑑\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT (in our algorithm) or polygons in the plane (in our lower bound construction), and thus these technicalities are irrelevant in our context. This problem has several important applications such as DNF Counting and Network Reliability; see the discussion in Section 1.2.

The state-of-the-art solution [19] works in a model where one has access to each input object Oisubscript𝑂𝑖O_{i}italic_O start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT by three types of queries: (i) determine the volume of the object, (ii) sample a point uniformly at random from the object, and (iii) ask whether a point is contained in the object. Apart from these types of queries, the model allows arbitrary computations. The complexity of algorithms is thus measured by the number of queries to the input objects.

After Karp and Luby [19] introduced this model, Karp, Luby and Madras [20] showed that one can (1+ε)1𝜀(1+\varepsilon)( 1 + italic_ε )-approximate the volume of n𝑛nitalic_n objects in this model using O(n/ε2)𝑂𝑛superscript𝜀2O(n/\varepsilon^{2})italic_O ( italic_n / italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) queries with constant success probability222The success probability can be boosted to 1δ1𝛿1-\delta1 - italic_δ at the cost of a factor log(1/δ)1𝛿\log(1/\delta)roman_log ( 1 / italic_δ ) in the number of queries and running time., by an algorithm that uses O(n/ε2)𝑂𝑛superscript𝜀2O(n/\varepsilon^{2})italic_O ( italic_n / italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) additional time (and their solution only asks containment queries of previously sampled points). This improved earlier related algorithms by Karp and Luby [19] and Luby [23]. In the last 35 years this problem has seen no improvement of the upper bound. Hence, it is natural to ask whether this classical upper bound is best possible and whether one can give a matching lower bound. We resolve this question in this work by providing a matching lower bound.

The union volume estimation problem was also studied very recently in the streaming setting [26, 24]. Here, the objects come in a stream O1,,Onsubscript𝑂1subscript𝑂𝑛O_{1},\ldots,O_{n}italic_O start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_O start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, and when we are at position i𝑖iitalic_i in the stream, we can only query object Oisubscript𝑂𝑖O_{i}italic_O start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Assuming the objects are subsets of a universe ΩΩ\Omegaroman_Ω, this line of work gives a streaming algorithm using O(polylog(|Ω|)log(1/δ)/ε2)𝑂polylogΩ1𝛿superscript𝜀2O(\textrm{polylog}(|\Omega|)\log(1/\delta)/\varepsilon^{2})italic_O ( polylog ( | roman_Ω | ) roman_log ( 1 / italic_δ ) / italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) queries per object (the same bound holds for the space usage and update time additional to the queries). Summed over n𝑛nitalic_n boxes this yields the same total running time as the general tool, apart from the polylog(|Ω|)polylogΩ\textup{polylog}(|\Omega|)polylog ( | roman_Ω | ) factor. So, interestingly, even in the streaming setting the same running time can be achieved.333See also [31] for earlier work studying Klee’s measure problem in the streaming setting.

The perhaps most famous application of the algorithm by Karp, Luby, and Madras [20] is Klee’s measure problem [22]: This is a fundamental problem in computational geometry in which we are given n𝑛nitalic_n axis-aligned boxes in dsuperscript𝑑\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT and want to compute the volume of their union. Here an axis-aligned box is any set of the form [a1,b1]××[ad,bd]dsubscript𝑎1subscript𝑏1subscript𝑎𝑑subscript𝑏𝑑superscript𝑑[a_{1},b_{1}]\times\ldots\times[a_{d},b_{d}]\subset\mathbb{R}^{d}[ italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] × … × [ italic_a start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ] ⊂ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, and the input consists of the coordinates a1,b1,,ad,bdsubscript𝑎1subscript𝑏1subscript𝑎𝑑subscript𝑏𝑑a_{1},b_{1},\ldots,a_{d},b_{d}italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_a start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT of each box. A long line of research on this problem and various special cases (e.g., for fixed dimensions or for cubes) [32, 27, 11, 2, 1, 12, 33, 3] lead to an exact algorithm running in time O(nd/2+nlogn)𝑂superscript𝑛𝑑2𝑛𝑛O(n^{d/2}+n\log n)italic_O ( italic_n start_POSTSUPERSCRIPT italic_d / 2 end_POSTSUPERSCRIPT + italic_n roman_log italic_n ) for constant d𝑑ditalic_d [13]. A conditional lower bound suggests that any faster algorithm would require fast matrix multiplication techniques [12], but it is unclear how to apply fast matrix multiplication to this problem. On the approximation side, note that for a d𝑑ditalic_d-dimensional axis-aligned box, the three queries can be implemented in time O(d)𝑂𝑑O(d)italic_O ( italic_d ). Thus, the union volume estimation algorithm can be applied, and it computes a (1+ε)1𝜀(1+\varepsilon)( 1 + italic_ε )-approximation of Klee’s measure problem in time O(nd/ε2)𝑂𝑛𝑑superscript𝜀2O(nd/\varepsilon^{2})italic_O ( italic_n italic_d / italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ), as has been observed in [4]. This direct application of union volume estimation was the state of the art for approximate solutions for Klee’s measure problem until our work. See Section 1.2 for interesting applications of Klee’s measure problem.

1.1 Our Contribution

Our contribution is twofold.

Lower bound for union volume estimation

Given the state of the art, a natural question is to ask whether the query complexity of the general union volume estimation algorithm of [20] can be further improved. Any such improvement would speed up several important applications, cf. Section 1.2. On the other hand, any lower bound showing that the algorithm of [20] is optimal also implies tightness of the known streaming algorithms (up to logarithmic factors), as the streaming algorithms match the static running time bound.

We answer this question negatively in the aforementioned query model. Note that the model allows unbounded computational power, examining the numerical coordinates of sampled points, and asking containment queries on arbitrary points. In contrast, these powers are not exploited by [20]. So our lower bound encompasses a much wider paradigm of algorithms. We show a query complexity lower bound of Ω(n/ε2)Ω𝑛superscript𝜀2\Omega(n/\varepsilon^{2})roman_Ω ( italic_n / italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) for this model, which matches the upper bound of [20]:

Theorem 1.

Any algorithm for computing a (1+ε)1𝜀(1+\varepsilon)( 1 + italic_ε )-approximation to the cardinality of the union of n𝑛nitalic_n objects via volume, sampling and containment queries with success probability at least 4/5454/54 / 5 must make Ω(ε2n)Ωsuperscript𝜀2𝑛\Omega(\varepsilon^{-2}n)roman_Ω ( italic_ε start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT italic_n ) queries.

We want to particularly highlight that our lower bound even holds for subsets of 2superscript2\mathbb{Z}^{2}blackboard_Z start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, and for equiponderous, axis-aligned polygons in the plane.

Upper bound for Klee’s measure problem

Our lower bound for union volume estimation implies that we can only achieve an improvement of the current upper bound of Klee’s measure problem if we exploit the geometric structure of boxes. Specifically, we exploit that we can split the input boxes into classes of similar boxes, since we have access to the boxes’ coordinates, and we make use of orthogonal range searching. This allows us to break the barrier that is possible within the query model and provide an algorithm that improves Klee’s measure problem from time O(n/ε2)𝑂𝑛superscript𝜀2O(n/\varepsilon^{2})italic_O ( italic_n / italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) to O((n+1ε2)polylog(n))𝑂𝑛1superscript𝜀2polylog𝑛O((n+\frac{1}{\varepsilon^{2}})\cdot\mathrm{polylog}(n))italic_O ( ( italic_n + divide start_ARG 1 end_ARG start_ARG italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) ⋅ roman_polylog ( italic_n ) ) in constant dimension.

Theorem 2.

There is an algorithm that runs in time O(log2d+1(n)(n+1ε2))𝑂superscript2𝑑1𝑛𝑛1superscript𝜀2O\left(\log^{2d+1}(n)\cdot(n+\frac{1}{\varepsilon^{2}})\right)italic_O ( roman_log start_POSTSUPERSCRIPT 2 italic_d + 1 end_POSTSUPERSCRIPT ( italic_n ) ⋅ ( italic_n + divide start_ARG 1 end_ARG start_ARG italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) ) and with probability at least 0.90.90.90.9 computes a (1+ε)1𝜀(1+\varepsilon)( 1 + italic_ε )-approximation for Klee’s measure problem.

The success probability can be boosted to any 1δ1𝛿1-\delta1 - italic_δ using standard techniques and incurring an additional log(1/δ)1𝛿\log(1/\delta)roman_log ( 1 / italic_δ ) factor in the running time. We also want to highlight that the core of our algorithm is an efficient method to sample uniformly and independently with a given density from the union of the input objects. While this allows us to (1+ε)1𝜀(1+\varepsilon)( 1 + italic_ε )-approximate the volume of the union, we believe that our efficient sampling method is also of independent interest.

Throughout this work, for simplicity and readability we assume the dimension d𝑑ditalic_d to be constant. We remark that our running time bounds hide factors of the form 2O(d)superscript2𝑂𝑑2^{O(d)}2 start_POSTSUPERSCRIPT italic_O ( italic_d ) end_POSTSUPERSCRIPT.

1.2 Related Work

A major application of union volume estimation is DNF Counting, in which we are given a formula in disjunctive normal form and want to count its number of satisfying assignments. Computing the exact number of satisfying assignments is ##\##P-complete, therefore it likely requires exponential time. Approximating the number of satisfying assignments can be achieved by an easy application of union volume estimation, as described in [20]. Their algorithm remains the state of the art for this problem to this day, see, e.g., [25]. In particular, a direct application of the union volume estimation algorithm of [20] gives the best known complexity for approximate DNF Counting. This has been extended to more general model counting [28, 9, 25], probabilistic databases [21, 15, 29], and probabilistic queries on databases [6].

We also want to mention Network Reliability as another application for union volume estimation, which was already discussed in [20]. Additionally, Karger’s famous paper on the problem [18] uses the algorithm of [20] as a subroutine. However, the current state-of-the-art algorithms no longer use union volume estimation as a tool [7].

Finally, we want to draw a connection to the following well-known query sampling bound. Canetti, Even, and Goldreich [5] showed that approximating the mean of a random variable whose codomain is the unit interval requires Ω(log(1/δ)/ε2)Ω1𝛿superscript𝜀2\Omega(\log(1/\delta)/\varepsilon^{2})roman_Ω ( roman_log ( 1 / italic_δ ) / italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) queries, thus obtaining tight bounds for the sampling complexity of the mean estimation problem. Their bound generalises to Ω(log(1/δ)/(με2))Ω1𝛿𝜇superscript𝜀2\Omega(\log(1/\delta)/(\mu\varepsilon^{2}))roman_Ω ( roman_log ( 1 / italic_δ ) / ( italic_μ italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ) on the number of queries needed to estimate the mean μ𝜇\muitalic_μ of a random variable in general. Before our work it was thus natural to expect that the 1/ε21superscript𝜀21/\varepsilon^{2}1 / italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT dependence in the number of queries for union volume estimation is optimal. However, whether the factor n𝑛nitalic_n is necessary, or the number of queries could be improved to, say, O(n+1/ε2)𝑂𝑛1superscript𝜀2O(n+1/\varepsilon^{2})italic_O ( italic_n + 1 / italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ), was open to the best of our knowledge.

Klee’s measure problem is an important problem in computational geometry. One reason for its importance is that techniques that have been developed for Klee’s measure problem can often be adapted to solve various related problems, such as the depth problem (given a set of boxes, what is the largest number of boxes that can be stabbed by a single point?) [13] or Hausdorff distance under translation in Lsubscript𝐿L_{\infty}italic_L start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT [14]. Moreover, various other problems can be reduced to Klee’s measure problem or to its related problems, e.g., deciding whether a set of boxes covers its boundary box can be reduced to Klee’s Measure problem [13], the continuous k𝑘kitalic_k-Center problem on graphs (i.e., finding centers that can lie on the edges of a graph that cover the vertices of a graph) can also be reduced to Klee’s measure problem [30], and finding the smallest hypercube containing at least k𝑘kitalic_k points among n𝑛nitalic_n given points can be reduced to the depth problem [17, 10, 13]. In light of this, it would be interesting to see whether our approximation techniques generalize to any of these related problems.

1.3 Technical Overview

We now give an overview of our results, starting with our upper bound result for Klee’s measure problem. We keep the statements on an intuitive level and hide many technical details. For the formal statements and proofs, see Section 2 for the upper bound and Section 3 for the lower bound.

Upper bound for Klee’s measure problem

We first remark that due to our lower bound result, we know that we have to exploit the structure of the input to obtain a running time of the form O((n+1ε2)polylog(n))𝑂𝑛1superscript𝜀2polylog𝑛O((n+\frac{1}{\varepsilon^{2}})\cdot\mathrm{polylog}(n))italic_O ( ( italic_n + divide start_ARG 1 end_ARG start_ARG italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) ⋅ roman_polylog ( italic_n ) ). Following a common algorithmic approach, we use sampling to approximate the volume of the union. Specifically, we want to draw a sample S𝑆Sitalic_S from the union of boxes with density p𝑝pitalic_p, such that in the end |S|/p𝑆𝑝|S|/p| italic_S | / italic_p is a good estimate of the volume of the union of input boxes. We defer how to set p𝑝pitalic_p to the end of this overview and first focus on the main difficulty, i.e., how to create a sample for a given p𝑝pitalic_p.

We start with a simple classification of the input boxes into classes of similar shape. Two boxes are in the same class if the side lengths of both boxes in each dimension i[d]𝑖delimited-[]𝑑i\in[d]italic_i ∈ [ italic_d ] lie in the same interval [2j,2j+1)superscript2𝑗superscript2𝑗1[2^{j},2^{j+1})[ 2 start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT , 2 start_POSTSUPERSCRIPT italic_j + 1 end_POSTSUPERSCRIPT ) for some j𝑗j\in\mathbb{Z}italic_j ∈ blackboard_Z. We call two classes similar if their side lengths are polynomially related (e.g., within a factor of n4superscript𝑛4n^{4}italic_n start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT) in each dimension.

We use the following three crucial insights to obtain an efficient algorithm:

  1. 1.

    We can efficiently sample from the union of boxes of a single class, see Lemma 4 (and Figure 1).

  2. 2.

    Each class has only few (i.e., a polylogarithmic number of) classes that are similar to it, see Observation 1.

  3. 3.

    Classes that are not similar have a small intersection compared to their union, see Observation 2 (and Figure 2).

In the remainder we give some more details on these insights and how they lead us to an efficient algorithm. The rough idea of our algorithm is as follows. We go through the classes in arbitrary order. For each class we sample with density p𝑝pitalic_p from the union of the boxes of this class, but we only keep a point if it is not contained in any class that comes later in the order. To efficiently check for containment in a later class, we use an orthogonal range searching data structure (with an additional dimension for the index of the class).

To understand why our algorithm is efficient, we have to look at two different parts:

Sampling from a class:

One of our main technical ingredients is to sample from the union of boxes of similar shape. Note that efficient sampling implies efficient volume estimation, so to break our lower bound we must exploit additional input structure than those offered in the query model. Our main approach here is simple but powerful: We can sample points from the union of similar shaped boxes uniformly by (1) gridding the space into cells of side lengths comparable to these boxes, (2) sampling points from the relevant cells, and (3) discarding points not in the union by querying an orthogonal range searching data structure. As the grid size is similar to the shape of the boxes in the class, we ensure that a significant fraction of the points sampled in (2) are contained in the union, i.e., not discarded. The orthogonal range searching data structure allows us to quickly check for containment.

Bound the number of drawn samples:

As we discard samples that appear in later classes, this is a potential source of inefficiency. Therefore, we need to bound the number of samples that we discard using the second and third insight from above. The second insight states that there are only few, i.e., polylogarithmically many, similar classes. Hence, a point might be discarded because it is contained in one of these similar classes, but as there are only few, this will only happen a polylogarithmic number of times. On the other hand, the third insight states that the intersection of dissimilar classes is small. Thus, the probability that we discard a sampled point because of a dissimilar class is small, and such events will not have a significant impact on the running time.

Figure 1: We sample points in the grid cells 𝒢𝒢\mathcal{G}caligraphic_G that are intersected by a box 𝒪isubscript𝒪𝑖\mathcal{O}_{i}caligraphic_O start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT from a fixed class. We then use orthogonal range searching to determine whether a sampled point is in a box from the class and should be kept (🌑🌑\newmoon🌑), or is not and should be discarded (×\times×).
Figure 2: When boxes differ a lot in side length for at least one of their dimensions (in this case, the y𝑦yitalic_y-axis), their intersection is small compared to their union.

Finally, to set the sampling probability p𝑝pitalic_p, we need a crude estimate of the volume. To obtain a constant factor approximation, one can use the classical algorithms (by Karp and Luby [19] or Karp, Luby, Madras [20]) with a constant error parameter (say, 1212\frac{1}{2}divide start_ARG 1 end_ARG start_ARG 2 end_ARG), to obtain a constant approximation factor in near-linear time. To keep our work self-contained, we provide a brief description and a simplified correctness proof of this case for union volume estimation, based on Karp and Luby [19], in Section 2.1.

Lower bound

We now give an overview of our lower bound result. The lower bound is proven by a reduction from a variant of the Gap-Hamming problem, defined as follows: Given two vectors x,y{1,+1}T𝑥𝑦superscript11𝑇x,y\in\{-1,+1\}^{T}italic_x , italic_y ∈ { - 1 , + 1 } start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT, distinguish whether their inner product is greater than T𝑇\sqrt{T}square-root start_ARG italic_T end_ARG or less than T𝑇-\sqrt{T}- square-root start_ARG italic_T end_ARG. It is known that any algorithm distinguishing these two cases with success probability at least 2/3232/32 / 3 must perform Ω(T)Ω𝑇\Omega(T)roman_Ω ( italic_T ) queries into x𝑥xitalic_x and y𝑦yitalic_y.

We first give the intuition why Ω(1/ε2)Ω1superscript𝜀2\Omega(1/\varepsilon^{2})roman_Ω ( 1 / italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) samples are necessary to (1+ε)1𝜀(1+\varepsilon)( 1 + italic_ε )-approximate the union of two sets with constant probability in the query model. Given a Gap-Hamming instance x,y𝑥𝑦x,yitalic_x , italic_y, we construct two sets X={(i,xi):i[T]}𝑋conditional-set𝑖subscript𝑥𝑖𝑖delimited-[]𝑇X=\{(i,x_{i}):i\in[T]\}italic_X = { ( italic_i , italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) : italic_i ∈ [ italic_T ] } and Y={(i,yi):i[T]}𝑌conditional-set𝑖subscript𝑦𝑖𝑖delimited-[]𝑇Y=\{(i,y_{i}):i\in[T]\}italic_Y = { ( italic_i , italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) : italic_i ∈ [ italic_T ] }, see Figure 3 for an example. Note that for all k{0,,T}𝑘0𝑇k\in\{0,\dots,T\}italic_k ∈ { 0 , … , italic_T }, we have

|XY|=T+kx,y=T2k.iff𝑋𝑌𝑇𝑘𝑥𝑦𝑇2𝑘|X\cup Y|=T+k\iff\langle x,y\rangle=T-2k.| italic_X ∪ italic_Y | = italic_T + italic_k ⇔ ⟨ italic_x , italic_y ⟩ = italic_T - 2 italic_k .

Hence, if we have an algorithm 𝒜𝒜\mathcal{A}caligraphic_A that computes a (1+ε)1𝜀(1+\varepsilon)( 1 + italic_ε )-approximation of |XY|𝑋𝑌|X\cup Y|| italic_X ∪ italic_Y | with probability 2/3232/32 / 3, then we can distinguish between x,yεT𝑥𝑦𝜀𝑇\langle x,y\rangle\geq\varepsilon T⟨ italic_x , italic_y ⟩ ≥ italic_ε italic_T and x,yεT𝑥𝑦𝜀𝑇\langle x,y\rangle\leq-\varepsilon T⟨ italic_x , italic_y ⟩ ≤ - italic_ε italic_T. Setting T=1/ε2𝑇1superscript𝜀2T=1/\varepsilon^{2}italic_T = 1 / italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, we therefore distinguish x,y1/ε=T𝑥𝑦1𝜀𝑇\langle x,y\rangle\geq 1/\varepsilon=\sqrt{T}⟨ italic_x , italic_y ⟩ ≥ 1 / italic_ε = square-root start_ARG italic_T end_ARG and x,y1/ε=T𝑥𝑦1𝜀𝑇\langle x,y\rangle\leq-1/\varepsilon=-\sqrt{T}⟨ italic_x , italic_y ⟩ ≤ - 1 / italic_ε = - square-root start_ARG italic_T end_ARG. Hence, our algorithm 𝒜𝒜\mathcal{A}caligraphic_A solves the Gap-Hamming instance.

Note that the volumes of X𝑋Xitalic_X and Y𝑌Yitalic_Y are fixed (depending only on the length of the vectors x𝑥xitalic_x and y𝑦yitalic_y but not their entries), and thus a volume query does not disclose any information about x𝑥xitalic_x and y𝑦yitalic_y. Each sample or containment query concerns at most one entry of x𝑥xitalic_x or y𝑦yitalic_y. Consequently, any union volume estimation algorithm has to use Ω(T)=Ω(ε2)Ω𝑇Ωsuperscript𝜀2\Omega(T)=\Omega(\varepsilon^{-2})roman_Ω ( italic_T ) = roman_Ω ( italic_ε start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT ) queries to X𝑋Xitalic_X or Y𝑌Yitalic_Y.

\mid1111\mid2222\mid\dots\midi𝑖iitalic_i\mid\dots\midT𝑇Titalic_T\midT+1𝑇1T+1italic_T + 1
Figure 3: The vector x=(+1,1,+1,+1,1,1)𝑥111111x=\left(+1,-1,+1,+1,-1,-1\right)italic_x = ( + 1 , - 1 , + 1 , + 1 , - 1 , - 1 ) represented as the set {(i,xi):i[6]}conditional-set𝑖subscript𝑥𝑖𝑖delimited-[]6\{(i,x_{i}):i\in[6]\}{ ( italic_i , italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) : italic_i ∈ [ 6 ] }, where each point is drawn as a rectangle.

In order to generalize this lower bound for estimating the union of two sets to an Ω(n/ε2)Ω𝑛superscript𝜀2\Omega(n/\varepsilon^{2})roman_Ω ( italic_n / italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) lower bound for estimating the union of n𝑛nitalic_n sets, we need to ensure that the sampled points do not give away too much information about the entries of x𝑥xitalic_x and y𝑦yitalic_y. We apply two obfuscations that jointly ensure a lower bound on the number of queries; see Figure 4. Firstly, we introduce sets X1,,Xnsubscript𝑋1subscript𝑋𝑛X_{1},\ldots,X_{n}italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT whose union is X𝑋Xitalic_X and sets Y1,,Ynsubscript𝑌1subscript𝑌𝑛Y_{1},\ldots,Y_{n}italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_Y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT whose union is Y𝑌Yitalic_Y. Imagine cutting each rectangle in Figure 3 into n𝑛nitalic_n side-by-side pieces and distributing them randomly among X1,,Xnsubscript𝑋1subscript𝑋𝑛X_{1},\ldots,X_{n}italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT; similarly for Y𝑌Yitalic_Y. The idea is that one needs to make Ω(n)Ω𝑛\Omega(n)roman_Ω ( italic_n ) containment queries on a set in order to hit the correct piece. Hence, the effort for revealing one bit in x𝑥xitalic_x or y𝑦yitalic_y is Ω(n)Ω𝑛\Omega(n)roman_Ω ( italic_n ). Secondly, we introduce a large set shared by all Xisubscript𝑋𝑖X_{i}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and Yisubscript𝑌𝑖Y_{i}italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT for i[n]𝑖delimited-[]𝑛i\in[n]italic_i ∈ [ italic_n ]. In Figure 4, this is the long dark-blue rectangle that spans from left to right. This large set intuitively enforces Ω(n)Ω𝑛\Omega(n)roman_Ω ( italic_n ) samples to even obtain a single point that contains any information about x𝑥xitalic_x and y𝑦yitalic_y.

\midn𝑛nitalic_n\mid2n2𝑛2n2 italic_n\mid\ldots\midin𝑖𝑛i\cdot nitalic_i ⋅ italic_n\mid\ldots\midTn𝑇𝑛T\cdot nitalic_T ⋅ italic_n\mid(T+1)n𝑇1𝑛(T+1)\cdot n( italic_T + 1 ) ⋅ italic_n
Figure 4: The vector y𝑦yitalic_y or x=(+1,1,+1,+1,1,1)𝑥111111x=\left(+1,-1,+1,+1,-1,-1\right)italic_x = ( + 1 , - 1 , + 1 , + 1 , - 1 , - 1 ) gives rise to n𝑛nitalic_n polygons; one of these polygons is illustrated in dark blue. The light blue area indicates the union of all these n𝑛nitalic_n polygons.

2 Approximation Algorithm for Klee’s Measure Problem

In this section we give our upper bound for Klee’s measure problem.

See 2

2.1 Preliminaries

In Klee’s measure problem we are given boxes O1,,Onsubscript𝑂1subscript𝑂𝑛O_{1},\dots,O_{n}italic_O start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_O start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT in dsuperscript𝑑\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT. Here, a box is an object of the form Oi=[a1,b1]××[ad,bd]subscript𝑂𝑖subscript𝑎1subscript𝑏1subscript𝑎𝑑subscript𝑏𝑑O_{i}=[a_{1},b_{1}]\times\ldots\times[a_{d},b_{d}]italic_O start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = [ italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] × … × [ italic_a start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ], and as input we are given the coordinates a1,b1,,ad,bdsubscript𝑎1subscript𝑏1subscript𝑎𝑑subscript𝑏𝑑a_{1},b_{1},\ldots,a_{d},b_{d}italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_a start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT of each input box. Throughout this section we assume d𝑑ditalic_d to be constant. Note that given the coordinates of a box, it is easy to compute its side lengths and volume. Throughout, we write VVolume(i=1nOi)𝑉Volumesuperscriptsubscript𝑖1𝑛subscript𝑂𝑖V\coloneqq\operatorname{Volume}(\bigcup_{i=1}^{n}O_{i})italic_V ≔ roman_Volume ( ⋃ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_O start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) for the volume of the union of boxes. We want to approximate V𝑉Vitalic_V up to a factor of 1+ε1𝜀1+\varepsilon1 + italic_ε. Our approach is based on sampling, so now let us introduce the relevant notions.

Recall that Pois(λ)Pois𝜆\mathrm{Pois}(\lambda)roman_Pois ( italic_λ ) is the Poisson distribution with mean and variance λ𝜆\lambdaitalic_λ. It captures the number of active points in a space, under the assumption that active points occur uniformly and independently at random across the space, and that λ𝜆\lambdaitalic_λ points are active on average.

The following definition is usually referred to as a homogeneous Poisson point process at rate p𝑝pitalic_p. Intuitively, we activate each point in space Ud𝑈superscript𝑑U\subset\mathbb{R}^{d}italic_U ⊂ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT independently with “probability density” p𝑝pitalic_p, thus the number of activated points follows the Poisson distribution with mean pVolume(U)𝑝Volume𝑈p\cdot\operatorname{Volume}(U)italic_p ⋅ roman_Volume ( italic_U ).

Definition 1 (p𝑝pitalic_p-sample).

Let Ud𝑈superscript𝑑U\subset\mathbb{R}^{d}italic_U ⊂ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT be a measurable set, and let p[0,1]𝑝01p\in[0,1]italic_p ∈ [ 0 , 1 ]. We say that a random subset SU𝑆𝑈S\subseteq Uitalic_S ⊆ italic_U is a p𝑝pitalic_p-sample of U𝑈Uitalic_U if for any measurable UUsuperscript𝑈𝑈U^{\prime}\subseteq Uitalic_U start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⊆ italic_U we have that |SU|Pois(pVolume(U))similar-to𝑆superscript𝑈Pois𝑝Volumesuperscript𝑈|S\cap U^{\prime}|\sim\mathrm{Pois}(p\cdot\operatorname{Volume}(U^{\prime}))| italic_S ∩ italic_U start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | ∼ roman_Pois ( italic_p ⋅ roman_Volume ( italic_U start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ).

In particular, if S𝑆Sitalic_S is a p𝑝pitalic_p-sample of U𝑈Uitalic_U, then |S|Pois(pVolume(U))similar-to𝑆Pois𝑝Volume𝑈|S|\sim\mathrm{Pois}(p\cdot\operatorname{Volume}(U))| italic_S | ∼ roman_Pois ( italic_p ⋅ roman_Volume ( italic_U ) ). Two more useful properties follow from the definition:

  1. (i)

    For any measurable subset UUsuperscript𝑈𝑈U^{\prime}\subseteq Uitalic_U start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⊆ italic_U, the restriction SU𝑆superscript𝑈S\cap U^{\prime}italic_S ∩ italic_U start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is a p𝑝pitalic_p-sample of Usuperscript𝑈U^{\prime}italic_U start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT.

  2. (ii)

    The union of p𝑝pitalic_p-samples of two disjoint sets U,U𝑈superscript𝑈U,U^{\prime}italic_U , italic_U start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is a p𝑝pitalic_p-sample of UU𝑈superscript𝑈U\cup U^{\prime}italic_U ∪ italic_U start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT.

We will make use of orthogonal range searching. Specifically, we need the query appears(x,i)appears𝑥𝑖\textsc{appears}(x,i)appears ( italic_x , italic_i ), which upon receiving xd𝑥superscript𝑑x\in\mathbb{R}^{d}italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT and i𝑖i\in\mathbb{N}italic_i ∈ blackboard_N returns true if xOiOn𝑥subscript𝑂𝑖subscript𝑂𝑛x\in O_{i}\cup\cdots\cup O_{n}italic_x ∈ italic_O start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∪ ⋯ ∪ italic_O start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT and false otherwise.

Lemma 1.

We can build a data structure in O(nlogd+1n)𝑂𝑛superscript𝑑1𝑛O(n\log^{d+1}n)italic_O ( italic_n roman_log start_POSTSUPERSCRIPT italic_d + 1 end_POSTSUPERSCRIPT italic_n ) time that answers appears(x,i)appears𝑥𝑖\textsc{appears}(x,i)appears ( italic_x , italic_i ) queries in O(logd+1n)𝑂superscript𝑑1𝑛O(\log^{d+1}n)italic_O ( roman_log start_POSTSUPERSCRIPT italic_d + 1 end_POSTSUPERSCRIPT italic_n ) time.

Proof.

For each j[n]𝑗delimited-[]𝑛j\in[n]italic_j ∈ [ italic_n ], map the box Ojdsubscript𝑂𝑗superscript𝑑O_{j}\subset\mathbb{R}^{d}italic_O start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ⊂ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT to a higher-dimensional box

Oj+:=Oj×(,j]d+1.assignsuperscriptsubscript𝑂𝑗subscript𝑂𝑗𝑗superscript𝑑1O_{j}^{+}:=O_{j}\times(-\infty,j]\subset\mathbb{R}^{d+1}.italic_O start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT := italic_O start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT × ( - ∞ , italic_j ] ⊂ blackboard_R start_POSTSUPERSCRIPT italic_d + 1 end_POSTSUPERSCRIPT .

We then apply orthogonal range searching, specifically we build a multi-level segment tree over {O1+,,On+}superscriptsubscript𝑂1superscriptsubscript𝑂𝑛\{O_{1}^{+},\dots,O_{n}^{+}\}{ italic_O start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT , … , italic_O start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT }, which takes O(nlogd+1n)𝑂𝑛superscript𝑑1𝑛O(n\log^{d+1}n)italic_O ( italic_n roman_log start_POSTSUPERSCRIPT italic_d + 1 end_POSTSUPERSCRIPT italic_n ) time; see [16, Section 10.4]. To answer the request appears(x,i)appears𝑥𝑖\textsc{appears}(x,i)appears ( italic_x , italic_i ) where xd𝑥superscript𝑑x\in\mathbb{R}^{d}italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT and i𝑖i\in\mathbb{N}italic_i ∈ blackboard_N, we query the segment tree whether there exists a box Oj+superscriptsubscript𝑂𝑗O_{j}^{+}italic_O start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT that contains the point (x,i)𝑥𝑖(x,i)( italic_x , italic_i ); or phrased differently, whether xOj𝑥subscript𝑂𝑗x\in O_{j}italic_x ∈ italic_O start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT for some ji𝑗𝑖j\geq iitalic_j ≥ italic_i. The query takes only O(logd+1n)𝑂superscript𝑑1𝑛O(\log^{d+1}n)italic_O ( roman_log start_POSTSUPERSCRIPT italic_d + 1 end_POSTSUPERSCRIPT italic_n ) time. ∎

For our main algorithm to work, we need a constant-factor approximation of the volume V𝑉Vitalic_V. It is known that this can be computed in O(n)𝑂𝑛O(n)italic_O ( italic_n ) time [20]. In order stay simple and self-contained, we prove a weaker result by implementing an algorithm of Karp and Luby [19] with the use of appears queries.

Lemma 2 (Adapted from Karp and Luby [19]).

Given the data structure from Lemma 1, there exists an algorithm that computes in time O(nlogd+1n)𝑂𝑛superscript𝑑1𝑛O(n\log^{d+1}n)italic_O ( italic_n roman_log start_POSTSUPERSCRIPT italic_d + 1 end_POSTSUPERSCRIPT italic_n ) a 2-approximation to V𝑉Vitalic_V with probability at least 0.90.90.90.9.

Algorithm 1 Crude volume estimator
  1. 1.

    Compute prefix sums Sj:=i=1jVolume(Oi)assignsubscript𝑆𝑗superscriptsubscript𝑖1𝑗Volumesubscript𝑂𝑖S_{j}:=\sum_{i=1}^{j}\operatorname{Volume}(O_{i})italic_S start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT := ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT roman_Volume ( italic_O start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) for all j{0,,n}𝑗0𝑛j\in\{0,\dots,n\}italic_j ∈ { 0 , … , italic_n }.

  2. 2.

    Initialise counter N:=0assign𝑁0N:=0italic_N := 0.

  3. 3.

    Repeat 40n40𝑛40n40 italic_n times:

    • Sample u[0,1]𝑢01u\in[0,1]italic_u ∈ [ 0 , 1 ] uniformly at random. Binary search for the smallest i𝑖iitalic_i such that uSiSn𝑢subscript𝑆𝑖subscript𝑆𝑛u\leq\frac{S_{i}}{S_{n}}italic_u ≤ divide start_ARG italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_S start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG.

    • Sample xOi𝑥subscript𝑂𝑖x\in O_{i}italic_x ∈ italic_O start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT uniformly at random.

    • Increment N𝑁Nitalic_N if not appears(x𝑥xitalic_x, i+1𝑖1i+1italic_i + 1).

  4. 4.

    Output V~:=N40nSnassign~𝑉𝑁40𝑛subscript𝑆𝑛\tilde{V}:=\frac{N}{40n}\cdot S_{n}over~ start_ARG italic_V end_ARG := divide start_ARG italic_N end_ARG start_ARG 40 italic_n end_ARG ⋅ italic_S start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT.

Proof.

We claim that Algorithm 1 has the desired properties. The time bound is easy to see: The computation of the prefix sums takes O(n)𝑂𝑛O(n)italic_O ( italic_n ) time. In each iteration, binary searching for i𝑖iitalic_i costs O(logn)𝑂𝑛O(\log n)italic_O ( roman_log italic_n ) time, sampling of x𝑥xitalic_x costs O(1)𝑂1O(1)italic_O ( 1 ) time, and calling appears takes O(logd+1n)𝑂superscript𝑑1𝑛O(\log^{d+1}n)italic_O ( roman_log start_POSTSUPERSCRIPT italic_d + 1 end_POSTSUPERSCRIPT italic_n ) time. So in total we spend O(nlogd+1n)𝑂𝑛superscript𝑑1𝑛O(n\log^{d+1}n)italic_O ( italic_n roman_log start_POSTSUPERSCRIPT italic_d + 1 end_POSTSUPERSCRIPT italic_n ) time.

For the correctness argument, we define two sets

P𝑃\displaystyle Pitalic_P :={(i,x):i[n],xOi},assignabsentconditional-set𝑖𝑥formulae-sequence𝑖delimited-[]𝑛𝑥subscript𝑂𝑖\displaystyle:=\{(i,x)\;:\;i\in[n],\,x\in O_{i}\},:= { ( italic_i , italic_x ) : italic_i ∈ [ italic_n ] , italic_x ∈ italic_O start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } ,
Q𝑄\displaystyle Qitalic_Q :={(i,x):i[n],xOi(Oi+1On)}.assignabsentconditional-set𝑖𝑥formulae-sequence𝑖delimited-[]𝑛𝑥subscript𝑂𝑖subscript𝑂𝑖1subscript𝑂𝑛\displaystyle:=\{(i,x)\;:\;i\in[n],\,x\in O_{i}\setminus(O_{i+1}\cup\ldots\cup O% _{n})\}.:= { ( italic_i , italic_x ) : italic_i ∈ [ italic_n ] , italic_x ∈ italic_O start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∖ ( italic_O start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT ∪ … ∪ italic_O start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) } .

Consider an iteration in step 3. For any fixed value j[n]𝑗delimited-[]𝑛j\in[n]italic_j ∈ [ italic_n ], we have

Pr(i=j)=Pr(Sj1Sn<uSjSn)=SjSj1Sn=Volume(Oj)Sn.Pr𝑖𝑗Prsubscript𝑆𝑗1subscript𝑆𝑛𝑢subscript𝑆𝑗subscript𝑆𝑛subscript𝑆𝑗subscript𝑆𝑗1subscript𝑆𝑛Volumesubscript𝑂𝑗subscript𝑆𝑛\Pr(i=j)=\Pr\left(\frac{S_{j-1}}{S_{n}}<u\leq\frac{S_{j}}{S_{n}}\right)=\frac{% S_{j}-S_{j-1}}{S_{n}}=\frac{\operatorname{Volume}(O_{j})}{S_{n}}.roman_Pr ( italic_i = italic_j ) = roman_Pr ( divide start_ARG italic_S start_POSTSUBSCRIPT italic_j - 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_S start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG < italic_u ≤ divide start_ARG italic_S start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG start_ARG italic_S start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG ) = divide start_ARG italic_S start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT - italic_S start_POSTSUBSCRIPT italic_j - 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_S start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG = divide start_ARG roman_Volume ( italic_O start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) end_ARG start_ARG italic_S start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG .

With this we can calculate the probability that the counter N𝑁Nitalic_N increments in this iteration:

Pr((i,x)Q)Pr𝑖𝑥𝑄\displaystyle\Pr((i,x)\in Q)roman_Pr ( ( italic_i , italic_x ) ∈ italic_Q ) =j=1nPr((i,x)Qi=j)Pr(i=j)absentsuperscriptsubscript𝑗1𝑛Pr𝑖𝑥conditional𝑄𝑖𝑗Pr𝑖𝑗\displaystyle=\sum_{j=1}^{n}\Pr((i,x)\in Q\mid i=j)\cdot\Pr(i=j)= ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT roman_Pr ( ( italic_i , italic_x ) ∈ italic_Q ∣ italic_i = italic_j ) ⋅ roman_Pr ( italic_i = italic_j )
=j=1nVolume(Oj(Oj+1On))Volume(Oj)Volume(Oj)Sn=VSn.absentsuperscriptsubscript𝑗1𝑛Volumesubscript𝑂𝑗subscript𝑂𝑗1subscript𝑂𝑛Volumesubscript𝑂𝑗Volumesubscript𝑂𝑗subscript𝑆𝑛𝑉subscript𝑆𝑛\displaystyle=\sum_{j=1}^{n}\frac{\operatorname{Volume}(O_{j}\setminus(O_{j+1}% \cup\ldots\cup O_{n}))}{\operatorname{Volume}(O_{j})}\cdot\frac{\operatorname{% Volume}(O_{j})}{S_{n}}=\frac{V}{S_{n}}.= ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT divide start_ARG roman_Volume ( italic_O start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∖ ( italic_O start_POSTSUBSCRIPT italic_j + 1 end_POSTSUBSCRIPT ∪ … ∪ italic_O start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ) end_ARG start_ARG roman_Volume ( italic_O start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) end_ARG ⋅ divide start_ARG roman_Volume ( italic_O start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) end_ARG start_ARG italic_S start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG = divide start_ARG italic_V end_ARG start_ARG italic_S start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG .

Since all iterations are independent, at the end of the algorithm we have NBin(40n,V/Sn)similar-to𝑁Bin40𝑛𝑉subscript𝑆𝑛N\sim\mathrm{Bin}(40n,V/S_{n})italic_N ∼ roman_Bin ( 40 italic_n , italic_V / italic_S start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ). Hence V~~𝑉\widetilde{V}over~ start_ARG italic_V end_ARG is an unbiased estimator for V𝑉Vitalic_V.

To analyse deviation, we observe that Vmaxi=1nVolume(Oi)Sn/n𝑉superscriptsubscript𝑖1𝑛Volumesubscript𝑂𝑖subscript𝑆𝑛𝑛V\geq\max_{i=1}^{n}\operatorname{Volume}(O_{i})\geq S_{n}/nitalic_V ≥ roman_max start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT roman_Volume ( italic_O start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ≥ italic_S start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT / italic_n. Therefore, 𝔼[N]=40nV/Sn40𝔼delimited-[]𝑁40𝑛𝑉subscript𝑆𝑛40\mathbb{E}[N]=40nV/S_{n}\geq 40blackboard_E [ italic_N ] = 40 italic_n italic_V / italic_S start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ≥ 40. By Chebyshev and as 𝔼[N]Var[N]𝔼delimited-[]𝑁Vardelimited-[]𝑁\mathbb{E}[N]\geq\mathrm{Var}[N]blackboard_E [ italic_N ] ≥ roman_Var [ italic_N ], we have

Pr(|N𝔼[N]|𝔼[N]2)4Var[N](𝔼[N])24𝔼[N]0.1.Pr𝑁𝔼delimited-[]𝑁𝔼delimited-[]𝑁24Vardelimited-[]𝑁superscript𝔼delimited-[]𝑁24𝔼delimited-[]𝑁0.1\Pr\left(|N-\mathbb{E}[N]|\geq\frac{\mathbb{E}[N]}{2}\right)\leq\frac{4\mathrm% {Var}[N]}{(\mathbb{E}[N])^{2}}\leq\frac{4}{\mathbb{E}[N]}\leq 0.1.roman_Pr ( | italic_N - blackboard_E [ italic_N ] | ≥ divide start_ARG blackboard_E [ italic_N ] end_ARG start_ARG 2 end_ARG ) ≤ divide start_ARG 4 roman_V roman_a roman_r [ italic_N ] end_ARG start_ARG ( blackboard_E [ italic_N ] ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ≤ divide start_ARG 4 end_ARG start_ARG blackboard_E [ italic_N ] end_ARG ≤ 0.1 .

That is, with probability at least 0.90.90.90.9 the output V~~𝑉\widetilde{V}over~ start_ARG italic_V end_ARG is a 2-approximation to V𝑉Vitalic_V. ∎

2.2 Classifying Boxes by Shapes

As our first step in the algorithm, we classify boxes by their shapes.

Definition 2.

Let L1,,Ldsubscript𝐿1subscript𝐿𝑑L_{1},\ldots,L_{d}\in\mathbb{Z}italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_L start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ∈ blackboard_Z. We say that a box Od𝑂superscript𝑑O\subset\mathbb{R}^{d}italic_O ⊂ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT is of type (L1,,Ld)subscript𝐿1subscript𝐿𝑑(L_{1},\dots,L_{d})( italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_L start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) if its side length in dimension i𝑖iitalic_i is contained in [2Li,2Li+1)superscript2subscript𝐿𝑖superscript2subscript𝐿𝑖1[2^{L_{i}},2^{L_{i}+1})[ 2 start_POSTSUPERSCRIPT italic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT , 2 start_POSTSUPERSCRIPT italic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + 1 end_POSTSUPERSCRIPT ), for each i[d]𝑖delimited-[]𝑑i\in[d]italic_i ∈ [ italic_d ].

Using this definition, we partition the input boxes O1,,Onsubscript𝑂1subscript𝑂𝑛O_{1},\dots,O_{n}italic_O start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_O start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT into classes C1,,Cmsubscript𝐶1subscript𝐶𝑚C_{1},\dots,C_{m}italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_C start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT such that each class corresponds to one type of boxes. We will fix this notation throughout. For each t[m]𝑡delimited-[]𝑚t\in[m]italic_t ∈ [ italic_m ], let us also define Ut:=OCtOdassignsubscript𝑈𝑡subscript𝑂subscript𝐶𝑡𝑂superscript𝑑U_{t}:=\bigcup_{O\in C_{t}}O\subseteq\mathbb{R}^{d}italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT := ⋃ start_POSTSUBSCRIPT italic_O ∈ italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_O ⊆ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, namely the union of boxes in class Ctsubscript𝐶𝑡C_{t}italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT.

Similar to appears, we can answer queries of the form: Is a given point xd𝑥superscript𝑑x\in\mathbb{R}^{d}italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT contained in Utsubscript𝑈𝑡U_{t}italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT? We call this an inClass(x,t)inClass𝑥𝑡\textsc{inClass}(x,t)inClass ( italic_x , italic_t ) query.

Lemma 3.

We can build a data structure in O(nlogd+1(n))𝑂𝑛superscript𝑑1𝑛O(n\log^{d+1}(n))italic_O ( italic_n roman_log start_POSTSUPERSCRIPT italic_d + 1 end_POSTSUPERSCRIPT ( italic_n ) ) time that answers inClass(x,t)inClass𝑥𝑡\textsc{inClass}(x,t)inClass ( italic_x , italic_t ) queries in O(logd+1n)𝑂superscript𝑑1𝑛O(\log^{d+1}n)italic_O ( roman_log start_POSTSUPERSCRIPT italic_d + 1 end_POSTSUPERSCRIPT italic_n ) time.

Proof.

Similar to the proof of Lemma 1, we transform each OiCtsubscript𝑂𝑖subscript𝐶𝑡O_{i}\in C_{t}italic_O start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT to a higher-dimensional box

Oi×{t}d+1subscript𝑂𝑖𝑡superscript𝑑1O_{i}\times\{t\}\subset\mathbb{R}^{d+1}italic_O start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT × { italic_t } ⊂ blackboard_R start_POSTSUPERSCRIPT italic_d + 1 end_POSTSUPERSCRIPT

and build a multi-level segment tree on top. The query inClass(x,t)inClass𝑥𝑡\textsc{inClass}(x,t)inClass ( italic_x , italic_t ) is thus implemented by querying the point x×{t}d+1𝑥𝑡superscript𝑑1x\times\{t\}\in\mathbb{R}^{d+1}italic_x × { italic_t } ∈ blackboard_R start_POSTSUPERSCRIPT italic_d + 1 end_POSTSUPERSCRIPT in the segment tree. ∎

Sampling from a class

The next lemma shows that we can obtain a p𝑝pitalic_p-sample of any Utsubscript𝑈𝑡U_{t}italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT efficiently by rejection sampling.

Lemma 4.

Given t[m]𝑡delimited-[]𝑚t\in[m]italic_t ∈ [ italic_m ], p[0,1]𝑝01p\in[0,1]italic_p ∈ [ 0 , 1 ] and the data structure from Lemma 3, one can generate a p𝑝pitalic_p-sample of Utsubscript𝑈𝑡U_{t}italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT in expected time O(|Ct|log|Ct|+pVolume(Ut)logd+1n)𝑂subscript𝐶𝑡subscript𝐶𝑡𝑝Volumesubscript𝑈𝑡superscript𝑑1𝑛O(|C_{t}|\log|C_{t}|+p\cdot\operatorname{Volume}(U_{t})\cdot\log^{d+1}n)italic_O ( | italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | roman_log | italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | + italic_p ⋅ roman_Volume ( italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ⋅ roman_log start_POSTSUPERSCRIPT italic_d + 1 end_POSTSUPERSCRIPT italic_n ).

Proof.

Write (L1,,Ld)subscript𝐿1subscript𝐿𝑑(L_{1},\dots,L_{d})( italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_L start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) for the type corresponding to class Ctsubscript𝐶𝑡C_{t}italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. We subdivide dsuperscript𝑑\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT into the grid

𝒢{[i1 2L1,(i1+1) 2L1)××[id 2Ld,(id+1) 2Ld)i1,,id}.subscript𝒢conditional-setsubscript𝑖1superscript2subscript𝐿1subscript𝑖11superscript2subscript𝐿1subscript𝑖𝑑superscript2subscript𝐿𝑑subscript𝑖𝑑1superscript2subscript𝐿𝑑subscript𝑖1subscript𝑖𝑑\mathcal{G}_{\infty}\coloneqq\{[i_{1}\,2^{L_{1}},(i_{1}+1)\,2^{L_{1}})\times% \dots\times[i_{d}\,2^{L_{d}},(i_{d}+1)\,2^{L_{d}})\mid i_{1},\dots,i_{d}\in% \mathbb{Z}\}.caligraphic_G start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ≔ { [ italic_i start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT 2 start_POSTSUPERSCRIPT italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT , ( italic_i start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + 1 ) 2 start_POSTSUPERSCRIPT italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) × ⋯ × [ italic_i start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT 2 start_POSTSUPERSCRIPT italic_L start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT end_POSTSUPERSCRIPT , ( italic_i start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT + 1 ) 2 start_POSTSUPERSCRIPT italic_L start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) ∣ italic_i start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_i start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ∈ blackboard_Z } .

We call each element of 𝒢subscript𝒢\mathcal{G}_{\infty}caligraphic_G start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT a cell. Let 𝒢{G𝒢GUt}𝒢conditional-set𝐺subscript𝒢𝐺subscript𝑈𝑡\mathcal{G}\coloneqq\{G\in\mathcal{G}_{\infty}\mid G\cap U_{t}\neq\emptyset\}caligraphic_G ≔ { italic_G ∈ caligraphic_G start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ∣ italic_G ∩ italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ≠ ∅ } be the set of cells that have a non-empty intersection with Utsubscript𝑈𝑡U_{t}italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. Write U:=G𝒢Gassign𝑈subscript𝐺𝒢𝐺U:=\bigcup_{G\in\mathcal{G}}Gitalic_U := ⋃ start_POSTSUBSCRIPT italic_G ∈ caligraphic_G end_POSTSUBSCRIPT italic_G.

First we create a p𝑝pitalic_p-sample S𝑆Sitalic_S of U𝑈Uitalic_U as follows. Generate KPois(pVolume(U))similar-to𝐾Pois𝑝Volume𝑈K\sim\mathrm{Pois}(p\cdot\operatorname{Volume}(U))italic_K ∼ roman_Pois ( italic_p ⋅ roman_Volume ( italic_U ) ), which determines the number of points we are going to sample. Then sample K𝐾Kitalic_K points uniformly at random from U𝑈Uitalic_U by repeating the following step K𝐾Kitalic_K times: Select a cell G𝒢𝐺𝒢G\in\mathcal{G}italic_G ∈ caligraphic_G uniformly at random and then sample a point from G𝐺Gitalic_G uniformly at random. The sampled points constitute our set S𝑆Sitalic_S.

Next we compute SUt𝑆subscript𝑈𝑡S\cap U_{t}italic_S ∩ italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT: For each xS𝑥𝑆x\in Sitalic_x ∈ italic_S, we query inClass(x,t)inClass𝑥𝑡\textsc{inClass}(x,t)inClass ( italic_x , italic_t ); if the answer is true then we keep x𝑥xitalic_x, otherwise we discard it. The resulting set SUt𝑆subscript𝑈𝑡S\cap U_{t}italic_S ∩ italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is a p𝑝pitalic_p-sample of Utsubscript𝑈𝑡U_{t}italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, since restricting to a fixed subset preserves the p𝑝pitalic_p-sample property.

Before we analyze the running time, we show that Utsubscript𝑈𝑡U_{t}italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT makes up a decent proportion of U𝑈Uitalic_U. Recall that every box in class Ctsubscript𝐶𝑡C_{t}italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is of type (L1,,Ld)subscript𝐿1subscript𝐿𝑑(L_{1},\dots,L_{d})( italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_L start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ). In any dimension k[d]𝑘delimited-[]𝑑k\in[d]italic_k ∈ [ italic_d ], one projected box from Ctsubscript𝐶𝑡C_{t}italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT can intersect at most three projected cells from 𝒢𝒢\mathcal{G}caligraphic_G. So each box from Ctsubscript𝐶𝑡C_{t}italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT intersects at most 3dsuperscript3𝑑3^{d}3 start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT cells from 𝒢𝒢\mathcal{G}caligraphic_G, implying that |𝒢|3d|Ct|𝒢superscript3𝑑subscript𝐶𝑡|\mathcal{G}|\leq 3^{d}\,|C_{t}|| caligraphic_G | ≤ 3 start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT | italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT |. Moreover, since the volume of any cell is at most the volume of a box in Ctsubscript𝐶𝑡C_{t}italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, we have Volume(U)3dVolume(Ut)Volume𝑈superscript3𝑑Volumesubscript𝑈𝑡\operatorname{Volume}(U)\leq 3^{d}\,\operatorname{Volume}(U_{t})roman_Volume ( italic_U ) ≤ 3 start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT roman_Volume ( italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ).

Regarding the running time, recall that we assume d𝑑ditalic_d to be constant and hence drop factors only depending on d𝑑ditalic_d. The computation of 𝒢𝒢\mathcal{G}caligraphic_G takes O(|𝒢|log|𝒢|)O(|Ct|log|Ct|)𝑂𝒢𝒢𝑂subscript𝐶𝑡subscript𝐶𝑡O(|\mathcal{G}|\log|\mathcal{G}|)\subseteq O(|C_{t}|\log|C_{t}|)italic_O ( | caligraphic_G | roman_log | caligraphic_G | ) ⊆ italic_O ( | italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | roman_log | italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | ) time. The remaining time is dominated by the inClass queries. The expected size of S𝑆Sitalic_S is pVolume(U)3dpVolume(Ut)𝑝Volume𝑈superscript3𝑑𝑝Volumesubscript𝑈𝑡p\cdot\operatorname{Volume}(U)\leq 3^{d}\,p\,\operatorname{Volume}(U_{t})italic_p ⋅ roman_Volume ( italic_U ) ≤ 3 start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT italic_p roman_Volume ( italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ). As we query the data structure from Lemma 1 once for each point of S𝑆Sitalic_S, the expected time of the inClass queries is O(pVolume(Ut)logd+1n)𝑂𝑝Volumesubscript𝑈𝑡superscript𝑑1𝑛O(p\cdot\operatorname{Volume}(U_{t})\cdot\log^{d+1}n)italic_O ( italic_p ⋅ roman_Volume ( italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ⋅ roman_log start_POSTSUPERSCRIPT italic_d + 1 end_POSTSUPERSCRIPT italic_n ). ∎

Classes do not overlap much

We show the following interesting property of classes, that the sum of their volumes is within a polylogarithmic factor of the total volume V𝑉Vitalic_V.

Lemma 5.

We have that t=1mVolume(Ut)23d+1logd(n)Vsuperscriptsubscript𝑡1𝑚Volumesubscript𝑈𝑡superscript23𝑑1superscript𝑑𝑛𝑉\sum_{t=1}^{m}\operatorname{Volume}(U_{t})\leq 2^{3d+1}\log^{d}(n)\cdot V∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT roman_Volume ( italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ≤ 2 start_POSTSUPERSCRIPT 3 italic_d + 1 end_POSTSUPERSCRIPT roman_log start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ( italic_n ) ⋅ italic_V.

We later use this property to draw p𝑝pitalic_p-samples from i=1nOi=t=1mUtsuperscriptsubscript𝑖1𝑛subscript𝑂𝑖superscriptsubscript𝑡1𝑚subscript𝑈𝑡\bigcup_{i=1}^{n}O_{i}=\bigcup_{t=1}^{m}U_{t}⋃ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_O start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ⋃ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT efficiently. To show this property, we first need some simple definitions and observations.

Definition 3.

We call classes of type (L1,,Ld)subscript𝐿1subscript𝐿𝑑(L_{1},\dots,L_{d})( italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_L start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) and (L1,,Ld)superscriptsubscript𝐿1superscriptsubscript𝐿𝑑(L_{1}^{\prime},\dots,L_{d}^{\prime})( italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , … , italic_L start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) similar if for all k[d]𝑘delimited-[]𝑑k\in[d]italic_k ∈ [ italic_d ] we have 2|LkLk|<n4superscript2subscript𝐿𝑘subscriptsuperscript𝐿𝑘superscript𝑛42^{|L_{k}-L^{\prime}_{k}|}<n^{4}2 start_POSTSUPERSCRIPT | italic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_L start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | end_POSTSUPERSCRIPT < italic_n start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT. Otherwise we call them dissimilar.

Observation 1.

Every class is similar to at most 8dlogdnsuperscript8𝑑superscript𝑑𝑛8^{d}\log^{d}n8 start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT roman_log start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT italic_n classes.

Proof.

Fix a type (L1,,Ld)subscript𝐿1subscript𝐿𝑑(L_{1},\dots,L_{d})( italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_L start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ). For each k[d]𝑘delimited-[]𝑑k\in[d]italic_k ∈ [ italic_d ], there are at most 8logn8𝑛8\log n8 roman_log italic_n many integers Lksuperscriptsubscript𝐿𝑘L_{k}^{\prime}italic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT such that 2|LkLk|<n4superscript2subscript𝐿𝑘superscriptsubscript𝐿𝑘superscript𝑛42^{|L_{k}-L_{k}^{\prime}|}<n^{4}2 start_POSTSUPERSCRIPT | italic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | end_POSTSUPERSCRIPT < italic_n start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT. ∎

Observation 2.

Let O𝑂Oitalic_O and Osuperscript𝑂O^{\prime}italic_O start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT be boxes in dissimilar classes, then Volume(OO)2V/n4Volume𝑂superscript𝑂2𝑉superscript𝑛4\operatorname{Volume}(O\cap O^{\prime})\leq 2V/n^{4}roman_Volume ( italic_O ∩ italic_O start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ≤ 2 italic_V / italic_n start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT.

Proof.

Let (L1,,Ld)subscript𝐿1subscript𝐿𝑑(L_{1},\dots,L_{d})( italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_L start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) be the type of O𝑂Oitalic_O, and (L1,,Ld)subscriptsuperscript𝐿1subscriptsuperscript𝐿𝑑(L^{\prime}_{1},\dots,L^{\prime}_{d})( italic_L start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_L start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) be the type of Osuperscript𝑂O^{\prime}italic_O start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. Since the boxes belong to dissimilar classes, there is a dimension k[d]𝑘delimited-[]𝑑k\in[d]italic_k ∈ [ italic_d ] such that 2|LkLk|n4superscript2subscript𝐿𝑘superscriptsubscript𝐿𝑘superscript𝑛42^{|L_{k}-L_{k}^{\prime}|}\geq n^{4}2 start_POSTSUPERSCRIPT | italic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | end_POSTSUPERSCRIPT ≥ italic_n start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT. Without loss of generality, assume 2LkLkn4superscript2subscript𝐿𝑘superscriptsubscript𝐿𝑘superscript𝑛42^{L_{k}-L_{k}^{\prime}}\geq n^{4}2 start_POSTSUPERSCRIPT italic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ≥ italic_n start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT; the other case is symmetric. Let [ak,bk]subscript𝑎𝑘subscript𝑏𝑘[a_{k},b_{k}][ italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ] and [ak,bk]superscriptsubscript𝑎𝑘superscriptsubscript𝑏𝑘[a_{k}^{\prime},b_{k}^{\prime}][ italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ] be the intervals resulting from projecting the boxes O𝑂Oitalic_O and Osuperscript𝑂O^{\prime}italic_O start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT onto dimension k𝑘kitalic_k, respectively. Note that bkak[2Lk,2Lk+1)subscript𝑏𝑘subscript𝑎𝑘superscript2subscript𝐿𝑘superscript2subscript𝐿𝑘1b_{k}-a_{k}\in[2^{L_{k}},2^{L_{k}+1})italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∈ [ 2 start_POSTSUPERSCRIPT italic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT , 2 start_POSTSUPERSCRIPT italic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + 1 end_POSTSUPERSCRIPT ) and bkak[2Lk,2Lk+1)superscriptsubscript𝑏𝑘superscriptsubscript𝑎𝑘superscript2superscriptsubscript𝐿𝑘superscript2superscriptsubscript𝐿𝑘1b_{k}^{\prime}-a_{k}^{\prime}\in[2^{L_{k}^{\prime}},2^{L_{k}^{\prime}+1})italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ [ 2 start_POSTSUPERSCRIPT italic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT , 2 start_POSTSUPERSCRIPT italic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT + 1 end_POSTSUPERSCRIPT ). So we have bkakbkak2Lk(Lk+1)n4/2subscript𝑏𝑘subscript𝑎𝑘subscriptsuperscript𝑏𝑘subscriptsuperscript𝑎𝑘superscript2subscript𝐿𝑘superscriptsubscript𝐿𝑘1superscript𝑛42\frac{b_{k}-a_{k}}{b^{\prime}_{k}-a^{\prime}_{k}}\geq 2^{L_{k}-(L_{k}^{\prime}% +1)}\geq n^{4}/2divide start_ARG italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG italic_b start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_a start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG ≥ 2 start_POSTSUPERSCRIPT italic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - ( italic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT + 1 ) end_POSTSUPERSCRIPT ≥ italic_n start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT / 2. In other words, at most a 2/n42superscript𝑛42/n^{4}2 / italic_n start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT fraction of the interval [ak,bk]subscript𝑎𝑘subscript𝑏𝑘[a_{k},b_{k}][ italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ] intersects the interval [ak,bk]subscriptsuperscript𝑎𝑘subscriptsuperscript𝑏𝑘[a^{\prime}_{k},b^{\prime}_{k}][ italic_a start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_b start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ]. Hence,

Volume(OO)Volume(O)2/n42V/n4.Volume𝑂superscript𝑂Volume𝑂2superscript𝑛42𝑉superscript𝑛4\operatorname{Volume}(O\cap O^{\prime})\leq\operatorname{Volume}(O)\cdot 2/n^{% 4}\leq 2V/n^{4}.\qedroman_Volume ( italic_O ∩ italic_O start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ≤ roman_Volume ( italic_O ) ⋅ 2 / italic_n start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT ≤ 2 italic_V / italic_n start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT . italic_∎

We are now ready to prove Lemma 5.

Proof of Lemma 5.

Without loss of generality assume Volume(U1)Volume(Um)Volumesubscript𝑈1Volumesubscript𝑈𝑚\operatorname{Volume}(U_{1})\geq\cdots\geq\operatorname{Volume}(U_{m})roman_Volume ( italic_U start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ≥ ⋯ ≥ roman_Volume ( italic_U start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ). We construct a set of indices T[m]𝑇delimited-[]𝑚T\subseteq[m]italic_T ⊆ [ italic_m ] by the following procedure:

  • Initially T=𝑇T=\emptysetitalic_T = ∅.

  • For t=1,,m𝑡1𝑚t=1,\dots,mitalic_t = 1 , … , italic_m, if Ctsubscript𝐶𝑡C_{t}italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and Cssubscript𝐶𝑠C_{s}italic_C start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT are dissimilar for all sT𝑠𝑇s\in Titalic_s ∈ italic_T, then add t𝑡titalic_t to T𝑇Titalic_T.

We have tT𝑡𝑇t\not\in Titalic_t ∉ italic_T for some t[m]𝑡delimited-[]𝑚t\in[m]italic_t ∈ [ italic_m ] only if there exists an sT𝑠𝑇s\in Titalic_s ∈ italic_T such that Cs,Ctsubscript𝐶𝑠subscript𝐶𝑡C_{s},C_{t}italic_C start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT , italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT are similar and Volume(Us)Volume(Ut)Volumesubscript𝑈𝑠Volumesubscript𝑈𝑡\operatorname{Volume}(U_{s})\geq\operatorname{Volume}(U_{t})roman_Volume ( italic_U start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ) ≥ roman_Volume ( italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ); we thus call s𝑠sitalic_s a witness of t𝑡titalic_t. If multiple witnesses exist, then we pick an arbitrary one. Conversely, every sT𝑠𝑇s\in Titalic_s ∈ italic_T can be a witness at most 8dlogdnsuperscript8𝑑superscript𝑑𝑛8^{d}\log^{d}n8 start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT roman_log start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT italic_n times by Observation 1. Hence

t=1mVolume(Ut)8dlogd(n)tTVolume(Ut).superscriptsubscript𝑡1𝑚Volumesubscript𝑈𝑡superscript8𝑑superscript𝑑𝑛subscript𝑡𝑇Volumesubscript𝑈𝑡\sum_{t=1}^{m}\operatorname{Volume}(U_{t})\leq 8^{d}\log^{d}(n)\cdot\sum_{t\in T% }\operatorname{Volume}(U_{t}).∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT roman_Volume ( italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ≤ 8 start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT roman_log start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ( italic_n ) ⋅ ∑ start_POSTSUBSCRIPT italic_t ∈ italic_T end_POSTSUBSCRIPT roman_Volume ( italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) . (1)

It remains to bound tTVolume(Ut)subscript𝑡𝑇Volumesubscript𝑈𝑡\sum_{t\in T}\operatorname{Volume}(U_{t})∑ start_POSTSUBSCRIPT italic_t ∈ italic_T end_POSTSUBSCRIPT roman_Volume ( italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ). Consider any distinct s,tT𝑠𝑡𝑇s,t\in Titalic_s , italic_t ∈ italic_T. By construction, Cssubscript𝐶𝑠C_{s}italic_C start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT and Ctsubscript𝐶𝑡C_{t}italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT are dissimilar; and each class contains at most n𝑛nitalic_n boxes. So Volume(UsUt)n2(2V/n4)=2V/n2Volumesubscript𝑈𝑠subscript𝑈𝑡superscript𝑛22𝑉superscript𝑛42𝑉superscript𝑛2\operatorname{Volume}(U_{s}\cap U_{t})\leq n^{2}\cdot(2V/n^{4})=2V/n^{2}roman_Volume ( italic_U start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ∩ italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ≤ italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⋅ ( 2 italic_V / italic_n start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT ) = 2 italic_V / italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT by Observation 2. Using this and inclusion-exclusion, we bound

tTVolume(Ut)subscript𝑡𝑇Volumesubscript𝑈𝑡\displaystyle\sum_{t\in T}\operatorname{Volume}(U_{t})∑ start_POSTSUBSCRIPT italic_t ∈ italic_T end_POSTSUBSCRIPT roman_Volume ( italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) Volume(tTUt)+{s,t}TVolume(UsUt)absentVolumesubscript𝑡𝑇subscript𝑈𝑡subscript𝑠𝑡𝑇Volumesubscript𝑈𝑠subscript𝑈𝑡\displaystyle\leq\operatorname{Volume}\left(\bigcup_{t\in T}U_{t}\right)+\sum_% {\{s,t\}\subseteq T}\operatorname{Volume}(U_{s}\cap U_{t})≤ roman_Volume ( ⋃ start_POSTSUBSCRIPT italic_t ∈ italic_T end_POSTSUBSCRIPT italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) + ∑ start_POSTSUBSCRIPT { italic_s , italic_t } ⊆ italic_T end_POSTSUBSCRIPT roman_Volume ( italic_U start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ∩ italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT )
V+(m2)2Vn2absent𝑉binomial𝑚22𝑉superscript𝑛2\displaystyle\leq V+\binom{m}{2}\,\frac{2V}{n^{2}}≤ italic_V + ( FRACOP start_ARG italic_m end_ARG start_ARG 2 end_ARG ) divide start_ARG 2 italic_V end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG
2V.absent2𝑉\displaystyle\leq 2V.≤ 2 italic_V .

Plugging this into the right-hand side of Expression (1), we obtain the lemma statement. ∎

2.3 Joining the Classes

Recall that C1,,Cmsubscript𝐶1subscript𝐶𝑚C_{1},\dots,C_{m}italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_C start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT are the classes of the input boxes and U1,,Umsubscript𝑈1subscript𝑈𝑚U_{1},\dots,U_{m}italic_U start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_U start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT their respective unions. Assume without loss of generality that the boxes are ordered in accordance with the class ordering, that is, C1={O1,,Oi1}subscript𝐶1subscript𝑂1subscript𝑂subscript𝑖1C_{1}=\{O_{1},\cdots,O_{i_{1}}\}italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = { italic_O start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_O start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT } form the first class, C2={Oi1+1,,Oi2}subscript𝐶2subscript𝑂subscript𝑖11subscript𝑂subscript𝑖2C_{2}=\{O_{i_{1}+1},\cdots,O_{i_{2}}\}italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = { italic_O start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + 1 end_POSTSUBSCRIPT , ⋯ , italic_O start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT } form the second class, and so on. More formally, we ensure that Ct={Oit1+1,,Oit}subscript𝐶𝑡subscript𝑂subscript𝑖𝑡11subscript𝑂subscript𝑖𝑡C_{t}=\{O_{i_{t-1}+1},\ldots,O_{i_{t}}\}italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = { italic_O start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT + 1 end_POSTSUBSCRIPT , … , italic_O start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT } for 0=i0<i1<<im=n0subscript𝑖0subscript𝑖1subscript𝑖𝑚𝑛0=i_{0}<i_{1}<\ldots<i_{m}=n0 = italic_i start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT < italic_i start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT < … < italic_i start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT = italic_n.

Let Dt:=Ut(s=t+1mUs)assignsubscript𝐷𝑡subscript𝑈𝑡superscriptsubscript𝑠𝑡1𝑚subscript𝑈𝑠D_{t}:=U_{t}\setminus(\bigcup_{s=t+1}^{m}U_{s})italic_D start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT := italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∖ ( ⋃ start_POSTSUBSCRIPT italic_s = italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT italic_U start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ) be the points in Utsubscript𝑈𝑡U_{t}italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT that are not contained in later classes. Note that D1,,Dmsubscript𝐷1subscript𝐷𝑚D_{1},\dots,D_{m}italic_D start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_D start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT is a partition of t=1mUt=i=1nOisuperscriptsubscript𝑡1𝑚subscript𝑈𝑡superscriptsubscript𝑖1𝑛subscript𝑂𝑖\bigcup_{t=1}^{m}U_{t}=\bigcup_{i=1}^{n}O_{i}⋃ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = ⋃ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_O start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Hence, to generate a p𝑝pitalic_p-sample of i=1nOisuperscriptsubscript𝑖1𝑛subscript𝑂𝑖\bigcup_{i=1}^{n}O_{i}⋃ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_O start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, it suffices to draw p𝑝pitalic_p-samples from each Dtsubscript𝐷𝑡D_{t}italic_D start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and then take their union.444This idea has previously been used on objects, by considering the difference Di:=Oi(j=i+1nOj)assignsubscriptsuperscript𝐷𝑖subscript𝑂𝑖superscriptsubscript𝑗𝑖1𝑛subscript𝑂𝑗D^{\prime}_{i}:=O_{i}\setminus(\bigcup_{j=i+1}^{n}O_{j})italic_D start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT := italic_O start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∖ ( ⋃ start_POSTSUBSCRIPT italic_j = italic_i + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_O start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) [19, 26], while we use this idea on classes. To this end, we draw a p𝑝pitalic_p-sample Stsubscript𝑆𝑡S_{t}italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT from Utsubscript𝑈𝑡U_{t}italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT via Lemma 4. Then we remove all xSt𝑥subscript𝑆𝑡x\in S_{t}italic_x ∈ italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT for which appears(x,it+1)=trueappears𝑥subscript𝑖𝑡1true\textsc{appears}(x,i_{t}+1)=\text{true}appears ( italic_x , italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + 1 ) = true; these are exactly the points that appear in a later class. What remains is a p𝑝pitalic_p-sample of Dtsubscript𝐷𝑡D_{t}italic_D start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. The union of these sets thus is a p𝑝pitalic_p-sample of i=1nOisuperscriptsubscript𝑖1𝑛subscript𝑂𝑖\bigcup_{i=1}^{n}O_{i}⋃ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_O start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, and we can use the size of this p𝑝pitalic_p-sample to estimate the volume V𝑉Vitalic_V of i=1nOisuperscriptsubscript𝑖1𝑛subscript𝑂𝑖\bigcup_{i=1}^{n}O_{i}⋃ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_O start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. The complete algorithm is summarized in Algorithm 2.

Algorithm 2 Volume estimator
  1. 1.

    Partition the boxes into classes C1,,Cmsubscript𝐶1subscript𝐶𝑚C_{1},\dots,C_{m}italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_C start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT. Relabel the boxes so that their indices are in accordance with the class ordering, i.e., Ct={Oit1+1,,Oit}subscript𝐶𝑡subscript𝑂subscript𝑖𝑡11subscript𝑂subscript𝑖𝑡C_{t}=\{O_{i_{t-1}+1},\ldots,O_{i_{t}}\}italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = { italic_O start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT + 1 end_POSTSUBSCRIPT , … , italic_O start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT } for all t[m]𝑡delimited-[]𝑚t\in[m]italic_t ∈ [ italic_m ].

  2. 2.

    Build the data structures from Lemmas 1 and 3.

  3. 3.

    Call Algorithm 1 to obtain a crude estimate V~~𝑉\widetilde{V}over~ start_ARG italic_V end_ARG. Set p:=8/(ε2V~)assign𝑝8superscript𝜀2~𝑉p:=8/(\varepsilon^{2}\widetilde{V})italic_p := 8 / ( italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT over~ start_ARG italic_V end_ARG ).

  4. 4.

    For t=1,,m𝑡1𝑚t=1,\dots,mitalic_t = 1 , … , italic_m do:

    • Draw a p𝑝pitalic_p-sample Stsubscript𝑆𝑡S_{t}italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT from the union Ut:=OCtOassignsubscript𝑈𝑡subscript𝑂subscript𝐶𝑡𝑂U_{t}:=\bigcup_{O\in C_{t}}Oitalic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT := ⋃ start_POSTSUBSCRIPT italic_O ∈ italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_O via Lemma 4.

    • Compute St:={xSt:appears(x,it+1)=false}assignsuperscriptsubscript𝑆𝑡conditional-set𝑥subscript𝑆𝑡appears𝑥subscript𝑖𝑡1falseS_{t}^{\prime}:=\{x\in S_{t}:\textsc{appears}(x,i_{t}+1)=\text{false}\}italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT := { italic_x ∈ italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT : appears ( italic_x , italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + 1 ) = false }.

  5. 5.

    Output t=1m|St|/psuperscriptsubscript𝑡1𝑚superscriptsubscript𝑆𝑡𝑝\sum_{t=1}^{m}|S_{t}^{\prime}|/p∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT | italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | / italic_p.

Lemma 6.

Conditioned on V~2V~𝑉2𝑉\widetilde{V}\leq 2Vover~ start_ARG italic_V end_ARG ≤ 2 italic_V, Algorithm 2 outputs a (1+ε)1𝜀(1+\varepsilon)( 1 + italic_ε )-approximation to V𝑉Vitalic_V with probability at least 3/4343/43 / 4.

Proof.

Note that for all t[m]𝑡delimited-[]𝑚t\in[m]italic_t ∈ [ italic_m ], the set Stsuperscriptsubscript𝑆𝑡S_{t}^{\prime}italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is a p𝑝pitalic_p-sample of Dtsubscript𝐷𝑡D_{t}italic_D start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. Since D1,,Dmsubscript𝐷1subscript𝐷𝑚D_{1},\dots,D_{m}italic_D start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_D start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT partition t=1mUt=i=1nOisuperscriptsubscript𝑡1𝑚subscript𝑈𝑡superscriptsubscript𝑖1𝑛subscript𝑂𝑖\bigcup_{t=1}^{m}U_{t}=\bigcup_{i=1}^{n}O_{i}⋃ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = ⋃ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_O start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, their union t=1mStsuperscriptsubscript𝑡1𝑚superscriptsubscript𝑆𝑡\bigcup_{t=1}^{m}S_{t}^{\prime}⋃ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is a p𝑝pitalic_p-sample of i=1nOisuperscriptsubscript𝑖1𝑛subscript𝑂𝑖\bigcup_{i=1}^{n}O_{i}⋃ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_O start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. It follows that N:=t=1m|St|Pois(pV)assign𝑁superscriptsubscript𝑡1𝑚superscriptsubscript𝑆𝑡similar-toPois𝑝𝑉N:=\sum_{t=1}^{m}|S_{t}^{\prime}|\sim\mathrm{Pois}(pV)italic_N := ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT | italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | ∼ roman_Pois ( italic_p italic_V ).

The expectation and variance of N𝑁Nitalic_N are pV=8V/(ε2V~)4/ε2𝑝𝑉8𝑉superscript𝜀2~𝑉4superscript𝜀2pV=8V/(\varepsilon^{2}\widetilde{V})\geq 4/\varepsilon^{2}italic_p italic_V = 8 italic_V / ( italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT over~ start_ARG italic_V end_ARG ) ≥ 4 / italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. So by Chebyshev,

Pr(|NpV|>εpV)Var[N](εpV)214.Pr𝑁𝑝𝑉𝜀𝑝𝑉Vardelimited-[]𝑁superscript𝜀𝑝𝑉214\Pr(|N-pV|>\varepsilon pV)\leq\frac{\mathrm{Var}[N]}{(\varepsilon pV)^{2}}\leq% \frac{1}{4}.roman_Pr ( | italic_N - italic_p italic_V | > italic_ε italic_p italic_V ) ≤ divide start_ARG roman_Var [ italic_N ] end_ARG start_ARG ( italic_ε italic_p italic_V ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ≤ divide start_ARG 1 end_ARG start_ARG 4 end_ARG .

In other words, with probability at least 3/4343/43 / 4, the output N/p𝑁𝑝N/pitalic_N / italic_p is a 1+ε1𝜀1+\varepsilon1 + italic_ε approximation to V𝑉Vitalic_V. ∎

Lemma 7.

Conditioned on V~V2~𝑉𝑉2\widetilde{V}\geq\frac{V}{2}over~ start_ARG italic_V end_ARG ≥ divide start_ARG italic_V end_ARG start_ARG 2 end_ARG, Algorithm 2 runs in expected time O(log2d+1(n)(n+1ε2))𝑂superscript2𝑑1𝑛𝑛1superscript𝜀2O\left(\log^{2d+1}(n)\cdot(n+\frac{1}{\varepsilon^{2}})\right)italic_O ( roman_log start_POSTSUPERSCRIPT 2 italic_d + 1 end_POSTSUPERSCRIPT ( italic_n ) ⋅ ( italic_n + divide start_ARG 1 end_ARG start_ARG italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) ).

Proof.

Step 1 takes O(nlogn)𝑂𝑛𝑛O(n\log n)italic_O ( italic_n roman_log italic_n ) time: we first compute the side lengths of each box and determine its class, then we sort the boxes according to class. Step 2 takes O(nlogd+1n)𝑂𝑛superscript𝑑1𝑛O(n\log^{d+1}n)italic_O ( italic_n roman_log start_POSTSUPERSCRIPT italic_d + 1 end_POSTSUPERSCRIPT italic_n ) time by Lemmas 1 and 3. Step 3 takes O(nlogd+1n)𝑂𝑛superscript𝑑1𝑛O(n\log^{d+1}n)italic_O ( italic_n roman_log start_POSTSUPERSCRIPT italic_d + 1 end_POSTSUPERSCRIPT italic_n ) time by Lemma 2.

In step 4, iteration t𝑡titalic_t, sampling Stsubscript𝑆𝑡S_{t}italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT costs expected time O((itit1)log(itit1)+pVolume(Ut)logd+1n)𝑂subscript𝑖𝑡subscript𝑖𝑡1subscript𝑖𝑡subscript𝑖𝑡1𝑝Volumesubscript𝑈𝑡superscript𝑑1𝑛O((i_{t}-i_{t-1})\log(i_{t}-i_{t-1})+p\operatorname{Volume}(U_{t})\cdot\log^{d% +1}n)italic_O ( ( italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_i start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) roman_log ( italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_i start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) + italic_p roman_Volume ( italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ⋅ roman_log start_POSTSUPERSCRIPT italic_d + 1 end_POSTSUPERSCRIPT italic_n ) by Lemma 4, and computing Stsuperscriptsubscript𝑆𝑡S_{t}^{\prime}italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT takes expected time O((1+pVolume(Ut))logd+1n)𝑂1𝑝Volumesubscript𝑈𝑡superscript𝑑1𝑛O((1+p\operatorname{Volume}(U_{t}))\cdot\log^{d+1}n)italic_O ( ( 1 + italic_p roman_Volume ( italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ) ⋅ roman_log start_POSTSUPERSCRIPT italic_d + 1 end_POSTSUPERSCRIPT italic_n ) by Lemma 1. Therefore, the expected running time over all iterations is

O(logd+1(n)(n+pt=1mVolume(Ut))).𝑂superscript𝑑1𝑛𝑛𝑝superscriptsubscript𝑡1𝑚Volumesubscript𝑈𝑡O\left(\log^{d+1}(n)\cdot\left(n+p\,\sum_{t=1}^{m}\operatorname{Volume}(U_{t})% \right)\right).italic_O ( roman_log start_POSTSUPERSCRIPT italic_d + 1 end_POSTSUPERSCRIPT ( italic_n ) ⋅ ( italic_n + italic_p ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT roman_Volume ( italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ) ) .

Substituting p=8/(ε2V~)16/(ε2V)𝑝8superscript𝜀2~𝑉16superscript𝜀2𝑉p=8/(\varepsilon^{2}\tilde{V})\leq 16/(\varepsilon^{2}V)italic_p = 8 / ( italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT over~ start_ARG italic_V end_ARG ) ≤ 16 / ( italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_V ) and applying Lemma 5, we can bound

pt=1mVolume(Ut)16ε2Vt=1mVolume(Ut)23d+5logdnε2.𝑝superscriptsubscript𝑡1𝑚Volumesubscript𝑈𝑡16superscript𝜀2𝑉superscriptsubscript𝑡1𝑚Volumesubscript𝑈𝑡superscript23𝑑5superscript𝑑𝑛superscript𝜀2p\,\sum_{t=1}^{m}\operatorname{Volume}(U_{t})\leq\frac{16}{\varepsilon^{2}V}\,% \sum_{t=1}^{m}\operatorname{Volume}(U_{t})\leq\frac{2^{3d+5}\log^{d}n}{% \varepsilon^{2}}.italic_p ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT roman_Volume ( italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ≤ divide start_ARG 16 end_ARG start_ARG italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_V end_ARG ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT roman_Volume ( italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ≤ divide start_ARG 2 start_POSTSUPERSCRIPT 3 italic_d + 5 end_POSTSUPERSCRIPT roman_log start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT italic_n end_ARG start_ARG italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG .

Hence, the expected running time of step 5 is O(log2d+1(n)(n+1ε2))𝑂superscript2𝑑1𝑛𝑛1superscript𝜀2O\left(\log^{2d+1}(n)\cdot(n+\frac{1}{\varepsilon^{2}})\right)italic_O ( roman_log start_POSTSUPERSCRIPT 2 italic_d + 1 end_POSTSUPERSCRIPT ( italic_n ) ⋅ ( italic_n + divide start_ARG 1 end_ARG start_ARG italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) ). ∎

Proof of Theorem 2.

We run Algorithm 2 with a time budget tenfold the bound in Lemma 7; if step 5 spends excessive time then we immediately abort the algorithm. So the stated time bound is clearly satisfied.

Now consider three bad events:

  • V~[V2,2V]~𝑉𝑉22𝑉\widetilde{V}\not\in[\frac{V}{2},2V]over~ start_ARG italic_V end_ARG ∉ [ divide start_ARG italic_V end_ARG start_ARG 2 end_ARG , 2 italic_V ].

  • V~[V2,2V]~𝑉𝑉22𝑉\widetilde{V}\in[\frac{V}{2},2V]over~ start_ARG italic_V end_ARG ∈ [ divide start_ARG italic_V end_ARG start_ARG 2 end_ARG , 2 italic_V ], but the algorithm is aborted.

  • V~[V2,2V]~𝑉𝑉22𝑉\widetilde{V}\in[\frac{V}{2},2V]over~ start_ARG italic_V end_ARG ∈ [ divide start_ARG italic_V end_ARG start_ARG 2 end_ARG , 2 italic_V ] and the algorithm is not aborted, but it does not output a (1+ε)1𝜀(1+\varepsilon)( 1 + italic_ε )-approximation to V𝑉Vitalic_V.

By Lemma 2, the first event happens with probability at most 0.10.10.10.1. By Markov’s inequality, the second event happens with probability at most 0.10.10.10.1. Lastly, by Lemma 6, the third event happens with probability at most 1/4141/41 / 4. So the total error probability is at most 0.1+0.1+14=9200.10.1149200.1+0.1+\frac{1}{4}=\frac{9}{20}0.1 + 0.1 + divide start_ARG 1 end_ARG start_ARG 4 end_ARG = divide start_ARG 9 end_ARG start_ARG 20 end_ARG. If none of the bad events happen, then the algorithm correctly outputs a (1+ε)1𝜀(1+\varepsilon)( 1 + italic_ε )-approximation to V𝑉Vitalic_V. The success probability of 192019201-\frac{9}{20}1 - divide start_ARG 9 end_ARG start_ARG 20 end_ARG can be boosted to, say, 0.90.90.90.9 by returning the median of a sufficiently large constant number of repetitions of the algorithm. ∎

2.4 Handling Discrete Boxes

We now argue that our algorithm for boxes in dsuperscript𝑑\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT also solves the following discrete variant of Klee’s measure problem: Given boxes O1,,Onsubscript𝑂1subscript𝑂𝑛O_{1},\dots,O_{n}italic_O start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_O start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT in dsuperscript𝑑\mathbb{Z}^{d}blackboard_Z start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, count the number of points in the union i=1nOisuperscriptsubscript𝑖1𝑛subscript𝑂𝑖\bigcup_{i=1}^{n}O_{i}⋃ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_O start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. To solve this problem, we employ the following embedding of dsuperscript𝑑\mathbb{Z}^{d}blackboard_Z start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT into dsuperscript𝑑\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT:

φ:(x1,,xd)d[x1,x1+1]××[xd,xd+1]d.:𝜑subscript𝑥1subscript𝑥𝑑superscript𝑑maps-tosubscript𝑥1subscript𝑥11subscript𝑥𝑑subscript𝑥𝑑1superscript𝑑\varphi:\;(x_{1},\dots,x_{d})\in\mathbb{Z}^{d}\;\mapsto\;[x_{1},x_{1}+1]\times% \cdots\times[x_{d},x_{d}+1]\subset\mathbb{R}^{d}.italic_φ : ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) ∈ blackboard_Z start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ↦ [ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + 1 ] × ⋯ × [ italic_x start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT + 1 ] ⊂ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT .

Note that φ𝜑\varphiitalic_φ transforms discrete boxes into continuous boxes, and that the cardinality of any Ud𝑈superscript𝑑U\subset\mathbb{Z}^{d}italic_U ⊂ blackboard_Z start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT is equal to the volume of its image φ(U)d𝜑𝑈superscript𝑑\varphi(U)\subset\mathbb{R}^{d}italic_φ ( italic_U ) ⊂ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT. Hence the discrete variant of Klee’s measure problem reduces to the continuous counterpart.

3 Lower Bound for Union Volume Estimation

We consider estimating the volume of the union of n𝑛nitalic_n (measurable) objects O1,,On2subscript𝑂1subscript𝑂𝑛superscript2O_{1},\dots,O_{n}\subset\mathbb{R}^{2}italic_O start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_O start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⊂ blackboard_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. These objects are only accessible through the following three queries:

  • Volume(i)Volume𝑖\operatorname{Volume}(i)roman_Volume ( italic_i ): Return the volume of object Oisubscript𝑂𝑖O_{i}italic_O start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT.

  • Sample(i)Sample𝑖\operatorname{Sample}(i)roman_Sample ( italic_i ): Draw a uniform random point from Oisubscript𝑂𝑖O_{i}italic_O start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT.

  • Contains((a,b),i)Contains𝑎𝑏𝑖\operatorname{Contains}((a,b),i)roman_Contains ( ( italic_a , italic_b ) , italic_i ): Given a point (a,b)2𝑎𝑏superscript2(a,b)\in\mathbb{R}^{2}( italic_a , italic_b ) ∈ blackboard_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, return whether (a,b)Oi𝑎𝑏subscript𝑂𝑖(a,b)\in O_{i}( italic_a , italic_b ) ∈ italic_O start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT or not.

It is known that O(nε2)𝑂𝑛superscript𝜀2O(n\varepsilon^{-2})italic_O ( italic_n italic_ε start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT ) queries suffice to return with constant probability a (1+ε)1𝜀(1+\varepsilon)( 1 + italic_ε )-approximation to the volume of the union O1Onsubscript𝑂1subscript𝑂𝑛O_{1}\cup\ldots\cup O_{n}italic_O start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∪ … ∪ italic_O start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT. Here we prove a matching lower bound.

For convenience, we also consider a discrete version of the problem in which each object Oisubscript𝑂𝑖O_{i}italic_O start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is instead a finite subset of the integer lattice 2superscript2\mathbb{Z}^{2}blackboard_Z start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. The queries are then

  • Volume(i)Volume𝑖\operatorname{Volume}(i)roman_Volume ( italic_i ): Return the cardinality |Oi|subscript𝑂𝑖|O_{i}|| italic_O start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT |.

  • Sample(i)Sample𝑖\operatorname{Sample}(i)roman_Sample ( italic_i ): Draw a uniform random point from Oisubscript𝑂𝑖O_{i}italic_O start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT.

  • Contains((a,b),i)Contains𝑎𝑏𝑖\operatorname{Contains}((a,b),i)roman_Contains ( ( italic_a , italic_b ) , italic_i ): Given a point (a,b)2𝑎𝑏superscript2(a,b)\in\mathbb{Z}^{2}( italic_a , italic_b ) ∈ blackboard_Z start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, return whether (a,b)Oi𝑎𝑏subscript𝑂𝑖(a,b)\in O_{i}( italic_a , italic_b ) ∈ italic_O start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT or not.

The goal is to give a (1+ε)1𝜀(1+\varepsilon)( 1 + italic_ε )-approximation to the cardinality |O1On|subscript𝑂1subscript𝑂𝑛|O_{1}\cup\ldots\cup O_{n}|| italic_O start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∪ … ∪ italic_O start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT | of the union.

In Section 3.1 we show a lower bound for the discrete version, and then in Section 3.2 we show that a lower bound for the discrete version implies a similar lower bound for the continuous version.

3.1 Lower Bound for Discrete Union

In the remainder, we write [n]{1,2,,n}delimited-[]𝑛12𝑛[n]\coloneqq\{1,2,\dots,n\}[ italic_n ] ≔ { 1 , 2 , … , italic_n }. The starting point is what we call the Query-Gap-Hamming problem: The input is two (hidden) vectors x,y{1,1}T𝑥𝑦superscript11𝑇x,y\in\{-1,1\}^{T}italic_x , italic_y ∈ { - 1 , 1 } start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT and we can access an arbitrary bit of x𝑥xitalic_x or y𝑦yitalic_y at a time. The goal is to distinguish the cases x,y>T𝑥𝑦𝑇\langle x,y\rangle>\sqrt{T}⟨ italic_x , italic_y ⟩ > square-root start_ARG italic_T end_ARG and x,y<T𝑥𝑦𝑇\langle x,y\rangle<-\sqrt{T}⟨ italic_x , italic_y ⟩ < - square-root start_ARG italic_T end_ARG using as few accesses as possible. Query-Gap-Hamming has linear query complexity:

Lemma 8.

Any randomized algorithm solving Query-Gap-Hamming with probability at least 2/3232/32 / 3 requires Ω(T)Ω𝑇\Omega(T)roman_Ω ( italic_T ) accesses to x𝑥xitalic_x and y𝑦yitalic_y, regardless of the computational resources it uses.

Proof.

This follows by a folklore argument from the fact that the Gap-Hamming problem has linear randomized communication complexity [8]. We next describe the details.

We reduce from the communication complexity of the Gap-Hamming problem, where Alice holds a vector x{1,1}T𝑥superscript11𝑇x\in\{-1,1\}^{T}italic_x ∈ { - 1 , 1 } start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT, Bob holds a vector y{1,1}T𝑦superscript11𝑇y\in\{-1,1\}^{T}italic_y ∈ { - 1 , 1 } start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT, and their goal is to distinguish x,y>T𝑥𝑦𝑇\langle x,y\rangle>\sqrt{T}⟨ italic_x , italic_y ⟩ > square-root start_ARG italic_T end_ARG from x,y<T𝑥𝑦𝑇\langle x,y\rangle<-\sqrt{T}⟨ italic_x , italic_y ⟩ < - square-root start_ARG italic_T end_ARG while communicating as few bits as possible. It is known that the two-way, public-coin randomized communication complexity of Gap-Hamming is Ω(n)Ω𝑛\Omega(n)roman_Ω ( italic_n ) [8]. Now suppose that a randomized algorithm can solve Query-Gap-Hamming with probability at least 2/3232/32 / 3, while making only o(n)𝑜𝑛o(n)italic_o ( italic_n ) accesses to x𝑥xitalic_x and y𝑦yitalic_y. We construct a protocol between Alice and Bob: They simulate the algorithm synchronously, using a shared random tape. Whenever the algorithm tries to access xjsubscript𝑥𝑗x_{j}italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, Alice sends the bit xjsubscript𝑥𝑗x_{j}italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT to Bob. Whenever it tries to access yjsubscript𝑦𝑗y_{j}italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, Bob sends the bit yjsubscript𝑦𝑗y_{j}italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT to Alice. Clearly both parties can simulate the algorithm till the end, and output the answer of the algorithm. The communication cost is o(n)𝑜𝑛o(n)italic_o ( italic_n ) bits, which contradicts the aforementioned communication complexity. ∎

Next we give a reduction from Query-Gap-Hamming to estimating the cardinality of a union of objects. In more detail, from the hidden input vectors x,y{1,1}T𝑥𝑦superscript11𝑇x,y\in\{-1,1\}^{T}italic_x , italic_y ∈ { - 1 , 1 } start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT we (implicitly) define 2n2𝑛2n2 italic_n objects X1,,Xn,Y1,,Yn2subscript𝑋1subscript𝑋𝑛subscript𝑌1subscript𝑌𝑛superscript2X_{1},\dots,X_{n},Y_{1},\dots,Y_{n}\subset\mathbb{Z}^{2}italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_Y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⊂ blackboard_Z start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. Write R:={(n+1,0),,(nT+n,0)}assign𝑅𝑛10𝑛𝑇𝑛0R:=\{(n+1,0),\dots,(nT+n,0)\}italic_R := { ( italic_n + 1 , 0 ) , … , ( italic_n italic_T + italic_n , 0 ) }. Given permutations π1,,πTsubscript𝜋1subscript𝜋𝑇\pi_{1},\dots,\pi_{T}italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_π start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT of [n]delimited-[]𝑛[n][ italic_n ], we define

Xi=Xi(x,π1,,πT):=R{(jn+πj(i),xj):j[T]}subscript𝑋𝑖subscript𝑋𝑖𝑥subscript𝜋1subscript𝜋𝑇assign𝑅conditional-set𝑗𝑛subscript𝜋𝑗𝑖subscript𝑥𝑗𝑗delimited-[]𝑇X_{i}=X_{i}(x,\pi_{1},\dots,\pi_{T}):=R\cup\{(jn+\pi_{j}(i),x_{j}):j\in[T]\}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x , italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_π start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) := italic_R ∪ { ( italic_j italic_n + italic_π start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_i ) , italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) : italic_j ∈ [ italic_T ] }

for every i[n]𝑖delimited-[]𝑛i\in[n]italic_i ∈ [ italic_n ]. Analogously, given a different set of permutations τ1,,τTsubscript𝜏1subscript𝜏𝑇\tau_{1},\dots,\tau_{T}italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_τ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT, we define

Yi=Yi(y,τ1,,τT):=R{(jn+τj(i),yj):j[T]}subscript𝑌𝑖subscript𝑌𝑖𝑦subscript𝜏1subscript𝜏𝑇assign𝑅conditional-set𝑗𝑛subscript𝜏𝑗𝑖subscript𝑦𝑗𝑗delimited-[]𝑇Y_{i}=Y_{i}(y,\tau_{1},\dots,\tau_{T}):=R\cup\{(jn+\tau_{j}(i),y_{j}):j\in[T]\}italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_y , italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_τ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) := italic_R ∪ { ( italic_j italic_n + italic_τ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_i ) , italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) : italic_j ∈ [ italic_T ] }

for every i[n]𝑖delimited-[]𝑛i\in[n]italic_i ∈ [ italic_n ]. Note that R𝑅Ritalic_R is a subset of all Xisubscript𝑋𝑖X_{i}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and Yisubscript𝑌𝑖Y_{i}italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT.

Consider an arbitrary index j[T]𝑗delimited-[]𝑇j\in[T]italic_j ∈ [ italic_T ]. If xj=yjsubscript𝑥𝑗subscript𝑦𝑗x_{j}=y_{j}italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT then the point sets {(jn+πj(i),xj):i[n]}conditional-set𝑗𝑛subscript𝜋𝑗𝑖subscript𝑥𝑗𝑖delimited-[]𝑛\{(jn+\pi_{j}(i),x_{j}):i\in[n]\}{ ( italic_j italic_n + italic_π start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_i ) , italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) : italic_i ∈ [ italic_n ] } and {(jn+τj(i),yj):i[n]}conditional-set𝑗𝑛subscript𝜏𝑗𝑖subscript𝑦𝑗𝑖delimited-[]𝑛\{(jn+\tau_{j}(i),y_{j}):i\in[n]\}{ ( italic_j italic_n + italic_τ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_i ) , italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) : italic_i ∈ [ italic_n ] } are equal, so they together contribute n𝑛nitalic_n to the cardinality of the union. On the other hand, if xjyjsubscript𝑥𝑗subscript𝑦𝑗x_{j}\neq y_{j}italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ≠ italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT then they are disjoint and thus contribute 2n2𝑛2n2 italic_n. Furthermore, the point set R𝑅Ritalic_R is contained in all objects and contributes nT𝑛𝑇nTitalic_n italic_T. Hence, the cardinality of the union equals

nT+j:xj=yjn+j:xjyj2n=52nT12n(j:xj=yj1+j:xjyj(1))=52nT12nx,y.𝑛𝑇subscript:𝑗subscript𝑥𝑗subscript𝑦𝑗𝑛subscript:𝑗subscript𝑥𝑗subscript𝑦𝑗2𝑛52𝑛𝑇12𝑛subscript:𝑗subscript𝑥𝑗subscript𝑦𝑗1subscript:𝑗subscript𝑥𝑗subscript𝑦𝑗152𝑛𝑇12𝑛𝑥𝑦nT+\sum_{j\colon x_{j}=y_{j}}n+\sum_{j\colon x_{j}\neq y_{j}}2n=\frac{5}{2}nT-% \frac{1}{2}n\cdot\left(\sum_{j\colon x_{j}=y_{j}}1+\sum_{j\colon x_{j}\neq y_{% j}}(-1)\right)=\frac{5}{2}nT-\frac{1}{2}n\langle x,y\rangle.italic_n italic_T + ∑ start_POSTSUBSCRIPT italic_j : italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_n + ∑ start_POSTSUBSCRIPT italic_j : italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ≠ italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT 2 italic_n = divide start_ARG 5 end_ARG start_ARG 2 end_ARG italic_n italic_T - divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_n ⋅ ( ∑ start_POSTSUBSCRIPT italic_j : italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT 1 + ∑ start_POSTSUBSCRIPT italic_j : italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ≠ italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( - 1 ) ) = divide start_ARG 5 end_ARG start_ARG 2 end_ARG italic_n italic_T - divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_n ⟨ italic_x , italic_y ⟩ .

Let ρ𝜌\rhoitalic_ρ be a (1+ε)1𝜀(1+\varepsilon)( 1 + italic_ε )-approximation to the cardinality of the union, i.e., 52nT12nx,y52𝑛𝑇12𝑛𝑥𝑦\frac{5}{2}nT-\frac{1}{2}n\langle x,y\rangledivide start_ARG 5 end_ARG start_ARG 2 end_ARG italic_n italic_T - divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_n ⟨ italic_x , italic_y ⟩. Since

ρ[(1ε)(52nT12nx,y),(1+ε)(52nT12nx,y)]𝜌1𝜀52𝑛𝑇12𝑛𝑥𝑦1𝜀52𝑛𝑇12𝑛𝑥𝑦\rho\in\left[(1-\varepsilon)(\tfrac{5}{2}nT-\tfrac{1}{2}n\langle x,y\rangle),% \,(1+\varepsilon)(\tfrac{5}{2}nT-\tfrac{1}{2}n\langle x,y\rangle)\right]italic_ρ ∈ [ ( 1 - italic_ε ) ( divide start_ARG 5 end_ARG start_ARG 2 end_ARG italic_n italic_T - divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_n ⟨ italic_x , italic_y ⟩ ) , ( 1 + italic_ε ) ( divide start_ARG 5 end_ARG start_ARG 2 end_ARG italic_n italic_T - divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_n ⟨ italic_x , italic_y ⟩ ) ]

and |x,y|T𝑥𝑦𝑇|\langle x,y\rangle|\leq T| ⟨ italic_x , italic_y ⟩ | ≤ italic_T, by computing (52nTρ)2n52𝑛𝑇𝜌2𝑛(\frac{5}{2}nT-\rho)\cdot\frac{2}{n}( divide start_ARG 5 end_ARG start_ARG 2 end_ARG italic_n italic_T - italic_ρ ) ⋅ divide start_ARG 2 end_ARG start_ARG italic_n end_ARG we obtain a value in [x,y6εT,x,y+6εT]𝑥𝑦6𝜀𝑇𝑥𝑦6𝜀𝑇[\langle x,y\rangle-6\varepsilon T,\langle x,y\rangle+6\varepsilon T][ ⟨ italic_x , italic_y ⟩ - 6 italic_ε italic_T , ⟨ italic_x , italic_y ⟩ + 6 italic_ε italic_T ], namely an additive 6εT6𝜀𝑇6\varepsilon T6 italic_ε italic_T approximation to x,y𝑥𝑦\langle x,y\rangle⟨ italic_x , italic_y ⟩ with probability at least 4/5. For ε1/(6T)𝜀16𝑇\varepsilon\leq 1/(6\sqrt{T})italic_ε ≤ 1 / ( 6 square-root start_ARG italic_T end_ARG ) this allows to decide x,y>T𝑥𝑦𝑇\langle x,y\rangle>\sqrt{T}⟨ italic_x , italic_y ⟩ > square-root start_ARG italic_T end_ARG or x,y<T𝑥𝑦𝑇\langle x,y\rangle<-\sqrt{T}⟨ italic_x , italic_y ⟩ < - square-root start_ARG italic_T end_ARG.

Let 𝒜𝒜\mathcal{A}caligraphic_A be a (possibly randomized) algorithm that (1+ε)1𝜀(1+\varepsilon)( 1 + italic_ε )-approximates the volume of union of any 2n2𝑛2n2 italic_n objects O1,,O2n2subscript𝑂1subscript𝑂2𝑛superscript2O_{1},\dots,O_{2n}\subset\mathbb{Z}^{2}italic_O start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_O start_POSTSUBSCRIPT 2 italic_n end_POSTSUBSCRIPT ⊂ blackboard_Z start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT with probability at least 4/5454/54 / 5, using q𝑞qitalic_q queries. We assume that q10n𝑞10𝑛q\geq 10nitalic_q ≥ 10 italic_n; otherwise we modify 𝒜𝒜\mathcal{A}caligraphic_A to ask 10nq10𝑛𝑞10n-q10 italic_n - italic_q dummy queries.

We now simulate 𝒜𝒜\mathcal{A}caligraphic_A as if the input were the 2n2𝑛2n2 italic_n objects X1,,Xn,subscript𝑋1subscript𝑋𝑛X_{1},\ldots,X_{n},italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , Y1,,Ynsubscript𝑌1subscript𝑌𝑛Y_{1},\ldots,Y_{n}italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_Y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT. It remains to argue that we can answer all queries by 𝒜𝒜\mathcal{A}caligraphic_A while accessing few bits in x𝑥xitalic_x and y𝑦yitalic_y. Specifically, the number of accesses would be only O(q/n)𝑂𝑞𝑛O(q/n)italic_O ( italic_q / italic_n ). The details of the simulation algorithm are as follows:

Algorithm 𝒮𝒮\mathcal{S}caligraphic_S:
  1. 1.

    Sample random permutations π1,,πTsubscript𝜋1subscript𝜋𝑇\pi_{1},\dots,\pi_{T}italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_π start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT and τ1,,τTsubscript𝜏1subscript𝜏𝑇\tau_{1},\dots,\tau_{T}italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_τ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT of [n]delimited-[]𝑛[n][ italic_n ] uniformly and independently.

  2. 2.

    Simulate algorithm 𝒜𝒜\mathcal{A}caligraphic_A and answer its queries as follows.

    • Volume(i)Volume𝑖\operatorname{Volume}(i)roman_Volume ( italic_i ): Answer nT+T𝑛𝑇𝑇nT+Titalic_n italic_T + italic_T.

    • Sample(i)Sample𝑖\operatorname{Sample}(i)roman_Sample ( italic_i ): In the case in𝑖𝑛i\leq nitalic_i ≤ italic_n:

      1. (S1)

        With probability nT/(nT+T)=11/(n+1)𝑛𝑇𝑛𝑇𝑇11𝑛1nT/(nT+T)=1-1/(n+1)italic_n italic_T / ( italic_n italic_T + italic_T ) = 1 - 1 / ( italic_n + 1 ), answer with a uniform random point pR𝑝𝑅p\in Ritalic_p ∈ italic_R.

      2. (S2)

        With the remaining probability, pick a uniform random j[T]𝑗delimited-[]𝑇j\in[T]italic_j ∈ [ italic_T ]. If we have not accessed xjsubscript𝑥𝑗x_{j}italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT yet, access it and keep it in memory. Then answer with the point (jn+πj(i),xj)𝑗𝑛subscript𝜋𝑗𝑖subscript𝑥𝑗(jn+\pi_{j}(i),x_{j})( italic_j italic_n + italic_π start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_i ) , italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ).

      In the case i>n𝑖𝑛i>nitalic_i > italic_n, do the same with xjsubscript𝑥𝑗x_{j}italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT replaced by yjsubscript𝑦𝑗y_{j}italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT and πj(i)subscript𝜋𝑗𝑖\pi_{j}(i)italic_π start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_i ) replaced by τj(in)subscript𝜏𝑗𝑖𝑛\tau_{j}(i-n)italic_τ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_i - italic_n ).

    • Contains((a,b),i)Contains𝑎𝑏𝑖\operatorname{Contains}((a,b),i)roman_Contains ( ( italic_a , italic_b ) , italic_i ): Let j=(a1)/n𝑗𝑎1𝑛j=\lfloor(a-1)/n\rflooritalic_j = ⌊ ( italic_a - 1 ) / italic_n ⌋. In the case in𝑖𝑛i\leq nitalic_i ≤ italic_n:

      1. (C1)

        If (a,b)R𝑎𝑏𝑅(a,b)\in R( italic_a , italic_b ) ∈ italic_R then answer true.

      2. (C2)

        Else, if j[n]𝑗delimited-[]𝑛j\not\in[n]italic_j ∉ [ italic_n ] or jn+πj(i)a𝑗𝑛subscript𝜋𝑗𝑖𝑎jn+\pi_{j}(i)\neq aitalic_j italic_n + italic_π start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_i ) ≠ italic_a then answer false.

      3. (C3)

        Else, we have jn+πj(i)=a𝑗𝑛subscript𝜋𝑗𝑖𝑎jn+\pi_{j}(i)=aitalic_j italic_n + italic_π start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_i ) = italic_a. If we have not accessed xjsubscript𝑥𝑗x_{j}italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT yet, access it and keep it in memory. If b=xj𝑏subscript𝑥𝑗b=x_{j}italic_b = italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT then answer true, otherwise answer false.

      In the case i>n𝑖𝑛i>nitalic_i > italic_n, do the same with xjsubscript𝑥𝑗x_{j}italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT replaced by yjsubscript𝑦𝑗y_{j}italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT and πj(i)subscript𝜋𝑗𝑖\pi_{j}(i)italic_π start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_i ) replaced by τj(in)subscript𝜏𝑗𝑖𝑛\tau_{j}(i-n)italic_τ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_i - italic_n ).

  3. 3.

    Let ρ𝜌\rhoitalic_ρ be output of 𝒜𝒜\mathcal{A}caligraphic_A and return (52nTρ)2n52𝑛𝑇𝜌2𝑛(\frac{5}{2}nT-\rho)\cdot\frac{2}{n}( divide start_ARG 5 end_ARG start_ARG 2 end_ARG italic_n italic_T - italic_ρ ) ⋅ divide start_ARG 2 end_ARG start_ARG italic_n end_ARG.

This finishes the description of algorithm 𝒮𝒮\mathcal{S}caligraphic_S. It is immediate from the algorithm that the execution of 𝒜𝒜\mathcal{A}caligraphic_A is the same as if actually running it on the objects Xi(x,π1,,πT)subscript𝑋𝑖𝑥subscript𝜋1subscript𝜋𝑇X_{i}(x,\pi_{1},\dots,\pi_{T})italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x , italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_π start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) and Yi(y,τ1,,τT)subscript𝑌𝑖𝑦subscript𝜏1subscript𝜏𝑇Y_{i}(y,\tau_{1},\dots,\tau_{T})italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_y , italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_τ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) for i[n]𝑖delimited-[]𝑛i\in[n]italic_i ∈ [ italic_n ]. What remains is to bound the number of accesses to x𝑥xitalic_x and y𝑦yitalic_y by 𝒮𝒮\mathcal{S}caligraphic_S during the simulation.

To this end, observe that an access to x𝑥xitalic_x (respectively y𝑦yitalic_y) occurs only when the query enters (S2) or (C3). In both (S2) and (C3), a permutation entry πj(i)subscript𝜋𝑗𝑖\pi_{j}(i)italic_π start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_i ) (respectively τj(in)subscript𝜏𝑗𝑖𝑛\tau_{j}(i-n)italic_τ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_i - italic_n )) is involved, and we say that the entry is hit by the query.

By definition, the number of accesses to x𝑥xitalic_x and y𝑦yitalic_y is exactly the number of entries πj(i)subscript𝜋𝑗𝑖\pi_{j}(i)italic_π start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_i ) and τj(in)subscript𝜏𝑗𝑖𝑛\tau_{j}(i-n)italic_τ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_i - italic_n ) hit by some query. In light of this, we can move on to upper bound the latter.

We consider two bad events. Let E1subscript𝐸1E_{1}italic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT be the event that more than 20q/n20𝑞𝑛20q/n20 italic_q / italic_n entries are hit by (S2). Let E2subscript𝐸2E_{2}italic_E start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT be the event that at most 20q/n20𝑞𝑛20q/n20 italic_q / italic_n entries are hit by (S2), but more than k:=40q/nassign𝑘40𝑞𝑛k:=40q/nitalic_k := 40 italic_q / italic_n entries are freshly hit by (C3). Here “freshly” means that the entry was not hit by any query before it is hit by (C3).

Entries hit by (S2).

We first consider the number of entries hit by (S2). For t[q]𝑡delimited-[]𝑞t\in[q]italic_t ∈ [ italic_q ], define an indicator random variable Ztsubscript𝑍𝑡Z_{t}italic_Z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT taking the value 1111 iff the t𝑡titalic_t-th query of 𝒮𝒮\mathcal{S}caligraphic_S enters case (S2). Since every query may hit at most one entry, the total number of entries hit by (S2) is at most t=1qZtsuperscriptsubscript𝑡1𝑞subscript𝑍𝑡\sum_{t=1}^{q}Z_{t}∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT italic_Z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. Note that Pr[Zt=1]1/(n+1)Prsubscript𝑍𝑡11𝑛1\Pr[Z_{t}=1]\leq 1/(n+1)roman_Pr [ italic_Z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = 1 ] ≤ 1 / ( italic_n + 1 ) for all t𝑡titalic_t and hence 𝔼[t=1qZt]<q/n𝔼delimited-[]superscriptsubscript𝑡1𝑞subscript𝑍𝑡𝑞𝑛\mathbb{E}[\sum_{t=1}^{q}Z_{t}]<q/nblackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT italic_Z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ] < italic_q / italic_n. So by Markov’s inequality, Pr[E1]Pr[t=1qZt>20q/n]<1/20Prsubscript𝐸1Prsuperscriptsubscript𝑡1𝑞subscript𝑍𝑡20𝑞𝑛120\Pr[E_{1}]\leq\Pr[\sum_{t=1}^{q}Z_{t}>20q/n]<1/20roman_Pr [ italic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] ≤ roman_Pr [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT italic_Z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT > 20 italic_q / italic_n ] < 1 / 20.

Entries freshly hit by (C3).

The tricky query to analyze is the Contains((a,b),i)Contains𝑎𝑏𝑖\operatorname{Contains}((a,b),i)roman_Contains ( ( italic_a , italic_b ) , italic_i ) query. We will show that Pr[E2]<1/20Prsubscript𝐸2120\Pr[E_{2}]<1/20roman_Pr [ italic_E start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ] < 1 / 20. Roughly, we need to argue that if πj(i)subscript𝜋𝑗𝑖\pi_{j}(i)italic_π start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_i ) was not hit previously then 𝒜𝒜\mathcal{A}caligraphic_A is unlikely to ask a query with a=jn+πj(i)𝑎𝑗𝑛subscript𝜋𝑗𝑖a=jn+\pi_{j}(i)italic_a = italic_j italic_n + italic_π start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_i ). The intuition is that 𝒜𝒜\mathcal{A}caligraphic_A is unaware of the permutations π1,,πTsubscript𝜋1subscript𝜋𝑇\pi_{1},\dots,\pi_{T}italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_π start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT and τ1,,τTsubscript𝜏1subscript𝜏𝑇\tau_{1},\dots,\tau_{T}italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_τ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT, and thus to get a fresh hit it has to “guess” an entry of a permutation.

For the proof, assume for the sake of contradiction that Pr[E2]1/20Prsubscript𝐸2120\Pr[E_{2}]\geq 1/20roman_Pr [ italic_E start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ] ≥ 1 / 20. Under this assumption, we give an algorithm for encoding the random permutations π1,,πT,τ1,,τTsubscript𝜋1subscript𝜋𝑇subscript𝜏1subscript𝜏𝑇\pi_{1},\dots,\pi_{T},\tau_{1},\dots,\tau_{T}italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_π start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT , italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_τ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT in less than 2Tlg(n!)2𝑇lg𝑛2T\lg(n!)2 italic_T roman_lg ( italic_n ! ) bits in expectation. This is an information theoretic contradiction. More formally, our proof considers a game between an encoder and a decoder. The encoder receives π1,,πT,τ1,,τT,x,ysubscript𝜋1subscript𝜋𝑇subscript𝜏1subscript𝜏𝑇𝑥𝑦\pi_{1},\dots,\pi_{T},\tau_{1},\dots,\tau_{T},x,yitalic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_π start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT , italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_τ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT , italic_x , italic_y as well as the random tape r𝑟ritalic_r used by 𝒮𝒮\mathcal{S}caligraphic_S and 𝒜𝒜\mathcal{A}caligraphic_A in simulation step 2. The decoder receives x,y,r𝑥𝑦𝑟x,y,ritalic_x , italic_y , italic_r. The encoder must send a message to the decoder which allows the decoder to reconstruct π1,,πT,τ1,,τTsubscript𝜋1subscript𝜋𝑇subscript𝜏1subscript𝜏𝑇\pi_{1},\dots,\pi_{T},\tau_{1},\dots,\tau_{T}italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_π start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT , italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_τ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT. Since the Shannon entropy is H(π1,,πT,τ1,,τTx,y,r)=2Tlg(n!)𝐻subscript𝜋1subscript𝜋𝑇subscript𝜏1conditionalsubscript𝜏𝑇𝑥𝑦𝑟2𝑇lg𝑛H(\pi_{1},\dots,\pi_{T},\tau_{1},\dots,\tau_{T}\mid x,y,r)=2T\lg(n!)italic_H ( italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_π start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT , italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_τ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ∣ italic_x , italic_y , italic_r ) = 2 italic_T roman_lg ( italic_n ! ), it follows by Shannon’s source coding theorem that the expected length of the message must be at least 2Tlg(n!)2𝑇lg𝑛2T\lg(n!)2 italic_T roman_lg ( italic_n ! ) bits.

The way we use the assumption Pr[E2]1/20Prsubscript𝐸2120\Pr[E_{2}]\geq 1/20roman_Pr [ italic_E start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ] ≥ 1 / 20, is that the encoder will send the indices of the queries among 1,,q1𝑞1,\dots,q1 , … , italic_q which freshly hit an entry in (C3). The encoder will further send information that allows the decoder to simulate 𝒮𝒮\mathcal{S}caligraphic_S for the remaining queries. Whenever the decoder reaches one of the specified queries, she knows that the point (a,b)𝑎𝑏(a,b)( italic_a , italic_b ) given by the Contains((a,b),i)Contains𝑎𝑏𝑖\operatorname{Contains}((a,b),i)roman_Contains ( ( italic_a , italic_b ) , italic_i ) query satisfies jn+πj(i)=a𝑗𝑛subscript𝜋𝑗𝑖𝑎jn+\pi_{j}(i)=aitalic_j italic_n + italic_π start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_i ) = italic_a. This allows her to recover πj(i)subscript𝜋𝑗𝑖\pi_{j}(i)italic_π start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_i ), i.e., roughly lgnlg𝑛\lg nroman_lg italic_n bits of information. But sending k𝑘kitalic_k such indices costs lg(qk)klg(q/k)lgbinomial𝑞𝑘𝑘lg𝑞𝑘\lg\binom{q}{k}\approx k\lg(q/k)roman_lg ( FRACOP start_ARG italic_q end_ARG start_ARG italic_k end_ARG ) ≈ italic_k roman_lg ( italic_q / italic_k ) bits, or lg(q/k)lg𝑞𝑘\lg(q/k)roman_lg ( italic_q / italic_k ) bits per index. Since q/knmuch-less-than𝑞𝑘𝑛q/k\ll nitalic_q / italic_k ≪ italic_n, we use less bits than the information theoretic lower bound, which is a contradiction. We now proceed to give the formal details.

Encoding procedure.

The encoder receives random permutations π1,,πT,τ1,,τTsubscript𝜋1subscript𝜋𝑇subscript𝜏1subscript𝜏𝑇\pi_{1},\dots,\pi_{T},\tau_{1},\dots,\tau_{T}italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_π start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT , italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_τ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT and also x,y,r𝑥𝑦𝑟x,y,ritalic_x , italic_y , italic_r, and proceeds as follows:

  1. 1.

    Initialize algorithm 𝒮𝒮\mathcal{S}caligraphic_S with the given permutations. Run it from step 2 onward, using the given tape r𝑟ritalic_r to make random choices for 𝒮𝒮\mathcal{S}caligraphic_S and 𝒜𝒜\mathcal{A}caligraphic_A.

  2. 2.

    If the event E2subscript𝐸2E_{2}italic_E start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT does not happen, send a 0-bit followed by a naive encoding of all permutations.

  3. 3.

    Otherwise E2subscript𝐸2E_{2}italic_E start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT happens. Signal this by sending a 1-bit. Then send the indices I[q]𝐼delimited-[]𝑞I\subseteq[q]italic_I ⊆ [ italic_q ] of the first k𝑘kitalic_k queries that freshly hit some entry in (C3). Next, denote :=maxIassign𝐼\ell:=\max Iroman_ℓ := roman_max italic_I. For t=1,,𝑡1t=1,\dots,\ellitalic_t = 1 , … , roman_ℓ in that order, if the t𝑡titalic_t-th query hits an entry in (S2) then send the value of that entry. Finally, for each permutation πjsubscript𝜋𝑗\pi_{j}italic_π start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT and τjsubscript𝜏𝑗\tau_{j}italic_τ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, send the induced permutation on its entries not hit by queries 1,,11,\dots,\ell1 , … , roman_ℓ.

Decoding procedure.

We next argue that we can recover the permutations π1,,πTsubscript𝜋1subscript𝜋𝑇\pi_{1},\dots,\pi_{T}italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_π start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT and τ1,,τTsubscript𝜏1subscript𝜏𝑇\tau_{1},\dots,\tau_{T}italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_τ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT after receiving x,y,r𝑥𝑦𝑟x,y,ritalic_x , italic_y , italic_r and the above encoding.

  1. 1.

    If the leading bit of the encoding is a 0, then we immediately recover all permutations from the rest of the encoding.

  2. 2.

    If the leading bit is a 1, we start by recovering I𝐼Iitalic_I and :=maxIassign𝐼\ell:=\max Iroman_ℓ := roman_max italic_I. Then we simulate algorithm 𝒮𝒮\mathcal{S}caligraphic_S up to the \ellroman_ℓ-th query, as if we knew the permutations. In the meantime we gradually recover all entries πj(i)subscript𝜋𝑗𝑖\pi_{j}(i)italic_π start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_i ) and τj(in)subscript𝜏𝑗𝑖𝑛\tau_{j}(i-n)italic_τ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_i - italic_n ) that are hit. More precisely, for t=1,,𝑡1t=1,\dots,\ellitalic_t = 1 , … , roman_ℓ we answer the t𝑡titalic_t-th query by 𝒜𝒜\mathcal{A}caligraphic_A as follows.

    • Volume(i)Volume𝑖\operatorname{Volume}(i)roman_Volume ( italic_i ): Answer nT+T𝑛𝑇𝑇nT+Titalic_n italic_T + italic_T.

    • Sample(i)Sample𝑖\operatorname{Sample}(i)roman_Sample ( italic_i ): In the case in𝑖𝑛i\leq nitalic_i ≤ italic_n:

      • If the tape r𝑟ritalic_r decides to give a point pR𝑝𝑅p\in Ritalic_p ∈ italic_R, then answer with this point p𝑝pitalic_p.

      • Else, the tape decides to give a j[T]𝑗delimited-[]𝑇j\in[T]italic_j ∈ [ italic_T ]. Since πj(i)subscript𝜋𝑗𝑖\pi_{j}(i)italic_π start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_i ) is hit by this query in (S2), its value is readily available in the encoding. We answer (jn+πj(i),xj)𝑗𝑛subscript𝜋𝑗𝑖subscript𝑥𝑗(jn+\pi_{j}(i),x_{j})( italic_j italic_n + italic_π start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_i ) , italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ).

      In the case i>n𝑖𝑛i>nitalic_i > italic_n, do the same with xjsubscript𝑥𝑗x_{j}italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT replaced by yjsubscript𝑦𝑗y_{j}italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT and πj(i)subscript𝜋𝑗𝑖\pi_{j}(i)italic_π start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_i ) replaced by τj(in)subscript𝜏𝑗𝑖𝑛\tau_{j}(i-n)italic_τ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_i - italic_n ).

    • Contains((a,b),i)Contains𝑎𝑏𝑖\operatorname{Contains}((a,b),i)roman_Contains ( ( italic_a , italic_b ) , italic_i ): Let j=(a1)/n𝑗𝑎1𝑛j=\lfloor(a-1)/n\rflooritalic_j = ⌊ ( italic_a - 1 ) / italic_n ⌋. In the case in𝑖𝑛i\leq nitalic_i ≤ italic_n:

      • If (a,b)R𝑎𝑏𝑅(a,b)\in R( italic_a , italic_b ) ∈ italic_R then answer true.

      • Else, if j[n]𝑗delimited-[]𝑛j\not\in[n]italic_j ∉ [ italic_n ] then answer false.

      • Else, if tI𝑡𝐼t\in Iitalic_t ∈ italic_I then the current query freshly hits πj(i)subscript𝜋𝑗𝑖\pi_{j}(i)italic_π start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_i ), so it must be the case that a=jn+πj(i)𝑎𝑗𝑛subscript𝜋𝑗𝑖a=jn+\pi_{j}(i)italic_a = italic_j italic_n + italic_π start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_i ). We have thus recovered πj(i)=ajnsubscript𝜋𝑗𝑖𝑎𝑗𝑛\pi_{j}(i)=a-jnitalic_π start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_i ) = italic_a - italic_j italic_n. Then we answer true if b=xj𝑏subscript𝑥𝑗b=x_{j}italic_b = italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT; otherwise we answer false.

      • Finally, if tI𝑡𝐼t\notin Iitalic_t ∉ italic_I then πj(i)subscript𝜋𝑗𝑖\pi_{j}(i)italic_π start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_i ) was hit before, or it is not hit by the current query. In the former case we know its value, so we answer true if (a,b)=(jn+πj(i),xj)𝑎𝑏𝑗𝑛subscript𝜋𝑗𝑖subscript𝑥𝑗(a,b)=(jn+\pi_{j}(i),x_{j})( italic_a , italic_b ) = ( italic_j italic_n + italic_π start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_i ) , italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ), and false otherwise. In the latter case we know that ajn+πj(n)𝑎𝑗𝑛subscript𝜋𝑗𝑛a\neq jn+\pi_{j}(n)italic_a ≠ italic_j italic_n + italic_π start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_n ), so we simply answer false.

      In the case i>n𝑖𝑛i>nitalic_i > italic_n, do the same with xjsubscript𝑥𝑗x_{j}italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT replaced by yjsubscript𝑦𝑗y_{j}italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT and πj(i)subscript𝜋𝑗𝑖\pi_{j}(i)italic_π start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_i ) replaced by τj(in)subscript𝜏𝑗𝑖𝑛\tau_{j}(i-n)italic_τ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_i - italic_n ).

  3. 3.

    Having recovered all entries of π1,,πTsubscript𝜋1subscript𝜋𝑇\pi_{1},\dots,\pi_{T}italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_π start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT and τ1,,τTsubscript𝜏1subscript𝜏𝑇\tau_{1},\dots,\tau_{T}italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_τ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT that are hit by queries 1,,11,\dots,\ell1 , … , roman_ℓ, we finally recover the remaining entries from the rest of the encoding.

Encoding length.

We finally analyze the expected encoding length to derive a contradiction to the assumption that Pr[E2]1/20Prsubscript𝐸2120\Pr[E_{2}]\geq 1/20roman_Pr [ italic_E start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ] ≥ 1 / 20.

If E2subscript𝐸2E_{2}italic_E start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT does not happen then the encoding length is 1+2Tlg(n!)2+2Tlg(n!)12𝑇lg𝑛22𝑇lg𝑛1+\lceil 2T\lg(n!)\rceil\leq 2+2T\lg(n!)1 + ⌈ 2 italic_T roman_lg ( italic_n ! ) ⌉ ≤ 2 + 2 italic_T roman_lg ( italic_n ! ) bits. If E2subscript𝐸2E_{2}italic_E start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT happens then we can save a significant number of bits. To this end, let us focus on the queries 1,,11,\dots,\ell1 , … , roman_ℓ. Let m𝑚mitalic_m be the number of entries hit by (S2); note that m20q/n𝑚20𝑞𝑛m\leq 20q/nitalic_m ≤ 20 italic_q / italic_n under the event E2subscript𝐸2E_{2}italic_E start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. For j=1,,T𝑗1𝑇j=1,\dots,Titalic_j = 1 , … , italic_T let njsubscript𝑛𝑗n_{j}italic_n start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT be the number of entries in πjsubscript𝜋𝑗\pi_{j}italic_π start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT not hit by any query. Similarly, for j=T+1,,2T𝑗𝑇12𝑇j=T+1,\dots,2Titalic_j = italic_T + 1 , … , 2 italic_T let njsubscript𝑛𝑗n_{j}italic_n start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT be the number of entries in τjsubscript𝜏𝑗\tau_{j}italic_τ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT not hit by any query. Then the encoding length is

1+lg(qk)+mlgn+j=12Tlg(nj!)1lgbinomial𝑞𝑘𝑚lg𝑛superscriptsubscript𝑗12𝑇lgsubscript𝑛𝑗\displaystyle 1+\left\lceil\lg\binom{q}{k}\right\rceil+m\lceil\lg n\rceil+% \left\lceil\sum_{j=1}^{2T}\lg(n_{j}!)\right\rceil1 + ⌈ roman_lg ( FRACOP start_ARG italic_q end_ARG start_ARG italic_k end_ARG ) ⌉ + italic_m ⌈ roman_lg italic_n ⌉ + ⌈ ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 italic_T end_POSTSUPERSCRIPT roman_lg ( italic_n start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ! ) ⌉
\displaystyle\leq 3+m+lg(qk)+mlgn+j=12Tlg(nj!)3𝑚lgbinomial𝑞𝑘𝑚lg𝑛superscriptsubscript𝑗12𝑇lgsubscript𝑛𝑗\displaystyle 3+m+\lg\binom{q}{k}+m\lg n+\sum_{j=1}^{2T}\lg(n_{j}!)3 + italic_m + roman_lg ( FRACOP start_ARG italic_q end_ARG start_ARG italic_k end_ARG ) + italic_m roman_lg italic_n + ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 italic_T end_POSTSUPERSCRIPT roman_lg ( italic_n start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ! )
\displaystyle\leq 3+m+klg(eq/k)+mlgn+j=12Tlg(n!)j=12Tlg(n!/nj!)3𝑚𝑘lg𝑒𝑞𝑘𝑚lg𝑛superscriptsubscript𝑗12𝑇lg𝑛superscriptsubscript𝑗12𝑇lg𝑛subscript𝑛𝑗\displaystyle 3+m+k\lg(eq/k)+m\lg n+\sum_{j=1}^{2T}\lg(n!)-\sum_{j=1}^{2T}\lg(% n!/n_{j}!)3 + italic_m + italic_k roman_lg ( italic_e italic_q / italic_k ) + italic_m roman_lg italic_n + ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 italic_T end_POSTSUPERSCRIPT roman_lg ( italic_n ! ) - ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 italic_T end_POSTSUPERSCRIPT roman_lg ( italic_n ! / italic_n start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ! )
=\displaystyle== 3+m+klg(eq/k)+mlgn+2Tlg(n!)j=12Tlg(n!/nj!).3𝑚𝑘lg𝑒𝑞𝑘𝑚lg𝑛2𝑇lg𝑛superscriptsubscript𝑗12𝑇lg𝑛subscript𝑛𝑗\displaystyle 3+m+k\lg(eq/k)+m\lg n+2T\lg(n!)-\sum_{j=1}^{2T}\lg(n!/n_{j}!).3 + italic_m + italic_k roman_lg ( italic_e italic_q / italic_k ) + italic_m roman_lg italic_n + 2 italic_T roman_lg ( italic_n ! ) - ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 italic_T end_POSTSUPERSCRIPT roman_lg ( italic_n ! / italic_n start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ! ) .

By Stirling’s approximation, we have n!(n/e)n𝑛superscript𝑛𝑒𝑛n!\geq(n/e)^{n}italic_n ! ≥ ( italic_n / italic_e ) start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT. Hence, the product of the nnj𝑛subscript𝑛𝑗n-n_{j}italic_n - italic_n start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT largest terms in the factorial (namely n!/nj!𝑛subscript𝑛𝑗n!/n_{j}!italic_n ! / italic_n start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT !) is at least (n/e)nnjsuperscript𝑛𝑒𝑛subscript𝑛𝑗(n/e)^{n-n_{j}}( italic_n / italic_e ) start_POSTSUPERSCRIPT italic_n - italic_n start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUPERSCRIPT. Thus

j=12Tlg(n!/nj!)j=12T(nnj)lg(n/e)(lg(n)2)j=12T(nnj).superscriptsubscript𝑗12𝑇lg𝑛subscript𝑛𝑗superscriptsubscript𝑗12𝑇𝑛subscript𝑛𝑗lg𝑛𝑒lg𝑛2superscriptsubscript𝑗12𝑇𝑛subscript𝑛𝑗\sum_{j=1}^{2T}\lg(n!/n_{j}!)~{}\geq~{}\sum_{j=1}^{2T}(n-n_{j})\lg(n/e)~{}\geq% ~{}(\lg(n)-2)\cdot\sum_{j=1}^{2T}(n-n_{j}).∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 italic_T end_POSTSUPERSCRIPT roman_lg ( italic_n ! / italic_n start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ! ) ≥ ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 italic_T end_POSTSUPERSCRIPT ( italic_n - italic_n start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) roman_lg ( italic_n / italic_e ) ≥ ( roman_lg ( italic_n ) - 2 ) ⋅ ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 italic_T end_POSTSUPERSCRIPT ( italic_n - italic_n start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) .

Since j=12T(nnj)=m+ksuperscriptsubscript𝑗12𝑇𝑛subscript𝑛𝑗𝑚𝑘\sum_{j=1}^{2T}(n-n_{j})=m+k∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 italic_T end_POSTSUPERSCRIPT ( italic_n - italic_n start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) = italic_m + italic_k is exactly the number of entries hit by queries 1,,11,\dots,\ell1 , … , roman_ℓ, the encoding length is at most

3+m+klg(eq/k)+mlgn+2Tlg(n!)(lg(n)2)(k+m)3𝑚𝑘lg𝑒𝑞𝑘𝑚lg𝑛2𝑇lg𝑛lg𝑛2𝑘𝑚\displaystyle 3+m+k\lg(eq/k)+m\lg n+2T\lg(n!)-(\lg(n)-2)\cdot(k+m)3 + italic_m + italic_k roman_lg ( italic_e italic_q / italic_k ) + italic_m roman_lg italic_n + 2 italic_T roman_lg ( italic_n ! ) - ( roman_lg ( italic_n ) - 2 ) ⋅ ( italic_k + italic_m )
=\displaystyle== 3+3m+klg(4eq/(kn))+2Tlg(n!)33𝑚𝑘lg4𝑒𝑞𝑘𝑛2𝑇lg𝑛\displaystyle 3+3m+k\lg(4eq/(kn))+2T\lg(n!)3 + 3 italic_m + italic_k roman_lg ( 4 italic_e italic_q / ( italic_k italic_n ) ) + 2 italic_T roman_lg ( italic_n ! )
\displaystyle\leq 3+60q/n+klg(4eq/(kn))+2Tlg(n!)360𝑞𝑛𝑘lg4𝑒𝑞𝑘𝑛2𝑇lg𝑛\displaystyle 3+60q/n+k\lg(4eq/(kn))+2T\lg(n!)3 + 60 italic_q / italic_n + italic_k roman_lg ( 4 italic_e italic_q / ( italic_k italic_n ) ) + 2 italic_T roman_lg ( italic_n ! )

where we used m20q/n𝑚20𝑞𝑛m\leq 20q/nitalic_m ≤ 20 italic_q / italic_n by event E2subscript𝐸2E_{2}italic_E start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT.

Recalling our choice of k=40q/n𝑘40𝑞𝑛k=40q/nitalic_k = 40 italic_q / italic_n and the assumption that q10n𝑞10𝑛q\geq 10nitalic_q ≥ 10 italic_n, the above is at most

3+60q/n+(40q/n)lg(e/10)+2Tlg(n!)360𝑞𝑛40𝑞𝑛lg𝑒102𝑇lg𝑛\displaystyle 3+60q/n+(40q/n)\lg(e/10)+2T\lg(n!)3 + 60 italic_q / italic_n + ( 40 italic_q / italic_n ) roman_lg ( italic_e / 10 ) + 2 italic_T roman_lg ( italic_n ! )
<\displaystyle<< 3+60q/n75q/n+2Tlg(n!)360𝑞𝑛75𝑞𝑛2𝑇lg𝑛\displaystyle 3+60q/n-75q/n+2T\lg(n!)3 + 60 italic_q / italic_n - 75 italic_q / italic_n + 2 italic_T roman_lg ( italic_n ! )
\displaystyle\leq 2Tlg(n!)147.2𝑇lg𝑛147\displaystyle 2T\lg(n!)-147.2 italic_T roman_lg ( italic_n ! ) - 147 .

Therefore, the expected encoding length is no more than

(1Pr[E2])(2+2Tlg(n!))+Pr[E2](2Tlg(n!)147)1Prsubscript𝐸222𝑇lg𝑛Prsubscript𝐸22𝑇lg𝑛147\displaystyle(1-\Pr[E_{2}])\cdot(2+2T\lg(n!))+\Pr[E_{2}]\cdot(2T\lg(n!)-147)( 1 - roman_Pr [ italic_E start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ] ) ⋅ ( 2 + 2 italic_T roman_lg ( italic_n ! ) ) + roman_Pr [ italic_E start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ] ⋅ ( 2 italic_T roman_lg ( italic_n ! ) - 147 )
\displaystyle\leq 2Tlg(n!)+2147Pr[E2]2𝑇lg𝑛2147Prsubscript𝐸2\displaystyle 2T\lg(n!)+2-147\Pr[E_{2}]2 italic_T roman_lg ( italic_n ! ) + 2 - 147 roman_Pr [ italic_E start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ]
<\displaystyle<< 2Tlg(n!)5.2𝑇lg𝑛5\displaystyle 2T\lg(n!)-5.2 italic_T roman_lg ( italic_n ! ) - 5 .

where the last line follows from the assumption that Pr[E2]1/20Prsubscript𝐸2120\Pr[E_{2}]\geq 1/20roman_Pr [ italic_E start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ] ≥ 1 / 20. This contradicts with the information theoretic lower bound.

Conclusion.

We have now shown that Pr[E1]1/20Prsubscript𝐸1120\Pr[E_{1}]\leq 1/20roman_Pr [ italic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] ≤ 1 / 20 and Pr[E2]1/20Prsubscript𝐸2120\Pr[E_{2}]\leq 1/20roman_Pr [ italic_E start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ] ≤ 1 / 20. By a union bound, we have that none of the events happen, so 𝒮𝒮\mathcal{S}caligraphic_S computes a 6εT6𝜀𝑇6\varepsilon T6 italic_ε italic_T additive approximation to x,y𝑥𝑦\langle x,y\rangle⟨ italic_x , italic_y ⟩, with probability at least 4/51/102/345110234/5-1/10\geq 2/34 / 5 - 1 / 10 ≥ 2 / 3. In this case, the number of hit entries is at most 20q/n+40q/n=60q/n20𝑞𝑛40𝑞𝑛60𝑞𝑛20q/n+40q/n=60q/n20 italic_q / italic_n + 40 italic_q / italic_n = 60 italic_q / italic_n, so is the number of accesses to x,y𝑥𝑦x,yitalic_x , italic_y. If 𝒮𝒮\mathcal{S}caligraphic_S performs more than 60q/n60𝑞𝑛60q/n60 italic_q / italic_n queries, we may simply abort and return an arbitrary answer; this does not affect the probability bound.

Recall that we made the simplifying assumption q10n𝑞10𝑛q\geq 10nitalic_q ≥ 10 italic_n. If the algorithm 𝒜𝒜\mathcal{A}caligraphic_A that we began with asks less than 10n10𝑛10n10 italic_n queries, then we added dummy queries to ensure q=10n𝑞10𝑛q=10nitalic_q = 10 italic_n, and the number of accesses to x,y𝑥𝑦x,yitalic_x , italic_y becomes 60q/n=60060𝑞𝑛60060q/n=60060 italic_q / italic_n = 600. In any case, the number of accesses is O(q/n)𝑂𝑞𝑛O(q/n)italic_O ( italic_q / italic_n ). We thus have an algorithm 𝒮𝒮\mathcal{S}caligraphic_S that makes only O(q/n)𝑂𝑞𝑛O(q/n)italic_O ( italic_q / italic_n ) accesses and returns a 6εT6𝜀𝑇6\varepsilon T6 italic_ε italic_T additive approximation with probability at least 2/3232/32 / 3. We may set T=ε2/144𝑇superscript𝜀2144T=\varepsilon^{-2}/144italic_T = italic_ε start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT / 144 to obtain an additive 6εT=ε1/24=T/26𝜀𝑇superscript𝜀124𝑇26\varepsilon T=\varepsilon^{-1}/24=\sqrt{T}/26 italic_ε italic_T = italic_ε start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT / 24 = square-root start_ARG italic_T end_ARG / 2 approximation. This is enough to solve the Query-Gap-Hamming problem and hence the number of accesses must be Ω(T)=Ω(ε2)Ω𝑇Ωsuperscript𝜀2\Omega(T)=\Omega(\varepsilon^{-2})roman_Ω ( italic_T ) = roman_Ω ( italic_ε start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT ) by Lemma 8. We thus have q/n=Ω(ε2)𝑞𝑛Ωsuperscript𝜀2q/n=\Omega(\varepsilon^{-2})italic_q / italic_n = roman_Ω ( italic_ε start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT ), or q=Ω(ε2n)𝑞Ωsuperscript𝜀2𝑛q=\Omega(\varepsilon^{-2}n)italic_q = roman_Ω ( italic_ε start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT italic_n ). This proves Theorem 1, in the discrete setting with objects in 2superscript2\mathbb{Z}^{2}blackboard_Z start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT.

3.2 Continuous to Discrete

To prove a lower bound for estimating the volume of the union of n𝑛nitalic_n objects in 2superscript2\mathbb{R}^{2}blackboard_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, we give a simple reduction from estimating the cardinality of the union of n𝑛nitalic_n objects in 2superscript2\mathbb{Z}^{2}blackboard_Z start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. Let 𝒜𝒜\mathcal{A}caligraphic_A be an algorithm for estimating the volume of the union of n𝑛nitalic_n objects in 2superscript2\mathbb{R}^{2}blackboard_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT using Volume,SampleVolumeSample\operatorname{Volume},\operatorname{Sample}roman_Volume , roman_Sample and ContainsContains\operatorname{Contains}roman_Contains queries.

We use 𝒜𝒜\mathcal{A}caligraphic_A to estimate the cardinality of the union of n𝑛nitalic_n sets in 2superscript2\mathbb{Z}^{2}blackboard_Z start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT as follows. Let O1,,On2subscript𝑂1subscript𝑂𝑛superscript2O_{1},\dots,O_{n}\subset\mathbb{Z}^{2}italic_O start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_O start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⊂ blackboard_Z start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT be the objects. We think of them as objects in 2superscript2\mathbb{R}^{2}blackboard_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT by replacing each point (x,y)𝑥𝑦(x,y)( italic_x , italic_y ) in an object Oisubscript𝑂𝑖O_{i}italic_O start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT by the unit square that has (x,y)𝑥𝑦(x,y)( italic_x , italic_y ) in its lower left corner, i.e. [x,x+1)×[y,y+1)𝑥𝑥1𝑦𝑦1[x,x+1)\times[y,y+1)[ italic_x , italic_x + 1 ) × [ italic_y , italic_y + 1 ). Denote the resulting objects in 2superscript2\mathbb{R}^{2}blackboard_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT by Oisubscriptsuperscript𝑂𝑖O^{\prime}_{i}italic_O start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. (Note that applying this transformation to our objects from the previous reduction gives connected axis-aligned polygons.)

The volume of the union O1Onsubscriptsuperscript𝑂1subscriptsuperscript𝑂𝑛O^{\prime}_{1}\cup\ldots\cup O^{\prime}_{n}italic_O start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∪ … ∪ italic_O start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT is the same as the cardinality of the union O1Onsubscript𝑂1subscript𝑂𝑛O_{1}\cup\ldots\cup O_{n}italic_O start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∪ … ∪ italic_O start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT. We thus merely need to simulate 𝒜𝒜\mathcal{A}caligraphic_A as if the input was O1,,Onsubscriptsuperscript𝑂1subscriptsuperscript𝑂𝑛O^{\prime}_{1},\dots,O^{\prime}_{n}italic_O start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_O start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT. For this, on every Volume(i)Volume𝑖\operatorname{Volume}(i)roman_Volume ( italic_i ) query made by 𝒜𝒜\mathcal{A}caligraphic_A to the object Oisubscriptsuperscript𝑂𝑖O^{\prime}_{i}italic_O start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, we ask the same query to Oisubscript𝑂𝑖O_{i}italic_O start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. For a Sample(i)Sample𝑖\operatorname{Sample}(i)roman_Sample ( italic_i ) query made by 𝒜𝒜\mathcal{A}caligraphic_A, we run Sample(i)Sample𝑖\operatorname{Sample}(i)roman_Sample ( italic_i ) on Oisubscript𝑂𝑖O_{i}italic_O start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, receive an integer point (a,b)2𝑎𝑏superscript2(a,b)\in\mathbb{Z}^{2}( italic_a , italic_b ) ∈ blackboard_Z start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. We then draw rx[0,1)subscript𝑟𝑥01r_{x}\in[0,1)italic_r start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ∈ [ 0 , 1 ) and ry[0,1)subscript𝑟𝑦01r_{y}\in[0,1)italic_r start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT ∈ [ 0 , 1 ) independently and uniformly at random and feed 𝒜𝒜\mathcal{A}caligraphic_A the point (x+rx,y+ry)𝑥subscript𝑟𝑥𝑦subscript𝑟𝑦(x+r_{x},y+r_{y})( italic_x + italic_r start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT , italic_y + italic_r start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT ) as the result of the Sample(i)Sample𝑖\operatorname{Sample}(i)roman_Sample ( italic_i ) query. Finally, when 𝒜𝒜\mathcal{A}caligraphic_A asks a Contains((a,b),i)Contains𝑎𝑏𝑖\operatorname{Contains}((a,b),i)roman_Contains ( ( italic_a , italic_b ) , italic_i ) query, we simply round the coordinates down to the nearest integers to obtain a point (a,b)=(a,b)2superscript𝑎superscript𝑏𝑎𝑏superscript2(a^{\prime},b^{\prime})=(\lfloor a\rfloor,\lfloor b\rfloor)\in\mathbb{Z}^{2}( italic_a start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_b start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) = ( ⌊ italic_a ⌋ , ⌊ italic_b ⌋ ) ∈ blackboard_Z start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. When then query Contains((a,b),i)Containssuperscript𝑎superscript𝑏𝑖\operatorname{Contains}((a^{\prime},b^{\prime}),i)roman_Contains ( ( italic_a start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_b start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) , italic_i ) on Oisubscript𝑂𝑖O_{i}italic_O start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Correctness follows immediately and we conclude:

Theorem 3.

Any algorithm for computing a (1+ε)1𝜀(1+\varepsilon)( 1 + italic_ε )-approximation to the volume of the union of n𝑛nitalic_n objects in 2superscript2\mathbb{R}^{2}blackboard_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT with probability at least 4/5454/54 / 5 via Volume,SampleVolumeSample\operatorname{Volume},\operatorname{Sample}roman_Volume , roman_Sample and ContainsContains\operatorname{Contains}roman_Contains queries, must use Ω(ε2n)Ωsuperscript𝜀2𝑛\Omega(\varepsilon^{-2}n)roman_Ω ( italic_ε start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT italic_n ) queries.

References

  • [1] Pankaj K. Agarwal. An improved algorithm for computing the volume of the union of cubes. In David G. Kirkpatrick and Joseph S. B. Mitchell, editors, Proceedings of the 26th ACM Symposium on Computational Geometry, Snowbird, Utah, USA, June 13-16, 2010, pages 230–239. ACM, 2010.
  • [2] Pankaj K. Agarwal, Haim Kaplan, and Micha Sharir. Computing the volume of the union of cubes. In Jeff Erickson, editor, Proceedings of the 23rd ACM Symposium on Computational Geometry, Gyeongju, South Korea, June 6-8, 2007, pages 294–301. ACM, 2007.
  • [3] Karl Bringmann. An improved algorithm for Klee’s measure problem on fat boxes. Comput. Geom., 45(5-6):225–233, 2012.
  • [4] Karl Bringmann and Tobias Friedrich. Approximating the volume of unions and intersections of high-dimensional geometric objects. Comput. Geom., 43(6-7):601–610, 2010.
  • [5] Ran Canetti, Guy Even, and Oded Goldreich. Lower bounds for sampling algorithms for estimating the average. Inf. Process. Lett., 53(1):17–25, 1995.
  • [6] Nofar Carmeli, Shai Zeevi, Christoph Berkholz, Alessio Conte, Benny Kimelfeld, and Nicole Schweikardt. Answering (unions of) conjunctive queries using random access and random-order enumeration. ACM Trans. Database Syst., 47(3):9:1–9:49, 2022.
  • [7] Ruoxu Cen, William He, Jason Li, and Debmalya Panigrahi. Beyond the quadratic time barrier for network unreliability. CoRR, abs/2304.06552, 2023.
  • [8] Amit Chakrabarti and Oded Regev. An optimal lower bound on the communication complexity of gap-hamming-distance. In Proceedings of the forty-third annual ACM symposium on Theory of computing, pages 51–60, 2011.
  • [9] Supratik Chakraborty, Kuldeep S. Meel, and Moshe Y. Vardi. A scalable approximate model counter. In Christian Schulte, editor, Principles and Practice of Constraint Programming - 19th International Conference, CP 2013, Uppsala, Sweden, September 16-20, 2013. Proceedings, volume 8124 of Lecture Notes in Computer Science, pages 200–216. Springer, 2013.
  • [10] Timothy M. Chan. Geometric applications of a randomized optimization technique. Discret. Comput. Geom., 22(4):547–567, 1999.
  • [11] Timothy M. Chan. Semi-online maintenance of geometric optima and measures. SIAM J. Comput., 32(3):700–716, 2003.
  • [12] Timothy M. Chan. A (slightly) faster algorithm for Klee’s measure problem. Comput. Geom., 43(3):243–250, 2010.
  • [13] Timothy M. Chan. Klee’s measure problem made easy. In 54th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2013, 26-29 October, 2013, Berkeley, CA, USA, pages 410–419. IEEE Computer Society, 2013.
  • [14] Timothy M. Chan. Minimum l_\infty hausdorff distance of point sets under translation: Generalizing klee’s measure problem. In Erin W. Chambers and Joachim Gudmundsson, editors, 39th International Symposium on Computational Geometry, SoCG 2023, June 12-15, 2023, Dallas, Texas, USA, volume 258 of LIPIcs, pages 24:1–24:13. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2023.
  • [15] Nilesh N. Dalvi and Dan Suciu. Efficient query evaluation on probabilistic databases. VLDB J., 16(4):523–544, 2007.
  • [16] Mark de Berg, Otfried Cheong, Marc J. van Kreveld, and Mark H. Overmars. Computational geometry: algorithms and applications, 3rd Edition. Springer, 2008.
  • [17] David Eppstein and Jeff Erickson. Iterated nearest neighbors and finding minimal polytopes. Discret. Comput. Geom., 11:321–350, 1994.
  • [18] David R. Karger. A randomized fully polynomial time approximation scheme for the all-terminal network reliability problem. SIAM J. Comput., 29(2):492–514, 1999.
  • [19] Richard M. Karp and Michael Luby. Monte-Carlo algorithms for the planar multiterminal network reliability problem. J. Complex., 1(1):45–64, 1985.
  • [20] Richard M. Karp, Michael Luby, and Neal Madras. Monte-Carlo approximation algorithms for enumeration problems. J. Algorithms, 10(3):429–448, 1989.
  • [21] Benny Kimelfeld, Yuri Kosharovsky, and Yehoshua Sagiv. Query efficiency in probabilistic XML models. In Jason Tsong-Li Wang, editor, Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2008, Vancouver, BC, Canada, June 10-12, 2008, pages 701–714. ACM, 2008.
  • [22] Victor Klee. Can the measure of 1n[ai,bi]superscriptsubscript1𝑛subscript𝑎𝑖subscript𝑏𝑖\bigcup_{1}^{n}[a_{i},b_{i}]⋃ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT [ italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] be computed in less than O(nlogn)𝑂𝑛𝑛O(n\log n)italic_O ( italic_n roman_log italic_n ) steps? The American Mathematical Monthly, 84(4):284–285, 1977.
  • [23] Michael G. Luby. Monte-Carlo methods for estimating system reliability. Technical report, Report UCB/CSD 84/168, Computer Science Division, University of California, Berkeley, 1983.
  • [24] Kuldeep S. Meel, Sourav Chakraborty, and N. V. Vinodchandran. Estimation of the size of union of delphic sets: Achieving independence from stream size. In Leonid Libkin and Pablo Barceló, editors, PODS ’22: International Conference on Management of Data, Philadelphia, PA, USA, June 12 - 17, 2022, pages 41–52. ACM, 2022.
  • [25] Kuldeep S. Meel, Aditya A. Shrotri, and Moshe Y. Vardi. Not all FPRASs are equal: demystifying FPRASs for DNF-counting. Constraints An Int. J., 24(3-4):211–233, 2019.
  • [26] Kuldeep S. Meel, N. V. Vinodchandran, and Sourav Chakraborty. Estimating the size of union of sets in streaming models. In Leonid Libkin, Reinhard Pichler, and Paolo Guagliardo, editors, PODS’21: Proceedings of the 40th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, Virtual Event, China, June 20-25, 2021, pages 126–137. ACM, 2021.
  • [27] Mark H. Overmars and Chee-Keng Yap. New upper bounds in Klee’s measure problem. SIAM J. Comput., 20(6):1034–1045, 1991.
  • [28] Aduri Pavan, N. V. Vinodchandran, Arnab Bhattacharyya, and Kuldeep S. Meel. Model counting meets F0subscript𝐹0F_{0}italic_F start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT estimation. In Leonid Libkin, Reinhard Pichler, and Paolo Guagliardo, editors, PODS’21: Proceedings of the 40th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, Virtual Event, China, June 20-25, 2021, pages 299–311. ACM, 2021.
  • [29] Christopher Ré, Nilesh N. Dalvi, and Dan Suciu. Efficient top-k query evaluation on probabilistic data. In Rada Chirkova, Asuman Dogac, M. Tamer Özsu, and Timos K. Sellis, editors, Proceedings of the 23rd International Conference on Data Engineering, ICDE 2007, The Marmara Hotel, Istanbul, Turkey, April 15-20, 2007, pages 886–895. IEEE Computer Society, 2007.
  • [30] Qiaosheng Shi and Binay K. Bhattacharya. Application of computational geometry to network p-center location problems. In Proceedings of the 20th Annual Canadian Conference on Computational Geometry, Montréal, Canada, August 13-15, 2008, 2008.
  • [31] Srikanta Tirthapura and David P. Woodruff. Rectangle-efficient aggregation in spatial data streams. In Michael Benedikt, Markus Krötzsch, and Maurizio Lenzerini, editors, Proceedings of the 31st ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2012, Scottsdale, AZ, USA, May 20-24, 2012, pages 283–294. ACM, 2012.
  • [32] Jan van Leeuwen and Derick Wood. The measure problem for rectangular ranges in d-space. J. Algorithms, 2(3):282–300, 1981.
  • [33] Hakan Yildiz and Subhash Suri. On Klee’s measure problem for grounded boxes. In Tamal K. Dey and Sue Whitesides, editors, Proceedings of the 28th ACM Symposium on Computational Geometry, Chapel Hill, NC, USA, June 17-20, 2012, pages 111–120. ACM, 2012.