If you could add another (potentially better explanation), that could be beneficial for future learners (not necessarily limited to the HN community).
The piece builds a strong case for the coronavirus spreading pattern to be highly skewed. If true, this would explain several strange observations, e.g. why some families are not infected, even though one of the family members is.
And importantly, this article contains some great science education with real-world examples for key probability theory concepts.
Jim asked about our "20 queries," his incisive way of learning about an application, as a deceptively simple way to jump-start a dialogue between him (a database expert) and me (an astronomer or any scientist). Jim said, "Give me your 20 most important questions you would like to ask of your data system and I will design the system for you. " It was amazing to watch how well this simple heuristic approach, combined with Jim's imagination, worked to produce quick results.
Jim then came to Baltimore to look over our computer room and within 30 seconds declared, with a grin, we had the wrong database layout. My colleagues and I were stunned. Jim explained later that he listened to the sounds the machines were making as they operated; the disks rattled too much, telling him there was too much random disk access. We began mapping SDSS database hardware requirements, projecting that in order to achieve acceptable performance with a 1TB data set we would need a GB/sec sequential read speed from the disks, translating to about 20 servers at the time. Jim was a firm believer in using "bricks," or the cheapest, simplest building blocks money could buy. We started experimenting with low-level disk IO on our inexpensive Dell servers, and our disks were soon much quieter and performing more efficiently.
The timing is convenient though.
People were amazed.
The trick he used was based on the "burstiness" rule you describe: a long enough random sequence will likely contain a long homogeneous block. While humans tend to avoid long streaks of the same digit, as it does not feel random enough.
So, all he did was he quickly checked with a glimpse, which of the two sequences contained the longest homogeneous block, and recognized that as the one generated via the coin flips.
I think there is also a quite strong connection in this anecdote to the information-theoretic notion of entropy, which takes us all the way back to the idea of entropy as in the article :-) Information-theoretically, the entropy of a long random sequence concentrates as well (it concentrates around the entropy of the underlying random variable). The implication is that with high probability, a sampled long random sequence will have an entropy close to a specific value.
Human intuition actually is somewhat correct in the anecdote, though! The longer the homogenous substring, the less entropy the sequence has, and the less likely it is to appear (as a limiting example, the sequence of all 0s or all 1s is extremely ordered, but extremely unlikely to appear). I think where it breaks down is that there are sequences with relatively long homogenous substrings with entropy close to the specific values (in the sense that the length is e.g. log (n) - log log (n) as in the calculation before), where the human intuition of the entropy of the sequence is based on local factors (have I generated 'too many' 0s in a row?) and leads us astray.
"Plato envisioned Earth’s building blocks as cubes, a shape rarely found in nature. The solar system is littered, however, with distorted polyhedra—shards of rock and ice produced by ubiquitous fragmentation. We apply the theory of convex mosaics to show that the average geometry of natural two-dimensional (2D) fragments, from mud cracks to Earth’s tectonic plates, has two attractors: “Platonic” quadrangles and “Voronoi” hexagons. In three dimensions (3D), the Platonic attractor is dominant: Remarkably, the average shape of natural rock fragments is cuboid. When viewed through the lens of convex mosaics, natural fragments are indeed geometric shadows of Plato’s forms. Simulations show that generic binary breakup drives all mosaics toward the Platonic attractor, explaining the ubiquity of cuboid averages."