Efficient re-implementation and extension of the dga
package of James Johndrow, Kristian Lum and Patrick Ball (2015): "Performs capture-recapture estimation by averaging over decomposable graphical models. This approach builds on Madigan and York (1997)."
-
Higher performance is needed to account for linkage errors through linkage-averaging and for simulation studies.
-
Further plotting and posterior summarization functions have been added (
bayesEstimator
,posteriorMode
,posteriorQuantiles
,posteriorSummaryTable
,adjMatrix
,plotGraph
,htmlSummary
,latexSummary
).
Note: the stratification functions and Venn diagram plotting functions from the dga
package have not been reproduced in dgaFast
. They can be accessed through install.packages("dga"); library(dga)
.
Five lists example from Madigan and York (1997) as implemented in the dga
package:
library(dgaFast) # Re-implements library(dga)
# Number of lists and prior hyperparameter
p <- 5
data(graphs5) # Decomposable graphical models on 5 lists.
delta <- 0.5
Nmissing <- 1:300 # Reasonable range for the number of unobserved individuals.
# Counts corresponding to list inclusion patterns.
Y <- c(0,27,37,19,4,4,1,1,97,22,37,25,2,1,3,5,83,36,34,18,3,5,0,2,30,5,23,8,0,3,0,2)
Y <- array(Y, dim=c(2,2,2,2,2))
N <- sum(Y) + Nmissing
# Model-wise posterior probaiblities on the total population size.
# weights[i,j] is the posterior probability for j missing individuals under model graphs5[[j]].
weights <- bma.cr(Y, Nmissing, delta, graphs5)
# Plot of the posterior distribution.
plotPosteriorN(weights, N)
Table of top model estimates (see also dgaFast::latexSummary
).
htmlSummary("./figures/posteriorSummary/summaryTable", weights, N, nrows=5, graphs=graphs5)
Model | Posterior | Prob. | Bayes est. | Mode | 0.025 | 0.975 |
---|---|---|---|---|---|---|
0.217 | 627 | 624 | 598 | 663 | ||
0.160 | 615 | 614 | 591 | 645 | ||
0.082 | 613 | 610 | 586 | 648 | ||
0.065 | 610 | 608 | 585 | 640 | ||
0.052 | 616 | 614 | 591 | 647 |
On a 2013 MacBook Pro 2.6 GHz Intel Core i5, the main routine of dgaFast
is about 75 times faster than dga
.
if (!require(pacman)) install.packages("pacman")
pacman::p_load(bench, dga)
bench::mark(
dga::bma.cr(Y, Nmissing, delta, graphs5),
dgaFast::bma.cr(Y, Nmissing, delta, graphs5),
min_iterations=10, check=FALSE)
expression | min | median | itr/sec | mem_alloc | gc/sec |
---|---|---|---|---|---|
dga | 866.8ms | 919.6ms | 0.994153 | 55.32MB | 9.444453 |
dgaFast | 11.3ms | 12.5ms | 76.286860 | 2.17MB | 1.956073 |
From GitHub:
if (!require(devtools)) install.packages("devtools")
devtools::install_github("OlivierBinette/dgaFast")
- James Johndrow, Kristian Lum and Patrick Ball (2015). dga: Capture-Recapture Estimation using Bayesian Model Averaging. R package version 1.2. https://CRAN.R-project.org/package=dga
- David Madigan and Jeremy C. York (1997) Bayesian methods for estimation of the size of a closed population. Biometrika. Vol. 84, No. 1 (Mar., 1997), pp. 19-31
- Mauricio Sadinle (2018) Bayesian propagation of record linkage uncertainty into population size estimation of human rights violations. Annals of Applied Statistics Vol. 12 No. 2 pp. 1013-1038