forked from david-cortes/approxcdf
-
Notifications
You must be signed in to change notification settings - Fork 0
/
pmvn.Rd
121 lines (113 loc) · 6.56 KB
/
pmvn.Rd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/approxcdf.R
\name{pmvn}
\alias{pmvn}
\title{Cumulative Distribution Function for Multivariate Normal Distribution}
\usage{
pmvn(q, Cov, mean = NULL, is_standardized = FALSE, log.p = FALSE)
}
\arguments{
\item{q}{Thresholds for the calculation (upper bound on the variables for which the CDF is to be calculated).}
\item{Cov}{Covariance matrix. If passing `is_standardized=TRUE`, will assume that it is a correlation matrix.
Being a covariance or correlation matrix, should have a non-negative determinant.}
\item{mean}{Means (expected values) of each variable. If passing `NULL`, will assume that all means are zero.
Cannot pass it when using `is_standardized=TRUE`.}
\item{is_standardized}{Whether the parameters correspond to a standardized MVN distribution - that is, the
means for all variables are zero, and `Cov` has only ones on its diagonal.}
\item{log.p}{Whether to return the logarithm of the probability instead of the probability itself.
This is helpful when dealing with very small probabilities, since they
might get rounded down to exactly zero in their raw form or not be representable
very exactly as float64, while the logarithm is likely still well defined.
Note that it will use slightly different methods here (all calculations, even
for 2D and 3D will use uni/bi-variate screening methods) and thus exponentiating
this result might not match exactly with the result obtained with `log.p=FALSE`.}
}
\value{
The (approximate) probability that each variable drawn from the MVN distribution parameterized by
`Cov` and `mean` will be lower than each corresponding threshold in `q`. If for some reason the calculation
failed, will return NaN.
}
\description{
Computes an approximation to the CDF (cumulative distribution function)
of the multivariate normal (MVN) distribution, defined as the probability that each
variable will take a value lower than a desired threshold - e.g. if there are three
variables, then \eqn{\text{CDF}(q_1, q_2, q_3, \mu, \Sigma) = P(x_1 < q_1, x_2 < q_2, x_3 < q_3)} given
that \eqn{x_1, x_2, x_3} are random variables following a MVN distribution parameterized by
\eqn{\mu} (variable means) and \eqn{\Sigma} (covariance).
This could be seen as a much faster version of the function `pmvnorm` in the `mvtnorm` package, but
less precise and supporting only the `upper` bounds argument.
}
\details{
The method used for the calculation will depend on the dimensionality (number of variables)
of the problem:
\itemize{
\item For \eqn{n \geq 5}, it will use the TVBS method (two-variate bivariate screening), which makes
iterative approximations by calculating the CDF for two variables at a time and then conditioning the
remaining variables after assuming that the earlier variables already took values below their thresholds.
Note that the implementation here differs from the author's in a few ways: (a) instead of sorting the
variables by the bounds/thresholds, it sorts them iteratively by applying univariate conditioning (this
is referred to as the "GGE order"in the references), (b) it does not truncate the thresholds (author
truncates thresholds to \eqn{-6} if they are lower than that), (c) given that it does not apply
truncation, it instead applies regularization to matrices with low determinants that are problematic to
invert, (d) while the author applies the same approximation technique until reducing the problem down to
a 1D CDF, the implementation here instead uses more exact methods for 2D and 3D CDF subproblems, adapted
from Genz's TVPACK library.
\item For \eqn{n = 4}, it will use Bhat's method, but given that there's less than 5 variables, it
differs a bit from TVBS since it uses fewer variables at a time. If the determinant of the correlation
matrix is too low, it will instead switch to Plackett's original method. Bhat's method as implemented
here again differs from the original in the same ways as TVBS, and Plackett's method as implemented
here also differs from the original in that (a) it applies regularization to correlation matrices with
low determinant, (c) it uses more Gaussian-Legendre points for evaluation (author's number was meant
for a paper-and-pen calculation).
\item For \eqn{n = 3}, it will use Plackett-Drezner's method. The implementation here differs from the
author's original in that it uses more Gauss-Legendre points.
\item For \eqn{n = 2}, it will use Drezner's method, again with more Gaussian-Legendre points.
}
Depending on the BLAS and LAPACK libraries being used by your R setup, it is recommended for better
speed to limit the number of BLAS threads to 1 if the library doesn't automatically determine when to
use multi-threading (MKL does for example, while not all OpenBLAS versions will), which can be done
through the package `RhpcBLASctl`.
}
\examples{
library(approxcdf)
### Example from Plackett's paper
b <- c(0, 0, 0, 0)
S <- matrix(
c( 1, -0.60, 0.85, 0.75,
-0.60, 1, -0.70, -0.80,
0.85, -0.70, 1, 0.65,
0.75, -0.80, 0.65, 1),
nrow=4, ncol=4, byrow=TRUE)
pmvn(b, S, is_standardized=TRUE)
### (Plackett's estimate was 0.042323)
### Compare against Genz's more exact method:
if (require(mvtnorm)) {
set.seed(123)
p_Genz <- pmvnorm(upper=b, corr=S)
abs(p_Genz - pmvn(b, S, is_standardized=TRUE))
}
### (Result should not be too far from Genz's or Plackett's,
### but should be around 50-100x faster to calculate, and this
### speed up should increase to around 1,000x for larger 'n')
}
\references{
\itemize{
\item Bhat, Chandra R.
"New matrix-based methods for the analytic evaluation of the multivariate cumulative normal distribution function."
Transportation Research Part B: Methodological 109 (2018): 238-256.
\item Plackett, Robin L. "A reduction formula for normal multivariate integrals." Biometrika 41.3/4 (1954): 351-360.
\item Gassmann, H. I.
"Multivariate normal probabilities: implementing an old idea of Plackett's."
Journal of Computational and Graphical Statistics 12.3 (2003): 731-752.
\item Drezner, Zvi, and George O. Wesolowsky.
"On the computation of the bivariate normal integral."
Journal of Statistical Computation and Simulation 35.1-2 (1990): 101-107.
\item Drezner, Zvi. "Computation of the trivariate normal integral." Mathematics of Computation 62.205 (1994): 289-294.
\item Genz, Alan.
"Numerical computation of rectangular bivariate and trivariate normal and t probabilities."
Statistics and Computing 14.3 (2004): 251-260.
\item Gibson, Garvin Jarvis, C. A. Glasbey, and D. A. Elston.
"Monte Carlo evaluation of multivariate normal integrals and sensitivity to variate ordering."
Advances in Numerical Methods and Applications (1994): 120-126.
}
}