Targeted Learning with Moderated Statistics for Biomarker Discovery
Authors: Nima Hejazi, Mark van der Laan, and Alan Hubbard
The biotmle
R package facilitates biomarker discovery through a
generalization of the moderated t-statistic (Smyth 2004) that extends
the procedure to locally efficient estimators of asymptotically linear
target parameters (Tsiatis 2007). The set of methods implemented modify
targeted maximum likelihood (TML) estimators of statistical (or causal)
target parameters (e.g., average treatment effect) to apply variance
moderation to the standard variance estimator based on the efficient
influence function (EIF) of the target parameter (van der Laan and Rose
2011, 2018). By performing a moderated hypothesis test that pools the
individual probe-specific EIF-based variance estimates, a robust
variance estimator is constructed, which stabilizes the standard error
estimates and improves the performance of such estimators both in
smaller samples and in settings where the EIF is poorly estimated. The
resultant procedure allows for the construction of conservative
hypothesis tests that reduce the false discovery rate and/or the
family-wise error rate (Hejazi, van der Laan, and Hubbard 2021).
Improvements upon prior TML-based approaches to biomarker discovery
(e.g., Bembom et al. (2009)) include both the moderated variance
estimator as well as the use of conservative reference distributions for
the corresponding moderated test statistics (e.g., logistic
distribution), inspired by tail bounds based on concentration
inequalities (Rosenblum and van der Laan 2009); the latter prove
critical for obtaining robust inference when the finite-sample
distribution of the estimator deviates from normality.
For standard use, install from
Bioconductor using
BiocManager
:
if (!requireNamespace("BiocManager", quietly=TRUE)) {
install.packages("BiocManager")
}
BiocManager::install("biotmle")
To contribute, install the bleeding-edge development ver