Skip to main content

Showing 1–25 of 25 results for author: Gao, M

Searching in archive stat. Search in all archives.
.
  1. arXiv:2408.11003  [pdf, other

    stat.ME

    DEEPEAST technique to enhance power in two-sample tests via the same-attraction function

    Authors: Yiting Chen, Min Gao, Wei Lin, Andrew Jirasek, Kirsty Milligan, Xiaoping Shi

    Abstract: Data depth has emerged as an invaluable nonparametric measure for the ranking of multivariate samples. The main contribution of depth-based two-sample comparisons is the introduction of the Q statistic (Liu and Singh, 1993), a quality index. Unlike traditional methods, data depth does not require the assumption of normal distributions and adheres to four fundamental properties. Many existing two-s… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  2. arXiv:2406.00322  [pdf, other

    stat.ME stat.AP

    Adaptive Penalized Likelihood method for Markov Chains

    Authors: Yining Zhou, Ming Gao, Yiting Chen, Xiaoping Shi

    Abstract: Maximum Likelihood Estimation (MLE) and Likelihood Ratio Test (LRT) are widely used methods for estimating the transition probability matrix in Markov chains and identifying significant relationships between transitions, such as equality. However, the estimated transition probability matrix derived from MLE lacks accuracy compared to the real one, and LRT is inefficient in high-dimensional Markov… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  3. arXiv:2403.17670  [pdf, other

    stat.ME

    A family of Chatterjee's correlation coefficients and their properties

    Authors: Muhong Gao, Qizhai Li

    Abstract: Quantifying the strength of functional dependence between random scalars $X$ and $Y$ is an important statistical problem. While many existing correlation coefficients excel in identifying linear or monotone functional dependence, they fall short in capturing general non-monotone functional relationships. In response, we propose a family of correlation coefficients $ξ^{(h,F)}_n$, characterized by a… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: 27 pages, 4 figures

    MSC Class: 62H20; 62G05

  4. arXiv:2402.06380  [pdf, other

    cs.LG stat.ML

    Optimal estimation of Gaussian (poly)trees

    Authors: Yuhao Wang, Ming Gao, Wai Ming Tai, Bryon Aragam, Arnab Bhattacharyya

    Abstract: We develop optimal algorithms for learning undirected Gaussian trees and directed Gaussian polytrees from data. We consider both problems of distribution learning (i.e. in KL distance) and structure learning (i.e. exact recovery). The first approach is based on the Chow-Liu algorithm, and learns an optimal tree-structured distribution efficiently. The second approach is a modification of the PC al… ▽ More

    Submitted 9 February, 2024; originally announced February 2024.

  5. arXiv:2306.02244  [pdf, other

    math.ST stat.ME

    Optimal neighbourhood selection in structural equation models

    Authors: Ming Gao, Wai Ming Tai, Bryon Aragam

    Abstract: We study the optimal sample complexity of neighbourhood selection in linear structural equation models, and compare this to best subset selection (BSS) for linear models under general design. We show by example that -- even when the structure is \emph{unknown} -- the existence of underlying structure can reduce the sample complexity of neighbourhood selection. This result is complicated by the pos… ▽ More

    Submitted 28 November, 2023; v1 submitted 3 June, 2023; originally announced June 2023.

  6. arXiv:2304.04615  [pdf, ps, other

    cs.LG stat.ML

    A Survey on Recent Teacher-student Learning Studies

    Authors: Minghong Gao

    Abstract: Knowledge distillation is a method of transferring the knowledge from a complex deep neural network (DNN) to a smaller and faster DNN, while preserving its accuracy. Recent variants of knowledge distillation include teaching assistant distillation, curriculum distillation, mask distillation, and decoupling distillation, which aim to improve the performance of knowledge distillation by introducing… ▽ More

    Submitted 10 April, 2023; originally announced April 2023.

  7. arXiv:2303.00058  [pdf, other

    cs.LG stat.ML

    Neural Nonnegative Matrix Factorization for Hierarchical Multilayer Topic Modeling

    Authors: Tyler Will, Runyu Zhang, Eli Sadovnik, Mengdi Gao, Joshua Vendrow, Jamie Haddock, Denali Molitor, Deanna Needell

    Abstract: We introduce a new method based on nonnegative matrix factorization, Neural NMF, for detecting latent hierarchical structure in data. Datasets with hierarchical structure arise in a wide variety of fields, such as document classification, image processing, and bioinformatics. Neural NMF recursively applies NMF in layers to discover overarching topics encompassing the lower-level features. We deriv… ▽ More

    Submitted 28 February, 2023; originally announced March 2023.

  8. arXiv:2301.12649  [pdf, other

    cs.LG math.DS stat.ME

    Convergence of uncertainty estimates in Ensemble and Bayesian sparse model discovery

    Authors: L. Mars Gao, Urban Fasel, Steven L. Brunton, J. Nathan Kutz

    Abstract: Sparse model identification enables nonlinear dynamical system discovery from data. However, the control of false discoveries for sparse model identification is challenging, especially in the low-data and high-noise limit. In this paper, we perform a theoretical study on ensemble sparse model discovery, which shows empirical success in terms of accuracy and robustness to noise. In particular, we a… ▽ More

    Submitted 26 April, 2023; v1 submitted 29 January, 2023; originally announced January 2023.

    Comments: 32 pages, 7 figures

  9. arXiv:2211.10575  [pdf, other

    cs.LG cs.AI stat.ML

    Bayesian autoencoders for data-driven discovery of coordinates, governing equations and fundamental constants

    Authors: L. Mars Gao, J. Nathan Kutz

    Abstract: Recent progress in autoencoder-based sparse identification of nonlinear dynamics (SINDy) under $\ell_1$ constraints allows joint discoveries of governing equations and latent coordinate systems from spatio-temporal data, including simulated video frames. However, it is challenging for $\ell_1$-based sparse inference to perform correct identification for real data due to the noisy measurements and… ▽ More

    Submitted 18 November, 2022; originally announced November 2022.

    Comments: 28 pages, 11 figures

  10. arXiv:2208.02670  [pdf

    stat.ML cs.LG

    Development and Validation of ML-DQA -- a Machine Learning Data Quality Assurance Framework for Healthcare

    Authors: Mark Sendak, Gaurav Sirdeshmukh, Timothy Ochoa, Hayley Premo, Linda Tang, Kira Niederhoffer, Sarah Reed, Kaivalya Deshpande, Emily Sterrett, Melissa Bauer, Laurie Snyder, Afreen Shariff, David Whellan, Jeffrey Riggio, David Gaieski, Kristin Corey, Megan Richards, Michael Gao, Marshall Nichols, Bradley Heintze, William Knechtle, William Ratliff, Suresh Balu

    Abstract: The approaches by which the machine learning and clinical research communities utilize real world data (RWD), including data captured in the electronic health record (EHR), vary dramatically. While clinical researchers cautiously use RWD for clinical investigations, ML for healthcare teams consume public datasets with minimal scrutiny to develop new algorithms. This study bridges this gap by devel… ▽ More

    Submitted 4 August, 2022; originally announced August 2022.

    Comments: Presented at 2022 Machine Learning in Health Care Conference

  11. arXiv:2201.10548  [pdf, other

    math.ST cs.AI cs.LG stat.ML

    Optimal estimation of Gaussian DAG models

    Authors: Ming Gao, Wai Ming Tai, Bryon Aragam

    Abstract: We study the optimal sample complexity of learning a Gaussian directed acyclic graph (DAG) from observational data. Our main results establish the minimax optimal sample complexity for learning the structure of a linear Gaussian DAG model in two settings of interest: 1) Under equal variances without knowledge of the true ordering, and 2) For general linear models given knowledge of the ordering. I… ▽ More

    Submitted 20 March, 2022; v1 submitted 25 January, 2022; originally announced January 2022.

    Comments: 21 pages, 2 figures, to appear in AISTATS 2022

  12. arXiv:2111.15636  [pdf

    eess.SP cs.AI stat.AP

    Generating gapless land surface temperature with a high spatio-temporal resolution by fusing multi-source satellite-observed and model-simulated data

    Authors: Jun Ma, Huanfeng Shen, Penghai Wu, Jingan Wu, Meiling Gao, Chunlei Meng

    Abstract: Land surface temperature (LST) is a key parameter when monitoring land surface processes. However, cloud contamination and the tradeoff between the spatial and temporal resolutions greatly impede the access to high-quality thermal infrared (TIR) remote sensing data. Despite the massive efforts made to solve these dilemmas, it is still difficult to generate LST estimates with concurrent spatial com… ▽ More

    Submitted 28 November, 2021; originally announced November 2021.

  13. arXiv:2110.06082  [pdf, other

    math.ST cs.AI cs.LG stat.ML

    Efficient Bayesian network structure learning via local Markov boundary search

    Authors: Ming Gao, Bryon Aragam

    Abstract: We analyze the complexity of learning directed acyclic graphical models from observational data in general settings without specific distributional assumptions. Our approach is information-theoretic and uses a local Markov boundary search procedure in order to recursively construct ancestral sets in the underlying graphical model. Perhaps surprisingly, we show that for certain graph ensembles, a s… ▽ More

    Submitted 21 November, 2021; v1 submitted 12 October, 2021; originally announced October 2021.

    Comments: 31 pages, 3 figures, to appear in NeurIPS 2021

  14. arXiv:2110.04719  [pdf, other

    cs.LG cs.AI stat.ML

    Structure learning in polynomial time: Greedy algorithms, Bregman information, and exponential families

    Authors: Goutham Rajendran, Bohdan Kivva, Ming Gao, Bryon Aragam

    Abstract: Greedy algorithms have long been a workhorse for learning graphical models, and more broadly for learning statistical models with sparse structure. In the context of learning directed acyclic graphs, greedy algorithms are popular despite their worst-case exponential runtime. In practice, however, they are very efficient. We provide new insight into this phenomenon by studying a general greedy scor… ▽ More

    Submitted 28 October, 2021; v1 submitted 10 October, 2021; originally announced October 2021.

    Comments: Accepted to NeurIPS 2021; 27 pages, 9 figures

  15. arXiv:2102.01432  [pdf, other

    stat.ML cs.LG

    Bayesian data-driven discovery of partial differential equations with variable coefficients

    Authors: Aoxue Chen, Yifan Du, Liyao Mars Gao, Guang Lin

    Abstract: The discovery of Partial Differential Equations (PDEs) is an essential task for applied science and engineering. However, data-driven discovery of PDEs is generally challenging, primarily stemming from the sensitivity of the discovered equation to noise and the complexities of model selection. In this work, we propose an advanced Bayesian sparse learning algorithm for PDE discovery with variable c… ▽ More

    Submitted 26 March, 2024; v1 submitted 2 February, 2021; originally announced February 2021.

  16. arXiv:2101.00494  [pdf, ps, other

    cs.LG cs.AI stat.ML

    A Provably Efficient Algorithm for Linear Markov Decision Process with Low Switching Cost

    Authors: Minbo Gao, Tianle Xie, Simon S. Du, Lin F. Yang

    Abstract: Many real-world applications, such as those in medical domains, recommendation systems, etc, can be formulated as large state space reinforcement learning problems with only a small budget of the number of policy changes, i.e., low switching cost. This paper focuses on the linear Markov Decision Process (MDP) recently studied in [Yang et al 2019, Jin et al 2020] where the linear function approxima… ▽ More

    Submitted 2 January, 2021; originally announced January 2021.

  17. arXiv:2009.08541  [pdf, other

    stat.ML cs.LG

    Variational Disentanglement for Rare Event Modeling

    Authors: Zidi Xiu, Chenyang Tao, Michael Gao, Connor Davis, Benjamin A. Goldstein, Ricardo Henao

    Abstract: Combining the increasing availability and abundance of healthcare data and the current advances in machine learning methods have created renewed opportunities to improve clinical decision support systems. However, in healthcare risk prediction applications, the proportion of cases with the condition (label) of interest is often very low relative to the available sample size. Though very prevalent… ▽ More

    Submitted 16 June, 2021; v1 submitted 17 September, 2020; originally announced September 2020.

    Comments: Accepted to AAAI2021

  18. arXiv:2006.11970  [pdf, other

    stat.ML cs.LG math.ST

    A polynomial-time algorithm for learning nonparametric causal graphs

    Authors: Ming Gao, Yi Ding, Bryon Aragam

    Abstract: We establish finite-sample guarantees for a polynomial-time algorithm for learning a nonlinear, nonparametric directed acyclic graphical (DAG) model from data. The analysis is model-free and does not assume linearity, additivity, independent noise, or faithfulness. Instead, we impose a condition on the residual variances that is closely related to previous work on linear models with equal variance… ▽ More

    Submitted 10 November, 2020; v1 submitted 21 June, 2020; originally announced June 2020.

    Comments: To appear at NeurIPS 2020

  19. arXiv:2004.08957  [pdf, other

    eess.IV cs.LG stat.ML

    Reconstruction of high-resolution 6x6-mm OCT angiograms using deep learning

    Authors: Min Gao, Yukun Guo, Tristan T. Hormel, Jiande Sun, Thomas Hwang, Yali Jia

    Abstract: Typical optical coherence tomographic angiography (OCTA) acquisition areas on commercial devices are 3x3- or 6x6-mm. Compared to 3x3-mm angiograms with proper sampling density, 6x6-mm angiograms have significantly lower scan quality, with reduced signal-to-noise ratio and worse shadow artifacts due to undersampling. Here, we propose a deep-learning-based high-resolution angiogram reconstruction ne… ▽ More

    Submitted 9 June, 2020; v1 submitted 19 April, 2020; originally announced April 2020.

  20. arXiv:2004.07348  [pdf, other

    stat.ML cs.LG

    Learning 1-Dimensional Submanifolds for Subsequent Inference on Random Dot Product Graphs

    Authors: Michael W. Trosset, Mingyue Gao, Minh Tang, Carey E. Priebe

    Abstract: A random dot product graph (RDPG) is a generative model for networks in which vertices correspond to positions in a latent Euclidean space and edge probabilities are determined by the dot products of the latent positions. We consider RDPGs for which the latent positions are randomly sampled from an unknown $1$-dimensional submanifold of the latent space. In principle, restricted inference, i.e., p… ▽ More

    Submitted 24 December, 2021; v1 submitted 15 April, 2020; originally announced April 2020.

    Comments: 29 pages

    MSC Class: 62H99

  21. arXiv:2002.09168  [pdf, other

    cs.LG cs.CV stat.ML

    Residual Knowledge Distillation

    Authors: Mengya Gao, Yujun Shen, Quanquan Li, Chen Change Loy

    Abstract: Knowledge distillation (KD) is one of the most potent ways for model compression. The key idea is to transfer the knowledge from a deep teacher model (T) to a shallower student (S). However, existing methods suffer from performance degradation due to the substantial gap between the learning capacities of S and T. To remedy this problem, this work proposes Residual Knowledge Distillation (RKD), whi… ▽ More

    Submitted 21 February, 2020; originally announced February 2020.

    Comments: 9 pages, 3 figures, 3 tables

  22. arXiv:1911.02014  [pdf, other

    eess.IV cs.CV cs.LG stat.ML

    Scribble-based Hierarchical Weakly Supervised Learning for Brain Tumor Segmentation

    Authors: Zhanghexuan Ji, Yan Shen, Chunwei Ma, Mingchen Gao

    Abstract: The recent state-of-the-art deep learning methods have significantly improved brain tumor segmentation. However, fully supervised training requires a large amount of manually labeled masks, which is highly time-consuming and needs domain expertise. Weakly supervised learning with scribbles provides a good trade-off between model accuracy and the effort of manual labeling. However, for segmenting t… ▽ More

    Submitted 5 November, 2019; originally announced November 2019.

    Comments: 22nd International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2019) Accept

  23. arXiv:1901.09676  [pdf, other

    cs.SI cs.LG stat.ML

    Learning Vertex Representations for Bipartite Networks

    Authors: Ming Gao, Xiangnan He, Leihui Chen, Tingting Liu, Jinglin Zhang, Aoying Zhou

    Abstract: Recent years have witnessed a widespread increase of interest in network representation learning (NRL). By far most research efforts have focused on NRL for homogeneous networks like social networks where vertices are of the same type, or heterogeneous networks like knowledge graphs where vertices (and/or edges) are of different types. There has been relatively little research dedicated to NRL for… ▽ More

    Submitted 10 March, 2020; v1 submitted 15 January, 2019; originally announced January 2019.

  24. arXiv:1809.08908  [pdf, ps, other

    physics.med-ph stat.ML

    Fast, Precise Myelin Water Quantification using DESS MRI and Kernel Learning

    Authors: Gopal Nataraj, Jon-Fredrik Nielsen, Mingjie Gao, Jeffrey A. Fessler

    Abstract: Purpose: To investigate the feasibility of myelin water content quantification using fast dual-echo steady-state (DESS) scans and machine learning with kernels. Methods: We optimized combinations of steady-state (SS) scans for precisely estimating the fast-relaxing signal fraction ff of a two-compartment signal model, subject to a scan time constraint. We estimated ff from the optimized DESS acq… ▽ More

    Submitted 24 September, 2018; originally announced September 2018.

  25. arXiv:1304.5563  [pdf

    stat.AP cs.CY

    A quantitative evaluation of health care system in US, China, and Sweden

    Authors: Qixin Wang, Menghui Li, Hualong Zu, Mingyi Gao, Chenghua Cao, Li Charlie Xia

    Abstract: This study is mainly aimed at evaluating the effectiveness of current health care systems of several representative countries and improving that of the US. To achieve these goals, a people-oriented non-linear evaluation model is designed. It comprises one major evaluation metric and four minor metrics. The major metric is constituted by combining possible factors that most significantly determine… ▽ More

    Submitted 19 April, 2013; originally announced April 2013.

    Comments: 6 figures, 2 tables

    Journal ref: HealthMED 4 (2013) 1064-1074