Skip to main content

Showing 1–50 of 266 results for author: Zhang, T

Searching in archive stat. Search in all archives.
.
  1. arXiv:2408.02060  [pdf, other

    math.ST stat.ME stat.ML

    Winners with Confidence: Discrete Argmin Inference with an Application to Model Selection

    Authors: Tianyu Zhang, Hao Lee, Jing Lei

    Abstract: We study the problem of finding the index of the minimum value of a vector from noisy observations. This problem is relevant in population/policy comparison, discrete maximum likelihood, and model selection. We develop a test statistic that is asymptotically normal, even in high-dimensional settings and with potentially many ties in the population mean vector, by integrating concepts and tools fro… ▽ More

    Submitted 4 August, 2024; originally announced August 2024.

  2. arXiv:2407.19078  [pdf, other

    cs.LG stat.ML

    Practical Marketplace Optimization at Uber Using Causally-Informed Machine Learning

    Authors: Bobby Chen, Siyu Chen, Jason Dowlatabadi, Yu Xuan Hong, Vinayak Iyer, Uday Mantripragada, Rishabh Narang, Apoorv Pandey, Zijun Qin, Abrar Sheikh, Hongtao Sun, Jiaqi Sun, Matthew Walker, Kaichen Wei, Chen Xu, Jingnan Yang, Allen T. Zhang, Guoqing Zhang

    Abstract: Budget allocation of marketplace levers, such as incentives for drivers and promotions for riders, has long been a technical and business challenge at Uber; understanding lever budget changes' impact and estimating cost efficiency to achieve predefined budgets is crucial, with the goal of optimal allocations that maximize business value; we introduce an end-to-end machine learning and optimization… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Comments: To be published in the 2nd Workshop on Causal Inference and Machine Learning in Practice, KDD 2024, August 25 to 29, 2024, Barcelona, Spain, 10 pages

    MSC Class: 62J99

  3. arXiv:2407.17466  [pdf, other

    cs.LG math.OC stat.ML

    Traversing Pareto Optimal Policies: Provably Efficient Multi-Objective Reinforcement Learning

    Authors: Shuang Qiu, Dake Zhang, Rui Yang, Boxiang Lyu, Tong Zhang

    Abstract: This paper investigates multi-objective reinforcement learning (MORL), which focuses on learning Pareto optimal policies in the presence of multiple reward functions. Despite MORL's significant empirical success, there is still a lack of satisfactory understanding of various MORL optimization targets and efficient learning algorithms. Our work offers a systematic analysis of several optimization t… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

    Comments: Initially submitted in May 2024

  4. arXiv:2407.07631  [pdf, other

    cs.LG math.OC math.ST stat.ML

    Pessimism Meets Risk: Risk-Sensitive Offline Reinforcement Learning

    Authors: Dake Zhang, Boxiang Lyu, Shuang Qiu, Mladen Kolar, Tong Zhang

    Abstract: We study risk-sensitive reinforcement learning (RL), a crucial field due to its ability to enhance decision-making in scenarios where it is essential to manage uncertainty and minimize potential adverse outcomes. Particularly, our work focuses on applying the entropic risk measure to RL problems. While existing literature primarily investigates the online setting, there remains a large gap in unde… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: ICML 2024

  5. arXiv:2407.03558  [pdf, ps, other

    stat.ME

    Aggregated Sure Independence Screening for Variable Selection with Interaction Structures

    Authors: Tonglin Zhang

    Abstract: A new method called the aggregated sure independence screening is proposed for the computational challenges in variable selection of interactions when the number of explanatory variables is much higher than the number of observations (i.e., $p\gg n$). In this problem, the two main challenges are the strong hierarchical restriction and the number of candidates for the main effects and interactions.… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: Preprint

    MSC Class: 62J07; 62J05

  6. arXiv:2406.01380  [pdf, other

    cs.CV stat.AP

    Convolutional Unscented Kalman Filter for Multi-Object Tracking with Outliers

    Authors: Shiqi Liu, Wenhan Cao, Chang Liu, Tianyi Zhang, Shengbo Eben Li

    Abstract: Multi-object tracking (MOT) is an essential technique for navigation in autonomous driving. In tracking-by-detection systems, biases, false positives, and misses, which are referred to as outliers, are inevitable due to complex traffic scenarios. Recent tracking methods are based on filtering algorithms that overlook these outliers, leading to reduced tracking accuracy or even loss of the objects… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: 11 pages, 5 figures

  7. arXiv:2405.16780  [pdf, other

    stat.ME

    Analysis of Broken Randomized Experiments by Principal Stratification

    Authors: Qinqing Liu, Xiang Peng, Tao Zhang, Yuhao Deng

    Abstract: Although randomized controlled trials have long been regarded as the ``gold standard'' for evaluating treatment effects, there is no natural prevention from post-treatment events. For example, non-compliance makes the actual treatment different from the assigned treatment, truncation-by-death renders the outcome undefined or ill-defined, and missingness prevents the outcomes from being measured. I… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  8. arXiv:2405.16734  [pdf, other

    stat.ML cs.LG

    Faster Sampling via Stochastic Gradient Proximal Sampler

    Authors: Xunpeng Huang, Difan Zou, Yi-An Ma, Hanze Dong, Tong Zhang

    Abstract: Stochastic gradients have been widely integrated into Langevin-based methods to improve their scalability and efficiency in solving large-scale sampling problems. However, the proximal sampler, which exhibits much faster convergence than Langevin-based algorithms in the deterministic setting Lee et al. (2021), has yet to be explored in its stochastic variants. In this paper, we study the Stochasti… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

    Comments: 48 pages, 2 figures, 5 tables

  9. arXiv:2405.16387  [pdf, other

    stat.ML cs.LG

    Reverse Transition Kernel: A Flexible Framework to Accelerate Diffusion Inference

    Authors: Xunpeng Huang, Difan Zou, Hanze Dong, Yi Zhang, Yi-An Ma, Tong Zhang

    Abstract: To generate data from trained diffusion models, most inference algorithms, such as DDPM, DDIM, and other variants, rely on discretizing the reverse SDEs or their equivalent ODEs. In this paper, we view such approaches as decomposing the entire denoising diffusion process into several segments, each corresponding to a reverse transition kernel (RTK) sampling subproblem. Specifically, DDPM uses a Ga… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

    Comments: 68 pages, 2 figures

  10. arXiv:2405.07863  [pdf, other

    cs.LG cs.AI cs.CL stat.ML

    RLHF Workflow: From Reward Modeling to Online RLHF

    Authors: Hanze Dong, Wei Xiong, Bo Pang, Haoxiang Wang, Han Zhao, Yingbo Zhou, Nan Jiang, Doyen Sahoo, Caiming Xiong, Tong Zhang

    Abstract: We present the workflow of Online Iterative Reinforcement Learning from Human Feedback (RLHF) in this technical report, which is widely reported to outperform its offline counterpart by a large margin in the recent large language model (LLM) literature. However, existing open-source RLHF projects are still largely confined to the offline learning setting. In this technical report, we aim to fill i… ▽ More

    Submitted 12 June, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

  11. arXiv:2405.01010  [pdf, other

    cs.LG stat.ML

    Efficient and Adaptive Posterior Sampling Algorithms for Bandits

    Authors: Bingshan Hu, Zhiming Huang, Tianyue H. Zhang, Mathias Lécuyer, Nidhi Hegde

    Abstract: We study Thompson Sampling-based algorithms for stochastic bandits with bounded rewards. As the existing problem-dependent regret bound for Thompson Sampling with Gaussian priors [Agrawal and Goyal, 2017] is vacuous when $T \le 288 e^{64}$, we derive a more practical bound that tightens the coefficient of the leading term %from $288 e^{64}$ to $1270$. Additionally, motivated by large-scale real-wo… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  12. arXiv:2404.03578  [pdf, ps, other

    cs.LG stat.ML

    Distributionally Robust Reinforcement Learning with Interactive Data Collection: Fundamental Hardness and Near-Optimal Algorithm

    Authors: Miao Lu, Han Zhong, Tong Zhang, Jose Blanchet

    Abstract: The sim-to-real gap, which represents the disparity between training and testing environments, poses a significant challenge in reinforcement learning (RL). A promising approach to addressing this challenge is distributionally robust RL, often framed as a robust Markov decision process (RMDP). In this framework, the objective is to find a robust policy that achieves good performance under the wors… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

  13. arXiv:2403.18658  [pdf, ps, other

    math.ST stat.ML

    Theoretical Guarantees for the Subspace-Constrained Tyler's Estimator

    Authors: Gilad Lerman, Feng Yu, Teng Zhang

    Abstract: This work analyzes the subspace-constrained Tyler's estimator (STE) designed for recovering a low-dimensional subspace within a dataset that may be highly corrupted with outliers. It assumes a weak inlier-outlier model and allows the fraction of inliers to be smaller than a fraction that leads to computational hardness of the robust subspace recovery problem. It shows that in this setting, if the… ▽ More

    Submitted 12 April, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

  14. arXiv:2403.17592  [pdf, other

    cs.LG stat.ML

    On the Benefits of Over-parameterization for Out-of-Distribution Generalization

    Authors: Yifan Hao, Yong Lin, Difan Zou, Tong Zhang

    Abstract: In recent years, machine learning models have achieved success based on the independently and identically distributed assumption. However, this assumption can be easily violated in real-world applications, leading to the Out-of-Distribution (OOD) problem. Understanding how modern over-parameterized DNNs behave under non-trivial natural distributional shifts is essential, as current theoretical und… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

  15. arXiv:2403.11497  [pdf, other

    cs.CV cs.LG stat.ML

    Do CLIPs Always Generalize Better than ImageNet Models?

    Authors: Qizhou Wang, Yong Lin, Yongqiang Chen, Ludwig Schmidt, Bo Han, Tong Zhang

    Abstract: Large vision language models, such as CLIPs, have revolutionized modern machine learning. CLIPs have demonstrated great generalizability under distribution shifts, supported by an increasing body of literature. However, the evaluation datasets for CLIPs are variations primarily designed for ImageNet benchmarks, which may not fully reflect the extent to which CLIPs, e.g., pre-trained on LAION, robu… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: Qizhou Wang, Yong Lin, and Yongqiang Chen contributed equally. Project page: https://counteranimal.github.io

  16. arXiv:2403.06183  [pdf, other

    cs.LG math.OC math.ST stat.ML

    An Improved Analysis of Langevin Algorithms with Prior Diffusion for Non-Log-Concave Sampling

    Authors: Xunpeng Huang, Hanze Dong, Difan Zou, Tong Zhang

    Abstract: Understanding the dimension dependency of computational complexity in high-dimensional sampling problem is a fundamental problem, both from a practical and theoretical perspective. Compared with samplers with unbiased stationary distribution, e.g., Metropolis-adjusted Langevin algorithm (MALA), biased samplers, e.g., Underdamped Langevin Dynamics (ULD), perform better in low-accuracy cases just be… ▽ More

    Submitted 10 March, 2024; originally announced March 2024.

    Comments: 32 pages

  17. arXiv:2403.05679  [pdf, other

    stat.ME math.ST stat.AP

    Debiased Projected Two-Sample Comparisonscfor Single-Cell Expression Data

    Authors: Tianyu Zhang, Jing Lei, Kathryn Roeder

    Abstract: We study several variants of the high-dimensional mean inference problem motivated by modern single-cell genomics data. By taking advantage of low-dimensional and localized signal structures commonly seen in such data, our proposed methods not only have the usual frequentist validity but also provide useful information on the potential locations of the signal if the null hypothesis is rejected. Ou… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

  18. arXiv:2402.18571  [pdf, other

    cs.LG cs.AI cs.CL stat.ML

    Arithmetic Control of LLMs for Diverse User Preferences: Directional Preference Alignment with Multi-Objective Rewards

    Authors: Haoxiang Wang, Yong Lin, Wei Xiong, Rui Yang, Shizhe Diao, Shuang Qiu, Han Zhao, Tong Zhang

    Abstract: Fine-grained control over large language models (LLMs) remains a significant challenge, hindering their adaptability to diverse user needs. While Reinforcement Learning from Human Feedback (RLHF) shows promise in aligning LLMs, its reliance on scalar rewards often limits its ability to capture diverse user preferences in real-world applications. To address this limitation, we introduce the Directi… ▽ More

    Submitted 6 March, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

    Comments: The code and model are released at https://github.com/Haoxiang-Wang/directional-preference-alignment

  19. arXiv:2402.18149  [pdf, ps, other

    cs.LG stat.ML

    Provably Efficient Partially Observable Risk-Sensitive Reinforcement Learning with Hindsight Observation

    Authors: Tonghe Zhang, Yu Chen, Longbo Huang

    Abstract: This work pioneers regret analysis of risk-sensitive reinforcement learning in partially observable environments with hindsight observation, addressing a gap in theoretical exploration. We introduce a novel formulation that integrates hindsight observations into a Partially Observable Markov Decision Process (POMDP) framework, where the goal is to optimize accumulated reward under the entropic ris… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

    Comments: 38 pages

  20. arXiv:2402.08991  [pdf, ps, other

    stat.ML cs.LG

    Towards Robust Model-Based Reinforcement Learning Against Adversarial Corruption

    Authors: Chenlu Ye, Jiafan He, Quanquan Gu, Tong Zhang

    Abstract: This study tackles the challenges of adversarial corruption in model-based reinforcement learning (RL), where the transition dynamics can be corrupted by an adversary. Existing studies on corruption-robust RL mostly focus on the setting of model-free RL, where robust least-square regression is often employed for value function estimation. However, these techniques cannot be directly applied to mod… ▽ More

    Submitted 20 July, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

  21. arXiv:2402.07314  [pdf, other

    cs.LG stat.ML

    Online Iterative Reinforcement Learning from Human Feedback with General Preference Model

    Authors: Chenlu Ye, Wei Xiong, Yuheng Zhang, Nan Jiang, Tong Zhang

    Abstract: We study Reinforcement Learning from Human Feedback (RLHF) under a general preference oracle. In particular, we do not assume that there exists a reward function and the preference signal is drawn from the Bradley-Terry model as most of the prior works do. We consider a standard mathematical formulation, the reverse-KL regularized minimax game between two LLMs for RLHF under general preference ora… ▽ More

    Submitted 25 April, 2024; v1 submitted 11 February, 2024; originally announced February 2024.

    Comments: RLHF, Preference Learning, Alignment for LLMs

  22. Robust Sufficient Dimension Reduction via $α$-Distance Covariance

    Authors: Hsin-Hsiung Huang, Feng Yu, Teng Zhang

    Abstract: We introduce a novel sufficient dimension-reduction (SDR) method which is robust against outliers using $α$-distance covariance (dCov) in dimension-reduction problems. Under very mild conditions on the predictors, the central subspace is effectively estimated and model-free advantage without estimating link function based on the projection on the Stiefel manifold. We establish the convergence prop… ▽ More

    Submitted 4 February, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

  23. arXiv:2401.12236  [pdf, ps, other

    cs.LG cs.CR stat.ML

    The Surprising Harmfulness of Benign Overfitting for Adversarial Robustness

    Authors: Yifan Hao, Tong Zhang

    Abstract: Recent empirical and theoretical studies have established the generalization capabilities of large machine learning models that are trained to (approximately or exactly) fit noisy data. In this work, we prove a surprising result that even if the ground truth itself is robust to adversarial examples, and the benignly overfitted model is benign in terms of the ``standard'' out-of-sample risk objecti… ▽ More

    Submitted 25 January, 2024; v1 submitted 19 January, 2024; originally announced January 2024.

  24. arXiv:2401.06325  [pdf, other

    stat.ML cs.LG math.OC stat.CO

    Faster Sampling without Isoperimetry via Diffusion-based Monte Carlo

    Authors: Xunpeng Huang, Difan Zou, Hanze Dong, Yian Ma, Tong Zhang

    Abstract: To sample from a general target distribution $p_*\propto e^{-f_*}$ beyond the isoperimetric condition, Huang et al. (2023) proposed to perform sampling through reverse diffusion, giving rise to Diffusion-based Monte Carlo (DMC). Specifically, DMC follows the reverse SDE of a diffusion process that transforms the target distribution to the standard Gaussian, utilizing a non-parametric score estimat… ▽ More

    Submitted 11 January, 2024; originally announced January 2024.

    Comments: 54 pages

  25. arXiv:2312.11456  [pdf, other

    cs.LG cs.AI stat.ML

    Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-Constraint

    Authors: Wei Xiong, Hanze Dong, Chenlu Ye, Ziqi Wang, Han Zhong, Heng Ji, Nan Jiang, Tong Zhang

    Abstract: This paper studies the alignment process of generative models with Reinforcement Learning from Human Feedback (RLHF). We first identify the primary challenges of existing popular methods like offline PPO and offline DPO as lacking in strategical exploration of the environment. Then, to understand the mathematical principle of RLHF, we consider a standard mathematical formulation, the reverse-KL re… ▽ More

    Submitted 1 May, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

    Comments: 53 pages; theoretical study and algorithmic design of iterative RLHF and DPO

  26. arXiv:2311.02490  [pdf, other

    math.NA math.OC stat.ML

    Improved Convergence Rates of Windowed Anderson Acceleration for Symmetric Fixed-Point Iterations

    Authors: Casey Garner, Gilad Lerman, Teng Zhang

    Abstract: This paper studies the commonly utilized windowed Anderson acceleration (AA) algorithm for fixed-point methods, $x^{(k+1)}=q(x^{(k)})$. It provides the first proof that when the operator $q$ is linear and symmetric the windowed AA, which uses a sliding window of prior iterates, improves the root-linear convergence factor over the fixed-point iterations. When $q$ is nonlinear, yet has a symmetric J… ▽ More

    Submitted 8 March, 2024; v1 submitted 4 November, 2023; originally announced November 2023.

    Comments: 32 pages, 14 figures

    MSC Class: 65F10; 65H10; 68W40

  27. arXiv:2310.19861  [pdf, ps, other

    cs.LG cs.GT stat.ML

    Posterior Sampling for Competitive RL: Function Approximation and Partial Observation

    Authors: Shuang Qiu, Ziyu Dai, Han Zhong, Zhaoran Wang, Zhuoran Yang, Tong Zhang

    Abstract: This paper investigates posterior sampling algorithms for competitive reinforcement learning (RL) in the context of general function approximations. Focusing on zero-sum Markov games (MGs) under two critical settings, namely self-play and adversarial learning, we first propose the self-play and adversarial generalized eluder coefficient (GEC) as complexity measures for function approximation, capt… ▽ More

    Submitted 30 October, 2023; originally announced October 2023.

    Comments: NeurIPS 2023

  28. arXiv:2310.12140  [pdf, other

    math.ST stat.ME stat.ML

    Online Estimation with Rolling Validation: Adaptive Nonparametric Estimation with Streaming Data

    Authors: Tianyu Zhang, Jing Lei

    Abstract: Online nonparametric estimators are gaining popularity due to their efficient computation and competitive generalization abilities. An important example includes variants of stochastic gradient descent. These algorithms often take one sample point at a time and instantly update the parameter estimate of interest. In this work we consider model selection and hyperparameter tuning for such online al… ▽ More

    Submitted 4 April, 2024; v1 submitted 18 October, 2023; originally announced October 2023.

  29. arXiv:2309.02476  [pdf, other

    stat.ML cs.LG

    Optimal Sample Selection Through Uncertainty Estimation and Its Application in Deep Learning

    Authors: Yong Lin, Chen Liu, Chenlu Ye, Qing Lian, Yuan Yao, Tong Zhang

    Abstract: Modern deep learning heavily relies on large labeled datasets, which often comse with high costs in terms of both manual labeling and computational resources. To mitigate these challenges, researchers have explored the use of informative subset selection techniques, including coreset selection and active learning. Specifically, coreset selection involves sampling data with both input ($\bx$) and o… ▽ More

    Submitted 5 September, 2023; originally announced September 2023.

  30. arXiv:2308.04428  [pdf, other

    stat.ML cs.LG eess.SY

    Sample-Efficient Linear Representation Learning from Non-IID Non-Isotropic Data

    Authors: Thomas T. C. K. Zhang, Leonardo F. Toso, James Anderson, Nikolai Matni

    Abstract: A powerful concept behind much of the recent progress in machine learning is the extraction of common features across data from heterogeneous sources or tasks. Intuitively, using all of one's data to learn a common representation function benefits both computational effort and statistical generalization by leaving a smaller number of parameters to fine-tune on a given task. Toward theoretically gr… ▽ More

    Submitted 27 July, 2024; v1 submitted 8 August, 2023; originally announced August 2023.

    Comments: Appeared at ICLR 2024 (spotlight presentation)

  31. arXiv:2307.05732  [pdf, ps, other

    stat.ME math.ST

    From isotonic to Lipschitz regression: a new interpolative perspective on shape-restricted estimation

    Authors: Kenta Takatsu, Tianyu Zhang, Arun Kumar Kuchibhotla

    Abstract: This manuscript seeks to bridge two seemingly disjoint paradigms of nonparametric regression estimation based on smoothness assumptions and shape constraints. The proposed approach is motivated by a conceptually simple observation: Every Lipschitz function is a sum of monotonic and linear functions. This principle is further generalized to the higher-order monotonicity and multivariate covariates.… ▽ More

    Submitted 20 June, 2024; v1 submitted 11 July, 2023; originally announced July 2023.

  32. arXiv:2307.02037  [pdf, other

    stat.ML cs.LG math.OC

    Reverse Diffusion Monte Carlo

    Authors: Xunpeng Huang, Hanze Dong, Yifan Hao, Yi-An Ma, Tong Zhang

    Abstract: We propose a Monte Carlo sampler from the reverse diffusion process. Unlike the practice of diffusion models, where the intermediary updates -- the score functions -- are learned with a neural network, we transform the score matching problem into a mean estimation one. By estimating the means of the regularized posterior distributions, we derive a novel Monte Carlo sampling algorithm called revers… ▽ More

    Submitted 13 March, 2024; v1 submitted 5 July, 2023; originally announced July 2023.

    Comments: 44 pages, 16 figures, ICLR 2024

  33. arXiv:2306.10369  [pdf, other

    math.OC eess.SY stat.ML

    Non-asymptotic System Identification for Linear Systems with Nonlinear Policies

    Authors: Yingying Li, Tianpeng Zhang, Subhro Das, Jeff Shamma, Na Li

    Abstract: This paper considers a single-trajectory system identification problem for linear systems under general nonlinear and/or time-varying policies with i.i.d. random excitation noises. The problem is motivated by safe learning-based control for constrained linear systems, where the safe policies during the learning process are usually nonlinear and time-varying for satisfying the state and input const… ▽ More

    Submitted 17 June, 2023; originally announced June 2023.

  34. arXiv:2305.09659  [pdf, ps, other

    cs.LG cs.AI math.OC stat.ML

    Double Pessimism is Provably Efficient for Distributionally Robust Offline Reinforcement Learning: Generic Algorithm and Robust Partial Coverage

    Authors: Jose Blanchet, Miao Lu, Tong Zhang, Han Zhong

    Abstract: In this paper, we study distributionally robust offline reinforcement learning (robust offline RL), which seeks to find an optimal policy purely from an offline dataset that can perform well in perturbed environments. In specific, we propose a generic algorithm framework called Doubly Pessimistic Model-based Policy Optimization ($P^2MPO$), which features a novel combination of a flexible model est… ▽ More

    Submitted 22 August, 2023; v1 submitted 16 May, 2023; originally announced May 2023.

    Comments: V2 adds results on robust offline Markov games

  35. arXiv:2305.08841  [pdf, other

    cs.LG cs.AI math.OC stat.ML

    A Theoretical Analysis of Optimistic Proximal Policy Optimization in Linear Markov Decision Processes

    Authors: Han Zhong, Tong Zhang

    Abstract: The proximal policy optimization (PPO) algorithm stands as one of the most prosperous methods in the field of reinforcement learning (RL). Despite its success, the theoretical understanding of PPO remains deficient. Specifically, it is unclear whether PPO or its optimistic variants can effectively solve linear Markov decision processes (MDPs), which are arguably the simplest models in RL with func… ▽ More

    Submitted 8 June, 2023; v1 submitted 15 May, 2023; originally announced May 2023.

  36. arXiv:2304.06767  [pdf, other

    cs.LG cs.AI cs.CL cs.CV stat.ML

    RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment

    Authors: Hanze Dong, Wei Xiong, Deepanshu Goyal, Yihan Zhang, Winnie Chow, Rui Pan, Shizhe Diao, Jipeng Zhang, Kashun Shum, Tong Zhang

    Abstract: Generative foundation models are susceptible to implicit biases that can arise from extensive unsupervised training data. Such biases can produce suboptimal samples, skewed outcomes, and unfairness, with potentially serious consequences. Consequently, aligning these models with human ethics and preferences is an essential step toward ensuring their responsible and effective deployment in real-worl… ▽ More

    Submitted 1 December, 2023; v1 submitted 13 April, 2023; originally announced April 2023.

    Comments: 29 pages, 12 figures, Published in Transactions on Machine Learning Research (TMLR)

  37. arXiv:2304.06326  [pdf, ps, other

    stat.ML cs.LG math.ST

    Understanding Overfitting in Adversarial Training via Kernel Regression

    Authors: Teng Zhang, Kang Li

    Abstract: Adversarial training and data augmentation with noise are widely adopted techniques to enhance the performance of neural networks. This paper investigates adversarial training and data augmentation with noise in the context of regularized regression in a reproducing kernel Hilbert space (RKHS). We establish the limiting formula for these techniques as the attack and noise size, as well as the regu… ▽ More

    Submitted 19 April, 2023; v1 submitted 13 April, 2023; originally announced April 2023.

  38. arXiv:2303.17482  [pdf

    cs.AI cs.LG stat.ME

    Three-way causal attribute partial order structure analysis

    Authors: Xue Zaifa, Lu Huibin, Zhang Tao, Li Tao, Lu Xin

    Abstract: As an emerging concept cognitive learning model, partial order formal structure analysis (POFSA) has been widely used in the field of knowledge processing. In this paper, we propose the method named three-way causal attribute partial order structure (3WCAPOS) to evolve the POFSA from set coverage to causal coverage in order to increase the interpretability and classification performance of the mod… ▽ More

    Submitted 28 March, 2023; originally announced March 2023.

  39. arXiv:2303.03092  [pdf, other

    math.ST cs.LG stat.ME stat.ML

    Environment Invariant Linear Least Squares

    Authors: Jianqing Fan, Cong Fang, Yihong Gu, Tong Zhang

    Abstract: This paper considers a multi-environment linear regression model in which data from multiple experimental settings are collected. The joint distribution of the response variable and covariates may vary across different environments, yet the conditional expectations of $y$ given the unknown set of important variables are invariant. Such a statistical model is related to the problem of endogeneity,… ▽ More

    Submitted 25 November, 2023; v1 submitted 6 March, 2023; originally announced March 2023.

    Comments: 62 pages,6 figures. Reorganize the main text part; Improve theoretical analysis with less technical conditions; Add numerical comparisons

    MSC Class: 62J05; 62D20

  40. arXiv:2303.00970  [pdf, other

    math.OC cs.GT cs.LG stat.ML

    PAPAL: A Provable PArticle-based Primal-Dual ALgorithm for Mixed Nash Equilibrium

    Authors: Shihong Ding, Hanze Dong, Cong Fang, Zhouchen Lin, Tong Zhang

    Abstract: We consider the non-convex non-concave objective function in two-player zero-sum continuous games. The existence of pure Nash equilibrium requires stringent conditions, posing a major challenge for this problem. To circumvent this difficulty, we examine the problem of identifying a mixed Nash equilibrium, where strategies are randomized and characterized by probability distributions over continuou… ▽ More

    Submitted 20 November, 2023; v1 submitted 2 March, 2023; originally announced March 2023.

  41. arXiv:2302.10371  [pdf, other

    cs.LG math.OC stat.ML

    Variance-Dependent Regret Bounds for Linear Bandits and Reinforcement Learning: Adaptivity and Computational Efficiency

    Authors: Heyang Zhao, Jiafan He, Dongruo Zhou, Tong Zhang, Quanquan Gu

    Abstract: Recently, several studies (Zhou et al., 2021a; Zhang et al., 2021b; Kim et al., 2021; Zhou and Gu, 2022) have provided variance-dependent regret bounds for linear contextual bandits, which interpolates the regret for the worst-case regime and the deterministic reward regime. However, these algorithms are either computationally intractable or unable to handle unknown variance of the noise. In this… ▽ More

    Submitted 20 February, 2023; originally announced February 2023.

    Comments: 43 pages, 2 tables

  42. arXiv:2301.13857  [pdf, other

    cs.LG cs.AI stat.ML

    Learning in POMDPs is Sample-Efficient with Hindsight Observability

    Authors: Jonathan N. Lee, Alekh Agarwal, Christoph Dann, Tong Zhang

    Abstract: POMDPs capture a broad class of decision making problems, but hardness results suggest that learning is intractable even in simple settings due to the inherent partial observability. However, in many realistic problems, more information is either revealed or can be computed during some point of the learning process. Motivated by diverse applications ranging from robotics to data center scheduling,… ▽ More

    Submitted 3 February, 2023; v1 submitted 31 January, 2023; originally announced January 2023.

  43. arXiv:2212.14194  [pdf, ps, other

    math.ST stat.CO stat.ME stat.ML

    Theoretical Guarantees for Sparse Principal Component Analysis based on the Elastic Net

    Authors: Teng Zhang, Haoyi Yang, Lingzhou Xue

    Abstract: Sparse principal component analysis (SPCA) is widely used for dimensionality reduction and feature extraction in high-dimensional data analysis. Despite many methodological and theoretical developments in the past two decades, the theoretical guarantees of the popular SPCA algorithm proposed by Zou, Hastie & Tibshirani (2006) are still unknown. This paper aims to address this critical gap. We firs… ▽ More

    Submitted 27 April, 2023; v1 submitted 29 December, 2022; originally announced December 2022.

    Comments: 60 pages

  44. arXiv:2212.08765  [pdf, other

    cs.LG stat.ML

    Latent Variable Representation for Reinforcement Learning

    Authors: Tongzheng Ren, Chenjun Xiao, Tianjun Zhang, Na Li, Zhaoran Wang, Sujay Sanghavi, Dale Schuurmans, Bo Dai

    Abstract: Deep latent variable models have achieved significant empirical successes in model-based reinforcement learning (RL) due to their expressiveness in modeling complex transition dynamics. On the other hand, it remains unclear theoretically and empirically how latent variable models may facilitate learning, planning, and exploration to improve the sample efficiency of RL. In this paper, we provide a… ▽ More

    Submitted 7 March, 2023; v1 submitted 16 December, 2022; originally announced December 2022.

    Comments: ICLR 2023. The first two authors contribute equally. Project Website: https://rlrep.github.io/lvrep/

  45. arXiv:2212.06069  [pdf, other

    cs.LG stat.ML

    VO$Q$L: Towards Optimal Regret in Model-free RL with Nonlinear Function Approximation

    Authors: Alekh Agarwal, Yujia Jin, Tong Zhang

    Abstract: We study time-inhomogeneous episodic reinforcement learning (RL) under general function approximation and sparse rewards. We design a new algorithm, Variance-weighted Optimistic $Q$-Learning (VO$Q$L), based on $Q$-learning and bound its regret assuming completeness and bounded Eluder dimension for the regression function class. As a special case, VO$Q$L achieves $\tilde{O}(d\sqrt{HT}+d^6H^{5})$ re… ▽ More

    Submitted 12 December, 2022; originally announced December 2022.

  46. arXiv:2212.05949  [pdf, ps, other

    stat.ML cs.LG

    Corruption-Robust Algorithms with Uncertainty Weighting for Nonlinear Contextual Bandits and Markov Decision Processes

    Authors: Chenlu Ye, Wei Xiong, Quanquan Gu, Tong Zhang

    Abstract: Despite the significant interest and progress in reinforcement learning (RL) problems with adversarial corruption, current works are either confined to the linear setting or lead to an undesired $\tilde{O}(\sqrt{T}ζ)$ regret bound, where $T$ is the number of rounds and $ζ$ is the total amount of corruption. In this paper, we consider the contextual bandit with general function approximation and pr… ▽ More

    Submitted 10 February, 2024; v1 submitted 12 December, 2022; originally announced December 2022.

    Comments: We study the corruption-robust MDPs and contextual bandits with general function approximation

    Journal ref: ICML 2023

  47. Statistics for Spatially Stratified Heterogeneous Data

    Authors: Jinfeng Wang, Robert Haining, Tonglin Zhang, Chengdong Xu, Maogui Hu

    Abstract: Spatial statistics is dominated by spatial autocorrelation (SAC) based Kriging and BHM, and spatial local heterogeneity based hotspots and geographical regression methods, appraised as the first and second laws of Geography (Tobler 1970; Goodchild 2004), respectively. Spatial stratified heterogeneity (SSH), the phenomena of a partition that within strata is more similar than between strata, exampl… ▽ More

    Submitted 30 November, 2022; originally announced November 2022.

    Journal ref: Annals of the American Association of Geographers 2024

  48. arXiv:2211.13954  [pdf, other

    stat.ML cs.LG

    Particle-based Variational Inference with Preconditioned Functional Gradient Flow

    Authors: Hanze Dong, Xi Wang, Yong Lin, Tong Zhang

    Abstract: Particle-based variational inference (VI) minimizes the KL divergence between model samples and the target posterior with gradient flow estimates. With the popularity of Stein variational gradient descent (SVGD), the focus of particle-based VI algorithms has been on the properties of functions in Reproducing Kernel Hilbert Space (RKHS) to approximate the gradient flow. However, the requirement of… ▽ More

    Submitted 18 April, 2023; v1 submitted 25 November, 2022; originally announced November 2022.

    Comments: 26 pages, 8 figures, ICLR 2023

  49. arXiv:2211.11638  [pdf, other

    cs.LG stat.ML

    Normalizing Flow with Variational Latent Representation

    Authors: Hanze Dong, Shizhe Diao, Weizhong Zhang, Tong Zhang

    Abstract: Normalizing flow (NF) has gained popularity over traditional maximum likelihood based methods due to its strong capability to model complex data distributions. However, the standard approach, which maps the observed data to a normal distribution, has difficulty in handling data distributions with multiple relatively isolated modes. To overcome this issue, we propose a new framework based on variat… ▽ More

    Submitted 21 November, 2022; originally announced November 2022.

    Comments: 24 pages, 7 figures

  50. arXiv:2211.10015  [pdf, ps, other

    stat.ML cs.LG

    Asymptotics for The $k$-means

    Authors: Tonglin Zhang

    Abstract: The $k$-means is one of the most important unsupervised learning techniques in statistics and computer science. The goal is to partition a data set into many clusters, such that observations within clusters are the most homogeneous and observations between clusters are the most heterogeneous. Although it is well known, the investigation of the asymptotic properties is far behind, leading to diffic… ▽ More

    Submitted 17 November, 2022; originally announced November 2022.

    Comments: Manuscript

    MSC Class: 62H30; 62J12