Skip to main content

Showing 1–16 of 16 results for author: Bai, S

Searching in archive stat. Search in all archives.
.
  1. arXiv:2406.14535  [pdf, other

    stat.ME math.ST

    On estimation and order selection for multivariate extremes via clustering

    Authors: Shiyuan Deng, He Tang, Shuyang Bai

    Abstract: We investigate the estimation of multivariate extreme models with a discrete spectral measure using spherical clustering techniques. The primary contribution involves devising a method for selecting the order, that is, the number of clusters. The method consistently identifies the true order, i.e., the number of spectral atoms, and enjoys intuitive implementation in practice. Specifically, we intr… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 31 pages, 12 figures

    MSC Class: 62G32 (Primary); 60G70 (Secondary)

  2. arXiv:2405.08484  [pdf, other

    quant-ph cs.LG nlin.CD stat.ML

    Universal replication of chaotic characteristics by classical and quantum machine learning

    Authors: Sheng-Chen Bai, Shi-Ju Ran

    Abstract: Replicating chaotic characteristics of non-linear dynamics by machine learning (ML) has recently drawn wide attentions. In this work, we propose that a ML model, trained to predict the state one-step-ahead from several latest historic states, can accurately replicate the bifurcation diagram and the Lyapunov exponents of discrete dynamic systems. The characteristics for different values of the hype… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

    Comments: 8 pages, 4 figures

  3. arXiv:2305.15643  [pdf, other

    cs.LG math.OC stat.ML

    Federated Composite Saddle Point Optimization

    Authors: Site Bai, Brian Bullins

    Abstract: Federated learning (FL) approaches for saddle point problems (SPP) have recently gained in popularity due to the critical role they play in machine learning (ML). Existing works mostly target smooth unconstrained objectives in Euclidean space, whereas ML problems often involve constraints or non-smooth regularization, which results in a need for composite optimization. Addressing these issues, we… ▽ More

    Submitted 24 May, 2023; originally announced May 2023.

  4. arXiv:2303.08242  [pdf, other

    stat.ML cs.LG stat.AP

    Optimal Sampling Designs for Multi-dimensional Streaming Time Series with Application to Power Grid Sensor Data

    Authors: Rui Xie, Shuyang Bai, Ping Ma

    Abstract: The Internet of Things (IoT) system generates massive high-speed temporally correlated streaming data and is often connected with online inference tasks under computational or energy constraints. Online analysis of these streaming time series data often faces a trade-off between statistical efficiency and computational cost. One important approach to balance this trade-off is sampling, where only… ▽ More

    Submitted 14 March, 2023; originally announced March 2023.

    Comments: Accepted by The Annals of Applied Statistics

  5. arXiv:2211.09961  [pdf, other

    cs.LG stat.ML

    Path Independent Equilibrium Models Can Better Exploit Test-Time Computation

    Authors: Cem Anil, Ashwini Pokle, Kaiqu Liang, Johannes Treutlein, Yuhuai Wu, Shaojie Bai, Zico Kolter, Roger Grosse

    Abstract: Designing networks capable of attaining better performance with an increased inference budget is important to facilitate generalization to harder problem instances. Recent efforts have shown promising results in this direction by making use of depth-wise recurrent networks. We show that a broad class of architectures named equilibrium models display strong upwards generalization, and find that str… ▽ More

    Submitted 17 November, 2022; originally announced November 2022.

    Comments: NeurIPS 2022

  6. arXiv:2205.14056  [pdf, other

    cs.LG stat.ML

    Dual Convexified Convolutional Neural Networks

    Authors: Site Bai, Chuyang Ke, Jean Honorio

    Abstract: We propose the framework of dual convexified convolutional neural networks (DCCNNs). In this framework, we first introduce a primal learning problem motivated by convexified convolutional neural networks (CCNNs), and then construct the dual convex training program through careful analysis of the Karush-Kuhn-Tucker (KKT) conditions and Fenchel conjugates. Our approach reduces the computational over… ▽ More

    Submitted 7 December, 2022; v1 submitted 27 May, 2022; originally announced May 2022.

  7. arXiv:2106.14342  [pdf, other

    cs.LG stat.ML

    Stabilizing Equilibrium Models by Jacobian Regularization

    Authors: Shaojie Bai, Vladlen Koltun, J. Zico Kolter

    Abstract: Deep equilibrium networks (DEQs) are a new class of models that eschews traditional depth in favor of finding the fixed point of a single nonlinear layer. These models have been shown to achieve performance competitive with the state-of-the-art deep networks while using significantly less memory. Yet they are also slower, brittle to architectural choices, and introduce potential instability to the… ▽ More

    Submitted 27 June, 2021; originally announced June 2021.

    Comments: ICML 2021 Short Oral

  8. arXiv:2106.00553  [pdf, other

    cs.LG stat.ML

    SHINE: SHaring the INverse Estimate from the forward pass for bi-level optimization and implicit models

    Authors: Zaccharie Ramzi, Florian Mannel, Shaojie Bai, Jean-Luc Starck, Philippe Ciuciu, Thomas Moreau

    Abstract: In recent years, implicit deep learning has emerged as a method to increase the effective depth of deep neural networks. While their training is memory-efficient, they are still significantly slower to train than their explicit counterparts. In Deep Equilibrium Models (DEQs), the training is performed as a bi-level problem, and its computational complexity is partially driven by the iterative inve… ▽ More

    Submitted 10 March, 2023; v1 submitted 1 June, 2021; originally announced June 2021.

    Comments: Accepted as a spotlight to ICLR 2022

  9. arXiv:2006.08656  [pdf, other

    cs.LG cs.CV stat.ML

    Multiscale Deep Equilibrium Models

    Authors: Shaojie Bai, Vladlen Koltun, J. Zico Kolter

    Abstract: We propose a new class of implicit networks, the multiscale deep equilibrium model (MDEQ), suited to large-scale and highly hierarchical pattern recognition domains. An MDEQ directly solves for and backpropagates through the equilibrium points of multiple feature resolutions simultaneously, using implicit differentiation to avoid storing intermediate states (and thus requiring only $O(1)$ memory c… ▽ More

    Submitted 24 November, 2020; v1 submitted 15 June, 2020; originally announced June 2020.

    Comments: NeurIPS 2020 Oral

  10. arXiv:1909.01377  [pdf, other

    cs.LG stat.ML

    Deep Equilibrium Models

    Authors: Shaojie Bai, J. Zico Kolter, Vladlen Koltun

    Abstract: We present a new approach to modeling sequential data: the deep equilibrium model (DEQ). Motivated by an observation that the hidden layers of many existing deep sequence models converge towards some fixed point, we propose the DEQ approach that directly finds these equilibrium points via root-finding. Such a method is equivalent to running an infinite depth (weight-tied) feedforward network, but… ▽ More

    Submitted 28 October, 2019; v1 submitted 3 September, 2019; originally announced September 2019.

    Comments: NeurIPS 2019 Spotlight Oral

  11. arXiv:1908.11775  [pdf, ps, other

    cs.LG stat.ML

    Transformer Dissection: A Unified Understanding of Transformer's Attention via the Lens of Kernel

    Authors: Yao-Hung Hubert Tsai, Shaojie Bai, Makoto Yamada, Louis-Philippe Morency, Ruslan Salakhutdinov

    Abstract: Transformer is a powerful architecture that achieves superior performance on various sequence learning tasks, including neural machine translation, language understanding, and sequence prediction. At the core of the Transformer is the attention mechanism, which concurrently processes all inputs in the streams. In this paper, we present a new formulation of attention via the lens of the kernel. To… ▽ More

    Submitted 11 November, 2019; v1 submitted 30 August, 2019; originally announced August 2019.

    Comments: EMNLP 2019

  12. arXiv:1907.12439  [pdf, other

    cs.LG cs.AI stat.ML

    Hindsight Trust Region Policy Optimization

    Authors: Hanbo Zhang, Site Bai, Xuguang Lan, David Hsu, Nanning Zheng

    Abstract: Reinforcement Learning(RL) with sparse rewards is a major challenge. We propose \emph{Hindsight Trust Region Policy Optimization}(HTRPO), a new RL algorithm that extends the highly successful TRPO algorithm with \emph{hindsight} to tackle the challenge of sparse rewards. Hindsight refers to the algorithm's ability to learn from information across goals, including ones not intended for the current… ▽ More

    Submitted 17 May, 2021; v1 submitted 29 July, 2019; originally announced July 2019.

    Comments: Accepted by IJCAI 2021

  13. arXiv:1901.08150  [pdf, other

    cs.LG cs.CV stat.ML

    Hypergraph Convolution and Hypergraph Attention

    Authors: Song Bai, Feihu Zhang, Philip H. S. Torr

    Abstract: Recently, graph neural networks have attracted great attention and achieved prominent performance in various research fields. Most of those algorithms have assumed pairwise relationships of objects of interest. However, in many real applications, the relationships between objects are in higher-order, beyond a pairwise formulation. To efficiently learn deep embeddings on the high-order graph-struct… ▽ More

    Submitted 10 October, 2020; v1 submitted 23 January, 2019; originally announced January 2019.

    Comments: Accepted by Pattern Recognition

  14. arXiv:1812.11276  [pdf, other

    cs.LG stat.ML

    Learn to Interpret Atari Agents

    Authors: Zhao Yang, Song Bai, Li Zhang, Philip H. S. Torr

    Abstract: Deep reinforcement learning (DeepRL) agents surpass human-level performance in many tasks. However, the direct mapping from states to actions makes it hard to interpret the rationale behind the decision-making of the agents. In contrast to previous a-posteriori methods for visualizing DeepRL policies, in this work, we propose to equip the DeepRL model with an innate visualization ability. Our prop… ▽ More

    Submitted 5 April, 2023; v1 submitted 28 December, 2018; originally announced December 2018.

    Comments: An old report. Uploaded for archival purposes only

  15. arXiv:1810.06682  [pdf, other

    cs.LG cs.AI cs.CL stat.ML

    Trellis Networks for Sequence Modeling

    Authors: Shaojie Bai, J. Zico Kolter, Vladlen Koltun

    Abstract: We present trellis networks, a new architecture for sequence modeling. On the one hand, a trellis network is a temporal convolutional network with special structure, characterized by weight tying across depth and direct injection of the input into deep layers. On the other hand, we show that truncated recurrent networks are equivalent to trellis networks with special sparsity structure in their we… ▽ More

    Submitted 11 March, 2019; v1 submitted 15 October, 2018; originally announced October 2018.

    Comments: Published at ICLR 2019

  16. arXiv:1803.06978  [pdf, other

    cs.CV cs.LG stat.ML

    Improving Transferability of Adversarial Examples with Input Diversity

    Authors: Cihang Xie, Zhishuai Zhang, Yuyin Zhou, Song Bai, Jianyu Wang, Zhou Ren, Alan Yuille

    Abstract: Though CNNs have achieved the state-of-the-art performance on various vision tasks, they are vulnerable to adversarial examples --- crafted by adding human-imperceptible perturbations to clean images. However, most of the existing adversarial attacks only achieve relatively low success rates under the challenging black-box setting, where the attackers have no knowledge of the model structure and p… ▽ More

    Submitted 1 June, 2019; v1 submitted 19 March, 2018; originally announced March 2018.

    Comments: CVPR 2019, code is available at: https://github.com/cihangxie/DI-2-FGSM