Skip to main content

Showing 1–50 of 88 results for author: De Sa, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.00421  [pdf, other

    cs.LG cs.AI

    Towards Evolutionary-based Automated Machine Learning for Small Molecule Pharmacokinetic Prediction

    Authors: Alex G. C. de Sá, David B. Ascher

    Abstract: Machine learning (ML) is revolutionising drug discovery by expediting the prediction of small molecule properties essential for developing new drugs. These properties -- including absorption, distribution, metabolism and excretion (ADME)-- are crucial in the early stages of drug development since they provide an understanding of the course of the drug in the organism, i.e., the drug's pharmacokine… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

    Comments: Paper accepted and presented at the 14th Workshop on Evolutionary Computation for the Automated Design of Algorithms (ECADA), which happened during the Genetic and Evolutionary Computation Conference (GECCO)

  2. arXiv:2407.16624  [pdf, other

    cs.CL

    Semantic Change Characterization with LLMs using Rhetorics

    Authors: Jader Martins Camboim de Sá, Marcos Da Silveira, Cédric Pruski

    Abstract: Languages continually evolve in response to societal events, resulting in new terms and shifts in meanings. These changes have significant implications for computer applications, including automatic translation and chatbots, making it essential to characterize them accurately. The recent development of LLMs has notably advanced natural language understanding, particularly in sense inference and re… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  3. arXiv:2406.11235  [pdf, other

    cs.LG

    QTIP: Quantization with Trellises and Incoherence Processing

    Authors: Albert Tseng, Qingyao Sun, David Hou, Christopher De Sa

    Abstract: Post-training quantization (PTQ) reduces the memory footprint of LLMs by quantizing weights to low-precision datatypes. Since LLM inference is usually memory-bound, PTQ methods can improve inference throughput. Recent state-of-the-art PTQ approaches have converged on using vector quantization (VQ) to quantize multiple weights at once, which improves information utilization through better shaping.… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  4. arXiv:2406.05033  [pdf, other

    cs.LG math.OC

    Gradient Descent on Logistic Regression with Non-Separable Data and Large Step Sizes

    Authors: Si Yi Meng, Antonio Orvieto, Daniel Yiming Cao, Christopher De Sa

    Abstract: We study gradient descent (GD) dynamics on logistic regression problems with large, constant step sizes. For linearly-separable data, it is known that GD converges to the minimizer with arbitrarily large step sizes, a property which no longer holds when the problem is not separable. In fact, the behaviour can be much more complex -- a sequence of period-doubling bifurcations begins at the critical… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  5. arXiv:2406.02913  [pdf, other

    cs.LG cs.AI

    Zeroth-Order Fine-Tuning of LLMs with Extreme Sparsity

    Authors: Wentao Guo, Jikai Long, Yimeng Zeng, Zirui Liu, Xinyu Yang, Yide Ran, Jacob R. Gardner, Osbert Bastani, Christopher De Sa, Xiaodong Yu, Beidi Chen, Zhaozhuo Xu

    Abstract: Zeroth-order optimization (ZO) is a memory-efficient strategy for fine-tuning Large Language Models using only forward passes. However, the application of ZO fine-tuning in memory-constrained settings such as mobile phones and laptops is still challenging since full precision forward passes are infeasible. In this study, we address this limitation by integrating sparsity and quantization into ZO f… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  6. arXiv:2406.00061  [pdf, other

    cs.LG cs.AI cs.CL

    STAT: Shrinking Transformers After Training

    Authors: Megan Flynn, Alexander Wang, Dean Edward Alvarez, Christopher De Sa, Anil Damle

    Abstract: We present STAT: a simple algorithm to prune transformer models without any fine-tuning. STAT eliminates both attention heads and neurons from the network, while preserving accuracy by calculating a correction to the weights of the next layer. Each layer block in the network is compressed using a series of principled matrix factorizations that preserve the network structure. Our entire algorithm t… ▽ More

    Submitted 29 May, 2024; originally announced June 2024.

  7. arXiv:2402.19088  [pdf, other

    cs.CL cs.AI

    Survey in Characterization of Semantic Change

    Authors: Jader Martins Camboim de Sá, Marcos Da Silveira, Cédric Pruski

    Abstract: Live languages continuously evolve to integrate the cultural change of human societies. This evolution manifests through neologisms (new words) or \textbf{semantic changes} of words (new meaning to existing words). Understanding the meaning of words is vital for interpreting texts coming from different cultures (regionalism or slang), domains (e.g., technical terms), or periods. In computer scienc… ▽ More

    Submitted 18 July, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

  8. arXiv:2402.04396  [pdf, other

    cs.LG cs.AI cs.CL

    QuIP#: Even Better LLM Quantization with Hadamard Incoherence and Lattice Codebooks

    Authors: Albert Tseng, Jerry Chee, Qingyao Sun, Volodymyr Kuleshov, Christopher De Sa

    Abstract: Post-training quantization (PTQ) reduces the memory footprint of LLMs by quantizing their weights to low-precision. In this work, we introduce QuIP#, a weight-only PTQ method that achieves state-of-the-art results in extreme compression regimes ($\le$ 4 bits per weight) using three novel techniques. First, QuIP# improves QuIP's (Chee et al., 2023) incoherence processing by using the randomized Had… ▽ More

    Submitted 4 June, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

    Comments: ICML 2024

  9. arXiv:2312.13236  [pdf, other

    cs.LG cs.CV

    Diffusion Models With Learned Adaptive Noise

    Authors: Subham Sekhar Sahoo, Aaron Gokaslan, Chris De Sa, Volodymyr Kuleshov

    Abstract: Diffusion models have gained traction as powerful algorithms for synthesizing high-quality images. Central to these algorithms is the diffusion process, a set of equations which maps data to noise in a way that can significantly affect performance. In this paper, we explore whether the diffusion process can be learned from data. Our work is grounded in Bayesian inference and seeks to improve log-l… ▽ More

    Submitted 4 June, 2024; v1 submitted 20 December, 2023; originally announced December 2023.

  10. arXiv:2311.06477  [pdf, other

    cs.CY

    Report of the 1st Workshop on Generative AI and Law

    Authors: A. Feder Cooper, Katherine Lee, James Grimmelmann, Daphne Ippolito, Christopher Callison-Burch, Christopher A. Choquette-Choo, Niloofar Mireshghallah, Miles Brundage, David Mimno, Madiha Zahrah Choksi, Jack M. Balkin, Nicholas Carlini, Christopher De Sa, Jonathan Frankle, Deep Ganguli, Bryant Gipson, Andres Guadamuz, Swee Leng Harris, Abigail Z. Jacobs, Elizabeth Joh, Gautam Kamath, Mark Lemley, Cass Matthews, Christine McLeavey, Corynne McSherry , et al. (10 additional authors not shown)

    Abstract: This report presents the takeaways of the inaugural Workshop on Generative AI and Law (GenLaw), held in July 2023. A cross-disciplinary group of practitioners and scholars from computer science and law convened to discuss the technical, doctrinal, and policy challenges presented by law for Generative AI, and by Generative AI for law, with an emphasis on U.S. law in particular. We begin the report… ▽ More

    Submitted 2 December, 2023; v1 submitted 10 November, 2023; originally announced November 2023.

  11. arXiv:2311.05580  [pdf, other

    cs.DS cs.AI cs.CC math.PR

    Inference for Probabilistic Dependency Graphs

    Authors: Oliver E. Richardson, Joseph Y. Halpern, Christopher De Sa

    Abstract: Probabilistic dependency graphs (PDGs) are a flexible class of probabilistic graphical models, subsuming Bayesian Networks and Factor Graphs. They can also capture inconsistent beliefs, and provide a way of measuring the degree of this inconsistency. We present the first tractable inference algorithm for PDGs with discrete variables, making the asymptotic complexity of PDG inference similar that o… ▽ More

    Submitted 9 November, 2023; originally announced November 2023.

    Comments: extended version of the paper with corrected reduction proof

    Journal ref: PMLR 216:1741-1751, 2023

  12. arXiv:2310.10013  [pdf, other

    stat.ML cs.LG

    Riemannian Residual Neural Networks

    Authors: Isay Katsman, Eric Ming Chen, Sidhanth Holalkere, Anna Asch, Aaron Lou, Ser-Nam Lim, Christopher De Sa

    Abstract: Recent methods in geometric deep learning have introduced various neural networks to operate over data that lie on Riemannian manifolds. Such networks are often necessary to learn well over graphs with a hierarchical structure or to learn over manifold-valued data encountered in the natural sciences. These networks are often inspired by and directly generalize standard Euclidean neural networks. H… ▽ More

    Submitted 15 October, 2023; originally announced October 2023.

    Comments: Published at NeurIPS 2023

  13. arXiv:2309.16119  [pdf, ps, other

    cs.LG cs.AI

    ModuLoRA: Finetuning 2-Bit LLMs on Consumer GPUs by Integrating with Modular Quantizers

    Authors: Junjie Yin, Jiahao Dong, Yingheng Wang, Christopher De Sa, Volodymyr Kuleshov

    Abstract: We propose a memory-efficient finetuning algorithm for large language models (LLMs) that supports finetuning LLMs with 65B parameters in 2/3/4-bit precision on as little as one 24GB GPU. Our method, modular low-rank adaptation (ModuLoRA), integrates any user-specified weight quantizer with finetuning via low-rank adapters (LoRAs). Our approach relies on a simple quantization-agnostic backward pass… ▽ More

    Submitted 9 March, 2024; v1 submitted 27 September, 2023; originally announced September 2023.

    Comments: Update since being accepted to TMLR. Updated 2Bit results

  14. arXiv:2309.13781  [pdf, other

    cs.LG cs.AI

    Explainable Machine Learning for ICU Readmission Prediction

    Authors: Alex G. C. de Sá, Daniel Gould, Anna Fedyukova, Mitchell Nicholas, Lucy Dockrell, Calvin Fletcher, David Pilcher, Daniel Capurro, David B. Ascher, Khaled El-Khawas, Douglas E. V. Pires

    Abstract: The intensive care unit (ICU) comprises a complex hospital environment, where decisions made by clinicians have a high level of risk for the patients' lives. A comprehensive care pathway must then be followed to reduce p complications. Uncertain, competing and unplanned aspects within this environment increase the difficulty in uniformly implementing the care pathway. Readmission contributes to th… ▽ More

    Submitted 26 September, 2023; v1 submitted 24 September, 2023; originally announced September 2023.

  15. arXiv:2308.01849  [pdf, other

    cs.CL cs.LG

    Curricular Transfer Learning for Sentence Encoded Tasks

    Authors: Jader Martins Camboim de Sá, Matheus Ferraroni Sanches, Rafael Roque de Souza, Júlio Cesar dos Reis, Leandro Aparecido Villas

    Abstract: Fine-tuning language models in a downstream task is the standard approach for many state-of-the-art methodologies in the field of NLP. However, when the distribution between the source task and target task drifts, \textit{e.g.}, conversational environments, these gains tend to be diminished. This article proposes a sequence of pre-training steps (a curriculum) guided by "data hacking" and grammar… ▽ More

    Submitted 3 August, 2023; originally announced August 2023.

  16. arXiv:2307.13304  [pdf, other

    cs.LG cs.CL

    QuIP: 2-Bit Quantization of Large Language Models With Guarantees

    Authors: Jerry Chee, Yaohui Cai, Volodymyr Kuleshov, Christopher De Sa

    Abstract: This work studies post-training parameter quantization in large language models (LLMs). We introduce quantization with incoherence processing (QuIP), a new method based on the insight that quantization benefits from $\textit{incoherent}$ weight and Hessian matrices, i.e., from the weights being even in magnitude and the directions in which it is important to round them accurately being unaligned w… ▽ More

    Submitted 15 January, 2024; v1 submitted 25 July, 2023; originally announced July 2023.

  17. arXiv:2306.08757  [pdf, other

    cs.LG cs.CV

    InfoDiffusion: Representation Learning Using Information Maximizing Diffusion Models

    Authors: Yingheng Wang, Yair Schiff, Aaron Gokaslan, Weishen Pan, Fei Wang, Christopher De Sa, Volodymyr Kuleshov

    Abstract: While diffusion models excel at generating high-quality samples, their latent variables typically lack semantic meaning and are not suitable for representation learning. Here, we propose InfoDiffusion, an algorithm that augments diffusion models with low-dimensional latent variables that capture high-level factors of variation in the data. InfoDiffusion relies on a learning objective regularized w… ▽ More

    Submitted 14 June, 2023; originally announced June 2023.

    Comments: ICML 2023

  18. arXiv:2306.07536  [pdf, other

    cs.LG cs.AI cs.CL

    TART: A plug-and-play Transformer module for task-agnostic reasoning

    Authors: Kush Bhatia, Avanika Narayan, Christopher De Sa, Christopher Ré

    Abstract: Large language models (LLMs) exhibit in-context learning abilities which enable the same model to perform several tasks without any task-specific training. In contrast, traditional adaptation approaches, such as fine-tuning, modify the underlying models for each specific task. In-context learning, however, consistently underperforms task-specific tuning approaches even when presented with the same… ▽ More

    Submitted 13 June, 2023; originally announced June 2023.

  19. arXiv:2306.00392  [pdf, other

    cs.LG

    Coneheads: Hierarchy Aware Attention

    Authors: Albert Tseng, Tao Yu, Toni J. B. Liu, Christopher De Sa

    Abstract: Attention networks such as transformers have achieved state-of-the-art performance in many domains. These networks rely heavily on the dot product attention operator, which computes the similarity between two points by taking their inner product. However, the inner product does not explicitly model the complex structural properties of real world datasets, such as hierarchies between data points. T… ▽ More

    Submitted 3 December, 2023; v1 submitted 1 June, 2023; originally announced June 2023.

    Comments: NeurIPS 2023

  20. arXiv:2305.15215  [pdf, other

    cs.LG

    Shadow Cones: A Generalized Framework for Partial Order Embeddings

    Authors: Tao Yu, Toni J. B. Liu, Albert Tseng, Christopher De Sa

    Abstract: Hyperbolic space has proven to be well-suited for capturing hierarchical relations in data, such as trees and directed acyclic graphs. Prior work introduced the concept of entailment cones, which uses partial orders defined by nested cones in the Poincaré ball to model hierarchies. Here, we introduce the ``shadow cones" framework, a physics-inspired entailment cone construction. Specifically, we m… ▽ More

    Submitted 8 April, 2024; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: ICLR 2024

  21. arXiv:2302.01172  [pdf, other

    cs.LG

    STEP: Learning N:M Structured Sparsity Masks from Scratch with Precondition

    Authors: Yucheng Lu, Shivani Agrawal, Suvinay Subramanian, Oleg Rybakov, Christopher De Sa, Amir Yazdanbakhsh

    Abstract: Recent innovations on hardware (e.g. Nvidia A100) have motivated learning N:M structured sparsity masks from scratch for fast model inference. However, state-of-the-art learning recipes in this regime (e.g. SR-STE) are proposed for non-adaptive optimizers like momentum SGD, while incurring non-trivial accuracy drop for Adam-trained models like attention-based LLMs. In this paper, we first demonstr… ▽ More

    Submitted 2 February, 2023; originally announced February 2023.

  22. arXiv:2302.00845  [pdf, other

    cs.LG cs.DC math.OC

    Coordinating Distributed Example Orders for Provably Accelerated Training

    Authors: A. Feder Cooper, Wentao Guo, Khiem Pham, Tiancheng Yuan, Charlie F. Ruan, Yucheng Lu, Christopher De Sa

    Abstract: Recent research on online Gradient Balancing (GraB) has revealed that there exist permutation-based example orderings for SGD that are guaranteed to outperform random reshuffling (RR). Whereas RR arbitrarily permutes training examples, GraB leverages stale gradients from prior epochs to order examples -- achieving a provably faster convergence rate than RR. However, GraB is limited by design: whil… ▽ More

    Submitted 21 December, 2023; v1 submitted 1 February, 2023; originally announced February 2023.

    Comments: NeurIPS 2023

  23. arXiv:2301.11562  [pdf, other

    cs.LG cs.AI cs.CY stat.ML

    Arbitrariness and Social Prediction: The Confounding Role of Variance in Fair Classification

    Authors: A. Feder Cooper, Katherine Lee, Madiha Zahrah Choksi, Solon Barocas, Christopher De Sa, James Grimmelmann, Jon Kleinberg, Siddhartha Sen, Baobao Zhang

    Abstract: Variance in predictions across different trained models is a significant, under-explored source of error in fair binary classification. In practice, the variance on some data examples is so large that decisions can be effectively arbitrary. To investigate this problem, we take an experimental approach and make four overarching contributions: We: 1) Define a metric called self-consistency, derived… ▽ More

    Submitted 6 March, 2024; v1 submitted 27 January, 2023; originally announced January 2023.

    Comments: AAAI '24 (received a Best Paper Honorable Mention designation)

  24. arXiv:2210.06705  [pdf, ps, other

    cs.LG cs.AI math.OC

    From Gradient Flow on Population Loss to Learning with Stochastic Gradient Descent

    Authors: Satyen Kale, Jason D. Lee, Chris De Sa, Ayush Sekhari, Karthik Sridharan

    Abstract: Stochastic Gradient Descent (SGD) has been the method of choice for learning large-scale non-convex models. While a general analysis of when SGD works has been elusive, there has been a lot of recent progress in understanding the convergence of Gradient Flow (GF) on the population loss, partly due to the simplicity that a continuous-time analysis buys us. An overarching theme of our paper is provi… ▽ More

    Submitted 12 October, 2022; originally announced October 2022.

  25. arXiv:2207.08867  [pdf, other

    cs.LG cs.MS

    MCTensor: A High-Precision Deep Learning Library with Multi-Component Floating-Point

    Authors: Tao Yu, Wentao Guo, Jianan Canal Li, Tiancheng Yuan, Christopher De Sa

    Abstract: In this paper, we introduce MCTensor, a library based on PyTorch for providing general-purpose and high-precision arithmetic for DL training. MCTensor is used in the same way as PyTorch Tensor: we implement multiple basic, matrix-level computation operators and NN modules for MCTensor with identical PyTorch interface. Our algorithms achieve high precision computation and also benefits from heavily… ▽ More

    Submitted 29 August, 2022; v1 submitted 18 July, 2022; originally announced July 2022.

    Comments: HATE2022 in ICML2022

  26. Non-Determinism and the Lawlessness of Machine Learning Code

    Authors: A. Feder Cooper, Jonathan Frankle, Christopher De Sa

    Abstract: Legal literature on machine learning (ML) tends to focus on harms, and thus tends to reason about individual model outcomes and summary error rates. This focus has masked important aspects of ML that are rooted in its reliance on randomness -- namely, stochasticity and non-determinism. While some recent work has begun to reason about the relationship between stochasticity and arbitrariness in lega… ▽ More

    Submitted 13 August, 2024; v1 submitted 23 June, 2022; originally announced June 2022.

    Comments: Proceedings of the 2022 Symposium on Computer Science and Law (CSLAW '22)

  27. arXiv:2206.09909  [pdf, other

    cs.LG stat.ML

    Low-Precision Stochastic Gradient Langevin Dynamics

    Authors: Ruqi Zhang, Andrew Gordon Wilson, Christopher De Sa

    Abstract: While low-precision optimization has been widely used to accelerate deep learning, low-precision sampling remains largely unexplored. As a consequence, sampling is simply infeasible in many large-scale scenarios, despite providing remarkable benefits to generalization and uncertainty estimation for neural networks. In this paper, we provide the first study of low-precision Stochastic Gradient Lang… ▽ More

    Submitted 20 June, 2022; originally announced June 2022.

    Comments: Published at ICML 2022

  28. arXiv:2205.10733  [pdf, other

    cs.LG

    GraB: Finding Provably Better Data Permutations than Random Reshuffling

    Authors: Yucheng Lu, Wentao Guo, Christopher De Sa

    Abstract: Random reshuffling, which randomly permutes the dataset each epoch, is widely adopted in model training because it yields faster convergence than with-replacement sampling. Recent studies indicate greedily chosen data orderings can further speed up convergence empirically, at the cost of using more computation and memory. However, greedy ordering lacks theoretical justification and has limited uti… ▽ More

    Submitted 4 January, 2023; v1 submitted 22 May, 2022; originally announced May 2022.

  29. arXiv:2203.02549  [pdf, other

    cs.CV

    Structured Pruning is All You Need for Pruning CNNs at Initialization

    Authors: Yaohui Cai, Weizhe Hua, Hongzheng Chen, G. Edward Suh, Christopher De Sa, Zhiru Zhang

    Abstract: Pruning is a popular technique for reducing the model size and computational cost of convolutional neural networks (CNNs). However, a slow retraining or fine-tuning procedure is often required to recover the accuracy loss caused by pruning. Recently, a new research direction on weight pruning, pruning-at-initialization (PAI), is proposed to directly prune CNNs before training so that fine-tuning o… ▽ More

    Submitted 31 May, 2022; v1 submitted 4 March, 2022; originally announced March 2022.

  30. arXiv:2202.06854  [pdf, other

    cs.LG

    Random Laplacian Features for Learning with Hyperbolic Space

    Authors: Tao Yu, Christopher De Sa

    Abstract: Due to its geometric properties, hyperbolic space can support high-fidelity embeddings of tree- and graph-structured data, upon which various hyperbolic networks have been developed. Existing hyperbolic networks encode geometric priors not only for the input, but also at every layer of the network. This approach involves repeatedly mapping to and from hyperbolic space, which makes these networks c… ▽ More

    Submitted 13 March, 2023; v1 submitted 14 February, 2022; originally announced February 2022.

    Comments: ICLR 2023

  31. arXiv:2202.06009  [pdf, other

    cs.LG

    Maximizing Communication Efficiency for Large-scale Training via 0/1 Adam

    Authors: Yucheng Lu, Conglong Li, Minjia Zhang, Christopher De Sa, Yuxiong He

    Abstract: 1-bit gradient compression and local steps are two representative techniques that enable drastic communication reduction in distributed SGD. Their benefits, however, remain an open question on Adam-based large model pre-training (e.g. BERT and GPT). In this paper, we demonstrate the non-linearity in Adam causes slow convergence even when 1-bit compression or local steps are individually applied. T… ▽ More

    Submitted 22 May, 2022; v1 submitted 12 February, 2022; originally announced February 2022.

  32. arXiv:2202.04805  [pdf, other

    cs.LG cs.DC cs.NE

    Understanding Hyperdimensional Computing for Parallel Single-Pass Learning

    Authors: Tao Yu, Yichi Zhang, Zhiru Zhang, Christopher De Sa

    Abstract: Hyperdimensional computing (HDC) is an emerging learning paradigm that computes with high dimensional binary vectors. It is attractive because of its energy efficiency and low latency, especially on emerging hardware -- but HDC suffers from low model accuracy, with little theoretical understanding of what limits its performance. We propose a new theoretical analysis of the limits of HDC via a cons… ▽ More

    Submitted 4 January, 2023; v1 submitted 9 February, 2022; originally announced February 2022.

  33. Tecnologica cosa: Modeling Storyteller Personalities in Boccaccio's Decameron

    Authors: A. Feder Cooper, Maria Antoniak, Christopher De Sa, Marilyn Migiel, David Mimno

    Abstract: We explore Boccaccio's Decameron to see how digital humanities tools can be used for tasks that have limited data in a language no longer in contemporary use: medieval Italian. We focus our analysis on the question: Do the different storytellers in the text exhibit distinct personalities? To answer this question, we curate and release a dataset based on the authoritative edition of the text. We us… ▽ More

    Submitted 21 September, 2021; originally announced September 2021.

    Comments: The 5th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (co-located with EMNLP 2021)

  34. arXiv:2108.00065  [pdf, other

    cs.LG

    Model Preserving Compression for Neural Networks

    Authors: Jerry Chee, Megan Renz, Anil Damle, Christopher De Sa

    Abstract: After training complex deep learning models, a common task is to compress the model to reduce compute and storage demands. When compressing, it is desirable to preserve the original model's per-example decisions (e.g., to go beyond top-1 accuracy or preserve robustness), maintain the network's structure, automatically determine per-layer compression levels, and eliminate the need for fine tuning.… ▽ More

    Submitted 14 October, 2022; v1 submitted 30 July, 2021; originally announced August 2021.

    Comments: 26 pages, 15 figures. To be published in Advances in Neural Information Processing Systems 35

    MSC Class: 68W99; 65F55

  35. arXiv:2107.08596  [pdf, other

    stat.ML cs.LG math.DG

    Equivariant Manifold Flows

    Authors: Isay Katsman, Aaron Lou, Derek Lim, Qingxuan Jiang, Ser-Nam Lim, Christopher De Sa

    Abstract: Tractably modelling distributions over manifolds has long been an important goal in the natural sciences. Recent work has focused on developing general machine learning models to learn such distributions. However, for many applications these distributions must respect manifold symmetries -- a trait which most previous models disregard. In this paper, we lay the theoretical foundations for learning… ▽ More

    Submitted 27 January, 2022; v1 submitted 18 July, 2021; originally announced July 2021.

    Comments: Published at NeurIPS 2021

  36. arXiv:2106.09686  [pdf, other

    cs.LG cs.AI

    How Low Can We Go: Trading Memory for Error in Low-Precision Training

    Authors: Chengrun Yang, Ziyang Wu, Jerry Chee, Christopher De Sa, Madeleine Udell

    Abstract: Low-precision arithmetic trains deep learning models using less energy, less memory and less time. However, we pay a price for the savings: lower precision may yield larger round-off error and hence larger prediction error. As applications proliferate, users must choose which precision to use to train a new model, and chip manufacturers must decide which precisions to manufacture. We view these pr… ▽ More

    Submitted 17 March, 2022; v1 submitted 17 June, 2021; originally announced June 2021.

    Comments: ICLR 2022

  37. arXiv:2104.00606  [pdf, other

    cs.LG cs.AI cs.CY

    Model Selection's Disparate Impact in Real-World Deep Learning Applications

    Authors: Jessica Zosa Forde, A. Feder Cooper, Kweku Kwegyir-Aggrey, Chris De Sa, Michael Littman

    Abstract: Algorithmic fairness has emphasized the role of biased data in automated decision outcomes. Recently, there has been a shift in attention to sources of bias that implicate fairness in other stages in the ML pipeline. We contend that one source of such bias, human preferences in model selection, remains under-explored in terms of its role in disparate impact across demographic groups. Using a deep… ▽ More

    Submitted 7 September, 2021; v1 submitted 1 April, 2021; originally announced April 2021.

    Comments: Science and Engineering of Deep Learning Workshop, ICLR 2021

  38. arXiv:2103.02062  [pdf, other

    cs.LG stat.ML

    Variance Reduced Training with Stratified Sampling for Forecasting Models

    Authors: Yucheng Lu, Youngsuk Park, Lifan Chen, Yuyang Wang, Christopher De Sa, Dean Foster

    Abstract: In large-scale time series forecasting, one often encounters the situation where the temporal patterns of time series, while drifting over time, differ from one another in the same dataset. In this paper, we provably show under such heterogeneity, training a forecasting model with commonly used stochastic optimizers (e.g. SGD) potentially suffers large variance on gradient estimation, and thus inc… ▽ More

    Submitted 11 June, 2021; v1 submitted 2 March, 2021; originally announced March 2021.

  39. arXiv:2102.13565  [pdf, other

    cs.LG

    Low-Precision Reinforcement Learning: Running Soft Actor-Critic in Half Precision

    Authors: Johan Bjorck, Xiangyu Chen, Christopher De Sa, Carla P. Gomes, Kilian Q. Weinberger

    Abstract: Low-precision training has become a popular approach to reduce compute requirements, memory footprint, and energy consumption in supervised learning. In contrast, this promising approach has not yet enjoyed similarly widespread adoption within the reinforcement learning (RL) community, partly because RL agents can be notoriously hard to train even in full precision. In this paper we consider conti… ▽ More

    Submitted 3 June, 2021; v1 submitted 26 February, 2021; originally announced February 2021.

  40. arXiv:2102.03034  [pdf, other

    cs.LG cs.LO

    Hyperparameter Optimization Is Deceiving Us, and How to Stop It

    Authors: A. Feder Cooper, Yucheng Lu, Jessica Zosa Forde, Christopher De Sa

    Abstract: Recent empirical work shows that inconsistent results based on choice of hyperparameter optimization (HPO) configuration are a widespread problem in ML research. When comparing two algorithms J and K searching one subspace can yield the conclusion that J outperforms K, whereas searching another can entail the opposite. In short, the way we choose hyperparameters can deceive us. We provide a theore… ▽ More

    Submitted 25 October, 2021; v1 submitted 5 February, 2021; originally announced February 2021.

    Comments: To appear, NeurIPS 2021

    Journal ref: Advances in Neural Information Processing Systems 34 pre-proceedings (NeurIPS 2021)

  41. arXiv:2010.06192  [pdf, other

    cs.LG stat.ML

    Revisiting BFloat16 Training

    Authors: Pedram Zamirai, Jian Zhang, Christopher R. Aberger, Christopher De Sa

    Abstract: State-of-the-art generic low-precision training algorithms use a mix of 16-bit and 32-bit precision, creating the folklore that 16-bit hardware compute units alone are not enough to maximize model accuracy. As a result, deep learning accelerators are forced to support both 16-bit and 32-bit floating-point units (FPUs), which is more costly than only using 16-bit FPUs for hardware design. We ask: c… ▽ More

    Submitted 7 March, 2021; v1 submitted 13 October, 2020; originally announced October 2020.

  42. arXiv:2009.07430  [pdf, other

    cs.LG cs.NE stat.ML

    An Extensive Experimental Evaluation of Automated Machine Learning Methods for Recommending Classification Algorithms (Extended Version)

    Authors: Márcio P. Basgalupp, Rodrigo C. Barros, Alex G. C. de Sá, Gisele L. Pappa, Rafael G. Mantovani, André C. P. L. F. de Carvalho, Alex A. Freitas

    Abstract: This paper presents an experimental comparison among four Automated Machine Learning (AutoML) methods for recommending the best classification algorithm for a given input dataset. Three of these methods are based on Evolutionary Algorithms (EAs), and the other is Auto-WEKA, a well-known AutoML method based on the Combined Algorithm Selection and Hyper-parameter optimisation (CASH) approach. The EA… ▽ More

    Submitted 15 September, 2020; originally announced September 2020.

    Comments: Accepted at Evolutionary Intelligence

  43. arXiv:2007.02912  [pdf, other

    cs.LG stat.ML

    Meta-Learning Divergences of Variational Inference

    Authors: Ruqi Zhang, Yingzhen Li, Christopher De Sa, Sam Devlin, Cheng Zhang

    Abstract: Variational inference (VI) plays an essential role in approximate Bayesian inference due to its computational efficiency and broad applicability. Crucial to the performance of VI is the selection of the associated divergence measure, as VI approximates the intractable distribution by minimizing this divergence. In this paper we propose a meta-learning algorithm to learn the divergence metric suite… ▽ More

    Submitted 22 June, 2021; v1 submitted 6 July, 2020; originally announced July 2020.

    Comments: Published at AISTATS 2021

  44. Accuracy-Efficiency Trade-Offs and Accountability in Distributed ML Systems

    Authors: A. Feder Cooper, Karen Levy, Christopher De Sa

    Abstract: Trade-offs between accuracy and efficiency pervade law, public health, and other non-computing domains, which have developed policies to guide how to balance the two in conditions of uncertainty. While computer science also commonly studies accuracy-efficiency trade-offs, their policy implications remain poorly examined. Drawing on risk assessment practices in the US, we argue that, since examinin… ▽ More

    Submitted 2 October, 2021; v1 submitted 4 July, 2020; originally announced July 2020.

    Journal ref: Equity and Access in Algorithms, Mechanisms, and Optimization (EAAMO 2021)

  45. arXiv:2006.11677  [pdf, other

    cs.LG stat.ML

    Asymptotically Optimal Exact Minibatch Metropolis-Hastings

    Authors: Ruqi Zhang, A. Feder Cooper, Christopher De Sa

    Abstract: Metropolis-Hastings (MH) is a commonly-used MCMC algorithm, but it can be intractable on large datasets due to requiring computations over the whole dataset. In this paper, we study minibatch MH methods, which instead use subsamples to enable scaling. We observe that most existing minibatch MH methods are inexact (i.e. they may change the target distribution), and show that this inexactness can ca… ▽ More

    Submitted 22 October, 2020; v1 submitted 20 June, 2020; originally announced June 2020.

    Journal ref: Advances in Neural Information Processing Systems 33 (NeurIPS 2020)

  46. arXiv:2006.10254  [pdf, other

    stat.ML cs.LG math.DG

    Neural Manifold Ordinary Differential Equations

    Authors: Aaron Lou, Derek Lim, Isay Katsman, Leo Huang, Qingxuan Jiang, Ser-Nam Lim, Christopher De Sa

    Abstract: To better conform to data geometry, recent deep generative modelling techniques adapt Euclidean constructions to non-Euclidean spaces. In this paper, we study normalizing flows on manifolds. Previous work has developed flow models for specific cases; however, these advancements hand craft layers on a manifold-by-manifold basis, restricting generality and inducing cumbersome design constraints. We… ▽ More

    Submitted 17 June, 2020; originally announced June 2020.

    Comments: Submitted to NeurIPS 2020

  47. arXiv:2006.08085  [pdf, other

    cs.LG stat.ML

    Optimal Complexity in Decentralized Training

    Authors: Yucheng Lu, Christopher De Sa

    Abstract: Decentralization is a promising method of scaling up parallel machine learning systems. In this paper, we provide a tight lower bound on the iteration complexity for such methods in a stochastic non-convex setting. Our lower bound reveals a theoretical gap in known convergence rates of many existing decentralized training algorithms, such as D-PSGD. We prove by construction this lower bound is tig… ▽ More

    Submitted 27 January, 2022; v1 submitted 14 June, 2020; originally announced June 2020.

  48. arXiv:2006.01169  [pdf, other

    eess.SP cs.HC cs.LG

    RNNs on Monitoring Physical Activity Energy Expenditure in Older People

    Authors: Stylianos Paraschiakos, Cláudio Rebelo de Sá, Jeremiah Okai, Eline P. Slagboom, Marian Beekman, Arno Knobbe

    Abstract: Through the quantification of physical activity energy expenditure (PAEE), health care monitoring has the potential to stimulate vital and healthy ageing, inducing behavioural changes in older people and linking these to personal health gains. To be able to measure PAEE in a monitoring environment, methods from wearable accelerometers have been developed, however, mainly targeted towards younger p… ▽ More

    Submitted 11 January, 2022; v1 submitted 1 June, 2020; originally announced June 2020.

    Comments: For a revised, updated and published version (Jan 2022, open access) refer to the Journal of Data Mining and Knowledge Discovery, DOI https://doi.org/10.1007/s10618-021-00817-w. To make our experiments, scripts and results accessible to other researchers (open-source) we shared our scripts with the published version (Jan 2022)

    Journal ref: Data Mining Knowledge Discovery (2022)

  49. arXiv:2005.08083  [pdf, other

    cs.LG cs.AI cs.NE

    A Robust Experimental Evaluation of Automated Multi-Label Classification Methods

    Authors: Alex G. C. de Sá, Cristiano G. Pimenta, Gisele L. Pappa, Alex A. Freitas

    Abstract: Automated Machine Learning (AutoML) has emerged to deal with the selection and configuration of algorithms for a given learning task. With the progression of AutoML, several effective methods were introduced, especially for traditional classification and regression problems. Apart from the AutoML success, several issues remain open. One issue, in particular, is the lack of ability of AutoML method… ▽ More

    Submitted 31 July, 2020; v1 submitted 16 May, 2020; originally announced May 2020.

    Comments: GECCO'2020 paper: Submitted and accepted

  50. arXiv:2005.06706  [pdf, other

    cs.LG stat.ML

    MixML: A Unified Analysis of Weakly Consistent Parallel Learning

    Authors: Yucheng Lu, Jack Nash, Christopher De Sa

    Abstract: Parallelism is a ubiquitous method for accelerating machine learning algorithms. However, theoretical analysis of parallel learning is usually done in an algorithm- and protocol-specific setting, giving little insight about how changes in the structure of communication could affect convergence. In this paper we propose MixML, a general framework for analyzing convergence of weakly consistent paral… ▽ More

    Submitted 6 June, 2020; v1 submitted 13 May, 2020; originally announced May 2020.