Skip to main content

Showing 1–9 of 9 results for author: Eschenhagen, R

Searching in archive stat. Search in all archives.
.
  1. arXiv:2312.05705  [pdf, other

    cs.LG stat.ML

    Structured Inverse-Free Natural Gradient: Memory-Efficient & Numerically-Stable KFAC

    Authors: Wu Lin, Felix Dangel, Runa Eschenhagen, Kirill Neklyudov, Agustinus Kristiadi, Richard E. Turner, Alireza Makhzani

    Abstract: Second-order methods such as KFAC can be useful for neural net training. However, they are often memory-inefficient since their preconditioning Kronecker factors are dense, and numerically unstable in low precision as they require matrix inversion or decomposition. These limitations render such methods unpopular for modern mixed-precision training. We address them by (i) formulating an inverse-fre… ▽ More

    Submitted 23 July, 2024; v1 submitted 9 December, 2023; originally announced December 2023.

    Comments: A long version of the ICML 2024 paper, updated the text about a related work

  2. arXiv:2311.00636  [pdf, other

    cs.LG stat.ML

    Kronecker-Factored Approximate Curvature for Modern Neural Network Architectures

    Authors: Runa Eschenhagen, Alexander Immer, Richard E. Turner, Frank Schneider, Philipp Hennig

    Abstract: The core components of many modern neural network architectures, such as transformers, convolutional, or graph neural networks, can be expressed as linear layers with $\textit{weight-sharing}$. Kronecker-Factored Approximate Curvature (K-FAC), a second-order optimisation method, has shown promise to speed up neural network training and thereby reduce computational costs. However, there is currentl… ▽ More

    Submitted 11 January, 2024; v1 submitted 1 November, 2023; originally announced November 2023.

    Comments: NeurIPS 2023

  3. arXiv:2306.07179  [pdf, other

    cs.LG stat.ML

    Benchmarking Neural Network Training Algorithms

    Authors: George E. Dahl, Frank Schneider, Zachary Nado, Naman Agarwal, Chandramouli Shama Sastry, Philipp Hennig, Sourabh Medapati, Runa Eschenhagen, Priya Kasimbeg, Daniel Suo, Juhan Bae, Justin Gilmer, Abel L. Peirson, Bilal Khan, Rohan Anil, Mike Rabbat, Shankar Krishnan, Daniel Snider, Ehsan Amid, Kongtao Chen, Chris J. Maddison, Rakshith Vasudev, Michal Badura, Ankush Garg, Peter Mattson

    Abstract: Training algorithms, broadly construed, are an essential part of every deep learning pipeline. Training algorithm improvements that speed up training across a wide variety of workloads (e.g., better update rules, tuning protocols, learning rate schedules, or data selection schemes) could save time, save computational resources, and lead to better, more accurate, models. Unfortunately, as a communi… ▽ More

    Submitted 12 June, 2023; originally announced June 2023.

    Comments: 102 pages, 8 figures, 41 tables

  4. arXiv:2304.08309  [pdf, other

    cs.LG stat.ML

    Promises and Pitfalls of the Linearized Laplace in Bayesian Optimization

    Authors: Agustinus Kristiadi, Alexander Immer, Runa Eschenhagen, Vincent Fortuin

    Abstract: The linearized-Laplace approximation (LLA) has been shown to be effective and efficient in constructing Bayesian neural networks. It is theoretically compelling since it can be seen as a Gaussian process posterior with the mean function given by the neural network's maximum-a-posteriori predictive function and the covariance function induced by the empirical neural tangent kernel. However, while i… ▽ More

    Submitted 10 July, 2023; v1 submitted 17 April, 2023; originally announced April 2023.

    Comments: AABI 2023

  5. arXiv:2205.10041  [pdf, other

    cs.LG stat.ML

    Posterior Refinement Improves Sample Efficiency in Bayesian Neural Networks

    Authors: Agustinus Kristiadi, Runa Eschenhagen, Philipp Hennig

    Abstract: Monte Carlo (MC) integration is the de facto method for approximating the predictive distribution of Bayesian neural networks (BNNs). But, even with many MC samples, Gaussian-based BNNs could still yield bad predictive performance due to the posterior approximation's error. Meanwhile, alternatives to MC integration tend to be more expensive or biased. In this work, we experimentally show that the… ▽ More

    Submitted 15 October, 2022; v1 submitted 20 May, 2022; originally announced May 2022.

    Comments: NeurIPS 2022

  6. arXiv:2111.03577  [pdf, other

    cs.LG stat.ML

    Mixtures of Laplace Approximations for Improved Post-Hoc Uncertainty in Deep Learning

    Authors: Runa Eschenhagen, Erik Daxberger, Philipp Hennig, Agustinus Kristiadi

    Abstract: Deep neural networks are prone to overconfident predictions on outliers. Bayesian neural networks and deep ensembles have both been shown to mitigate this problem to some extent. In this work, we aim to combine the benefits of the two approaches by proposing to predict with a Gaussian mixture model posterior that consists of a weighted sum of Laplace approximations of independently trained deep ne… ▽ More

    Submitted 5 November, 2021; originally announced November 2021.

    Comments: Bayesian Deep Learning Workshop, NeurIPS 2021

  7. arXiv:2106.14806  [pdf, other

    cs.LG stat.ML

    Laplace Redux -- Effortless Bayesian Deep Learning

    Authors: Erik Daxberger, Agustinus Kristiadi, Alexander Immer, Runa Eschenhagen, Matthias Bauer, Philipp Hennig

    Abstract: Bayesian formulations of deep learning have been shown to have compelling theoretical properties and offer practical functional benefits, such as improved predictive uncertainty quantification and model selection. The Laplace approximation (LA) is a classic, and arguably the simplest family of approximations for the intractable posteriors of deep neural networks. Yet, despite its simplicity, the L… ▽ More

    Submitted 14 March, 2022; v1 submitted 28 June, 2021; originally announced June 2021.

    Comments: NeurIPS 2021 camera-ready version; source code: https://github.com/AlexImmer/Laplace

  8. arXiv:2004.14070  [pdf, other

    stat.ML cs.LG

    Continual Deep Learning by Functional Regularisation of Memorable Past

    Authors: Pingbo Pan, Siddharth Swaroop, Alexander Immer, Runa Eschenhagen, Richard E. Turner, Mohammad Emtiyaz Khan

    Abstract: Continually learning new skills is important for intelligent systems, yet standard deep learning methods suffer from catastrophic forgetting of the past. Recent works address this with weight regularisation. Functional regularisation, although computationally expensive, is expected to perform better, but rarely does so in practice. In this paper, we fix this issue by using a new functional-regular… ▽ More

    Submitted 8 January, 2021; v1 submitted 29 April, 2020; originally announced April 2020.

  9. arXiv:1906.02506  [pdf, other

    stat.ML cs.LG

    Practical Deep Learning with Bayesian Principles

    Authors: Kazuki Osawa, Siddharth Swaroop, Anirudh Jain, Runa Eschenhagen, Richard E. Turner, Rio Yokota, Mohammad Emtiyaz Khan

    Abstract: Bayesian methods promise to fix many shortcomings of deep learning, but they are impractical and rarely match the performance of standard methods, let alone improve them. In this paper, we demonstrate practical training of deep networks with natural-gradient variational inference. By applying techniques such as batch normalisation, data augmentation, and distributed training, we achieve similar pe… ▽ More

    Submitted 29 October, 2019; v1 submitted 6 June, 2019; originally announced June 2019.

    Comments: NeurIPS 2019