Skip to content

Learning with reparametrized gradients for native pytorch modules

Notifications You must be signed in to change notification settings

tkusmierczyk/reparameterized_pytorch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

55 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

reparameterized pytorch

Learning with reparameterized gradients for native pytorch modules.

Introduction

Mathematical formulations of learning with samples from reparameterized distributions separate posteriors $q$ from structures of likelihoods (networks) $f$. For example, for ELBO $\mathcal{L} = E_q \left( \log p(y|f(x|w)) + \log p(w) - \log q(w|\lambda) \right) \approx \frac{1}{S} \sum_{w \sim q(w|\lambda)} \left( \log p(f(y,x|w)) + \log p(w) - \log q(w|\lambda) \right)$, $f$ takes parameters (weights) $w$ as an argument but is not tied anyhow to the sampling distribution $q$. At the same time, all the available pytorch libraries (for example, bayesian torch) work by replacing pytorch native layers with custom layers. As a consequence, it is impossible to sample jointly for multiple layers or pass additional information to the sampling code.

We achieve full separation of sampling procedures from network structures by implementing custom procedure for loading state dictionary (pytorch's default load_state_dict loses gradients of samples) for an arbitrary network's parameters.

Installation

The library can be installed using: pip install git+https://github.com/tkusmierczyk/reparameterized_pytorch.git#egg=reparameterized

Limitations

For native pytorch modules it is impossible to pass at once multiple sampled parameter sets for a network. Hence, when we sampled more than one set, we need to loop over them using take_parameters_sample. In each iteration the forward operation is then repeated, which makes execution slower.

Demos

  1. Learn Normalizing Flows for BNN with a single wide hidden layer and Matern52-like activation (compared against MCMC baseline in Pyro)
  2. Learn Normalizing Flow for Bayesian linear regression (13 dimensions; using BNN wrapper class)
  3. Learn full-rank Normal for Bayesian linear regression
  4. Learn factorized Normal for Bayesian linear regression
  5. Minimize KL(q|p) for q modeled as Bayesian Hypernetwork
  6. Minimize KL(q|p) for q modeled as RealNVP flow
  7. Minimize KL(q|p) for q and p being factorized Normal

Credits

RealNVP implementation is based on code from Jakub Tomczak. Code for flows includes contributions by Bartosz Wójcik [bartwojc(AT)gmail.com] and Marcin Sendera [marcin.sendera(AT)gmail.com].

About

Learning with reparametrized gradients for native pytorch modules

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published