This repository contains the code for running the experiments and reproducing all results reported in our paper A Deep Learning Method for Comparing Bayesian Hierarchical Models. We propose a deep learning method for performing Bayesian model comparison on any set of hierarchical models which can be instantiated as probabilistic programs. The method formulates the problem as data compression (i.e., embedding hierarchical data sets into informative summary vectors) and probabilistic classification (i.e., assigning posterior probabilities to the summary vectors).
The details of the method are described in our paper:
Elsemüller, L., Schnuerch, M., Bürkner, P. C., & Radev, S. T. (2023). A Deep Learning Method for Comparing Bayesian Hierarchical Models arXiv preprint arXiv:2301.11873, available for free at: https://arxiv.org/abs/2301.11873.
The code depends on the BayesFlow library, which implements the neural network architectures and training utilities.
@article{elsemuller2023deep,
title={A deep learning method for comparing bayesian hierarchical models},
author={Elsem{\"u}ller, Lasse and Schnuerch, Martin and B{\"u}rkner, Paul-Christian and Radev, Stefan T},
journal={arXiv preprint arXiv:2301.11873},
year={2023}
}
The experiments are structured as self-contained Jupyter notebooks, which are detailed below.
Code for reproducing the calibration experiments of validation study 1 that are composed of three sub-parts:
- 01_calibration_fixed_sizes: Training and calibration assessment with data sets that all possess the same amount of groups and nested observations.
- 02_calibration_variable_observations: Training and calibration assessment with data sets that all possess the same amount of groups but vary in their amount of nested observations.
- 03_calibration_variable_sizes: Training and calibration assessment with data sets that vary in their amount of groups as well as nested observations.
- 02_bridge_sampling_comparison/nested_models: Code for reproducing the bridge sampling benchmarking of validation study 1, in which the approximation performance of the neural network is tested against bridge sampling on a toy example.
- 02_bridge_sampling_comparison/non_nested_models: Code for reproducing the calibration experiment and bridge sampling benchmarking of validation study 2, based on the comparison of hierarchical SDT and MPT models.
Code for reproducing the application study in which the drift diffusion model and the Lévy flight model are compared with and without inter-trial variability parameters. Consists of five steps:
- 01_simulate_data: Simulate training and validation data.
- 02_pretrain_networks: Pretrain the network on simulated data with a reduced amount of trials per participant.
- 03_finetune_networks: Fine-tune the network on simulated data that contains the same amount of trials per participant as the empirical data.
- 04_validate_networks: Validate the trained networks on new simulated data sets.
- 05_apply_networks: Apply the trained networks to the empirical data set.
Here, we re-analyzed data from Jumping to Conclusion? A Lévy Flight Model of Decision Making by Eva Marie Wieschen, Andreas Voss, and Stefan T. Radev. The data set can be requested from the authors of the original study.
Contains custom Julia and Python functions that enable the analyses, including the original implementation of our proposed hierarchical neural network architecture (all experiments now use our implementation in the BayesFlow library).
This work was supported by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy -– EXC-2181 - 390900948 (the Heidelberg Cluster of Excellence STRUCTURES) and -- EXC-2075 - 390740016 (the Stuttgart Cluster of Excellence SimTech), and by the research training group "Statistical Modeling in Psychology" (SMiP, also supported by the DFG; GRK 2277).
MIT