Table of Contents
1 Introduction
1.1 Analytical Solution
1.2 Stability of Solutions
2 Generative Model Structure
3 BayesFlow Architecture
3.1 Summary Network
3.2 Inference Network
4 Preproccessing
5 Training
6 Validating the Results
[1]:
import os
import sys
import numpy as np
import matplotlib.pyplot as plt
from functools import partial
import tensorflow as tf
from tensorflow.keras.optimizers.schedules import PiecewiseConstantDecay
[2]:
sys.path.append(os.path.abspath(os.path.join('../../..')))
from bayesflow.simulation import Prior, Simulator, GenerativeModel
from bayesflow.networks import InvertibleNetwork
from bayesflow.amortized_inference import AmortizedPosterior
from bayesflow.trainers import Trainer
import bayesflow.diagnostics as diag
/home/leo/Workspace/BayesFlow/bayesflow/trainers.py:16: TqdmExperimentalWarning: Using `tqdm.autonotebook.tqdm` in notebook mode. Use `tqdm.tqdm` instead to force console mode (e.g. in jupyter console)
from tqdm.autonotebook import tqdm
Introduction
In this tutorial, we will look at a simple linear ODE system:
with the boundary conditions:
Given the solutions \(u(t)\) and \(v(t)\), we want to use BayesFlow to estimate the parameters \(a\), \(b\), \(c\) and \(d\) of the ODE equations as well as the boundary conditions \(u_0\) and \(v_0\) over all possible solutions of the system.
Analytical Solution
The advantage of such a simple ODE system is, that the analytical solutions for \(u\) and \(v\) are known:
where \(\lambda_1\) and \(\lambda_2\) are eigenvalues and \(\vec{v_1}\) and \(\vec{v_2}\) are the corresponding eigenvectors of the matrix:
Note, that \(\begin{bmatrix} \vec{v_1} & \vec{v_2} \end{bmatrix}\) is the matrix with eigenvectors \(\vec{v_1}\) and \(\vec{v_2}\) as its column vectors. The constants \(C_1\) and \(C_2\) can be computed from the boundary condition:
Stability of Solutions
The solution \(u\) and \(v\) will be of the form:
By separating the real and complex part of the eigenvalues \(\lambda_j = \gamma + i \omega\) we see that the real part determines whether a solution will be exponential (increasing/decreasing) or stay constant, whereas the complex part will determine the oscillation behaviour.
We will apply rejection sampling to only consider stable solutions. In other words, we only keep solutions that fulfill the condition \(\gamma_1 \leq 0\) and \(\gamma_2 \leq 0\).
# Generative Model Structure
We have to generate some simulated data to train our BayesFlow architecture. The first step is to randomly draw combinations of \(a\), \(b\), \(c\), \(d\), \(u_0\) and \(v_0\) from a uniform prior distribution. By computing the eigenvalues, we can preemptively reject prior samples that will lead to unstable solutions. Additionally, we estimate the prior means and standard deviations for standardization later. The standardization step is just a convenience, since neural networks like well-scaled data.
[3]:
def model_prior():
"""Generates random draws from uniform pior with rejection sampling."""
while True:
# generate draw from prior
sample = np.random.uniform(low=-10, high=10, size=6)
# reject, if solution not stable
A = sample[:4].reshape((2, 2))
eigenvalues, _ = np.linalg.eig(A)
if eigenvalues[0].real <= 0 and eigenvalues[1].real <= 0:
break
return sample
We wrap the prior function into the Prior
wrapper:
[4]:
prior = Prior(prior_fun=model_prior, param_names=[r'$a$', r'$b$', r'$c$', r'$d$', r'$u_0$', r'$v_0$'])
prior_means, prior_stds = prior.estimate_means_and_stds()
The next step is to generate simulated data using the prior samples by defining a simulator using the Simulator
wrapper:
[5]:
def linear_ode_solver(params, t):
"""Solves the linear ODE system analytically for given time points t and prior parameter samples"""
# unpack params
A = params[:4].reshape((2, 2))
boundaries = params[-2:]
# solve for u and v
eigenvalues, eigenvectors = np.linalg.eig(A)
C = np.linalg.inv(eigenvectors) @ boundaries
solution = eigenvectors @ np.array([C[0] * np.exp(eigenvalues[0] * t), C[1] * np.exp(eigenvalues[1] * t)])
return solution.T.real
[6]:
time_points = np.linspace(0, 1, num=32)
simulator = Simulator(simulator_fun=partial(linear_ode_solver, t=time_points))
Finally, we will wrap the prior and simulator into a GenerativeModel
which will connect the prior with the simulator:
[7]:
model = GenerativeModel(prior, simulator, name='linear_ODE_generator')
INFO:root:Performing 2 pilot runs with the linear_ODE_generator model...
INFO:root:Shape of parameter batch after 2 pilot simulations: (batch_size = 2, 6)
INFO:root:Shape of simulation batch after 2 pilot simulations: (batch_size = 2, 32, 2)
INFO:root:No optional prior non-batchable context provided.
INFO:root:No optional prior batchable context provided.
INFO:root:No optional simulation non-batchable context provided.
INFO:root:No optional simulation batchable context provided.
As a sanity check, we can sample 1000 prior combinations and visualize the joint priors in bivariate plots. Note that we have applied rejection sampling. Therefore, certain combinations of \(a\), \(b\), \(c\), \(d\) will always be rejected, because they will produce unstable solutions, so the priors are no longer uniform.
[8]:
fig = prior.plot_prior2d()

Additionally, we can visualize the simulation data:
[9]:
sim_data = model(8)['sim_data']
fig, ax = plt.subplots(2, 4, figsize=(15, 10))
ax = ax.flat
for i, data in enumerate(sim_data):
ax[i].plot(time_points, data[:, 0], label='u')
ax[i].plot(time_points, data[:, 1], label='v')
ax[i].set_xlabel("Time t [s]")
ax[i].set_ylabel("Solution u(t) / v(t)")
ax[i].grid(True)
ax[i].legend()
plt.tight_layout()
plt.show()

BayesFlow Architecture
The BayesFlow architecture consists of a summary network and an invertible neural network. The summary network learns summary statistics for each input, such that the summary network output will have dimensions (batch_size, summary_dim)
. The invertible inference neural network is conditioned on the summary statistic of the summary network.
Summary Network
For this tutorial, we will use a small deep LSTM summary network:
[10]:
class LSTM(tf.keras.Model):
def __init__(self, hidden_size=32, summary_dim=32):
super(LSTM, self).__init__()
self.LSTM = tf.keras.Sequential([
tf.keras.layers.LSTM(hidden_size, return_sequences=True),
tf.keras.layers.LSTM(hidden_size, return_sequences=True),
tf.keras.layers.LSTM(hidden_size),
tf.keras.layers.Dense(summary_dim, activation="sigmoid")
])
def call(self, x, **kwargs):
out = self.LSTM(x)
return out
[11]:
summary_net = LSTM()
2022-09-02 11:40:11.451968: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-09-02 11:40:11.456873: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-09-02 11:40:11.457161: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-09-02 11:40:11.457894: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-09-02 11:40:11.458592: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-09-02 11:40:11.458822: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-09-02 11:40:11.459029: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-09-02 11:40:11.911997: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-09-02 11:40:11.912217: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-09-02 11:40:11.912393: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-09-02 11:40:11.912533: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 3974 MB memory: -> device: 0, name: NVIDIA GeForce RTX 2080, pci bus id: 0000:29:00.0, compute capability: 7.5
Inference Network
For the inference network, we will use a 8-layer conditional invertible neural network (cINN):
[12]:
inference_net = InvertibleNetwork(num_params=6, num_coupling_layers=8)
Finally, we wrap the summary network and inference network into an AmortizedPosterior
instance:
[13]:
amortizer = AmortizedPosterior(inference_net, summary_net, name='linear_ODE_amortizer')
Preproccessing
Before we feed the simulation data to our BayesFlow amortizer, we want to perform some preprocessing such as normalization, logscale conversion for stability, and removing nan/infinite samples.
[14]:
data = model(batch_size=1000)
sim_mean = np.mean(data['sim_data'])
sim_std = np.std(data['sim_data'])
def configure_input(forward_dict):
"""Configures dictionary of prior draws and simulated data into BayesFlow format."""
out_dict = {}
# standardization sim_data
sim_data = forward_dict['sim_data'].astype(np.float32)
norm_data = (sim_data - sim_mean) / sim_std
# standardization priors
params = forward_dict['prior_draws'].astype(np.float32)
norm_params = (params - prior_means) / prior_stds
# remove nan, inf and -inf
keep_idx = np.all(np.isfinite(norm_data), axis=(1, 2))
if not np.all(keep_idx):
print('Invalid value encountered...removing from batch')
# add to dict
out_dict['summary_conditions'] = norm_data[keep_idx]
out_dict['parameters'] = norm_params[keep_idx]
return out_dict
Training
To train our BayesFlow amortizer, we first have to define a Trainer
instance, which will take care of simulation-based training. Note, that we also use a piecewise constant decay learning rate scheduler to reduce the learning rate after a pre-selected number of training steps.
[15]:
trainer = Trainer(
amortizer=amortizer,
generative_model=model,
configurator=configure_input
)
INFO:root:Performing a consistency check with provided components...
2022-09-02 11:40:13.452004: I tensorflow/stream_executor/cuda/cuda_dnn.cc:384] Loaded cuDNN version 8100
INFO:root:Done.
Once we initialized the Trainer instance and run a consistency check on a test batch, we can print the number of trainable and non-trainable parameters in our BayesFlow model:
[16]:
amortizer.summary()
Model: "linear_ODE_amortizer"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
invertible_network (Inverti multiple 213280
bleNetwork)
lstm (LSTM) multiple 22176
=================================================================
Total params: 235,456
Trainable params: 235,360
Non-trainable params: 96
_________________________________________________________________
Now, we can train our BayesFlow architecture using online learning and showcasing a custom optimizer.
[17]:
learning_rate = PiecewiseConstantDecay([100000, 150000], [0.001, 0.0001, 0.00001])
optimizer = tf.keras.optimizers.Adam(learning_rate)
losses = trainer.train_online(epochs=200, iterations_per_epoch=1000, batch_size=32, optimizer=optimizer)
We can visualize the training loss to inspect, if the training has converge.
[18]:
fig = diag.plot_losses(losses)

Validating the Results
After training our BayesFlow architecture, we want to validate the results. Our first step is to inspect the latent space \(z\), which we enforce to be Gaussian using the default Kullback-Leibler (KL) loss during training.
[19]:
fig = trainer.diagnose_latent2d()

Next we can perform simulation-based calibration (SBC) as proposed by Talts et al. (2020):
[20]:
fig = trainer.diagnose_sbc_histograms()

Additionally, we can also inspect the simulation-based calibration through empirical cumulative distribution functions (ECDF) instead of histograms as proposed by Teemu Säilynoja et al. (2021):
[21]:
valid_sim_data_raw = model(batch_size=300)
valid_sim_data = trainer.configurator(valid_sim_data_raw)
posterior_samples = amortizer.sample(valid_sim_data, n_samples=100)
fig = diag.plot_sbc_ecdf(posterior_samples, valid_sim_data['parameters'])

During inference, BayesFlow will provide us with a posterior distribution for every hidden parameter and every possible solution. To get a point estimate from BayesFlow, we will simply take the mean value of the posterior. For validation, we can plot the posterior mean value against the ground truth and compute the R Squared for our model. In the ideal case, all predicted values should lie on the diagonal line (posterior standard deviations are also plotted for an uncertainty measure):
[22]:
fig = diag.plot_recovery(posterior_samples, valid_sim_data['parameters'], param_names=prior.param_names)

For single solutions, we can also visualize the bivariate posteriors between different hidden parameters and compare them to the priors:
[23]:
posterior_samples_unnorm = prior_means + posterior_samples * prior_stds
fig = diag.plot_posterior_2d(posterior_samples_unnorm[0], param_names=prior.param_names)

[24]:
fig = diag.plot_posterior_2d(posterior_samples_unnorm[0], prior=prior)

The last check we want to perform is re-simulation (also known as posterior predictive checks). Given a set of linear ODE solutions of \(u(t)\) and \(v(t)\), we predict the posterior of \(a\), \(b\), \(c\), \(d\), \(u_0\) and \(v_0\). All parameter posterior samples will then be passed to the simulator to re-simulate \(\hat{u}_i(t)\) and \(\hat{v}_i(t)\). Finally, we compute the median and quantile between all \(\hat{u}_i(t)\) and \(\hat{v}_i(t)\) and compare to the ground truth solutions \(u(t)\) and \(v(t)\):
[25]:
# re-simulation
resim_u = np.empty((posterior_samples_unnorm[0].shape[0], len(time_points)), dtype=np.float32)
resim_v = np.empty((posterior_samples_unnorm[0].shape[0], len(time_points)), dtype=np.float32)
for i in range(posterior_samples_unnorm[0].shape[0]):
re_sim = linear_ode_solver(posterior_samples_unnorm[0, i], time_points)
resim_u[i, :] = re_sim[:, 0]
resim_v[i, :] = re_sim[:, 1]
# compute quantiles
u_qt_50 = np.quantile(resim_u, q=[0.25, 0.75], axis=0)
u_qt_90 = np.quantile(resim_u, q=[0.05, 0.95], axis=0)
u_qt_95 = np.quantile(resim_u, q=[0.025, 0.975], axis=0)
v_qt_50 = np.quantile(resim_v, q=[0.25, 0.75], axis=0)
v_qt_90 = np.quantile(resim_v, q=[0.05, 0.95], axis=0)
v_qt_95 = np.quantile(resim_v, q=[0.025, 0.975], axis=0)
# plot u
fig, ax = plt.subplots(1, 1, figsize=(15, 10))
ax.plot(time_points, np.median(resim_u, axis=0), label='Median u(t)', color='b')
ax.plot(time_points, valid_sim_data_raw['sim_data'][0, :, 0], marker='o', label='Ground truth u(t)', color='k', linestyle='--', alpha=0.8)
ax.fill_between(time_points, u_qt_50[0], u_qt_50[1], color='b', alpha=0.3, label='50% CI')
ax.fill_between(time_points, u_qt_90[0], u_qt_90[1], color='b', alpha=0.2, label='90% CI')
ax.fill_between(time_points, u_qt_95[0], u_qt_95[1], color='b', alpha=0.1, label='95% CI')
ax.grid(True)
ax.set_title("Re-simulation for u(t)")
ax.set_xlabel("Time t [s]")
ax.set_ylabel("Function u(t)")
ax.legend()
plt.show()
# plot v
fig, ax = plt.subplots(1, 1, figsize=(15, 10))
ax.plot(time_points, np.median(resim_v, axis=0), label='Median v(t)', color='b')
ax.plot(time_points, valid_sim_data_raw['sim_data'][0, :, 1], marker='o', label='Ground truth v(t)', color='k', linestyle='--', alpha=0.8)
ax.fill_between(time_points, v_qt_50[0], v_qt_50[1], color='b', alpha=0.3, label='50% CI')
ax.fill_between(time_points, v_qt_90[0], v_qt_90[1], color='b', alpha=0.2, label='90% CI')
ax.fill_between(time_points, v_qt_95[0], v_qt_95[1], color='b', alpha=0.1, label='95% CI')
ax.grid(True)
ax.set_title("Re-simulation for v(t)")
ax.set_xlabel("Time t [s]")
ax.set_ylabel("Function v(t)")
ax.legend()
plt.show()

