Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vectorization with emcee solver #83

Open
JanKoune opened this issue Jun 7, 2022 · 2 comments
Open

Vectorization with emcee solver #83

JanKoune opened this issue Jun 7, 2022 · 2 comments

Comments

@JanKoune
Copy link
Collaborator

JanKoune commented Jun 7, 2022

Emcee allows for vectorized calls to the log-likelihood function by setting vectorize=True in the EnsembleSampler. With this option enabled, the input to the log-likelihood function is a 2D array of theta values of size (n_walkers x n_theta). Doing this in probeye results in an error in check_parameter_domains (see minimal example below).

Having this functionality could be useful for:

  • Parallelizing MCMC with a computationally expensive forward model, e.g. using offloading.
  • Faster inference with cheap surrogate models that accept vectorized inputs.
@JanKoune
Copy link
Collaborator Author

JanKoune commented Jun 7, 2022

A minimal example based on the linear regression integration test:

import numpy as np
from probeye.definition.inverse_problem import InverseProblem
from probeye.definition.forward_model import ForwardModelBase
from probeye.definition.sensor import Sensor
from probeye.definition.likelihood_model import GaussianLikelihoodModel
from probeye.inference.emcee.solver import EmceeSolver

# ============================================================================ #
#                              Set numeric values                              #
# ============================================================================ #
# Options
n_steps = 200
n_initial_steps = 100
n_walkers = 20

# 'true' value of a, and its normal prior parameters
a_true = 2.5
mean_a = 2.0
std_a = 1.0

# 'true' value of b, and its normal prior parameters
b_true = 1.7
mean_b = 1.0
std_b = 1.0

# 'true' value of additive error sd, and its uniform prior parameters
sigma = 0.5
low_sigma = 0.0
high_sigma = 0.8

# the number of generated experiment_names and seed for random numbers
n_tests = 50
seed = 1

# ============================================================================ #
#                           Define the Forward Model                           #
# ============================================================================ #

class LinearModel(ForwardModelBase):
    def interface(self):
        self.parameters = [{"a": "m"}, "b"]
        self.input_sensors = Sensor("x")
        self.output_sensors = Sensor("y", std_model="sigma")

    def response(self, inp: dict) -> dict:
        # this method *must* be provided by the user
        x = inp["x"]
        m = inp["m"]
        b = inp["b"]
        return {"y": m * x + b}

    def jacobian(self, inp: dict) -> dict:
        x = inp["x"]  # vector
        one = np.ones((len(x), 1))
        return {"y": {"x": None, "m": x.reshape(-1, 1), "b": one}}

# ============================================================================ #
#                         Define the Inference Problem                         #
# ============================================================================ #
problem = InverseProblem("Linear regression (AME)")

problem.add_parameter(
    "a",
    "model",
    tex="$a$",
    info="Slope of the graph",
    prior=("normal", {"mean": mean_a, "std": std_a}),
)
problem.add_parameter(
    "b",
    "model",
    info="Intersection of graph with y-axis",
    tex="$b$",
    prior=("normal", {"mean": mean_b, "std": std_b}),
)
problem.add_parameter(
    "sigma",
    "likelihood",
    domain="(0, +oo)",
    tex=r"$\sigma$",
    info="Standard deviation, of zero-mean additive model error",
    prior=("uniform", {"low": low_sigma, "high": high_sigma}),
)

# add the forward model to the problem
linear_model = LinearModel("LinearModel")
problem.add_forward_model(linear_model)

# ============================================================================ #
#                    Add test data to the Inference Problem                    #
# ============================================================================ #
# data-generation; normal likelihood with constant variance around each point
np.random.seed(seed)
x_test = np.linspace(0.0, 1.0, n_tests)
y_true = linear_model.response(
    {linear_model.input_sensor.name: x_test, "m": a_true, "b": b_true}
)[linear_model.output_sensor.name]
y_test = np.random.normal(loc=y_true, scale=sigma)

# add the experimental data
problem.add_experiment(
    f"TestSeries_1",
    fwd_model_name="LinearModel",
    sensor_values={
        linear_model.input_sensor.name: x_test,
        linear_model.output_sensor.name: y_test,
    },
)

# ============================================================================ #
#                           Add likelihood model(s)                            #
# ============================================================================ #
# add the likelihood model to the problem
problem.add_likelihood_model(
    GaussianLikelihoodModel(
        prms_def="sigma",
        experiment_name="TestSeries_1",
        model_error="additive",
        name="SimpleLikelihoodModel",
    )
)

# ============================================================================ #
#                    Solve problem with inference engine(s)                    #
# ============================================================================ #
emcee_solver = EmceeSolver(
    problem,
    show_progress=True,
)

inference_data = emcee_solver.run_mcmc(
    n_walkers=n_walkers,
    n_steps=n_steps,
    n_initial_steps=n_initial_steps,
    vectorize=True
)

@aklawonn
Copy link
Collaborator

aklawonn commented Jul 7, 2022

I looked into this, and I saw that there are quite a few changes necessary in order to make this feature possible. But I also think the code will get more complicated, because there will be much more ifs and for-loops in a lot of methods and function. So, it is possible, but at the cost of additional code complexity. If there is not an urgent need, I would not implement this at the moment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants