Skip to content
forked from cscherrer/Soss.jl

Probabilistic programming via source rewriting

License

MIT, Unknown licenses found

Licenses found

MIT
LICENSE
Unknown
LICENSE.md
Notifications You must be signed in to change notification settings

kskyten/Soss.jl

 
 

Repository files navigation

Soss

Stable Dev Build Status Build Status Codecov Coveralls

Soss is a library for probabilistic programming.

Getting started

Soss is an officially registered package, so to add it to your project you can type

]add Soss

within the julia REPL and your are ready for using Soss. If it fails to precompile, it could be due to one of the following:

  • You have gotten an old version due to compatibility restrictions with your current environment. Should that happen, create a new folder for your Soss project, launch a julia session within, type
]activate .

and start again. More information on julia projects here.

  • You have set up PyCall to use a python distribution provided by yourself. If that is the case, make sure to install the missing python dependencies, as listed in the precompilation error. More information on PyCall's python version here.

Let's jump right in with a simple linear model:

using Soss

m = @model X begin
    β ~ Normal() |> iid(size(X,2))
    y ~ For(eachrow(X)) do x
        Normal(x' * β, 1)
    end
end

In Soss, models are first-class and function-like, and applying a model to its arguments gives a joint distribution.

Just a few of the things we can do in Soss:

  • Sample from the (forward) model
  • Condition a joint distribution on a subset of parameters
  • Have arbitrary Julia values (yes, even other models) as inputs or outputs of a model
  • Build a new model for the predictive distribution, for assigning parameters to particular values

Let's use our model to build some fake data:

julia> import Random; Random.seed!(3)
Random.MersenneTwister(UInt32[0x00000003], Random.DSFMT.DSFMT_state(Int32[-1359582567, 1073454075, 1934390716, 1073583786, -114685834, 1073112842, -1913218479, 1073122729, -73577195, 1073266439  …  1226759590, 1072980451, -1366384707, 1073012992, 1661148031, 2121090155, 141576524, -658637225, 382, 0]), [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0  …  0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], UInt128[0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000  …  0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000], 1002, 0)

julia> X = randn(6,2)
6×2 Array{Float64,2}:
  1.19156    0.100793
 -2.51973   -0.00197414
  2.07481    1.00879
 -0.97325    0.844223
 -0.101607   1.15807
 -1.54251   -0.475159
julia> truth = rand(m(X=X));
(β = [0.07187269298745927, -0.5128103336795292], y = [0.10079289135480324, -2.5197330871745263, 2.0748097755419757, 0.8442227439533416, 1.158074626662026, -0.47515878362112707])

julia> pairs(truth)
pairs(::NamedTuple) with 2 entries:
   => [0.0718727, -0.51281]
  :y => [0.100793, -2.51973, 2.07481, 0.844223, 1.15807, -0.475159]
julia> truth.β
2-element Array{Float64,1}:
  0.07187269298745927
 -0.5128103336795292
julia> truth.y
6-element Array{Float64,1}:
  0.10079289135480324
 -2.5197330871745263
  2.0748097755419757
  0.8442227439533416
  1.158074626662026
 -0.47515878362112707

And now pretend we don't know β, and have the model figure it out. Often these are easier to work with in terms of particles (built using MonteCarloMeasurements.jl):

julia> post = dynamicHMC(m(X=X), (y=truth.y,));
1000-element Array{NamedTuple{(,),Tuple{Array{Float64,1}}},1}:= [0.39734466526796725, 0.9980730158207407],)
 (β = [0.45764947899215946, 0.7167597584561338],)
 (β = [0.44310994230948264, 1.1912646425584188],)
 (β = [0.5804128915563641, 0.9052672328462603],)
 (β = [0.4911904823400375, 1.3336636471494507],)
 (β = [0.49901795646365543, 1.3937610114041836],)
 (β = [0.45240830115259073, 1.0845756727727063],)
 (β = [0.5850340074895557, 0.9928101856844366],)
 (β = [0.3914145275648779, 1.0526332372994567],)
 (β = [0.6832156018930491, 0.5252436259666028],)
 = [0.09003031741978823, 1.4673228840593469],)
 (β = [0.7119637329216995, 1.3169083690223835],)
 (β = [0.7745492796085934, 1.5705570770395088],)
 (β = [0.4793282827750216, 1.2308875491009093],)
 (β = [0.5780307659230056, 0.6573161353873793],)
 (β = [0.4333874984633136, 0.5621670522823905],)
 (β = [0.671748699538022, 1.2435049706009875],)
 (β = [0.4123391655698562, 0.6816928239792956],)
 (β = [0.4337503594945402, 0.9399038270436071],)

julia> particles(post)
(β = Particles{Float64,1000}[0.538 ± 0.26, 0.775 ± 0.51],)

For model diagnostics and prediction, we need the predictive distribution:

julia> pred = predictive(m,)
@model (X, β) begin
        y ~ For(eachrow(X)) do x
                Normal(x' * β, 1)
            end
    end

This requires X and β as inputs, so we can do something like this to do a posterior predictive check

ppc = [rand(pred(;X=X, p...)).y for p in post];

truth.y - particles(ppc)
6-element Array{Particles{Float64,1000},1}:
 -0.534 ± 0.55
 -1.28 ± 1.3
  0.551 ± 0.53
  0.918 ± 0.91
  0.624 ± 0.63
  0.534 ± 0.53

These play a role similar to that of residuals in a non-Bayesian approach (there's plenty more detail to go into, but that's for another time).

With some minor modifications, we can put this into a form that allows symbolic simplification:

m2 = @model X begin
    N = size(X,1)
    k = size(X,2)
    β ~ Normal() |> iid(k)
    yhat = X * β
    y ~ For(N) do j
            Normal(yhat[j], 1)
        end
end;

symlogpdf(m2).evalf(3)
                            N                                       k           
                           ___                                     ___          
                           ╲                                       ╲            
                            ╲                            2          ╲          2
-0.919⋅N - 0.919⋅k - 0.5⋅   ╱    (y[_j1] - 1.0⋅yhat[_j1])  - 0.5⋅   ╱    β[_j1] 
                           ╱                                       ╱            
                           ‾‾‾                                     ‾‾‾          
                         _j1 = 1                                 _j1 = 1        

[the evalf(3) is to reduce the displayed number of decimal positions]

We can use the symbolic simplification to speed up computations:

julia> using BenchmarkTools

julia> jointdist = m2(X=X)
Joint Distribution
    Bound arguments: [X]
    Variables: [k, β, yhat, N, y]

@model X begin
        k = size(X, 2)
        β ~ Normal() |> iid(k)
        yhat = X * β
        N = size(X, 1)
        y ~ For(N) do j
                Normal(yhat[j], 1)
            end
    end


julia> @btime logpdf($jointdist, $truth)
  1.665 μs (51 allocations: 1.20 KiB)
-15.84854642585797

julia> @btime logpdf($jointdist, $truth, $codegen)
  77.007 ns (1 allocation: 128 bytes)
-15.848546425857968

What's Really Happening Here?

Under the hood, rand and logpdf specify different ways of "running" the model.

rand turns each v ~ dist into v = rand(dist), finally outputting the NamedTuple of all values it has seen.

logpdf steps through the same program, but instead accumulates a log-density. It begins by initializing _ℓ = 0.0. Then at each step, it turns v ~ dist into _ℓ += logpdf(dist, v), before finally returning _ℓ.

Note that I said "turns into" instead of "interprets". Soss uses GG.jl to generate specialized code for a given model, inference primitive (like rand and logpdf), and type of data.

This idea can be used in much more complex ways. weightedSample is a sort of hybrid between rand and logpdf. For data that are provided, it increments a _ℓ using logpdf. Unknown values are sampled using rand.

julia> ℓ, proposal = weightedSample(m(X=X), (y=truth.y,));
(-33.647614702926504, (X = [1.1915557734285787 0.10079289135480324; -2.5197330871745263 -0.0019741367391015213;  ; -0.1016067940589428 1.158074626662026; -1.5425131978228126 -0.47515878362112707], β = [-1.216679880035586, 0.42410088891060693], y = [0.10079289135480324, -2.5197330871745263, 2.0748097755419757, 0.8442227439533416, 1.158074626662026, -0.47515878362112707]))

julia>-33.647614702926504

julia> proposal.β
2-element Array{Float64,1}:
 -1.216679880035586
  0.42410088891060693

Again, there's no runtime check needed for this. Each of these is compiled the first time it is called, so future calls are very fast. Functions like this are great to use in tight loops.

To Do

We need a way to "lift" a "Distribution" (without parameters, so really a family) to a Model, or one with parameters to a JointDistribution

Models are "function-like", so a JointDistribution should be sometimes usable as a value. m1(m2(args)) should work.

This also means m1 ∘ m2 should be fine

Since inference primitives are specialized for the type of data, we can include methods for Union{Missing, T} data. PyMC3 has something like this, but for us it will be better since we know at compile time whether any data are missing.

There's a return available in case you want a result other than a NamedTuple, but it's a little fiddly still. I think whether the return is respected or ignored should depend on the inference primitive. And some will also modify it, similar to how a state monad works. Likelihood weighting is an example of this.

Rather than having lots of functions for inference, anything that's not a primitive should (I think for now at least) be a method of... let's call it sample. This should always return an iterator, so we can combine results after the fact using tools like IterTools, ResumableFunctions, and Transducers.

This situation just described is for generating a sequence of samples from a single distribution. But we may also have models with a sequence of distributions, either observed or sampled, or a mix. This can be something like Haskell's iterateM, though we need to think carefully about the specifics.

We already have a way to merge models, we should look into intersection as well.

We need ways to interact with Turing and Gen. Some ideas:

  • Turn a Soss model into an "outside" (Turing or Gen) model
  • Embed outside models as a black box in a Soss model, using their methods for inference

Stargazers over time

Stargazers over time

About

Probabilistic programming via source rewriting

Resources

License

MIT, Unknown licenses found

Licenses found

MIT
LICENSE
Unknown
LICENSE.md

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Julia 93.7%
  • TeX 6.2%
  • HTML 0.1%