Skip to content

Commit

Permalink
Merge pull request #132 from dwijenchawra/master
Browse files Browse the repository at this point in the history
fixed issue #67 and #51
  • Loading branch information
mykelk committed Sep 3, 2021
2 parents c5e1681 + 6bd6e88 commit 6ae0c4f
Show file tree
Hide file tree
Showing 23 changed files with 99 additions and 127 deletions.
8 changes: 6 additions & 2 deletions .github/workflows/TagBot.yml
Original file line number Diff line number Diff line change
@@ -1,11 +1,15 @@
name: TagBot
on:
schedule:
- cron: 0 * * * *
issue_comment:
types:
- created
workflow_dispatch:
jobs:
TagBot:
if: github.event_name == 'workflow_dispatch' || github.actor == 'JuliaTagBot'
runs-on: ubuntu-latest
steps:
- uses: JuliaRegistries/TagBot@v1
with:
token: ${{ secrets.GITHUB_TOKEN }}
ssh: ${{ secrets.DOCUMENTER_KEY }}
2 changes: 1 addition & 1 deletion .github/workflows/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ jobs:
with:
version: '1.6'
- name: Install LuaLatex
run: sudo apt-get install texlive-full && sudo apt-get install texlive-latex-extra && sudo mktexlsr && sudo updmap-sys
run: sudo apt-get update && sudo apt-get install texlive-full --fix-missing && sudo apt-get install texlive-latex-extra && sudo mktexlsr && sudo updmap-sys
- name: Install dependencies
run: julia --project=docs/ -e 'using Pkg; Pkg.develop(PackageSpec(path=pwd())); Pkg.instantiate()'
- name: Build and deploy
Expand Down
34 changes: 0 additions & 34 deletions .travis.yml

This file was deleted.

8 changes: 0 additions & 8 deletions docs/Project.toml

This file was deleted.

5 changes: 3 additions & 2 deletions docs/make.jl
Original file line number Diff line number Diff line change
Expand Up @@ -25,5 +25,6 @@ makedocs(
)

deploydocs(
repo = "github.com/sisl/BayesNets.jl.git",
)
repo = "github.com/dwijenchawra/BayesNets.jl.git",
)
return true
32 changes: 18 additions & 14 deletions docs/src/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -187,7 +187,7 @@ rand(bn_gibbs, gsampler, 5)

BayesNets.jl supports parameter learning for an entire graph.

```julia
```julia
fit(BayesNet, data, (:a=>:b), [StaticCPD{Normal}, LinearGaussianCPD])
```

Expand Down Expand Up @@ -223,7 +223,7 @@ Inference methods for discrete Bayesian networks can be used via the `infer` met
bn = DiscreteBayesNet()
push!(bn, DiscreteCPD(:a, [0.3,0.7]))
push!(bn, DiscreteCPD(:b, [0.2,0.8]))
push!(bn, DiscreteCPD(:c, [:a, :b], [2,2],
push!(bn, DiscreteCPD(:c, [:a, :b], [2,2],
[Categorical([0.1,0.9]),
Categorical([0.2,0.8]),
Categorical([1.0,0.0]),
Expand Down Expand Up @@ -283,7 +283,7 @@ data[1:3,:] # only display a subset...
Here we use the K2 structure learning algorithm which runs in polynomial time but requires that we specify a topological node ordering.

```@example bayesnet
parameters = K2GraphSearch([:Species, :SepalLength, :SepalWidth, :PetalLength, :PetalWidth],
parameters = K2GraphSearch([:Species, :SepalLength, :SepalWidth, :PetalLength, :PetalWidth],
ConditionalLinearGaussianCPD,
max_n_parents=2)
bn = fit(BayesNet, data, parameters)
Expand All @@ -300,7 +300,7 @@ Changing the ordering will change the structure.

```julia
CLG = ConditionalLinearGaussianCPD
parameters = K2GraphSearch([:Species, :PetalLength, :PetalWidth, :SepalLength, :SepalWidth],
parameters = K2GraphSearch([:Species, :PetalLength, :PetalWidth, :SepalLength, :SepalWidth],
[StaticCPD{Categorical}, CLG, CLG, CLG, CLG],
max_n_parents=2)
fit(BayesNet, data, parameters)
Expand All @@ -311,7 +311,7 @@ A `ScoringFunction` allows for extracting a scoring metric for a CPD given data.
A `GraphSearchStrategy` defines a structure learning algorithm. The K2 algorithm is defined through `K2GraphSearch` and `GreedyHillClimbing` is implemented for discrete Bayesian networks and the Bayesian score:

```@example bayesnet
data = DataFrame(c=[1,1,1,1,2,2,2,2,3,3,3,3],
data = DataFrame(c=[1,1,1,1,2,2,2,2,3,3,3,3],
b=[1,1,1,2,2,2,2,1,1,2,1,1],
a=[1,1,1,2,1,1,2,1,1,2,1,1])
parameters = GreedyHillClimbing(ScoreComponentCache(data), max_n_parents=3, prior=UniformPrior())
Expand All @@ -325,7 +325,7 @@ TikzPictures.save(SVG("plot9"), plot) # hide

We can specify the number of categories for each variable in case it cannot be correctly inferred:

```julia
```@example bayesnet
bn = fit(DiscreteBayesNet, data, parameters, ncategories=[3,3,2])
```

Expand All @@ -338,11 +338,17 @@ A whole suite of features are supported for DiscreteBayesNets. Here, we illustra

We also detail obtaining a bayesian score for a network structure in the next section.

```julia
count(bn, :a, data) # 1
statistics(bn.dag, data) # 2
table(bn, :b) # 3
table(bn, :c, :a=>1) # 4
```@example bayesnet
count(bn, :a, data)
```
```@example bayesnet
statistics(bn.dag, data)
```
```@example bayesnet
table(bn, :b)
```
```@example bayesnet
table(bn, :c, :a=>1)
```

## Reading from XDSL
Expand All @@ -363,12 +369,10 @@ TikzPictures.save(SVG("plot10"), plot) # hide
The bayesian score for a discrete-valued BayesNet can can be calculated based only on the structure and data (the CPDs do not need to be defined beforehand). This is implemented with a method of ```bayesian_score``` that takes in a directed graph, the names of the nodes and data.

```@example bayesnet
data = DataFrame(c=[1,1,1,1,2,2,2,2,3,3,3,3],
data = DataFrame(c=[1,1,1,1,2,2,2,2,3,3,3,3],
b=[1,1,1,2,2,2,2,1,1,2,1,1],
a=[1,1,1,2,1,1,2,1,1,2,1,1])
g = DAG(3)
add_edge!(g,1,2); add_edge!(g,2,3); add_edge!(g,1,3)
bayesian_score(g, [:a,:b,:c], data)
```


4 changes: 3 additions & 1 deletion src/BayesNets.jl
Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,9 @@ export
adding_edge_preserves_acyclicity,
bayesian_score_component,
bayesian_score_components,
bayesian_score
bayesian_score,

nodenames


include("bayes_nets.jl")
Expand Down
6 changes: 3 additions & 3 deletions src/DiscreteBayesNet/discrete_bayes_net.jl
Original file line number Diff line number Diff line change
Expand Up @@ -71,15 +71,15 @@ function table(bn::DiscreteBayesNet, name::NodeName)
d[!,name] = 1:ncategories(cpd(assignment))
end

p = ones(size(d,1)) # the probability column
potential = ones(size(d,1)) # the probability column
for i in 1:size(d,1)
assignment = Assignment()
for j in 1:length(varnames)
assignment[varnames[j]] = d[i,j]
end
p[i] = pdf(cpd, assignment)
potential[i] = pdf(cpd, assignment)
end
d[!,:p] = p
d[!,:potential] = potential

return Table(d)
end
Expand Down
4 changes: 2 additions & 2 deletions src/DiscreteBayesNet/io.jl
Original file line number Diff line number Diff line change
Expand Up @@ -157,8 +157,8 @@ function Base.write(io::IO, mime::MIME"text/plain", bn::DiscreteBayesNet)
for name in arr_names
cpd = get(bn, name)
for D in cpd.distributions
for p in probs(D)[1:end-1]
str = @sprintf("%.16g", p)
for potential in probs(D)[1:end-1]
str = @sprintf("%.16g", potential)
print(io, space ? " " : "" , str)
space = true
end
Expand Down
34 changes: 17 additions & 17 deletions src/DiscreteBayesNet/tables.jl
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
DataFrames are used to represent factors
https://en.wikipedia.org/wiki/Factor_graph
:p is the column containing the probabilities, ::Float64
:potential is the column containing the probabilities, ::Float64
Each variable has its own column corresponding to its assignments and named with its name
These can be obtained using the table() function
Expand Down Expand Up @@ -34,16 +34,16 @@ function Base.:*(t1::Table, t2::Table)
f1 =t1.potential
f2 =t2.potential

onnames = setdiff(intersect(propertynames(f1), propertynames(f2)), [:p])
finalnames = vcat(setdiff(union(propertynames(f1), propertynames(f2)), [:p]), :p)
onnames = setdiff(intersect(propertynames(f1), propertynames(f2)), [:potential])
finalnames = vcat(setdiff(union(propertynames(f1), propertynames(f2)), [:potential]), :potential)

if isempty(onnames)
j = join(f1, f2, kind=:cross, makeunique=true)
else
j = outerjoin(f1, f2, on=onnames, makeunique=true)
end

j[!,:p] = broadcast(*, j[!,:p], j[!,:p_1])
j[!,:potential] = broadcast(*, j[!,:potential], j[!,:potential_1])

return Table(j[!,finalnames])
end
Expand All @@ -57,25 +57,25 @@ function sumout(t::Table, v::NodeNameUnion)
f = t.potential

# vcat works for single values and vectors alike (magic?)
remainingvars = setdiff(propertynames(f), vcat(v, :p))
remainingvars = setdiff(propertynames(f), vcat(v, :potential))

if isempty(remainingvars)
# they want to remove all variables except for prob column
# uh ... 'singleton' table?
return Table(DataFrame(p = sum(f[!,:p])))
return Table(DataFrame(potential = sum(f[!,:potential])))
else
# note that this will fail miserably if f is too large (~1E4 maybe?)
# nothing I can do; there is a github issue
return Table(combine(df -> DataFrame(p = sum(df[!,:p])), DataFrames.groupby(f, remainingvars)))
return Table(combine(df -> DataFrame(potential = sum(df[!,:potential])), DataFrames.groupby(f, remainingvars)))
end
end

"""
Table normalization
Ensures that the `:p` column sums to one
Ensures that the `:potential` column sums to one
"""
function LinearAlgebra.normalize!(t::Table)
t.potential[!,:p] ./= sum(t.potential[!,:p])
t.potential[!,:potential] ./= sum(t.potential[!,:potential])

return t
end
Expand Down Expand Up @@ -103,23 +103,23 @@ end

"""
takes a list of observations of assignments represented as a DataFrame
or a set of data samples (without :p),
or a set of data samples (without :potential),
takes the unique assignments,
and estimates the associated probability of each assignment
based on its frequency of occurrence.
"""
function Distributions.fit(::Type{Table}, f::DataFrame)
w = ones(size(f, 1))
t = f
if hasproperty(f, :p)
t = f[:, propertynames(t) .!= :p]
w = f[!,:p]
if hasproperty(f, :potential)
t = f[:, propertynames(t) .!= :potential]
w = f[!,:potential]
end
# unique samples
tu = unique(t)
# add column with probabilities of unique samples
tu[!,:p] = Float64[sum(w[Bool[tu[j,:] == t[i,:] for i = 1:size(t,1)]]) for j = 1:size(tu,1)]
tu[!,:p] /= sum(tu[!,:p])
tu[!,:potential] = Float64[sum(w[Bool[tu[j,:] == t[i,:] for i = 1:size(t,1)]]) for j = 1:size(tu,1)]
tu[!,:potential] /= sum(tu[!,:potential])

return Table(tu)
end
Expand All @@ -133,8 +133,8 @@ end
# n = size(f, 1)
# p = zeros(n)
# w = ones(n)
# if hasproperty(f, :p)
# w = f[!,:p]
# if hasproperty(f, :potential)
# w = f[!,:potential]
# end
#
# dfindex = find([hasproperty(a, n) for n in names(f)])
Expand Down
7 changes: 3 additions & 4 deletions src/Factors/factors_dims.jl
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ Normalize the factor so all instances of dims have (or the entire factors has)
p-norm of 1
"""
function LinearAlgebra.normalize!::Factor, dims::NodeNameUnion; p::Int=1)
dims = unique(convert(NodeNames, dims))
dims = unique(nodeconvert(NodeNames, dims))
_check_dims_valid(dims, ϕ)

inds = indexin(dims, ϕ)
Expand Down Expand Up @@ -69,7 +69,7 @@ Reduce dimensions `dims` in `ϕ` using function `op`.
"""
function reducedim(op, ϕ::Factor, dims::NodeNameUnion, v0=nothing)
# a (possibly?) more efficient version than reducedim!(deepcopy(ϕ))
dims = convert(NodeNames, dims)
dims = nodeconvert(NodeNames, dims)
_check_dims_valid(dims, ϕ)

# needs to be a tuple for squeeze
Expand All @@ -85,7 +85,7 @@ function reducedim(op, ϕ::Factor, dims::NodeNameUnion, v0=nothing)
end

function reducedim!(op, ϕ::Factor, dims::NodeNameUnion, v0=nothing)
dims = convert(NodeNames, dims)
dims = nodeconvert(NodeNames, dims)
_check_dims_valid(dims, ϕ)

# needs to be a tuple for squeeze
Expand Down Expand Up @@ -270,4 +270,3 @@ end
/(ϕ1::Factor, ϕ2::Factor) = join(/, ϕ1, ϕ2)
+(ϕ1::Factor, ϕ2::Factor) = join(+, ϕ1, ϕ2)
-(ϕ1::Factor, ϕ2::Factor) = join(-, ϕ1, ϕ2)

8 changes: 6 additions & 2 deletions src/Factors/factors_main.jl
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,11 @@
# THE MOST BASIC ASSUMPTION IS THAT ALL VARIABLES ARE CATEGORICAL AND THEREFORE
# Base.OneTo WORTHY. IF THAT IS VIOLATED, NOTHING WILL WORK

nodeconvert(::Type{NodeNames}, names::NodeNameUnion) = names

nodeconvert(::Type{NodeNames}, name::NodeName) = [name]


"""
Factor(dims, potential)
Expand All @@ -16,7 +21,7 @@ mutable struct Factor
# In most cases this will be a probability

function Factor(dims::NodeNameUnion, potential::Array{Float64})
dims = convert(NodeNames, dims)
dims = nodeconvert(NodeNames, dims)
_ckeck_dims_unique(dims)

(length(dims) != ndims(potential)) &&
Expand Down Expand Up @@ -192,4 +197,3 @@ function pattern(ϕ::Factor)
hcat([repeat(collect(1:l), inner=i, outer=o) for (l, i, o) in
zip(lens, inners, outers)]...)
end

3 changes: 1 addition & 2 deletions src/Inference/exact.jl
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ function infer(im::ExactInference, inf::InferenceState{BN}) where {BN<:DiscreteB
nodes = names(bn)
query = inf.query
evidence = inf.evidence
hidden = setdiff(nodes, vcat(query, names(evidence)))
hidden = setdiff(nodes, vcat(query, keys(evidence)))

factors = map(n -> Factor(bn, n, evidence), nodes)

Expand All @@ -31,4 +31,3 @@ function infer(im::ExactInference, inf::InferenceState{BN}) where {BN<:DiscreteB
end
infer(inf::InferenceState{BN}) where {BN<:DiscreteBayesNet} = infer(ExactInference(), inf)
infer(bn::BN, query::NodeNameUnion; evidence::Assignment=Assignment()) where {BN<:DiscreteBayesNet} = infer(ExactInference(), InferenceState(bn, query, evidence))

Loading

0 comments on commit 6ae0c4f

Please sign in to comment.