Invalid weights under repeated data #21

tawheeler · 2016-09-13T20:27:26Z

If you have repeated data you can end up with NaNs in your weights.

using GaussianMixtures

data = Array(Float64, 110, 1) # one variable
data[1:100] = randn(100)
data[101:110] = randn(10) + 10.0

model = GaussianMixtures.GMM(2, data)
println(model)

data[101:110] = fill(10.0, 10)
model = GaussianMixtures.GMM(2, data)
println(model)

Results in

GaussianMixtures.GMM{Float64,Array{Float64,2}}(2,1,[0.09090909090969347,0.9090909090903065],[9.463489092287428
 0.19884123133198583],[0.9811041259003161
 0.9898456887418752],[,,,,,,,,,,,,],110)
WARNING: 4 pathological elements normalized
WARNING: 4 pathological elements normalized
WARNING: 4 pathological elements normalized
WARNING: 4 pathological elements normalized
WARNING: 4 pathological elements normalized
WARNING: 4 pathological elements normalized
WARNING: 4 pathological elements normalized
WARNING: 4 pathological elements normalized
WARNING: 4 pathological elements normalized
WARNING: 4 pathological elements normalized
GaussianMixtures.GMM{Float64,Array{Float64,2}}(2,1,[NaN,NaN],[0.0
 0.0],[1.0
 1.0],[,,,,,,,,,,,,,,,,,,,,,,],110)

It looks like varfloor is a parameter in em! but it is not exposed to GMM.
One problem is that, in em!, the line tooSmall = any(gmm.Σ .< varfloor, 2) will not find the offending NaN values. Also, it looks like N and F from stats() are all NaN as well, so the mean is also NaN.

The text was updated successfully, but these errors were encountered:

davidavdav · 2016-09-14T07:49:04Z

Thanks. Yes, it is likely that repeated data ends up in a gaussian on its own, which will lead to vanishing variance. I suppose the relevant code could be revised on that point. I've never really been charmed by the logic in the code at that point.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Invalid weights under repeated data #21

Invalid weights under repeated data #21

tawheeler commented Sep 13, 2016 •

edited

Loading

davidavdav commented Sep 14, 2016

Invalid weights under repeated data #21

Invalid weights under repeated data #21

Comments

tawheeler commented Sep 13, 2016 • edited Loading

davidavdav commented Sep 14, 2016

tawheeler commented Sep 13, 2016 •

edited

Loading