solution dependence on regularization path #71

hugoguh · 2016-05-11T16:37:25Z

I got negative pseudo-R2s on the V4 data

so I checked:
running with default reg_lambdas seems to work fine

import numpy as np
import scipy.sparse as sps
from sklearn.preprocessing import StandardScaler
from pyglmnet import GLM

# create an instance of the GLM class
model = GLM(distr='poissonexp', verbose=False, alpha=0.05)

n_samples, n_features = 10000, 100

# coefficients
beta0 = np.random.normal(0.0, 1.0, 1)
beta = sps.rand(n_features, 1, 0.1)
beta = np.array(beta.todense())

# training data
Xr = np.random.normal(0.0, 1.0, [n_samples, n_features])
yr = model.simulate(beta0, beta, Xr)

# testing data
Xt = np.random.normal(0.0, 1.0, [n_samples, n_features])
yt = model.simulate(beta0, beta, Xt)

# fit Generalized Linear Model
scaler = StandardScaler().fit(Xr)
model.fit(scaler.transform(Xr), yr)

# we'll get .fit_params after .fit(), here we get one set of fit parameters
fit_param = model[-1].fit_

# we can use fitted parameters to predict
yhat = model.predict(scaler.transform(Xt))
print 'reg_lambdas:', model.reg_lambda
print 'pseudo_R2s:', [model.pseudo_R2(yt,i,np.mean(yr)) for i in yhat]

Output:

reg_lambdas: [ 0.5         0.3237394   0.2096144   0.13572088  0.08787639  0.0568981
  0.03684031  0.02385332  0.01544452  0.01      ]
pseudo_R2s: [0.43841608601335824, 0.49823894207455854, 0.54805964244862948, 
0.5411179410997351, 0.56589624430495222, 0.52914657441486135, 0.57277441499592874, 
0.52690973730494284, 0.57043414426245309, 0.53886579676913726]

However if I choose another, even not that different range here’s what happens:

import numpy as np
import scipy.sparse as sps
from sklearn.preprocessing import StandardScaler
from pyglmnet import GLM

# create an instance of the GLM class
model = GLM(distr='poissonexp', verbose=False, alpha=0.05, reg_lambda=[0,0.0001,0.001,0.01,0.1,0.2,0.5])

n_samples, n_features = 10000, 100

# coefficients
beta0 = np.random.normal(0.0, 1.0, 1)
beta = sps.rand(n_features, 1, 0.1)
beta = np.array(beta.todense())

# training data
Xr = np.random.normal(0.0, 1.0, [n_samples, n_features])
yr = model.simulate(beta0, beta, Xr)

# testing data
Xt = np.random.normal(0.0, 1.0, [n_samples, n_features])
yt = model.simulate(beta0, beta, Xt)

# fit Generalized Linear Model
scaler = StandardScaler().fit(Xr)
model.fit(scaler.transform(Xr), yr)

# we'll get .fit_params after .fit(), here we get one set of fit parameters
fit_param = model[-1].fit_

# we can use fitted parameters to predict
yhat = model.predict(scaler.transform(Xt))
print 'reg_lambdas:', model.reg_lambda
print 'pseudo_R2s:', [model.pseudo_R2(yt,i,np.mean(yr)) for i in yhat]

output:

reg_lambdas: [0, 0.0001, 0.001, 0.01, 0.1, 0.2, 0.5]
pseudo_R2s: [0.10370201748019414, -0.024446271196854941, 0.10307940713282349, 
-0.019807866066520852, 0.040822101620644258, 0.087468883241032636, 0.17386339833109343]

The text was updated successfully, but these errors were encountered:

hugoguh · 2016-05-11T16:46:31Z

if I switch the order of the reg_lambdas then it works again.
i.e. if I give it
reg_lambda=[0.5, 0.2, 0.1, 0.01, 0.001, 0.0001, 0]) instead of reg_lambda=[0, 0.0001, 0.001, 0.01, 0.1, 0.2, 0.5]) then it works.

So my guess is that it is working well in this particular example for the higher values of reg_lambda and then it takes advantage of the warm restart.
Which brings 2 issues:

should converge better for any individual lambda;
if it doesn’t then warm restarts might biasing results and hiding bugs/issues: we should always compare warm restar vs non-warm to check for biases, we should have an option to switch-off warm-restarts while we are testing/developing pyglmnet, since it might be hiding bugs

jasmainak · 2016-05-11T17:00:38Z

should converge better for any individual lambda

maybe this could be a test?

if it doesn’t then warm restarts might biasing results and hiding bugs/issues: we should always compare warm restar vs non-warm to check for biases, we should have an option to switch-off warm-restarts while we are testing/developing pyglmnet, since it might be hiding bugs

okay, we can do that too but option should not be in the public API. Maybe in a private function / method ...

hugoguh · 2016-05-11T18:14:44Z

an alternative to shutting down warm restarts is to just run it for each individual lambda at a time and compare it to when running them together.

I haven’t manage to get positive pseudo-R2s on the V4 neuron for the canonical link function in pyglmnet distr=poissonexp, I have tried several reg_lambdas.
Using R with the canonical link I get pseudo-R2 = 0.0383 (+/-0.0059)
Using pyglmnet with the (non-canonical) default link function I get 0.0260 (+/-0.0070).

(these values are setting reg_lambda=0)

pavanramkumar · 2016-05-11T19:19:42Z

according to the paper, the logic of warm restarts doesn't work if we go from low to high lambda. they recommend fitting the largest lambda first and initializing the next lambda after.

fix: the documentation should include this so that user knows to specify lambda's in decreasing order.

@hugoguh shall i close this issue and create a new issue for documentation related changes?

hugoguh · 2016-05-11T19:24:14Z

still it should converge for a single reg_lambda

import numpy as np
import scipy.sparse as sps
from sklearn.preprocessing import StandardScaler
from pyglmnet import GLM

model = GLM(distr='poissonexp', verbose=False, alpha=0.05, reg_lambda=[0.001])

n_samples, n_features = 10000, 100

# coefficients
beta0 = np.random.normal(0.0, 1.0, 1)
beta = sps.rand(n_features, 1, 0.1)
beta = np.array(beta.todense())

# training data
Xr = np.random.normal(0.0, 1.0, [n_samples, n_features])
yr = model.simulate(beta0, beta, Xr)

# testing data
Xt = np.random.normal(0.0, 1.0, [n_samples, n_features])
yt = model.simulate(beta0, beta, Xt)

# fit Generalized Linear Model
scaler = StandardScaler().fit(Xr)
model.fit(scaler.transform(Xr), yr)

# we'll get .fit_params after .fit(), here we get one set of fit parameters
fit_param = model[-1].fit_

# we can use fitted parameters to predict
yhat = model.predict(scaler.transform(Xt))
print 'reg_lambdas:', model.reg_lambda
print 'pseudo_R2s:', [model.pseudo_R2(yt,i,np.mean(yr)) for i in yhat]

output:

reg_lambdas: [0.001]
pseudo_R2s: [-0.024434697354696278]

hugoguh · 2016-05-11T19:58:42Z

I don’t think this issue should be closed.

It’s not about documenting the order of the lambdas when using warm starts (though that should be documented).

This issue was about the convergence which still fails in this case. While R doesn’t.
So far I have tried pyglmnet on two real datasets and it has failed to converge for both:

failed on V4 (canonical poisson) while R works
failed on Electro imaging data (normal with l1-reg) for which sklearn' lasso worked, it worked better when we changed the learning rate but sill with poor results.
needs a better optimization algorithm

jasmainak · 2016-05-11T20:03:34Z

@hugoguh I agree we should crease out the convergence issues urgently. Did you manage to trace the source of the bug?

pavanramkumar · 2016-05-11T20:33:04Z

@hugoguh agreed that we should make the outputs conform to sklearn and R glmnet package for whichever use cases exist in those tools. I appreciate all the testing!

pavanramkumar · 2016-05-11T20:37:55Z

Let's aim to get to the bottom of the bug.

In your bug report above, I wouldn't call it a convergence issue, so much as a poor choice of reg. parameter. Note that the absolute value of reg. parameter lambda cannot be directly compared for scikit learn and pyglmnet. This is because the cost function (negative log likelihood) is normalized slightly differently. Therefore the relative weight of the penalty term will be different in scikit learn and pyglmnet for the same choice of lambda.

In your bug report above, the test set pseudo-R2 being negative, simply indicates that the model has overfit to the training set. This can be verified if you print both training set (R2r) and test set pseudo-R2s (R2t) for the above example for both scikit learn and pyglmnet.

I would expect to see comparable R2r and R2t for scikit learn but a large positive R2r and negative R2t for pyglmnet

pavanramkumar · 2016-05-11T20:40:39Z

As for the linear case with Roozbeh's data, we figured out that the fit improves a lot with a better choice of learning rate. We can only get to the bottom of this issue after the learning-rate-free newton-cg solver has been implemented in issue #39. My goal is to solve #39 this weekend

hugoguh · 2016-05-11T20:59:22Z

@pavanramkumar
we can compare them directly when setting reg_lambda to zero. R with V4 data and sklearn with Roozbeh’s data worked fine with reg_lambda=0, while pyglmnet didn’t. So I think it is a convergence issue.
I agree that we need the adaptive learning rate optimizer, and while it is not implemented (and also when it is) we should use sklearns newton on our cost function to check that the bug is only a optimization issue

pavanramkumar · 2016-05-12T02:02:51Z

From here: http:https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.ElasticNet.html

alpha : float
Constant that multiplies the penalty terms. Defaults to 1.0 See the notes for the exact mathematical meaning of this parameter. alpha = 0 is equivalent to an ordinary least square, solved by the LinearRegression object. For numerical reasons, using alpha = 0 with the Lasso object is not advised and you should prefer the LinearRegression object.

pavanramkumar · 2016-05-12T02:12:26Z

Turns out this is mainly an issue of regularization path. See conversation in #76
Should I create new issue for "solution dependence on regularization path"?

jasmainak · 2016-05-12T08:44:22Z

let's just rename this issue so that all the conversation is in the same place?

jasmainak added the bug label May 11, 2016

pavanramkumar added docs and removed bug labels May 11, 2016

pavanramkumar changed the title ~~convergence problems with poissonexp~~ mismatch between pyglmnet poissonexp and legacy tools with example V4 data May 11, 2016

pavanramkumar added benchmark docs and removed docs labels May 11, 2016

pavanramkumar changed the title ~~mismatch between pyglmnet poissonexp and legacy tools with example V4 data~~ solution dependence on regularization path Jun 8, 2016

pavanramkumar added the bug label Jun 8, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

solution dependence on regularization path #71

solution dependence on regularization path #71

hugoguh commented May 11, 2016

hugoguh commented May 11, 2016

jasmainak commented May 11, 2016

hugoguh commented May 11, 2016 •

edited

Loading

pavanramkumar commented May 11, 2016

hugoguh commented May 11, 2016

hugoguh commented May 11, 2016

jasmainak commented May 11, 2016

pavanramkumar commented May 11, 2016

pavanramkumar commented May 11, 2016

pavanramkumar commented May 11, 2016

hugoguh commented May 11, 2016 •

edited

Loading

pavanramkumar commented May 12, 2016

pavanramkumar commented May 12, 2016

jasmainak commented May 12, 2016

solution dependence on regularization path #71

solution dependence on regularization path #71

Comments

hugoguh commented May 11, 2016

hugoguh commented May 11, 2016

jasmainak commented May 11, 2016

hugoguh commented May 11, 2016 • edited Loading

pavanramkumar commented May 11, 2016

hugoguh commented May 11, 2016

hugoguh commented May 11, 2016

jasmainak commented May 11, 2016

pavanramkumar commented May 11, 2016

pavanramkumar commented May 11, 2016

pavanramkumar commented May 11, 2016

hugoguh commented May 11, 2016 • edited Loading

pavanramkumar commented May 12, 2016

pavanramkumar commented May 12, 2016

jasmainak commented May 12, 2016

hugoguh commented May 11, 2016 •

edited

Loading

hugoguh commented May 11, 2016 •

edited

Loading