Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

solution dependence on regularization path #71

Open
hugoguh opened this issue May 11, 2016 · 14 comments
Open

solution dependence on regularization path #71

hugoguh opened this issue May 11, 2016 · 14 comments

Comments

@hugoguh
Copy link
Collaborator

hugoguh commented May 11, 2016

I got negative pseudo-R2s on the V4 data

so I checked:
running with default reg_lambdas seems to work fine

import numpy as np
import scipy.sparse as sps
from sklearn.preprocessing import StandardScaler
from pyglmnet import GLM

# create an instance of the GLM class
model = GLM(distr='poissonexp', verbose=False, alpha=0.05)

n_samples, n_features = 10000, 100

# coefficients
beta0 = np.random.normal(0.0, 1.0, 1)
beta = sps.rand(n_features, 1, 0.1)
beta = np.array(beta.todense())

# training data
Xr = np.random.normal(0.0, 1.0, [n_samples, n_features])
yr = model.simulate(beta0, beta, Xr)

# testing data
Xt = np.random.normal(0.0, 1.0, [n_samples, n_features])
yt = model.simulate(beta0, beta, Xt)

# fit Generalized Linear Model
scaler = StandardScaler().fit(Xr)
model.fit(scaler.transform(Xr), yr)

# we'll get .fit_params after .fit(), here we get one set of fit parameters
fit_param = model[-1].fit_

# we can use fitted parameters to predict
yhat = model.predict(scaler.transform(Xt))
print 'reg_lambdas:', model.reg_lambda
print 'pseudo_R2s:', [model.pseudo_R2(yt,i,np.mean(yr)) for i in yhat]

Output:

reg_lambdas: [ 0.5         0.3237394   0.2096144   0.13572088  0.08787639  0.0568981
  0.03684031  0.02385332  0.01544452  0.01      ]
pseudo_R2s: [0.43841608601335824, 0.49823894207455854, 0.54805964244862948, 
0.5411179410997351, 0.56589624430495222, 0.52914657441486135, 0.57277441499592874, 
0.52690973730494284, 0.57043414426245309, 0.53886579676913726]

However if I choose another, even not that different range here’s what happens:

import numpy as np
import scipy.sparse as sps
from sklearn.preprocessing import StandardScaler
from pyglmnet import GLM

# create an instance of the GLM class
model = GLM(distr='poissonexp', verbose=False, alpha=0.05, reg_lambda=[0,0.0001,0.001,0.01,0.1,0.2,0.5])

n_samples, n_features = 10000, 100

# coefficients
beta0 = np.random.normal(0.0, 1.0, 1)
beta = sps.rand(n_features, 1, 0.1)
beta = np.array(beta.todense())

# training data
Xr = np.random.normal(0.0, 1.0, [n_samples, n_features])
yr = model.simulate(beta0, beta, Xr)

# testing data
Xt = np.random.normal(0.0, 1.0, [n_samples, n_features])
yt = model.simulate(beta0, beta, Xt)

# fit Generalized Linear Model
scaler = StandardScaler().fit(Xr)
model.fit(scaler.transform(Xr), yr)

# we'll get .fit_params after .fit(), here we get one set of fit parameters
fit_param = model[-1].fit_

# we can use fitted parameters to predict
yhat = model.predict(scaler.transform(Xt))
print 'reg_lambdas:', model.reg_lambda
print 'pseudo_R2s:', [model.pseudo_R2(yt,i,np.mean(yr)) for i in yhat]

output:

reg_lambdas: [0, 0.0001, 0.001, 0.01, 0.1, 0.2, 0.5]
pseudo_R2s: [0.10370201748019414, -0.024446271196854941, 0.10307940713282349, 
-0.019807866066520852, 0.040822101620644258, 0.087468883241032636, 0.17386339833109343]
@jasmainak jasmainak added the bug label May 11, 2016
@hugoguh
Copy link
Collaborator Author

hugoguh commented May 11, 2016

if I switch the order of the reg_lambdas then it works again.
i.e. if I give it
reg_lambda=[0.5, 0.2, 0.1, 0.01, 0.001, 0.0001, 0]) instead of reg_lambda=[0, 0.0001, 0.001, 0.01, 0.1, 0.2, 0.5]) then it works.

So my guess is that it is working well in this particular example for the higher values of reg_lambda and then it takes advantage of the warm restart.
Which brings 2 issues:

  • should converge better for any individual lambda;
  • if it doesn’t then warm restarts might biasing results and hiding bugs/issues: we should always compare warm restar vs non-warm to check for biases, we should have an option to switch-off warm-restarts while we are testing/developing pyglmnet, since it might be hiding bugs

@jasmainak
Copy link
Member

should converge better for any individual lambda

maybe this could be a test?

if it doesn’t then warm restarts might biasing results and hiding bugs/issues: we should always compare warm restar vs non-warm to check for biases, we should have an option to switch-off warm-restarts while we are testing/developing pyglmnet, since it might be hiding bugs

okay, we can do that too but option should not be in the public API. Maybe in a private function / method ...

@hugoguh
Copy link
Collaborator Author

hugoguh commented May 11, 2016

an alternative to shutting down warm restarts is to just run it for each individual lambda at a time and compare it to when running them together.

I haven’t manage to get positive pseudo-R2s on the V4 neuron for the canonical link function in pyglmnet distr=poissonexp, I have tried several reg_lambdas.
Using R with the canonical link I get pseudo-R2 = 0.0383 (+/-0.0059)
Using pyglmnet with the (non-canonical) default link function I get 0.0260 (+/-0.0070).

(these values are setting reg_lambda=0)

@pavanramkumar
Copy link
Collaborator

according to the paper, the logic of warm restarts doesn't work if we go from low to high lambda. they recommend fitting the largest lambda first and initializing the next lambda after.

fix: the documentation should include this so that user knows to specify lambda's in decreasing order.

@hugoguh shall i close this issue and create a new issue for documentation related changes?

@pavanramkumar pavanramkumar added docs and removed bug labels May 11, 2016
@hugoguh
Copy link
Collaborator Author

hugoguh commented May 11, 2016

still it should converge for a single reg_lambda

import numpy as np
import scipy.sparse as sps
from sklearn.preprocessing import StandardScaler
from pyglmnet import GLM

model = GLM(distr='poissonexp', verbose=False, alpha=0.05, reg_lambda=[0.001])

n_samples, n_features = 10000, 100

# coefficients
beta0 = np.random.normal(0.0, 1.0, 1)
beta = sps.rand(n_features, 1, 0.1)
beta = np.array(beta.todense())

# training data
Xr = np.random.normal(0.0, 1.0, [n_samples, n_features])
yr = model.simulate(beta0, beta, Xr)

# testing data
Xt = np.random.normal(0.0, 1.0, [n_samples, n_features])
yt = model.simulate(beta0, beta, Xt)

# fit Generalized Linear Model
scaler = StandardScaler().fit(Xr)
model.fit(scaler.transform(Xr), yr)

# we'll get .fit_params after .fit(), here we get one set of fit parameters
fit_param = model[-1].fit_

# we can use fitted parameters to predict
yhat = model.predict(scaler.transform(Xt))
print 'reg_lambdas:', model.reg_lambda
print 'pseudo_R2s:', [model.pseudo_R2(yt,i,np.mean(yr)) for i in yhat]

output:

reg_lambdas: [0.001]
pseudo_R2s: [-0.024434697354696278]

@hugoguh
Copy link
Collaborator Author

hugoguh commented May 11, 2016

I don’t think this issue should be closed.

It’s not about documenting the order of the lambdas when using warm starts (though that should be documented).

This issue was about the convergence which still fails in this case. While R doesn’t.
So far I have tried pyglmnet on two real datasets and it has failed to converge for both:

  • failed on V4 (canonical poisson) while R works
  • failed on Electro imaging data (normal with l1-reg) for which sklearn' lasso worked, it worked better when we changed the learning rate but sill with poor results.
    needs a better optimization algorithm

@jasmainak
Copy link
Member

@hugoguh I agree we should crease out the convergence issues urgently. Did you manage to trace the source of the bug?

@pavanramkumar
Copy link
Collaborator

@hugoguh agreed that we should make the outputs conform to sklearn and R glmnet package for whichever use cases exist in those tools. I appreciate all the testing!

@pavanramkumar
Copy link
Collaborator

Let's aim to get to the bottom of the bug.

In your bug report above, I wouldn't call it a convergence issue, so much as a poor choice of reg. parameter. Note that the absolute value of reg. parameter lambda cannot be directly compared for scikit learn and pyglmnet. This is because the cost function (negative log likelihood) is normalized slightly differently. Therefore the relative weight of the penalty term will be different in scikit learn and pyglmnet for the same choice of lambda.

In your bug report above, the test set pseudo-R2 being negative, simply indicates that the model has overfit to the training set. This can be verified if you print both training set (R2r) and test set pseudo-R2s (R2t) for the above example for both scikit learn and pyglmnet.

I would expect to see comparable R2r and R2t for scikit learn but a large positive R2r and negative R2t for pyglmnet

@pavanramkumar
Copy link
Collaborator

As for the linear case with Roozbeh's data, we figured out that the fit improves a lot with a better choice of learning rate. We can only get to the bottom of this issue after the learning-rate-free newton-cg solver has been implemented in issue #39. My goal is to solve #39 this weekend

@pavanramkumar pavanramkumar changed the title convergence problems with poissonexp mismatch between pyglmnet poissonexp and legacy tools with example V4 data May 11, 2016
@hugoguh
Copy link
Collaborator Author

hugoguh commented May 11, 2016

@pavanramkumar
we can compare them directly when setting reg_lambda to zero. R with V4 data and sklearn with Roozbeh’s data worked fine with reg_lambda=0, while pyglmnet didn’t. So I think it is a convergence issue.
I agree that we need the adaptive learning rate optimizer, and while it is not implemented (and also when it is) we should use sklearns newton on our cost function to check that the bug is only a optimization issue

@pavanramkumar
Copy link
Collaborator

From here: http:https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.ElasticNet.html

alpha : float
Constant that multiplies the penalty terms. Defaults to 1.0 See the notes for the exact mathematical meaning of this parameter. alpha = 0 is equivalent to an ordinary least square, solved by the LinearRegression object. For numerical reasons, using alpha = 0 with the Lasso object is not advised and you should prefer the LinearRegression object.

@pavanramkumar
Copy link
Collaborator

Turns out this is mainly an issue of regularization path. See conversation in #76
Should I create new issue for "solution dependence on regularization path"?

@jasmainak
Copy link
Member

let's just rename this issue so that all the conversation is in the same place?

@pavanramkumar pavanramkumar changed the title mismatch between pyglmnet poissonexp and legacy tools with example V4 data solution dependence on regularization path Jun 8, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants