-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GLMCV results do not match GLM with same parameters and optimal lambda #377
Comments
The differences are really tiny if you reduce from pyglmnet import GLMCV
from pyglmnet import GLM
from pyglmnet.datasets import fetch_group_lasso_datasets
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
df, group_idxs = fetch_group_lasso_datasets()
X = df[df.columns.difference(["Label"])].values
y = df.loc[:, "Label"].values
Xtrain, Xtest, ytrain, ytest = \
train_test_split(X, y, test_size=0.2, random_state=42)
# Setup lasso cv model
gl_glm = GLMCV(distr="binomial", tol=1e-7,
score_metric="pseudo_R2",
alpha=1.0, learning_rate=1, max_iter=200, cv=3, verbose=True)
# Fit the model
gl_glm.fit(Xtrain, ytrain)
# Save the optimal lambda based on highest score
opt_lambda = gl_glm.reg_lambda[gl_glm.scores_.index(max(gl_glm.scores_))]
print(opt_lambda) # 0.010000000000000007
# Setup lasso model using optimal lambda found earlier, all other relevant parameters kept the same
glm = GLM(distr="binomial", tol=1e-7, reg_lambda=opt_lambda,
score_metric="pseudo_R2",
alpha=1.0, learning_rate=1., max_iter=200, verbose=True)
# Fit the model
glm.fit(Xtrain, ytrain)
# Compare beta coefficients
print(gl_glm.beta_ - glm.beta_)
plt.plot(gl_glm.beta_)
plt.plot(glm.beta_, 'r')
plt.show() This is specially the case for the second model which is not helped by warm start. |
Thanks, I noticed this too - that changing the the That said, there are a few instances in your example where |
Well, you need to push the convergence even further. You'll see that there is this warning currently: /Users/mainak/Documents/github_repos/pyglmnet/pyglmnet/pyglmnet.py:900: UserWarning: Reached max number of iterations without convergence.
"Reached max number of iterations without convergence.") Use I agree it's a bit hard to debug this. Wouldn't be opposed to adding a method |
Thanks for the help. Yes a |
Problem
Optimal penalization parameter (lambda) found via GLMCV does not yield similar results when plugged into GLM with otherwise similar parameters.
Example
Script below mostly follow the group lasso example code in the docs except modified for regular lasso instead of group.
Results
You can tweak the learning rate and iterations of the second model, but the results will never match those of GLMCV even with many iterations and a low learning rate.
I understand there may be some inherent instability given the way that convergence is reached, but this feels like too much. Especially since an important purpose of lasso is to do feature selection, if certain variables have non-zero coefficients in one model but not the other, this somewhat defeats that purpose.
Thank you for looking into this.
The text was updated successfully, but these errors were encountered: