Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

convergence confusion #311

Open
inferentialist opened this issue Aug 5, 2019 · 8 comments
Open

convergence confusion #311

inferentialist opened this issue Aug 5, 2019 · 8 comments

Comments

@inferentialist
Copy link

Hey there. I'm really excited about this package, but I've been seeing some unexpected convergence behavior and was hoping this might be a good place to ask for clarity. I'm happy to write this up in more detail, or dig into the code myself if necessary; let me know what would be most helpful.

Here's the short version of my scenario. I'm running a logistic group lasso with the divergence loss. I'm just ramping up the penalty parameter (lambda) across several orders of magnitude and, in the high-lambda regimes where the non-constant coefficients are forced to zero, I'm not seeing any convergence of the constant-term coefficient.

Tinkering with the tol parameter gets me the numerical behavior I would have expected, but is orders of magnitude slower and (with the verbose parameter set to True) fails to report convergence.

I might have thought that will the non-constant terms forced to zero, estimation of the constant term coefficient would be trivial. So, even with a loose tolerance, I suppose I'm surprised that my options are (a) the constant term coefficient doesn't converge or (b) everything slows down dramatically.

As a point of comparison, the R glmnet package, while less fully featured, doesn't seem to suffer from the issue.

Let me know if you have any thoughts about how I might help or where I should start digging around in the code. Thanks!

@jasmainak
Copy link
Member

Can I ask if you are using the development version? And which solver are you using?

It's a known issue for the cdfast solver. See: #275 and #289

@inferentialist
Copy link
Author

This reproduces the issue. I'm using mostly defaults; in particular, the "batch-gradient" solver. I assume I'm using the development version as well: git log dates this as a Jun 3 2019 commit.

# $ git log

# commit c962c840c4669ae69e623a2595eba40ae50e460d 
#                         (HEAD -> master, origin/master, origin/HEAD)
# Author: Mainak Jas <[email protected]>
# Date:   Mon Jun 3 23:56:36 2019 -0400

#     FIX test and tweak to whats_new

import pandas as pd
import numpy as np
from numpy.linalg import norm
import itertools
from pyglmnet import GLM

# read the data

df = pd.read_csv(r"https://dlennon.org/assets/repro/bddae97a.gz")

y = df['Churn']
X = df.drop('Churn', axis=1)

# group ids
pref    = [c.split('_')[0] for c in X.columns]
grp_id  = [j for j,r in enumerate(itertools.groupby(pref), start=1) for k in r[1]]


# fit models as a function of lambda
n_mod   = 13
models  = []
lam     = np.logspace(-6,6,num=n_mod)

model_params = dict(
  distr   = 'binomial', 
  group   = grp_id,
  tol     = 0.,
  verbose = 3
)

for i,l in enumerate(lam):
  clf = GLM(reg_lambda = l, **model_params)

  # warm start, if possible
  try:
    clf.beta0_  = models[i-1].beta0_
    clf.beta_   = models[i-1].beta_
  except IndexError:
    pass

  clf.fit(X.values, y.values)
  models.append(clf)


# ------------------------------------------------------------------------------
# These are the tol=0.5 results.  Beta is zero, but beta0 is not constant and
# not converging very quickly to -1.016114.

# In [68]: %paste                                                                                                                     
# pd.DataFrame({
#   'lambda'    : lam,
#   'beta0'     : [m.beta0_ for m in models], 
#   'norm_beta' : [norm(m.beta_) for m in models]
#   })
# ## -- End pasted text --
# Out[68]: 
#             lambda     beta0  norm_beta
# 0         0.000001 -0.042286   0.265220
# 1         0.000010 -0.075504   0.370958
# 2         0.000100 -0.092875   0.452565
# 3         0.001000 -0.104283   0.518814
# 4         0.010000 -0.114575   0.541784
# 5         0.100000 -0.139812   0.238829
# 6         1.000000 -0.372410   0.000000
# 7        10.000000 -0.427910   0.000000
# 8       100.000000 -0.478218   0.000000
# 9      1000.000000 -0.523872   0.000000
# 10    10000.000000 -0.565348   0.000000
# 11   100000.000000 -0.603068   0.000000
# 12  1000000.000000 -0.637410   0.000000


# ------------------------------------------------------------------------------

# Taking tol = 0.1 gives the expected behavior but this is much slower and 
# no convergence (from verbose=True) is reported after lambda > 0.1

# In [70]: pd.DataFrame({ 
#     ...:   'lambda'    : lam, 
#     ...:   'beta0'     : [m.beta0_ for m in models],  
#     ...:   'norm_beta' : [norm(m.beta_) for m in models] 
#     ...:   }) 
#     ...:                                                                                                                            
# Out[70]: 
#             lambda     beta0  norm_beta
# 0         0.000001 -0.139408   0.798327
# 1         0.000010 -0.143974   0.834305
# 2         0.000100 -0.148182   0.866839
# 3         0.001000 -0.152151   0.892953
# 4         0.010000 -0.156752   0.879674
# 5         0.100000 -1.016114   0.000000
# 6         1.000000 -1.016114   0.000000
# 7        10.000000 -1.016114   0.000000
# 8       100.000000 -1.016114   0.000000
# 9      1000.000000 -1.016114   0.000000
# 10    10000.000000 -1.016114   0.000000
# 11   100000.000000 -1.016114   0.000000
# 12  1000000.000000 -1.016114   0.000000

@jasmainak
Copy link
Member

hi @inferentialist sorry for the delay in getting back to you. I finally had a moment to look at this but I can't reproduce your problem. See my code here:

import pandas as pd
import numpy as np
from numpy.linalg import norm
import itertools
from pyglmnet import GLM

# read the data

df = pd.read_csv(r"https://dlennon.org/assets/repro/bddae97a.gz")

y = df['Churn']
X = df.drop('Churn', axis=1)

# group ids
pref    = [c.split('_')[0] for c in X.columns]
grp_id  = [j for j,r in enumerate(itertools.groupby(pref), start=1) for k in r[1]]


# fit models as a function of lambda
n_mod   = 13
models  = []
lam     = np.logspace(-6,6,num=n_mod)

model_params = dict(
  distr   = 'binomial', 
  group   = grp_id,
  tol     = 0.1,
  verbose = 3,
  max_iter = 1000
)

for i,l in enumerate(lam):
  clf = GLM(reg_lambda = l, **model_params)

  # warm start, if possible
  try:
    clf.beta0_  = models[i-1].beta0_
    clf.beta_   = models[i-1].beta_
  except IndexError:
    pass

  clf.fit(X.values, y.values)
  models.append(clf)
  print(clf.beta0_)

It seems that convergence is just fine even for lambda > 1. I tried both with latest master and the commit you are on. For tol=0.5 I would expect it not to converge, so nothing unexpected there. It seems a bit slow for tol=0.1 but not too bad on the dataset size you have. I have never used the R package, so I don't have a point of reference. But if you want to help write a faster solver, I would be very happy to review and merge that code into pyglmnet.

@inferentialist
Copy link
Author

@jasmainak, thanks for taking a look. I was hoping you'd have a quick fix; sorry to have handed you a difficult to reproduce result. For what it's worth, I love the idea of what you're doing here with pyglmnet, and I'd like to get involved. In terms of references, I'd probably be inclined to start with the Lukas Meier JRSSB 2008 paper, if only because I seem to recall that the R version of glmnet is based on it. If you have suggestions, beyond what's on the documentation webpage, on how to get from Lukas' paper to the current codebase, definitely feel free to make suggestions.

@jasmainak
Copy link
Member

If the R version of glmnet is based on it, I think it sounds like a great idea. What solver does it use?

By the way if you are working with group lasso, it would be great if you can also suggest an example dataset. Our current example takes ages to finish ... not sure how much a better solver would help.

@inferentialist
Copy link
Author

inferentialist commented Aug 26, 2019 via email

@jasmainak
Copy link
Member

hi @inferentialist I had a brief look at the paper you mention, it sounds promising. But we will need some work to integrate it into the current codebase. I'd be happy to add you as a contributing author of course, the criteria will be quite liberal anyway and if you help integrating these block coordinate descent solvers, it would certainly be a sizable contribution.

I would start small with a WIP pull request so you don't spend too much time in the wrong direction and we can iterate together. The current group lasso example (originally implemented by @evadyer) takes over an hour to run! I think we use the same data as in the paper that you cite so comparison should be easy between packages.

and before you dig any further, let me also point you to this open PR: https://github.com/glm-tools/pyglmnet/pull/226/files. Feel free to review and/or try it. I see that the paper uses line search, I wonder if it already solves some problems.

@pavanramkumar
Copy link
Collaborator

Hey @inferentialist thanks for your thoughtful suggestions regarding alternate solvers. We've made some improvements to our convergence criteria recently and have made a new release on PyPI along with various improvements. You can pip install the latest version. Let us know if these improvements can solve your convergence problems.

As far as contributions go for future releases, we'd love to make our coordinate descent based solver faster: it's a better solver than batch gradient in terms of convergence properties in theory but it isn't the case in practice simply because our current implementation is pure Python. Translating the solver to cython is a potential enhancement. Also happy to explore the block-coordinate direction if that's your interest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants