Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

offset and/or weights for survival analysis #221

Open
shearerpmm opened this issue Sep 29, 2017 · 4 comments
Open

offset and/or weights for survival analysis #221

shearerpmm opened this issue Sep 29, 2017 · 4 comments
Milestone

Comments

@shearerpmm
Copy link

Poisson regression is very commonly used for survival analysis. In this context, it is necessary to include the exposure time as a log-offset or via weighting. It appears that currently pyglmnet has neither option; the package would be much more widely useful for Poisson regression if it included one or both of these options.

@shearerpmm shearerpmm changed the title offset and/or weights for event rate analysis offset and/or weights for survival analysis Sep 29, 2017
@jasmainak
Copy link
Member

hmm ... interesting. So, if we provide an offset param which is subtracted during the fit and added during the predict, would that be it? Do you have time to help us?

@shearerpmm
Copy link
Author

I'm interested but wish I had a better sense of how this algorithm scales. There are a couple dozen half-implemented group lasso solvers that sort of work on small datasets, and I'm looking for the one really good one that can scale to my problem (millions of rows and thousands of predictors)

@pavanramkumar
Copy link
Collaborator

@shearerp it's always great to hear about concrete use cases in the context of feature requests.

have a look at our readme page where a basic set of benchmarks is published comparing runtimes for 1000 samples x 100 features against scikit-learn, statsmodels, and R. we're slightly slower than scikit-learn and faster than statsmodels, primarily because we didn't want to prematurely optimize (cythonize) our solvers.

if you'd like to run benchmarks against larger datasets, have a look at BenchmarkGLM() in pyglmnet.datasets which we use internally for benchmarks. you'd have to take care of the dependencies though. if you do run benchmarks for your dataset, we'd love to know what you think.

in general as far as scalability goes, we may be able to support in-memory computations but currently may not have enough resources to extend to distributed / streaming data use cases.

@jasmainak
Copy link
Member

Feel free to open a pull request. We are open to contributions

@jasmainak jasmainak added this to the version 0.3 milestone Aug 26, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants