Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GLM tests of scikit-learn #723

Open
lorentzenchr opened this issue Oct 31, 2023 · 3 comments
Open

GLM tests of scikit-learn #723

lorentzenchr opened this issue Oct 31, 2023 · 3 comments

Comments

@lorentzenchr
Copy link
Contributor

Scikit-learn has some very strict tests for GLMs in https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/linear_model/_glm/tests/test_glm.py. I modified the file to test glum.GeneralizedLinearRegressor instead, see https://gist.github.com/lorentzenchr/2e319bcfd4aadfbea64c6330e5b33521. Running pytest test_glm.py results in 76 failed, 212 passed, 104 warnings.

It might be interesting to include those tests in glum.

@jtilly
Copy link
Member

jtilly commented Oct 31, 2023

Thanks a lot! For future reference, these are the failing tests:

FAILED tests_glum.py::test_glm_regression_unpenalized[wide-GeneralizedLinearRegressor(family='binomial')-True-irls-ls] - numpy.linalg.LinAlgError: Matrix is singular.
FAILED tests_glum.py::test_glm_regression_unpenalized[wide-GeneralizedLinearRegressor(family='binomial')-True-irls-cd] - assert 0.690107820640591 == 1.2837501395684472 ± 6.4e-05
FAILED tests_glum.py::test_glm_regression_unpenalized[wide-GeneralizedLinearRegressor(family='binomial')-False-lbfgs] - AssertionError: 
FAILED tests_glum.py::test_glm_regression_unpenalized[wide-GeneralizedLinearRegressor(family='binomial')-False-irls-ls] - numpy.linalg.LinAlgError: Matrix is singular.
FAILED tests_glum.py::test_glm_regression_unpenalized[wide-GeneralizedLinearRegressor(family='binomial')-False-irls-cd] - AssertionError: 
FAILED tests_glum.py::test_glm_regression_unpenalized[wide-GeneralizedLinearRegressor(family='poisson')-True-irls-ls] - numpy.linalg.LinAlgError: Matrix is singular.
FAILED tests_glum.py::test_glm_regression_unpenalized[wide-GeneralizedLinearRegressor(family='poisson')-True-irls-cd] - assert 0.8533955861703721 == 1.2837501395684472 ± 6.4e-05
FAILED tests_glum.py::test_glm_regression_unpenalized[wide-GeneralizedLinearRegressor(family='poisson')-False-lbfgs] - AssertionError: 
FAILED tests_glum.py::test_glm_regression_unpenalized[wide-GeneralizedLinearRegressor(family='poisson')-False-irls-ls] - numpy.linalg.LinAlgError: Matrix is singular.
FAILED tests_glum.py::test_glm_regression_unpenalized[wide-GeneralizedLinearRegressor(family='poisson')-False-irls-cd] - AssertionError: 
FAILED tests_glum.py::test_glm_regression_unpenalized[wide-GeneralizedLinearRegressor(family='gamma')-True-irls-ls] - numpy.linalg.LinAlgError: Matrix is singular.
FAILED tests_glum.py::test_glm_regression_unpenalized[wide-GeneralizedLinearRegressor(family='gamma')-True-irls-cd] - assert 0.8051836315439316 == 1.2837501395684472 ± 6.4e-05
FAILED tests_glum.py::test_glm_regression_unpenalized[wide-GeneralizedLinearRegressor(family='gamma')-False-lbfgs] - AssertionError: 
FAILED tests_glum.py::test_glm_regression_unpenalized[wide-GeneralizedLinearRegressor(family='gamma')-False-irls-ls] - numpy.linalg.LinAlgError: Matrix is singular.
FAILED tests_glum.py::test_glm_regression_unpenalized[wide-GeneralizedLinearRegressor(family='gamma')-False-irls-cd] - AssertionError: 
FAILED tests_glum.py::test_glm_regression_unpenalized[wide-GeneralizedLinearRegressor(family='tweedie')-True-irls-ls] - numpy.linalg.LinAlgError: Matrix is singular.
FAILED tests_glum.py::test_glm_regression_unpenalized[wide-GeneralizedLinearRegressor(family='tweedie')-True-irls-cd] - assert 0.6292876313885498 == 1.2837501395684472 ± 6.4e-05
FAILED tests_glum.py::test_glm_regression_unpenalized[wide-GeneralizedLinearRegressor(family='tweedie')-False-lbfgs] - AssertionError: 
FAILED tests_glum.py::test_glm_regression_unpenalized[wide-GeneralizedLinearRegressor(family='tweedie')-False-irls-ls] - numpy.linalg.LinAlgError: Matrix is singular.
FAILED tests_glum.py::test_glm_regression_unpenalized[wide-GeneralizedLinearRegressor(family='tweedie')-False-irls-cd] - AssertionError: 
FAILED tests_glum.py::test_glm_regression_unpenalized_hstacked_X[long-GeneralizedLinearRegressor(family='binomial')-True-irls-ls] - numpy.linalg.LinAlgError: Matrix is singular.
FAILED tests_glum.py::test_glm_regression_unpenalized_hstacked_X[long-GeneralizedLinearRegressor(family='binomial')-True-irls-cd] - AssertionError: 
FAILED tests_glum.py::test_glm_regression_unpenalized_hstacked_X[long-GeneralizedLinearRegressor(family='binomial')-False-irls-ls] - numpy.linalg.LinAlgError: Matrix is singular.
FAILED tests_glum.py::test_glm_regression_unpenalized_hstacked_X[long-GeneralizedLinearRegressor(family='binomial')-False-irls-cd] - AssertionError: 
FAILED tests_glum.py::test_glm_regression_unpenalized_hstacked_X[long-GeneralizedLinearRegressor(family='poisson')-True-irls-ls] - numpy.linalg.LinAlgError: Matrix is singular.
FAILED tests_glum.py::test_glm_regression_unpenalized_hstacked_X[long-GeneralizedLinearRegressor(family='poisson')-True-irls-cd] - AssertionError: 
FAILED tests_glum.py::test_glm_regression_unpenalized_hstacked_X[long-GeneralizedLinearRegressor(family='poisson')-False-irls-ls] - numpy.linalg.LinAlgError: Matrix is singular.
FAILED tests_glum.py::test_glm_regression_unpenalized_hstacked_X[long-GeneralizedLinearRegressor(family='poisson')-False-irls-cd] - AssertionError: 
FAILED tests_glum.py::test_glm_regression_unpenalized_hstacked_X[long-GeneralizedLinearRegressor(family='gamma')-True-irls-ls] - numpy.linalg.LinAlgError: Matrix is singular.
FAILED tests_glum.py::test_glm_regression_unpenalized_hstacked_X[long-GeneralizedLinearRegressor(family='gamma')-True-irls-cd] - AssertionError: 
FAILED tests_glum.py::test_glm_regression_unpenalized_hstacked_X[long-GeneralizedLinearRegressor(family='gamma')-False-irls-ls] - numpy.linalg.LinAlgError: Matrix is singular.
FAILED tests_glum.py::test_glm_regression_unpenalized_hstacked_X[long-GeneralizedLinearRegressor(family='gamma')-False-irls-cd] - AssertionError: 
FAILED tests_glum.py::test_glm_regression_unpenalized_hstacked_X[long-GeneralizedLinearRegressor(family='tweedie')-True-irls-ls] - numpy.linalg.LinAlgError: Matrix is singular.
FAILED tests_glum.py::test_glm_regression_unpenalized_hstacked_X[long-GeneralizedLinearRegressor(family='tweedie')-True-irls-cd] - AssertionError: 
FAILED tests_glum.py::test_glm_regression_unpenalized_hstacked_X[long-GeneralizedLinearRegressor(family='tweedie')-False-irls-ls] - numpy.linalg.LinAlgError: Matrix is singular.
FAILED tests_glum.py::test_glm_regression_unpenalized_hstacked_X[long-GeneralizedLinearRegressor(family='tweedie')-False-irls-cd] - AssertionError: 
FAILED tests_glum.py::test_glm_regression_unpenalized_hstacked_X[wide-GeneralizedLinearRegressor(family='binomial')-True-irls-ls] - numpy.linalg.LinAlgError: Matrix is singular.
FAILED tests_glum.py::test_glm_regression_unpenalized_hstacked_X[wide-GeneralizedLinearRegressor(family='binomial')-True-irls-cd] - assert 1.3802141997400277 == 1.2837501395684472 ± 6.4e-06
FAILED tests_glum.py::test_glm_regression_unpenalized_hstacked_X[wide-GeneralizedLinearRegressor(family='binomial')-False-lbfgs] - AssertionError: 
FAILED tests_glum.py::test_glm_regression_unpenalized_hstacked_X[wide-GeneralizedLinearRegressor(family='binomial')-False-irls-ls] - numpy.linalg.LinAlgError: Matrix is singular.
FAILED tests_glum.py::test_glm_regression_unpenalized_hstacked_X[wide-GeneralizedLinearRegressor(family='binomial')-False-irls-cd] - AssertionError: 
FAILED tests_glum.py::test_glm_regression_unpenalized_hstacked_X[wide-GeneralizedLinearRegressor(family='poisson')-True-irls-ls] - numpy.linalg.LinAlgError: Matrix is singular.
FAILED tests_glum.py::test_glm_regression_unpenalized_hstacked_X[wide-GeneralizedLinearRegressor(family='poisson')-True-irls-cd] - assert 1.706489240970734 == 1.2837501395684472 ± 6.4e-06
FAILED tests_glum.py::test_glm_regression_unpenalized_hstacked_X[wide-GeneralizedLinearRegressor(family='poisson')-False-lbfgs] - AssertionError: 
FAILED tests_glum.py::test_glm_regression_unpenalized_hstacked_X[wide-GeneralizedLinearRegressor(family='poisson')-False-irls-ls] - numpy.linalg.LinAlgError: Matrix is singular.
FAILED tests_glum.py::test_glm_regression_unpenalized_hstacked_X[wide-GeneralizedLinearRegressor(family='poisson')-False-irls-cd] - AssertionError: 
FAILED tests_glum.py::test_glm_regression_unpenalized_hstacked_X[wide-GeneralizedLinearRegressor(family='gamma')-True-irls-ls] - numpy.linalg.LinAlgError: Matrix is singular.
FAILED tests_glum.py::test_glm_regression_unpenalized_hstacked_X[wide-GeneralizedLinearRegressor(family='gamma')-True-irls-cd] - assert 2.1915879526750373 == 1.2837501395684472 ± 6.4e-06
FAILED tests_glum.py::test_glm_regression_unpenalized_hstacked_X[wide-GeneralizedLinearRegressor(family='gamma')-False-lbfgs] - AssertionError: 
FAILED tests_glum.py::test_glm_regression_unpenalized_hstacked_X[wide-GeneralizedLinearRegressor(family='gamma')-False-irls-ls] - numpy.linalg.LinAlgError: Matrix is singular.
FAILED tests_glum.py::test_glm_regression_unpenalized_hstacked_X[wide-GeneralizedLinearRegressor(family='gamma')-False-irls-cd] - AssertionError: 
FAILED tests_glum.py::test_glm_regression_unpenalized_hstacked_X[wide-GeneralizedLinearRegressor(family='tweedie')-True-irls-ls] - numpy.linalg.LinAlgError: Matrix is singular.
FAILED tests_glum.py::test_glm_regression_unpenalized_hstacked_X[wide-GeneralizedLinearRegressor(family='tweedie')-True-irls-cd] - assert 1.2585688960366177 == 1.2837501395684472 ± 6.4e-06
FAILED tests_glum.py::test_glm_regression_unpenalized_hstacked_X[wide-GeneralizedLinearRegressor(family='tweedie')-False-lbfgs] - AssertionError: 
FAILED tests_glum.py::test_glm_regression_unpenalized_hstacked_X[wide-GeneralizedLinearRegressor(family='tweedie')-False-irls-ls] - numpy.linalg.LinAlgError: Matrix is singular.
FAILED tests_glum.py::test_glm_regression_unpenalized_hstacked_X[wide-GeneralizedLinearRegressor(family='tweedie')-False-irls-cd] - AssertionError: 
FAILED tests_glum.py::test_glm_regression_unpenalized_vstacked_X[wide-GeneralizedLinearRegressor(family='binomial')-True-irls-ls] - numpy.linalg.LinAlgError: Matrix is singular.
FAILED tests_glum.py::test_glm_regression_unpenalized_vstacked_X[wide-GeneralizedLinearRegressor(family='binomial')-True-irls-cd] - assert 0.6901078206405913 == 1.2837501395684472 ± 1.3e-04
FAILED tests_glum.py::test_glm_regression_unpenalized_vstacked_X[wide-GeneralizedLinearRegressor(family='binomial')-False-lbfgs] - AssertionError: 
FAILED tests_glum.py::test_glm_regression_unpenalized_vstacked_X[wide-GeneralizedLinearRegressor(family='binomial')-False-irls-ls] - numpy.linalg.LinAlgError: Matrix is singular.
FAILED tests_glum.py::test_glm_regression_unpenalized_vstacked_X[wide-GeneralizedLinearRegressor(family='binomial')-False-irls-cd] - AssertionError: 
FAILED tests_glum.py::test_glm_regression_unpenalized_vstacked_X[wide-GeneralizedLinearRegressor(family='poisson')-True-irls-ls] - numpy.linalg.LinAlgError: Matrix is singular.
FAILED tests_glum.py::test_glm_regression_unpenalized_vstacked_X[wide-GeneralizedLinearRegressor(family='poisson')-True-irls-cd] - assert 0.8533955861703684 == 1.2837501395684472 ± 1.3e-04
FAILED tests_glum.py::test_glm_regression_unpenalized_vstacked_X[wide-GeneralizedLinearRegressor(family='poisson')-False-lbfgs] - AssertionError: 
FAILED tests_glum.py::test_glm_regression_unpenalized_vstacked_X[wide-GeneralizedLinearRegressor(family='poisson')-False-irls-ls] - numpy.linalg.LinAlgError: Matrix is singular.
FAILED tests_glum.py::test_glm_regression_unpenalized_vstacked_X[wide-GeneralizedLinearRegressor(family='poisson')-False-irls-cd] - AssertionError: 
FAILED tests_glum.py::test_glm_regression_unpenalized_vstacked_X[wide-GeneralizedLinearRegressor(family='gamma')-True-irls-ls] - numpy.linalg.LinAlgError: Matrix is singular.
FAILED tests_glum.py::test_glm_regression_unpenalized_vstacked_X[wide-GeneralizedLinearRegressor(family='gamma')-True-irls-cd] - assert 0.8051836315439317 == 1.2837501395684472 ± 1.3e-04
FAILED tests_glum.py::test_glm_regression_unpenalized_vstacked_X[wide-GeneralizedLinearRegressor(family='gamma')-False-lbfgs] - AssertionError: 
FAILED tests_glum.py::test_glm_regression_unpenalized_vstacked_X[wide-GeneralizedLinearRegressor(family='gamma')-False-irls-ls] - numpy.linalg.LinAlgError: Matrix is singular.
FAILED tests_glum.py::test_glm_regression_unpenalized_vstacked_X[wide-GeneralizedLinearRegressor(family='gamma')-False-irls-cd] - AssertionError: 
FAILED tests_glum.py::test_glm_regression_unpenalized_vstacked_X[wide-GeneralizedLinearRegressor(family='tweedie')-True-irls-ls] - numpy.linalg.LinAlgError: Matrix is singular.
FAILED tests_glum.py::test_glm_regression_unpenalized_vstacked_X[wide-GeneralizedLinearRegressor(family='tweedie')-True-irls-cd] - assert 0.6292876313579576 == 1.2837501395684472 ± 1.3e-04
FAILED tests_glum.py::test_glm_regression_unpenalized_vstacked_X[wide-GeneralizedLinearRegressor(family='tweedie')-False-lbfgs] - AssertionError: 
FAILED tests_glum.py::test_glm_regression_unpenalized_vstacked_X[wide-GeneralizedLinearRegressor(family='tweedie')-False-irls-ls] - numpy.linalg.LinAlgError: Matrix is singular.
FAILED tests_glum.py::test_glm_regression_unpenalized_vstacked_X[wide-GeneralizedLinearRegressor(family='tweedie')-False-irls-cd] - AssertionError: 
================================================================== 76 failed, 212 passed, 107 warnings in 197.18s (0:03:17) ==================================================================

I can get the 12 failing L-BFGS related tests to pass by not standardizing the design matrix here.

64 failing tests to go.

@MarcAntoineSchmidtQC
Copy link
Member

All the failing tests seem to be for unpenalized regression with a singular design matrix (either the wide problem: p=12, n=4, or the stacked problem where we duplicate all columns). Is that correct? Maybe this is a dumb question but what is the expected result in this case? I'm not surprised to see the tests failing in this case for glum, but in case we want to support this the tests are great!

@lorentzenchr
Copy link
Contributor Author

It is often said that singular design matrices don't allow for a solution, but this is wrong, there are just infinitely many solutions. For OLS, there is a particular nice one called minimal norm solution, i.e. the solution/coefficients having minimal L2 norm among all solutions/coefficients.
It may by that this is of no high practical value, but in light of the discovered interpolation regime, it is at least interesting.

I have at least one PR for the line search in mind that could help at least with a few of those test failures.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants