Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Attempt to make GLM compatible with scikit-learn check_estimator #364

Merged
merged 25 commits into from
Feb 20, 2020

Conversation

titipata
Copy link
Collaborator

@titipata titipata commented Feb 18, 2020

So far, there are 3 main parts which can be fixed in GLM class to be compatible with scikit-learn's check_estimator

  1. Initial attributes
  2. fit_predict function
  3. scipy's expit

closes #363

README.rst Outdated Show resolved Hide resolved
@jasmainak
Copy link
Member

how are you testing? can you add the check_estimator line somewhere in the diff?

@titipata
Copy link
Collaborator Author

@jasmainak I did add a test script where you can check by running py.test --cov=pyglmnet tests/. Note that this will fail a bunch of original tests since I commented out fit_predict from GLM

@jasmainak
Copy link
Member

Cool, FYI, I use this command:

$ pytest ./tests/test_pyglmnet.py::test_glm_estimator --pdb

that way you drop right into the console when there is an error :)

@jasmainak
Copy link
Member

For whatever reason X is an array of complex numbers, that's why it is failing

@titipata
Copy link
Collaborator Author

titipata commented Feb 18, 2020

@jasmainak Thanks for the handy test script. Oh yeah, I see that: z is ndarray of size (10, ) with a complex number in there...

@jasmainak
Copy link
Member

I don't think it's easy to make all the checks happy without adding an sklearn dependency. Basically you have to follow what is in: https://github.com/scikit-learn-contrib/project-template/blob/master/skltemplate/_template.py. I'd like to believe it can be done though :)

@titipata
Copy link
Collaborator Author

@jasmainak alright, I tried adding sklearn dependency so that it follows their template style. Not sure if we also want BaseEstimator to be from sklearn also?

@jasmainak
Copy link
Member

Let's try to first make it work. Use sklearn or whatever is necessary. Then we can make it nice and see if the dependency can be dropped easily :)

@titipata
Copy link
Collaborator Author

@jasmainak Now the error is in predict_proba, I'm not sure how to fix it.

@titipata
Copy link
Collaborator Author

titipata commented Feb 18, 2020

@jasmainak alright the current commits should work with check_estimator now! The part that I cannot keep track is the _allow_refit. Does scikit-learn dependencies keep track of this fitting/re-fitting?

@jasmainak
Copy link
Member

jasmainak commented Feb 18, 2020 via email

@titipata
Copy link
Collaborator Author

titipata commented Feb 18, 2020

@jasmainak There are 2 failed (test_random_state_consistency and test_api_input), 56 passed, 32 warning for the current commits. The error is related to Failed: DID NOT RAISE <class 'ValueError'>.

@titipata
Copy link
Collaborator Author

titipata commented Feb 18, 2020

The test is now not raising errors because of the additional check_X_y. It returns nd.array back even you put the list in. So now, the function test_api_input in test_pyglmnet.py below won't raise an error:

    with pytest.raises(ValueError):
        glm.fit(X, list(y))

And the fail in test_random_state_consistency is coming from the commented out of self._allow_refit = False after fit method. Specifically, it comes from

    match = "This glm object has already been fit"
    with pytest.raises(ValueError, match=match):
        ypred_c = glm_b.fit_predict(Xtrain, ytrain)

@titipata
Copy link
Collaborator Author

Alright @jasmainak, I think this should be the push using sklearn dependency. If you can help a bit with _allow_refit logic, that should be it! I will leave it to you for now.

@jasmainak
Copy link
Member

This is great!! awesome work @titipata :-) Happy to take over from here. Can you update whats_new.rst so we can acknowledge this in the next release?

doc/whats_new.rst Outdated Show resolved Hide resolved
@jasmainak
Copy link
Member

@titipata you are on fire. I don't think I need to do a pass really. If you just address my last couple of comments (remove the commented test), move the line in whats_new.rst and update the title from WIP to MRG, I am happy to merge this PR once both CIs become green. Thank you so much for the efforts!

@@ -530,7 +534,7 @@ class GLM(BaseEstimator):
https://core.ac.uk/download/files/153/6287975.pdf
"""

def __init__(self, distr='poisson', alpha=0.5,
def __init__(self, distr='binomial', alpha=0.5,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, the default value changed here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, somehow it fails sklearn with check_estimator due to predict_proba. I didn't resolve it here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

humm ... I see. Let's bring it back to the default and see if we can fix it? Is that the only thing left that didn't work?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, will do bring it back to the default!

pyglmnet/pyglmnet.py Outdated Show resolved Hide resolved
@titipata
Copy link
Collaborator Author

Awesome!! That sounds great to me! We might have to write some more test scripts later on :)

@jasmainak
Copy link
Member

Yes I am looking forward to more contributions from you in the future!

@titipata
Copy link
Collaborator Author

@jasmainak, so I added the workaround for predict_proba error when using check_estimator. This is similar to the logic in sklearn's LogisticRegression.

tests/test_pyglmnet.py Outdated Show resolved Hide resolved
@jasmainak
Copy link
Member

Good to go from your end @titipata ? Everything resolved and works? Please set the PR title to MRG if so!

@titipata
Copy link
Collaborator Author

@jasmainak yes, I guess you can help polish a bit and that should be good to go!

@titipata titipata changed the title [WIP] Attempt to make GLM compatible with scikit-learn check_estimator [MRG] Attempt to make GLM compatible with scikit-learn check_estimator Feb 20, 2020
@jasmainak
Copy link
Member

I'll merge @titipata when the CIs are green. We're good to go!

@titipata
Copy link
Collaborator Author

Perfect 👍 @jasmainak. We can also poke JOSS editor right after. This PR should take care their concerns!

@jasmainak
Copy link
Member

yep that's the plan! :)

@jasmainak jasmainak merged commit d1dc16f into glm-tools:master Feb 20, 2020
@jasmainak
Copy link
Member

All good, thanks a ton @titipata !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Test GLM with scikit-learn check_estimator (setting attributes during init )
3 participants