[MRG] Attempt to make `GLM` compatible with scikit-learn `check_estimator` #364

titipata · 2020-02-18T17:01:48Z

So far, there are 3 main parts which can be fixed in GLM class to be compatible with scikit-learn's check_estimator

Initial attributes
fit_predict function
scipy's expit

closes #363

README.rst

jasmainak · 2020-02-18T17:11:37Z

how are you testing? can you add the check_estimator line somewhere in the diff?

titipata · 2020-02-18T17:24:08Z

@jasmainak I did add a test script where you can check by running py.test --cov=pyglmnet tests/. Note that this will fail a bunch of original tests since I commented out fit_predict from GLM

jasmainak · 2020-02-18T17:31:53Z

Cool, FYI, I use this command:

$ pytest ./tests/test_pyglmnet.py::test_glm_estimator --pdb

that way you drop right into the console when there is an error :)

jasmainak · 2020-02-18T17:34:47Z

For whatever reason X is an array of complex numbers, that's why it is failing

titipata · 2020-02-18T17:38:20Z

@jasmainak Thanks for the handy test script. Oh yeah, I see that: z is ndarray of size (10, ) with a complex number in there...

jasmainak · 2020-02-18T17:43:20Z

I don't think it's easy to make all the checks happy without adding an sklearn dependency. Basically you have to follow what is in: https://github.com/scikit-learn-contrib/project-template/blob/master/skltemplate/_template.py. I'd like to believe it can be done though :)

titipata · 2020-02-18T18:04:39Z

@jasmainak alright, I tried adding sklearn dependency so that it follows their template style. Not sure if we also want BaseEstimator to be from sklearn also?

jasmainak · 2020-02-18T18:08:08Z

Let's try to first make it work. Use sklearn or whatever is necessary. Then we can make it nice and see if the dependency can be dropped easily :)

titipata · 2020-02-18T18:20:26Z

@jasmainak Now the error is in predict_proba, I'm not sure how to fix it.

titipata · 2020-02-18T19:05:04Z

@jasmainak alright the current commits should work with check_estimator now! The part that I cannot keep track is the _allow_refit. Does scikit-learn dependencies keep track of this fitting/re-fitting?

jasmainak · 2020-02-18T19:22:54Z

You can remove the refit logic for now. Do all tests pass if you do that? I’ll see what to do with the refit when I get a chance in a couple of hours

On Tue 18 Feb 2020 at 14:05, Titipat Achakulvisut ***@***.***> wrote: @jasmainak <https://github.com/jasmainak> alright the current commits should work with check_estimator now! — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#364?email_source=notifications&email_token=ADY6FIX2I4DDDKREO2XCPL3RDQWODA5CNFSM4KXIRITKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEMDSOHQ#issuecomment-587671326>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADY6FIRQK7MBYVQKRN4QW3TRDQWODANCNFSM4KXIRITA> .

-- Sent from my iPhone

titipata · 2020-02-18T19:28:32Z

@jasmainak There are 2 failed (test_random_state_consistency and test_api_input), 56 passed, 32 warning for the current commits. The error is related to Failed: DID NOT RAISE <class 'ValueError'>.

titipata · 2020-02-18T20:46:33Z

The test is now not raising errors because of the additional check_X_y. It returns nd.array back even you put the list in. So now, the function test_api_input in test_pyglmnet.py below won't raise an error:

    with pytest.raises(ValueError):
        glm.fit(X, list(y))

And the fail in test_random_state_consistency is coming from the commented out of self._allow_refit = False after fit method. Specifically, it comes from

    match = "This glm object has already been fit"
    with pytest.raises(ValueError, match=match):
        ypred_c = glm_b.fit_predict(Xtrain, ytrain)

titipata · 2020-02-18T21:33:40Z

Alright @jasmainak, I think this should be the push using sklearn dependency. If you can help a bit with _allow_refit logic, that should be it! I will leave it to you for now.

jasmainak · 2020-02-20T04:21:30Z

This is great!! awesome work @titipata :-) Happy to take over from here. Can you update whats_new.rst so we can acknowledge this in the next release?

tests/test_pyglmnet.py

pyglmnet/externals/sklearn/utils/config.py

doc/whats_new.rst

jasmainak · 2020-02-20T04:37:26Z

@titipata you are on fire. I don't think I need to do a pass really. If you just address my last couple of comments (remove the commented test), move the line in whats_new.rst and update the title from WIP to MRG, I am happy to merge this PR once both CIs become green. Thank you so much for the efforts!

jasmainak · 2020-02-20T04:45:57Z

pyglmnet/pyglmnet.py

@@ -530,7 +534,7 @@ class GLM(BaseEstimator):
 https://core.ac.uk/download/files/153/6287975.pdf
 """

- def __init__(self, distr='poisson', alpha=0.5,
+ def __init__(self, distr='binomial', alpha=0.5,


oops, the default value changed here.

Yes, somehow it fails sklearn with check_estimator due to predict_proba. I didn't resolve it here.

humm ... I see. Let's bring it back to the default and see if we can fix it? Is that the only thing left that didn't work?

Yes, will do bring it back to the default!

pyglmnet/pyglmnet.py

titipata · 2020-02-20T04:48:55Z

Awesome!! That sounds great to me! We might have to write some more test scripts later on :)

jasmainak · 2020-02-20T04:50:17Z

Yes I am looking forward to more contributions from you in the future!

…e other cases

titipata · 2020-02-20T11:27:45Z

@jasmainak, so I added the workaround for predict_proba error when using check_estimator. This is similar to the logic in sklearn's LogisticRegression.

tests/test_pyglmnet.py

pyglmnet/pyglmnet.py

jasmainak · 2020-02-20T15:54:18Z

Good to go from your end @titipata ? Everything resolved and works? Please set the PR title to MRG if so!

titipata · 2020-02-20T16:04:08Z

@jasmainak yes, I guess you can help polish a bit and that should be good to go!

jasmainak · 2020-02-20T16:45:39Z

I'll merge @titipata when the CIs are green. We're good to go!

titipata · 2020-02-20T16:48:39Z

Perfect 👍 @jasmainak. We can also poke JOSS editor right after. This PR should take care their concerns!

jasmainak · 2020-02-20T16:49:19Z

yep that's the plan! :)

jasmainak · 2020-02-20T18:45:02Z

All good, thanks a ton @titipata !

titipata added 2 commits February 18, 2020 10:00

Minor update README example

fe67bec

Attempt to make GLM with check_estimator

a885f13

titipata requested a review from jasmainak February 18, 2020 17:02

jasmainak reviewed Feb 18, 2020

View reviewed changes

README.rst Outdated Show resolved Hide resolved

Adding check_estimator in test script

440ec63

Follow scikit-learn template for estimator

42e9bc1

Try using BaseEstimator from scikit-learn

f8d910f

titipata added 2 commits February 18, 2020 13:56

Check X in predict_proba

7f041d2

Add _ to rng according to scikit-learn

779b2fb

Adding back fit_predict method

240cf05

Uncomment _allow_refit

53e5c87

titipata added 3 commits February 18, 2020 15:57

Comment back _allow_refit

1c7d244

Further use check_random_state from scikit-learn

1ff6513

Further use scikit-learn random state convention

c59b9b8

titipata mentioned this pull request Feb 18, 2020

Test GLM with scikit-learn check_estimator (setting attributes during init ) #363

Closed

Polishing test and pyglmnet class

eb43e92

titipata added 3 commits February 19, 2020 22:50

Minor externals folder fix

d6cfac9

Flake8 corrections

1ee1cdb

Removing unused lines

e6b3136