Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

replace Boston dataset with diabetes dataset in examples #357

Merged
merged 1 commit into from
Jan 8, 2022

Conversation

StrikerRUS
Copy link
Member

Boston dataset will be removed from scikit-learn in next version due to ethical reasons. Details: scikit-learn/scikit-learn#16155.

scoring=make_scorer(mean_squared_error),
cv=n_folds)

rgf_score = sum(rgf_scores)/n_folds
print('RGF Regressor MSE: {0:.5f}'.format(rgf_score))
# >>>RGF Regressor MSE: 11.79409
# >>> RGF Regressor MSE: 3377.46076
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

@StrikerRUS StrikerRUS Jan 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fukatani Thanks a lot for your feedback!

Yeah, I tried to increase those hyperparameters, of course. Unfortunately, increasing them makes results only worse.

I believe that this example is to show that RGF can outperform RF in some cases. And this goal is already achieved

RGF Regressor MSE: 3377.46076
Random Forest Regressor MSE: 3441.01988

Also, I have a great doubt that the article you've mentioned is referring the same dataset used in scikit-learn. MSE less than 0.55 for y scaled from 25 to 346 looks unbelievable.

To prove my words, here are some examples of MSE for this dataset over the GitHub:
https://github.com/Akshaykumarcp/ML-ensemble-learning/blob/b142d7d2ac32554fe8dba545bfcb0a2fb31e02da/0.2_generative_ensemble_methods/0.2.2_boosting_method/0.2.2.3_adaboost_scikit_regression.py#L37-L40

https://github.com/Mean518/kingminji/blob/5a3cd3ad7ace4e18fbbb07f1c8977b377e42e6b3/keras/keras80_diabetes_lstm.py#L61

https://github.com/Mean518/kingminji/blob/5a3cd3ad7ace4e18fbbb07f1c8977b377e42e6b3/keras/keras79_diabetes_dnn.py#L76

https://github.com/Mean518/kingminji/blob/5a3cd3ad7ace4e18fbbb07f1c8977b377e42e6b3/keras/keras81_diabetes_cnn.py#L79

https://github.com/kevinpiger/ML-100Days/blob/master/Day036-040/Day_040_lasso_ridge_regression.ipynb

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, now I'm sure these datasets are different:

Source URL: https://www4.stat.ncsu.edu/~boos/var.select/diabetes.html
https://scikit-learn.org/stable/datasets/toy_dataset.html#diabetes-dataset

Size: 11x442.

The diabetes dataset containing multivariate characteristicsthat are extracted from the University of California machine learn-ing repository contained 768 specimens of adult females. This depicts chemical changes occurs from initial to peak stages in the female body that result in diabetes (Smith et al., 1988).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx!
LGTM.

Sorry for very late response 🙏

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem at all! Thank you very much for the thoughtful review!

@StrikerRUS StrikerRUS merged commit 272afb8 into master Jan 8, 2022
@StrikerRUS StrikerRUS deleted the boston branch January 8, 2022 13:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants