Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possibility to access Ranger object for Shap values #54

Closed
calogerobra opened this issue Oct 25, 2023 · 8 comments
Closed

Possibility to access Ranger object for Shap values #54

calogerobra opened this issue Oct 25, 2023 · 8 comments

Comments

@calogerobra
Copy link

calogerobra commented Oct 25, 2023

The project could be extended to cater for the production of Shap values and other metrics that the underlying package accounts for.
Is there a way to accomplish that in the current version already?

@mayer79
Copy link
Owner

mayer79 commented Oct 25, 2023

Interesting idea, thanks.

The output of missRanger() is simply a data.frame with optionally some OOB performance results attached, so this is not possible at the moment.

A complete API change is not possible (too much dependencies). I am considering the following idea:

mr <- missRanger(data, other stuff, output = c("data.frame", "missRanger"))
  • output = "data.frame": current behavior
  • output = "missRanger": A "missRanger" object is returned. Basically a list with imputed data, random forests and other stuff. With print() method any maybe summary().

This would not break current code, while offering necessary flexibility for further analysis.

What do you think?

@calogerobra
Copy link
Author

calogerobra commented Oct 25, 2023 via email

@mayer79
Copy link
Owner

mayer79 commented Oct 25, 2023

I will ping you when a Pull Request is ready to be installed for a quick cross-check.

@mayer79 mayer79 mentioned this issue Oct 27, 2023
@mayer79
Copy link
Owner

mayer79 commented Oct 27, 2023

Implemented in #55

You can use the new version via

devtools::install_github("mayer79/missRanger")

library(missRanger)

irisWithNA <- generateNA(iris, seed = 34)

imp <- missRanger(
  irisWithNA, pmm.k = 3, num.trees = 100, data_only = FALSE, keep_forests = TRUE
)
imp

summary(imp)

imp$forests$Species

@mayer79 mayer79 closed this as completed Oct 27, 2023
@piebel
Copy link

piebel commented Oct 27, 2023

thanks @mayer79, this implementation is great.
I was wondering, would it be possible, like in the ranger() package to be able to perform functionalities like importance_pvalues() in the missranger() object?
With this implementation it seems like we get some more information within the missranger() object, but is it possible to actually perform more investigations on the object itself, like in ranger(), with other functions?

thank you!

@mayer79
Copy link
Owner

mayer79 commented Oct 27, 2023

With the new, extended data_only = FALSE logic, adding methods is now much more natural. I don't have specific plans yet, but your issue was clearly the first step towards more functionality!

@calogerobra
Copy link
Author

Thanks a million @mayer79 . Looks great. Wouldn't a simple solution to @piebel 's idea be to return the full Ranger object in the above-mentioned list?

@mayer79
Copy link
Owner

mayer79 commented Oct 27, 2023

In above example, all ranger objects are attached in the $forests slot. But I think it was just an example he mentioned.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants