Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to test the accuracy of predictions? #33

Closed
lime-n opened this issue Jul 12, 2021 · 1 comment
Closed

How to test the accuracy of predictions? #33

lime-n opened this issue Jul 12, 2021 · 1 comment

Comments

@lime-n
Copy link

lime-n commented Jul 12, 2021

I am rather new to random forests and especially imputations and I would like to know how can I get an estimate of the accuracy of these predictions?

For example, if I have a matrix with species in columns and abundance in rows and se generateNA to create NA values for imputation. How can I test the accuracy of the predicted and actual observed values?

For example, I have the following dataset and want to test how well the imputation works against the abundance of these species over these selected two years, relative to the actual values:

#data
# A tibble: 12 x 7
   eventDate   year Hirundorustica Himantopushimantopus Gallinulachloropus Fulicaatra Spilopeliachinensis
   <date>     <dbl>          <int>                <int>              <int>      <int>               <int>
 1 2019-01-01  2019         375087               275213             337709    1638522               81054
 2 2019-02-01  2019         245500               174385             230240     864141               72817
 3 2019-03-01  2019         478287               169552             207516     509113               59389
 4 2019-04-01  2019        1149118               146255             162036     371454               58119
 5 2019-05-01  2019        1777995                84937             132554     290331               43462
 6 2019-06-01  2019         674044                52308             101186     255249               24479
 7 2019-07-01  2019        1114053                75779             107368     377148               23558
 8 2019-08-01  2019        2091425                81571             133904     535402               31321
 9 2019-09-01  2019        1834696               105622             141551     659778               46775
10 2019-10-01  2019         676342               111289             174135     737695               76354
11 2019-11-01  2019         322766               143620             302165     869143               63237
12 2019-12-01  2019         359126               193387             281926    1299738               66995

#Generate NAs
mimNA[, -c(1, 2)]<- generateNA(mimNA[, -c(1, 2)], seed=5)

#then create imputations
mRan <- missRanger(mimNA[, -c(1, 2)], pmm.k=3, num.trees = 100)
@lime-n lime-n changed the title Creating a confusion matrix? How to test the accuracy of predictions? Jul 12, 2021
@mayer79
Copy link
Owner

mayer79 commented Jul 12, 2021

In practice, one does not have access to the true values, so this question pops up only for simulation studies.

Since missRanger ends up with a predictive model for each column, you can evaluate its "accuracy" column by column on the missing values, using a suitable scoring function (e.g. rmse for continuous columns).

@mayer79 mayer79 closed this as completed Jul 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants