Question #3

thierrygosselin · 2016-12-21T19:11:53Z

Quick question Michael...

Scenario where you have more than 1 response variable missing:

e.g. with the iris dataset
let say Sepal.Length and Sepal.Width are missing
we know that both of these values are correlated together with Species.

Your implementation imputes by column, is the correlation between columns is still accounted for in the model ? Because, we don't want to have imputed values that taken together after imputations don't "fit" the species...

Best,
Thierry

The text was updated successfully, but these errors were encountered:

mayer79 · 2016-12-22T08:11:21Z

Hello Thierry

The algorithm tries to take into account all statistical associations between all variables. So, at least in theory, the answer will be positive. In practice, if you have e.g. too little data or if the values are not missing at random, then it does not work too well in general.

Let us see what happens to our iris data:

set.seed(398745)
# Replace some values by NA
iris2 <- iris
iris2$Sepal.Length[sample(150, 20)] <- NA
iris2$Sepal.Width[sample(150, 40)] <- NA
table(is.na(iris2$Sepal.Length), is.na(iris2$Sepal.Width))

# Output
       FALSE TRUE
  FALSE    94   36
  TRUE     16    4

So there are 20 missing values in Sepal.Length and 40 in Sepal.Width.

Now let's fill those values again by running

  iris3 <- missRanger(iris2, pmm = 3, seed = 3483)

and compare the joint distribution of the two variables stratified by Species (= color) in the original data set (left) and after imputation (right).

par(mfrow = 1:2)
plot(Sepal.Length ~ Sepal.Width, data = iris, col = Species, main = "original")
plot(Sepal.Length ~ Sepal.Width, data = iris3, col = Species, main = "imputed")

Of course, the pictures are not identical, but the structure seems to be retained.

thierrygosselin · 2017-01-07T15:56:37Z

Related to this, check out what this guy does for the iris dataset...
http:https://www.markvanderloo.eu/yaRb/2016/09/13/announcing-the-simputation-package-make-imputation-simple/

thierrygosselin closed this as completed Jan 7, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question #3

Question #3

thierrygosselin commented Dec 21, 2016

mayer79 commented Dec 22, 2016 •

edited

Loading

thierrygosselin commented Jan 7, 2017

Question #3

Question #3

Comments

thierrygosselin commented Dec 21, 2016

mayer79 commented Dec 22, 2016 • edited Loading

thierrygosselin commented Jan 7, 2017

mayer79 commented Dec 22, 2016 •

edited

Loading