-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question #3
Comments
Hello Thierry The algorithm tries to take into account all statistical associations between all variables. So, at least in theory, the answer will be positive. In practice, if you have e.g. too little data or if the values are not missing at random, then it does not work too well in general. Let us see what happens to our iris data:
So there are 20 missing values in Now let's fill those values again by running
and compare the joint distribution of the two variables stratified by Species (= color) in the original data set (left) and after imputation (right).
Of course, the pictures are not identical, but the structure seems to be retained. |
Related to this, check out what this guy does for the iris dataset... |
Quick question Michael...
Scenario where you have more than 1 response variable missing:
e.g. with the iris dataset
let say
Sepal.Length
andSepal.Width
are missingwe know that both of these values are correlated together with
Species
.Your implementation imputes by column, is the correlation between columns is still accounted for in the model ? Because, we don't want to have imputed values that taken together after imputations don't "fit" the species...
Best,
Thierry
The text was updated successfully, but these errors were encountered: