-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More granular control over which cells get imputed #40
Comments
Thanks for pinging me. I think your situation occurs quite frequently and you are definitively using the right data shape (wide, not long) to do the imputation.
|
Thanks! I agree approach 1 is optimal. Just to add a further tip for those who have similar use-cases:
If anyone else has faced the issue of imputing missing data for unbalanced panel data, would love to hear more about the strategies used. |
One quick follow-up issue is that when one "widens" the data, the donor pool for pmm is reduced a lot. This is not a big issue for my case - but something to consider. |
I would love to see a feature whereby I can feed a logical matrix of the same dimensions as the underlying data.frame into the function, which controls which cells in the data.frame get imputed, and which do not.
I currently have a data.frame which has two types of NAs: (i) data which I need to impute, and (ii) data which I know should never exist.
This situation arrises when trying to impute unbalanced panel data (e.g. annual income of a population of individuals). Since I reshape this data to be "wide" (one row per person) I end up with a number of columns (e.g. income_2010, income_2011, ... etc). This is essential to capture time-dynamics (i.e. my income this year and next year are strongly correlated).
Some for a person who died in 2005, I do not wish to impute income_2006, income_2007, etc. But for someone who's income is missing during their lifetime, I would like to impute it.
All the best - and thanks for a great package!
The text was updated successfully, but these errors were encountered: