Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question on missRanger and BRMS #30

Closed
GabriellaS-K opened this issue May 13, 2021 · 8 comments
Closed

Question on missRanger and BRMS #30

GabriellaS-K opened this issue May 13, 2021 · 8 comments

Comments

@GabriellaS-K
Copy link

Hi,

Thank you for a brilliant package. I'm using missRanger to impute, and then apply BRMS to the imputed dataset. BRMS describes how to use the mice package, but missRanger imputed data comes out quite different.

Ideally I would have imputed the data, pooled the data, run my models, run model comparisons. But I cannot then pool using mice, it doesn't work. So instead I run multiple models on imputed data like this:

models_imputed <- brm_multiple(formula = score ~ 1 + cs(group), data = imputed, family = acat("cloglog"), combine=TRUE, chains=1)
But this is pretty clunky, and if I try to do a LOO on my models (I have 5) I get the error:
Using only the first imputed data set. Please interpret the results with caution until a more principled approach has been implemented.

This isn't an issue with missRanger as such, more that I'm caught in the space between missRanger and BRMS and am not sure how to get them to work together...hoping someone might have advice!

Thanks

@mayer79
Copy link
Owner

mayer79 commented May 13, 2021

I think brm_multiple just expects a list of datasets, so you can basically go along the lines of the missRanger multiple imputation vignette on https://cran.r-project.org/web/packages/missRanger/vignettes/multiple_imputation.html

Let me know if the results look (un-)reasonable.

# Via mice
library(mice)
library(brms)

imp <- mice(nhanes, m = 5, print = FALSE)

fit_imp1 <- brm_multiple(bmi ~ age*chl, data = imp, chains = 2)

# With missRanger
library(missRanger)

# Generate 5 complete data sets
imp <- replicate(5, missRanger(nhanes, verbose = 0, num.trees = 50, pmm.k = 5),
                 simplify = FALSE)

# Fit model
fit_imp2 <- brm_multiple(bmi ~ age*chl, data = imp, chains = 2)

@GabriellaS-K
Copy link
Author

GabriellaS-K commented May 15, 2021

HI,

You so much for the answer, that's actually what I tried to do-my imputed dataset (called imputed) was fed straight into the bar and multiple just like you did in your example with fit_imp2. The model runs, the problem comes after-I'd like to compare different models together using the LOO function, but because it isn't pooled it only uses the first imputed dataset

@mayer79
Copy link
Owner

mayer79 commented May 16, 2021

Hmm. If you could adapt my examples (both mice and missRanger) accordingly, that would be fantastic.

@GabriellaS-K
Copy link
Author

I'm not sure what you mean by adapt your examples, sorry!!

@mayer79
Copy link
Owner

mayer79 commented May 17, 2021

I would need a fully reproducible example to see what works and what not.

@GabriellaS-K
Copy link
Author

Ah ok, great!

Please find below:

Here is a subset of my data:

 structure(list(agequartiles = structure(c(1L, 3L, 2L, 1L, 2L, 
4L, 3L, 1L, 3L, 4L, 1L, 2L, 2L, 2L, 4L, 1L, 3L, 3L, 4L, 4L, 4L, 
3L, 4L, 1L, 4L, 3L, 1L, 4L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 3L, 2L, 
2L, 3L, 4L, 4L, 3L, 2L, 3L, NA, 1L, 1L, 1L, 2L, 2L), .Label = c("[18,23]", 
"(23,27]", "(27,32]", "(32,54]"), class = "factor"), sentiment = c(1, 
1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 
1, 3, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 3, 2, 1, 
1, 2, 1, 1, 3, 1, 3), group = structure(c(2L, 3L, 3L, 2L, 2L, 
1L, 2L, 1L, 2L, 2L, 2L, 3L, 3L, 1L, 3L, 1L, 3L, 2L, 2L, 1L, 3L, 
1L, 3L, 2L, 1L, 2L, 2L, 2L, 3L, 1L, 1L, 2L, 1L, 3L, 1L, 2L, 3L, 
3L, 3L, 3L, 2L, 3L, 3L, 1L, 3L, 3L, 3L, 3L, 3L, 2L), .Label = c("prime1", 
"prime2", "prime3"), class = "factor"), continent = c("UK", "Australia and New Zealand", 
"Northern America", "UK", "Northern America", "Australia and New Zealand", 
"Asia and the Pacific", "UK", "Southern and Central America", 
"Australia and New Zealand", "UK", "Northern America", "Northern America", 
"UK", "Northern America", "UK", "UK", "Northern America", "UK", 
"Northern America", "Northern America", "Southern and Central America", 
"Northern America", "UK", "Europe", "Northern America", "UK", 
"Northern America", NA, "UK", "UK", "Australia and New Zealand", 
"Australia and New Zealand", "UK", "UK", "UK", "Australia and New Zealand", 
"Northern America", "UK", "Northern America", "UK", "Asia and the Pacific", 
"Northern America", "Northern America", NA, NA, "UK", "Europe", 
"UK", "Northern America"), ID = 1:50, medication = c("FALSE", 
"FALSE", "FALSE", "FALSE", "FALSE", "FALSE", "FALSE", "TRUE", 
"FALSE", "FALSE", "TRUE", "FALSE", "FALSE", "FALSE", "FALSE", 
"FALSE", "FALSE", "TRUE", "TRUE", "FALSE", "FALSE", "FALSE", 
"FALSE", "FALSE", "FALSE", "FALSE", "FALSE", "FALSE", "FALSE", 
"FALSE", "TRUE", "FALSE", "FALSE", "TRUE", "TRUE", "FALSE", "FALSE", 
"FALSE", "FALSE", "FALSE", "FALSE", "FALSE", "TRUE", "TRUE", 
"FALSE", "FALSE", "FALSE", "TRUE", "FALSE", "TRUE")), row.names = c(NA, 
50L), class = "data.frame")

Then I imputed:


library(missRanger)
data <- lapply(3456:3460, function(x)
  missRanger(
    data,
     . #predict all columns 
    ~ . #Make predictions using all columns except:
    - ID,
    maxiter = 10,# How many iterations until it stops? 
    pmm.k = 3, #Predictive Mean Matching leading to more natural imputations and improved distributional properties of the resulting values
    verbose = 1,#how much info is printed to screen, 
    seed = x,#Integer seed to initialize the random generator.
    num.trees = 200,
    returnOOB = TRUE,
    case.weights = NULL
  )
)

Then I ran 5 models

models_group <- brm_multiple(formula = sentiment  ~ 1 + cs(group),  data = data, family = acat("cloglog"), combine=TRUE, chains=4)

models_meds <- brm_multiple(formula = sentiment  ~ 1 + cs(group)+ medication,  data = data, family = acat("cloglog"), combine=TRUE, chains=4)

models_age <- brm_multiple(formula = sentiment  ~ 1 + cs(group)+age,  data = data, family = acat("cloglog"), combine=TRUE, chains=4)

models_continent <- brm_multiple(formula = sentiment  ~ 1 + cs(group)+continent,  data = data, family = acat("cloglog"), combine=TRUE, chains=4)

models_all<-models_age <- brm_multiple(formula = sentiment  ~ 1 + cs(group) +age +medication+continent,  data = data, family = acat("cloglog"), combine=TRUE, chains=4)

And finally the LOO

modelcomparison<-loo(models_all, models_group, models_meds, model_continent, models_age)

@mayer79
Copy link
Owner

mayer79 commented May 17, 2021

Okay, thanks a lot for that example. I visited

My first thought:

  1. use combine = FALSE in brm_multiple(), then
  2. pool result of brm_multiple() doing some Bayesian magic, then
  3. run loo

I would actually suggest to ask the brms team how they would approach the problem. I think it would be quite cool if loo would work on the output of brm_multiple(), independent of using missRanger or another algo.

@mayer79 mayer79 closed this as completed May 19, 2021
@GabriellaS-K
Copy link
Author

OK great thank you for that, I will do!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants