Question on missRanger and BRMS #30

GabriellaS-K · 2021-05-13T09:34:10Z

Hi,

Thank you for a brilliant package. I'm using missRanger to impute, and then apply BRMS to the imputed dataset. BRMS describes how to use the mice package, but missRanger imputed data comes out quite different.

Ideally I would have imputed the data, pooled the data, run my models, run model comparisons. But I cannot then pool using mice, it doesn't work. So instead I run multiple models on imputed data like this:

models_imputed <- brm_multiple(formula = score ~ 1 + cs(group), data = imputed, family = acat("cloglog"), combine=TRUE, chains=1)
But this is pretty clunky, and if I try to do a LOO on my models (I have 5) I get the error:
Using only the first imputed data set. Please interpret the results with caution until a more principled approach has been implemented.

This isn't an issue with missRanger as such, more that I'm caught in the space between missRanger and BRMS and am not sure how to get them to work together...hoping someone might have advice!

Thanks

The text was updated successfully, but these errors were encountered:

mayer79 · 2021-05-13T10:20:31Z

I think brm_multiple just expects a list of datasets, so you can basically go along the lines of the missRanger multiple imputation vignette on https://cran.r-project.org/web/packages/missRanger/vignettes/multiple_imputation.html

Let me know if the results look (un-)reasonable.

# Via mice
library(mice)
library(brms)

imp <- mice(nhanes, m = 5, print = FALSE)

fit_imp1 <- brm_multiple(bmi ~ age*chl, data = imp, chains = 2)

# With missRanger
library(missRanger)

# Generate 5 complete data sets
imp <- replicate(5, missRanger(nhanes, verbose = 0, num.trees = 50, pmm.k = 5),
                 simplify = FALSE)

# Fit model
fit_imp2 <- brm_multiple(bmi ~ age*chl, data = imp, chains = 2)

GabriellaS-K · 2021-05-15T14:55:08Z

HI,

You so much for the answer, that's actually what I tried to do-my imputed dataset (called imputed) was fed straight into the bar and multiple just like you did in your example with fit_imp2. The model runs, the problem comes after-I'd like to compare different models together using the LOO function, but because it isn't pooled it only uses the first imputed dataset

mayer79 · 2021-05-16T13:34:26Z

Hmm. If you could adapt my examples (both mice and missRanger) accordingly, that would be fantastic.

GabriellaS-K · 2021-05-16T20:23:09Z

I'm not sure what you mean by adapt your examples, sorry!!

mayer79 · 2021-05-17T05:56:30Z

I would need a fully reproducible example to see what works and what not.

GabriellaS-K · 2021-05-17T09:40:17Z

Ah ok, great!

Please find below:

Here is a subset of my data:

 structure(list(agequartiles = structure(c(1L, 3L, 2L, 1L, 2L, 
4L, 3L, 1L, 3L, 4L, 1L, 2L, 2L, 2L, 4L, 1L, 3L, 3L, 4L, 4L, 4L, 
3L, 4L, 1L, 4L, 3L, 1L, 4L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 3L, 2L, 
2L, 3L, 4L, 4L, 3L, 2L, 3L, NA, 1L, 1L, 1L, 2L, 2L), .Label = c("[18,23]", 
"(23,27]", "(27,32]", "(32,54]"), class = "factor"), sentiment = c(1, 
1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 
1, 3, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 3, 2, 1, 
1, 2, 1, 1, 3, 1, 3), group = structure(c(2L, 3L, 3L, 2L, 2L, 
1L, 2L, 1L, 2L, 2L, 2L, 3L, 3L, 1L, 3L, 1L, 3L, 2L, 2L, 1L, 3L, 
1L, 3L, 2L, 1L, 2L, 2L, 2L, 3L, 1L, 1L, 2L, 1L, 3L, 1L, 2L, 3L, 
3L, 3L, 3L, 2L, 3L, 3L, 1L, 3L, 3L, 3L, 3L, 3L, 2L), .Label = c("prime1", 
"prime2", "prime3"), class = "factor"), continent = c("UK", "Australia and New Zealand", 
"Northern America", "UK", "Northern America", "Australia and New Zealand", 
"Asia and the Pacific", "UK", "Southern and Central America", 
"Australia and New Zealand", "UK", "Northern America", "Northern America", 
"UK", "Northern America", "UK", "UK", "Northern America", "UK", 
"Northern America", "Northern America", "Southern and Central America", 
"Northern America", "UK", "Europe", "Northern America", "UK", 
"Northern America", NA, "UK", "UK", "Australia and New Zealand", 
"Australia and New Zealand", "UK", "UK", "UK", "Australia and New Zealand", 
"Northern America", "UK", "Northern America", "UK", "Asia and the Pacific", 
"Northern America", "Northern America", NA, NA, "UK", "Europe", 
"UK", "Northern America"), ID = 1:50, medication = c("FALSE", 
"FALSE", "FALSE", "FALSE", "FALSE", "FALSE", "FALSE", "TRUE", 
"FALSE", "FALSE", "TRUE", "FALSE", "FALSE", "FALSE", "FALSE", 
"FALSE", "FALSE", "TRUE", "TRUE", "FALSE", "FALSE", "FALSE", 
"FALSE", "FALSE", "FALSE", "FALSE", "FALSE", "FALSE", "FALSE", 
"FALSE", "TRUE", "FALSE", "FALSE", "TRUE", "TRUE", "FALSE", "FALSE", 
"FALSE", "FALSE", "FALSE", "FALSE", "FALSE", "TRUE", "TRUE", 
"FALSE", "FALSE", "FALSE", "TRUE", "FALSE", "TRUE")), row.names = c(NA, 
50L), class = "data.frame")

Then I imputed:


library(missRanger)
data <- lapply(3456:3460, function(x)
  missRanger(
    data,
     . #predict all columns 
    ~ . #Make predictions using all columns except:
    - ID,
    maxiter = 10,# How many iterations until it stops? 
    pmm.k = 3, #Predictive Mean Matching leading to more natural imputations and improved distributional properties of the resulting values
    verbose = 1,#how much info is printed to screen, 
    seed = x,#Integer seed to initialize the random generator.
    num.trees = 200,
    returnOOB = TRUE,
    case.weights = NULL
  )
)

Then I ran 5 models

models_group <- brm_multiple(formula = sentiment  ~ 1 + cs(group),  data = data, family = acat("cloglog"), combine=TRUE, chains=4)

models_meds <- brm_multiple(formula = sentiment  ~ 1 + cs(group)+ medication,  data = data, family = acat("cloglog"), combine=TRUE, chains=4)

models_age <- brm_multiple(formula = sentiment  ~ 1 + cs(group)+age,  data = data, family = acat("cloglog"), combine=TRUE, chains=4)

models_continent <- brm_multiple(formula = sentiment  ~ 1 + cs(group)+continent,  data = data, family = acat("cloglog"), combine=TRUE, chains=4)

models_all<-models_age <- brm_multiple(formula = sentiment  ~ 1 + cs(group) +age +medication+continent,  data = data, family = acat("cloglog"), combine=TRUE, chains=4)

And finally the LOO

modelcomparison<-loo(models_all, models_group, models_meds, model_continent, models_age)

mayer79 · 2021-05-17T18:35:58Z

Okay, thanks a lot for that example. I visited

My first thought:

use combine = FALSE in brm_multiple(), then
pool result of brm_multiple() doing some Bayesian magic, then
run loo

I would actually suggest to ask the brms team how they would approach the problem. I think it would be quite cool if loo would work on the output of brm_multiple(), independent of using missRanger or another algo.

GabriellaS-K · 2021-05-19T19:21:42Z

OK great thank you for that, I will do!

mayer79 closed this as completed May 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question on missRanger and BRMS #30

Question on missRanger and BRMS #30

GabriellaS-K commented May 13, 2021

mayer79 commented May 13, 2021 •

edited

Loading

GabriellaS-K commented May 15, 2021 •

edited

Loading

mayer79 commented May 16, 2021 •

edited

Loading

GabriellaS-K commented May 16, 2021

mayer79 commented May 17, 2021

GabriellaS-K commented May 17, 2021

mayer79 commented May 17, 2021 •

edited

Loading

GabriellaS-K commented May 19, 2021

Question on missRanger and BRMS #30

Question on missRanger and BRMS #30

Comments

GabriellaS-K commented May 13, 2021

mayer79 commented May 13, 2021 • edited Loading

GabriellaS-K commented May 15, 2021 • edited Loading

mayer79 commented May 16, 2021 • edited Loading

GabriellaS-K commented May 16, 2021

mayer79 commented May 17, 2021

GabriellaS-K commented May 17, 2021

mayer79 commented May 17, 2021 • edited Loading

GabriellaS-K commented May 19, 2021

mayer79 commented May 13, 2021 •

edited

Loading

GabriellaS-K commented May 15, 2021 •

edited

Loading

mayer79 commented May 16, 2021 •

edited

Loading

mayer79 commented May 17, 2021 •

edited

Loading