Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

simulate_gls requires the nlme data argument to be in the global environment #4

Closed
cryanking opened this issue Dec 22, 2020 · 8 comments
Labels
bug Something isn't working

Comments

@cryanking
Copy link

This is perhaps properly a bug for nlme, but simulate_gls fails if the data used to construct the gls object does not live in the global environment. Supplying newdata gets further but doesn't cure it during model construction. I suspect it is related to this issue: several of nlme's functions like getData seem to parse arguments in an way such that they look in the global rather than calling environment. This comes up when the data is created inside a function or other calling environment, such as when simulating or bootstrapping.

simulatation_summary_fixed <- function( 
duration_per_period = 6L ) {
 c( 1, 0 , 0 ,
   1, 1 , 1 ,
   1, 2 , 1 ,
   2, 0 , 0 ,
   2, 1 , 0 ,
   2, 2 , 1 ,
   3, 0 , 0 ,
   3, 1 , 0 ,
   3, 2 , 0 ) %>% matrix(ncol=3L, byrow=T) %>% data.frame -> SWdesign

   SWdesign %<>% set_colnames(c("cluster" , "period", "treatment"))
   SWdesign$period %<>% factor 
   SWdesign$cluster %<>% factor 
   
   base_num <- nrow(SWdesign)
   SWdesign  <- bind_rows( rep(list(SWdesign ) , duration_per_period ) )
   
   SWdesign %<>% arrange(cluster, period)
   SWdesign$time <- rep( seq_len(duration_per_period*n_distinct(SWdesign$period) ) , n_distinct(SWdesign$cluster) ) 
   
 
   cor_init <- corAR1(form= ~ time|cluster )
   cor_init <- Initialize(cor_init, data = SWdesign )
   
        SWdesign$outcome <- rnorm( nrow(SWdesign) )
        local.glm <- gls( outcome ~ treatment+ cluster + time   ,   correlation=cor_init, data=SWdesign, method="ML") 
        
        nlraa::simulate_gls(local.glm, psim=2)
        
        
}

simulatation_summary_fixed()
Error in eval(if ("data" %in% names(object)) object$data else mCall$data) : 
  object 'SWdesign' not found
@femiguez
Copy link
Owner

femiguez commented Dec 23, 2020

Thanks for the report! The issue is with a nlme::getData call buried inside 'simulate_gls'. It's a scoping issue, which I'm not sure how to solve at the moment. nlme::getData will look in the parent frame and the Global environment, but not inside the function environment you created. I don't know if this works for you, but the simplest way I can think of is by separating the experimental design step from the model fit step. For example,

Experimental design step 1)

exp_design <- function(duration_per_period = 6L){
  c( 1, 0 , 0 ,
     1, 1 , 1 ,
     1, 2 , 1 ,
     2, 0 , 0 ,
     2, 1 , 0 ,
     2, 2 , 1 ,
     3, 0 , 0 ,
     3, 1 , 0 ,
     3, 2 , 0 ) %>% matrix(ncol=3L, byrow=T) %>% data.frame -> SWdesign
  
  SWdesign %<>% set_colnames(c("cluster" , "period", "treatment"))
  SWdesign$period %<>% factor 
  SWdesign$cluster %<>% factor 
  
  base_num <- nrow(SWdesign)
  SWdesign  <- bind_rows( rep(list(SWdesign ) , duration_per_period ) )
  
  SWdesign %<>% arrange(cluster, period)
  SWdesign$time <- rep( seq_len(duration_per_period*n_distinct(SWdesign$period) ) , n_distinct(SWdesign$cluster) ) 
  SWdesign$outcome <- rnorm( nrow(SWdesign) )  
  SWdesign
}

SWdesign <- exp_design()

Model fitting step

## Simulate data according to the experimental design
SWdesign <- exp_design()
## Fit model
cor_init <- corAR1(form= ~ time|cluster )
cor_init <- Initialize(cor_init, data = SWdesign )
local.glm <- gls( outcome ~ treatment+ cluster + time, correlation=cor_init, data=SWdesign, method="ML") 
ans <- nlraa::simulate_lme(local.glm, psim=2)

Also, note that by using simulate_lme instead of simulate_gls you can change the number of simulations you want to perform by doing, for example, 100 simulations

ans <- nlraa::simulate_lme(local.glm, nsim = 1e2, psim=2)

Hope this helps. In the meantime, I'll think about whether there is anything I can do to work around the scoping issues in nlme::getData.

Best

@cryanking
Copy link
Author

Thanks, is there any reason to use simulate_gls instead of simulate_lme? Didn't realize it took a gls object (many lme methods blow up where the grouping factor is null).

In my application I did an assign("SWdesign", SWdesign, envir = .GlobalEnv). I am exploring a range of design matrices for a study design; the easy answer was just fork each design off to a new process to avoid issues with parallel safe code.

It looks like the underlying nlme bug is submitted to bugzilla in april with the answer that a fix is hard (https://bugs.r-project.org/bugzilla/show_bug.cgi?id=15892) (there are a couple of related bugs).

@femiguez
Copy link
Owner

Thanks for pointing me to that bug. So far the potential workarounds I'm thinking about have downsides and other negative side effects, but I'll keep thinking about it and see if there is a better solution. Your method is what is basically done in the nlraa::boot_* functions in the package and also, for example, in car::Boot.

I think of gls objects as special cases of lme in the statistical sense, so I wrote simulate_lme thinking it could handle both types of objects. I decided to export simulate_gls in case someone wants to use it directly, but most users would probably not need it. It does have the limitation that it produces just one simulation.

If you have access to Pinheiro and Bates see chapter 5. I believe they have some discussion about modeling the random vs. residual structure. As an example,

## Design experiment
treatment <- as.factor(c(0,1))
time <- 1:20
cluster <- as.factor(c("a", "b", "c"))
SWdesign <- expand.grid(treatment = treatment, time = time, cluster = cluster)
SWdesign$id <- with(SWdesign, paste0(cluster, "_", treatment))
SWdesign$outcome <- with(SWdesign, as.numeric(treatment) + 
                           (1 + as.numeric(treatment)) * time + 
                           scale(as.numeric(cluster)) * 0.5 + 
                           rnorm(nrow(SWdesign)))

ggplot(data = SWdesign, aes(x = time, y = outcome, color = treatment)) + 
  facet_wrap(~ cluster) + 
  geom_point()

fit.gls <- gls(outcome ~ treatment * time, 
               correlation = corCAR1(form = ~ time | id), 
               data = SWdesign)

fit.lme <- lme(outcome ~ treatment * time,
               random = ~ 1 | cluster,
               data = SWdesign)

anova(fit.gls, fit.lme)
## LME model is better in this case

Another note: to prevent functions in nlme to blow up make sure all the categorical are factors. I would think that in your example you want treatment to also be modeled as a factor. (Or am I assuming too much?)

@femiguez femiguez added the bug Something isn't working label Dec 24, 2020
@femiguez
Copy link
Owner

@cryanking I consider this a bug in my package and now know how to address it. I'll fix it for the next version of the package. Thank you for the report!

@femiguez
Copy link
Owner

femiguez commented Jan 5, 2021

This is now fixed and the new version in CRAN (0.83) fixes this problem. The approach I have taken is to pass the data as an optional argument. This seems to only be needed when using the function inside other functions as in your case. Thanks again for the report.

@femiguez femiguez closed this as completed Jan 5, 2021
@billdenney
Copy link
Contributor

FYI, I had a similar issue to this one that I just posted to r-sig-mixed-models. If there is a notable reply, I'll link it back here.

@femiguez
Copy link
Owner

femiguez commented Aug 2, 2021

@billdenney Did you have a problem using the nlraa package? In the current version of the package, it is required to pass the data as an argument if there data can't be found. I think this is a common approach in many other packages. Thanks for the note!

@billdenney
Copy link
Contributor

Thanks for the quick response! Yes, my issue was with the boot_* functions (described in #7).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants