Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

simulate_gnls with psim=2 and data that are a new length does not assign residuals correctly #8

Closed
billdenney opened this issue Aug 10, 2021 · 10 comments

Comments

@billdenney
Copy link
Contributor

There is a warning that longer object length is not a multiple of shorter object length" indicating that there is an issue with assigning residuals in simulate_gnls(..., psim=2)`.

I believe that the error is related to this line of code:

rsds <- rsds.std * attr(residuals(object), "std") ## This last term is 'sigma'

library(nlraa)
library(nlme)

fm1 <-
  gnls(
    weight ~ SSlogis(Time, Asym, xmid, scal),
    Soybean,
    weights = varPower()
  )

new_data <- data.frame(Time=14:84)

try0 <- simulate_gnls(fm1, data=new_data, psim=0)
try1 <- simulate_gnls(fm1, data=new_data, psim=1)
# the warnings here indicate that there is a problem assigning residuals to the
# correct length vector
try2 <- simulate_gnls(fm1, data=new_data, psim=2)
#> Warning in rsds.std * attr(residuals(object), "std"): longer object length is
#> not a multiple of shorter object length
#> Warning in val + rsds: longer object length is not a multiple of shorter object
#> length

Created on 2021-08-10 by the reprex package (v2.0.0)

@femiguez
Copy link
Owner

Thanks for the report Bill! I am aware of this issue and I should maybe include a better error message. The argument 'data' should really only be used when the data are out of scope as in the examples you were running before. It should be identical to the data used to fit the model and I do not have a good way of checking that the data passed in this way is indeed identical to the one used for fitting the model. (Maybe I need to think more about this). I could check that it has the same dimensions and throw an error if it does not match. There is a 'hidden' argument 'newdata' which works but it is only compatible with psim 0 or 1 in that case and not 2 - I did not intend to make this available to the user. The issue is that I cannot create a variance-covariance matrix for the new data, because I do not know how to extend its error structure (I'm not sure this is possible).

@billdenney
Copy link
Contributor Author

I do think that if psim=2 is only supported on the original data, that the above should be an error.

I think that the function nlme::varWeights() may be able to help (or maybe it could provide an instructive example of how to help). For simple additive residual error structures, I would think that rnorm(n, mean=0, sd=[somehow extract the residual standard deviation]) should usually work. For other variance structures, I hope that maybe nlme::varWeights() can provide the idea for how to do it.

@femiguez
Copy link
Owner

femiguez commented Aug 10, 2021

Yes, I just changed the code so that better error messages are reported in some of these cases. Thanks for the report!

This is why the 'newdata' argument is hidden. :) For existing data, the model does give me a std for each observation, but when newdata are used it is tricky, because in realty I should be propagating the uncertainty considering that this is not part of the data used to fit the model. I could ignore this for now and interpolate the stds for these new observations, but it could be unreliable if we are extrapolating. I have not had the need to implement this and also I have no idea how to extend it to a model with correlated errors.

If you really need this feature, I can try to implement it for the case of unequal variance, but not for correlated errors.

@billdenney
Copy link
Contributor Author

I don't really need it right now (my main need was psim=1 which is working for me), but I do think that it would be very helpful for some of my projects.

@femiguez
Copy link
Owner

I thought about this a bit more and while I will try to implement it, it won't be trivial.

Also, notice that the stats::simulate.lm function does not have a newdata argument. For this purpose the stats::predict.lm function should be used. In my package, however, I need the nlraa::simulate_* functions in order to implement the predict_* and boot_* functions, so I think it is a better design that if you really want predictions on newdata, you should be using the nlraa::predict_nlme function. I do realize that this might not be what you need since it reports intervals and not observation-level data.

Right now in the package I have this:

library(nlme)
library(nlraa)
fm1 <- gnls(weight ~ SSlogis(Time, Asym, xmid, scal), Soybean,
             weights = varPower())
dat <- data.frame(Time = 15:90)
## This does not work
prd <- predict_nlme(fm1, interval = "pred", newdata = dat)
## Error: At this point newdata is not compatible with observation-level simulation
## This works fine
prd2 <- predict_nlme(fm1, interval = "conf", newdata = dat)

In any case, I will think more about the best way to implement it.

@billdenney
Copy link
Contributor Author

I was in the process of using simulate_gnls() to make confidence intervals; that was my underlying goal! Thanks for simplifying that and pointing out that the predict_nlme() function can also work for gnls models.

@femiguez
Copy link
Owner

femiguez commented Aug 11, 2021

If you are interested in confidence intervals, take a look at the vignette 'Confidence-bands' which is here in github, but not in the CRAN version of the package. Also here:
https://femiguez.github.io/nlraa-docs/Confidence-Bands.html

@femiguez
Copy link
Owner

I have now implemented the case of having newdata and psim =2 (observation-level simulation and thus prediction intervals) for some variance structures. I wrote specific code for 'varIdent', 'varExp', 'varPower' and 'varFixed'. I also wrote some tests and it seems to be working, but will keep testing. I'm still thinking about how to deal with correlated errors and do not have a good approach yet. It seems to make the predict_ functions much slower though, so there is room for improvement.

@billdenney
Copy link
Contributor Author

Thanks! I'll give it a try. (I will try the next time I need something like it, but it may be a while.)

@femiguez
Copy link
Owner

For the most part this is implemented. Dealing with correlated errors is a potential enhancement down the road - don't have a good approach yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants