17-poisson-negbin-glmm.Rmd

# Count-data GLMMs

# Goals

- Learn to fit Poisson and negative binomial GLMMs with various packages in R
- Learn how to criticize GLMM fit
- Learn how to check for overdispersion in GLMMs
- Learn how to interpret and plot output from count GLMMs

# Data

We are going to work with one of the data sets from:
Artelle, K. A., S. C. Anderson, J. D. Reynolds, A. B. Cooper, P. C. Paquet, and C. T. Darimont. 2016. Ecology of conflict: marine food supply affects human-wildlife interactions on land. Scientific Reports 6:25936.

The data set represents the number of grizzly bears that were killed because of conflict with humans (`late_kills`) across various Grizzly Bear Population Units (`gbpu`). There are a number of possible predictors. First, the annual predictors: an index of salmon biomass, the number of conflict kills in the previous 3 years, the numeric year, and the mean annual temperature. Then a number of group-level predictors: mean temperature, mean annual precipitation, human population, and the number of grizzly bears in the unit. There is an offset term for usable area for grizzly bears in the population unit so that the number of kills is scaled to be per available area. These predictors have been transformed (e.g. log), centered, and scaled by 2 times their standard deviations to make the magnitude of their effects approximately comparable. This is a slightly simplified version of the data set and model used in the paper.

Let's read in the data and look at it:

```{r}
library(tidyverse)
d <- readRDS("data/raw/conflict-salmon-for-modelling.rds") %>%
  select(late_kills,
    gbpu,
    salmon_biomass_geo_mean, 
    prev_3yr_conflict_scaled,
    year_centered,
    log_humanpop_scaled,
    log_grizzly_pop_est_scaled,
    gbpu_usable_area) %>%
  mutate(
    year_scaled = year_centered / (2 * sd(year_centered)),
    gbpu_usable_area = gbpu_usable_area / 1e9) %>%
  rename(salmon_scaled = salmon_biomass_geo_mean)
head(d)
```

We are going to use the below formula a number of times. So let's store it in a variable we will call `f`.

```{r}
f <- late_kills ~ 
  salmon_scaled + 
  prev_3yr_conflict_scaled +
  year_scaled +
  log_humanpop_scaled +
  log_grizzly_pop_est_scaled +
  offset(log(gbpu_usable_area))
```

Let's look at the data in a few ways. 

```{r}
ggplot(d, aes(year_centered, late_kills, color = salmon_scaled)) +
  geom_point() +
  facet_wrap(~gbpu)
```

```{r}
ggplot(d, aes(salmon_scaled, log(late_kills+0.5), color = gbpu)) +
  geom_point(position = position_jitter(height = 0.4, width = 0))
```

```{r}
ggplot(d, aes(salmon_scaled, log((late_kills+0.5)/gbpu_usable_area), 
  color = gbpu)) +
  geom_point(position = position_jitter(height = 0.4, width = 0)) +
  geom_smooth(method = "lm", se = FALSE)
```

```{r}
ggplot(d, aes(salmon_scaled, log(late_kills+0.5), color = gbpu)) +
  geom_point(position = position_jitter(height = 0.4, width = 0), alpha = 0.3) +
  geom_smooth(method = "lm", se = FALSE, colour = "black")
```

```{r}
ggplot(d, aes(salmon_scaled, log(late_kills+0.5), 
  color = gbpu, group = gbpu)) +
  geom_point(position = position_jitter(height = 0.4, width = 0), 
    alpha = 0.3) +
  geom_smooth(method = "lm", se = FALSE, alpha = 0.7, colour = "black")
```

```{r}
ggplot(d, aes(salmon_scaled, late_kills, color = gbpu, group = gbpu)) +
  geom_point(position = position_jitter(height = 0.4, width = 0), 
    alpha = 0.3) +
  geom_smooth(method = "glm", method.args = c(family = poisson), se = FALSE, colour = "black")
```

```{r}
ggplot(d, aes(salmon_scaled, late_kills, color = gbpu)) +
  geom_point(position = position_jitter(height = 0.4, width = 0), 
    alpha = 0.3) +
  geom_smooth(method = "glm", method.args = c(family = poisson), se = FALSE, 
    colour = "black")
```

Are there other useful ways you can think of looking at the data?

# Considering various response distributions

We can get a feeling for what distribution might work best by plotting the mean and variance for each group level:

```{r}
dg <- group_by(d, gbpu) %>%
  summarise(m = mean(late_kills), v = var(late_kills))
ggplot(dg, aes(m, v)) +
  geom_smooth(method = "lm", formula = y ~ x - 1, se = F, color = "blue") +
  geom_smooth(method = "lm", 
    formula = y ~ I(x^2) + offset(x) - 1, colour = "red", se = F) +
  geom_abline(intercept = 0, slope = 1, lty = 2) +
  geom_point()
```

In the above plot, the dashed line represents the Poisson (one-to-one), the blue line represents the NB1 negative binomial parameterization, and the red line represents the NB2 negative binomial parameterization (the more usual one we have been working with previously).

The NB2 looks like it fits the smaller values the best, although the NB1 better represents some of the larger values. We can tell that the Poisson doesn't represent the data very well. The variance is growing faster as the mean increases than it allows. Nonetheless, we will start off with the simpler Poisson distribution as a learning exercise and build from there.

# Modeling 

Let's start by using the lme4 package as we have before. Instead of using the `lmer` function we will use the `glmer` function. (Although `lmer` will work too.)

We have count data. So one obvious place to start is with the Poisson distribution and a log link. 

```{r}
library(lme4)
f_lme4 <- paste(c(f, "+ (1 | gbpu)"), collapse = " ") %>% as.formula()
f_lme4
m_glmer <- glmer(f_lme4, data = d, family = poisson(link = "log"))
summary(m_glmer, correlation = FALSE)
```

We can look at these default plots, but the default residual plot is not that helpful here. 

```{r}
plot(m_glmer)
lattice::dotplot(ranef(m_glmer, condVar = TRUE))
```

We can also extract fixed effects, random effects, and their standard errors with:

```{r}
fixef(m_glmer)
arm::se.fixef(m_glmer)

ranef(m_glmer)
arm::se.ranef(m_glmer)
```

There are 3 kinds of confidence intervals we can get out of lme4, but only one that will run quickly right now. 

```{r}
confint(m_glmer, method = "Wald")
# confint(m_glmer, method = "profile")
# confint(m_glmer, method = "boot")
```

One useful summary function is 

```{r}
coef(summary(m_glmer))
```

# Checking for overdispersion 

Now we should check for overdispersion. We can do that by comparing the residual deviance and the sum of the pearson residuals squared with the residual degrees of freedom. These should be approximately the same. If the degrees of freedom are considerably smaller then we have evidence of over dispersion. If the residual deviance and sum of Pearson residuals squared  disagree, then use the Pearson residuals. See Ben Bolker's worked examples here for more details <https://rpubs.com/bbolker/glmmchapter>.

We can see the degrees of freedom from the output of `summary()`. And we can calculate the sum of the Pearson residuals squared:

```{r}
sum(residuals(m_glmer, "pearson")^2)
aods3::gof(m_glmer) # alternative shortcut
```

So it looks like we have some overdispersion here.

We can try dealing with this by adding an observation-level random intercept or by switching to a negative binomial distribution. 

Theoretically, we could fit a negative binomial GLMM with the function `lme4::glmer.nb`, but the following fails to converge for me.

```{r, eval=FALSE}
m_glmer_nb <- glmer.nb(f_lme4, data = d)
```

We could try removing the offset term (or simplified by removing any other terms). Removing the offset term did not help with convergence for me. 

So instead, let's start by adding an observation-level random intercept. This simply requires making a dummy variable with a level for each row of data:

```{r}
d$obs <- seq_len(nrow(d))
```

We then fit the model:

```{r}
f_lme4_obs <- paste(c(f, "+ (1 | gbpu) + (1 | obs)"), collapse = " ") %>% 
  as.formula()
m_lme4_obs <- glmer(f_lme4_obs, data = d, family = poisson(link = "log"))
summary(m_lme4_obs)
```

# MASS package penalized quasi-likelihood version

We can try the function `glmmPQL` in the package MASS, which lets us fit the model with something called penalized quasi-likelihood. "Penalized quasi-likelihood" doubles as a great conservation starter at cocktail parties.

```{r}
m_pql <- MASS::glmmPQL(f, random = ~ 1 | gbpu/obs, data = d, 
  family = poisson(link = "log"))
summary(m_pql)
```

This one converges. In the background the function is using nlme. Therefore, we have to work with the object as if it was from nlme. For example:

```{r}
nlme::intervals(m_pql, which = "fixed")
```

# glmmTMB with negative binomial distributions

It's always a good idea to fit GLMMs with multiple methods if possible to ensure the conclusions are not sensitive to the optimization method or that your model has not converged. These are not nearly as simple as GLMs or linear-mixed effect models!

One of the greatest things since sliced bread is the TMB package for R. TMB normally requires you to write your own model template, but the package glmmTMB comes with a number of pre-written and tested GLMMs.

TMB, and therefore glmmTMB, is *fast* and very robust. <https://github.com/glmmTMB>

```{r}
library(glmmTMB)

m_tmb_pois <- glmmTMB(f_lme4, data = d, family = poisson(link = "log"))
summary(m_tmb_pois)

m_tmb_obs <- glmmTMB(f_lme4_obs, data = d, family = poisson(link = "log"))
summary(m_tmb_obs)

m_tmb_nb <- glmmTMB(f_lme4, data = d, family = nbinom2(link = "log"))
summary(m_tmb_nb)

m_tmb_nb1 <- glmmTMB(f_lme4, data = d, family = nbinom1(link = "log"))
summary(m_tmb_nb1)

bbmle::AICtab(m_tmb_pois, m_tmb_nb, m_tmb_nb1)
```

So the NB2 parameterization has the lowest AIC.

The residuals are hard to interpret still: 

```{r}
plot(log(fitted(m_tmb_nb)), residuals(m_tmb_nb, type = "response"))
plot(log(fitted(m_tmb_nb)), residuals(m_tmb_nb, type = "pearson"))
plot(log(fitted(m_tmb_nb)), sqrt(abs(residuals(m_tmb_nb, type = "pearson"))))
```

Let's plot the residuals by group:

```{r}
d$res_nb <- residuals(m_tmb_nb, type = "response")
ggplot(d, aes(gbpu, res_nb)) + geom_boxplot() + coord_flip()
```

# Brief introduction to a Bayesian Stan version

If we want to accurately explore the full uncertainty on all parameters, including what we have been calling random effects, then we can fit a Bayesian version. The new package rstanarm makes this relatively simple with a function that mimics the function `lme4::lmer` but fits a Stan model in the background. Stan is a programming language and software implementation of the Bayesian Hamiltonian Monte Carlo No-U-Turn Sampler (not a good cocktail party topic). Any more than that is well beyond the focus of this workshop.

Warning: If you are considering using a similar model yourself you will have to do some considerable reading first or find somebody to work with. Note that because this is a Bayesian model you have to choose prior distributions on your parameters. rstanarm has carefully thought out priors that apply to most conditions, but it's important that you understand what these are and whether they make sense for your data.

```{r stan, warning=FALSE}
library(rstanarm)
options(mc.cores = parallel::detectCores()) # setup parallel processing

mstan_nb <- rstanarm::stan_glmer.nb(f_lme4, data = d, chains = 4, iter = 400)

summary(mstan_nb)
rstan::stan_plot(mstan_nb)
rstan::stan_trace(mstan_nb)
rstan::stan_hist(mstan_nb)
ggplot2::theme_set(theme_grey()) # reset theme
save(mstan_nb, file = "data/generated/stan-models.rda")
```

```{r}
load("data/generated/stan-models.rda")

stan_est <- broom.mixed::tidyMCMC(mstan_nb, conf.int = TRUE)

stan_est %>% filter(grepl("gbpu", term)) %>%
  ggplot(aes(term, estimate, ymin = conf.low, ymax = conf.high)) +
  geom_pointrange() +
  coord_flip()

stan_re_int <- stan_est %>% filter(grepl("gbpu", term))
qqnorm(stan_re_int$estimate)
qqline(stan_re_int$estimate)
```

# Comparing estimates across methods 

Let's compare confidence intervals:

```{r compare}
get_tmb_cis <- function(x, name) {
  ci <- confint(x)
  ci <- ci[!grepl("Std.Dev|sigma", row.names(ci)), ]
  ci <- data.frame(ci)
  names(ci) <- c("conf.low", "conf.high", "estimate")
  ci$model <- name
  ci$term <- names(fixef(x)$cond)
  ci
}
cis <- list()
cis[[1]] <- get_tmb_cis(m_tmb_pois, "TMB Poisson")
cis[[2]] <- get_tmb_cis(m_tmb_obs, "TMB Poisson w obs RE")
cis[[3]] <- get_tmb_cis(m_tmb_nb, "TMB Negative binomial (NB2)")
cis[[4]] <- get_tmb_cis(m_tmb_nb1, "TMB Negative binomial (NB1)")
cis[[5]] <- broom.mixed::tidy(m_glmer, conf.int = TRUE) %>%
  dplyr::select(conf.low, conf.high, estimate, term) %>%
  mutate(model = "lme4::glmer Poisson") %>%
  filter(term != "sd__(Intercept)")
cis[[6]] <- stan_est %>%
  dplyr::select(conf.low, conf.high, estimate, term) %>%
  filter(!grepl("Intercept", term), term != "reciprocal_dispersion") %>%
  mutate(model = "Stan Negative binomial (NB2)")

cis_df <- bind_rows(cis) %>% filter(term != "(Intercept)")
```

```{r plot-comparison}
ggplot(cis_df, aes(term, estimate,
  ymin = conf.low, ymax = conf.high, colour = model)) +
  geom_pointrange(position = position_dodge(width = 0.5)) +
  coord_flip() +
  geom_hline(yintercept = 0, linetype = 2)

filter(cis_df, term == "salmon_scaled") %>%
  ggplot(aes(term, estimate, ymin = conf.low, ymax = conf.high, 
    colour = model)) +
  geom_pointrange(position = position_dodge(width = 0.5)) +
  coord_flip() +
  geom_hline(yintercept = 0, linetype = 2)
```

What do you notice about the various models?

# Additional information 

Very useful worked examples of GLMMs by Ben Bolker:
<https://rpubs.com/bbolker/glmmchapter>

Using observation-level random effects for overdispersion:
Harrison, X. A. 2014. Using observation-level random effects to model overdispersion in count data in ecology and evolution. PeerJ 2:e616. https://peerj.com/articles/616/

My favourite introduction to Bayesian modelling book:
Hobbs, N.T., Hooten, M.B. 2015. Bayesian Models: A Statistical Primer for Ecologists.

A good introduction to Bayesian data analysis, although the current version uses WinBUGS/JAGS:
Gelman, A., and J. Hill. 2006. Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press, Cambridge, UK.

The Bayesian data analysis bible, but not an easy read:
Gelman, A., J. B. Carlin, H. S. Stern, D. B. Dunson, A. Vehtari, and D. B. Rubin. 2014. Bayesian Data Analysis. Chapman & Hall, Boca Raton, FL.

A great recent and modern introduction to Bayesian modeling with Stan:
McElreath, R. 2016. Statistical rethinking: A Bayesian course with examples in R and Stan. CRC Press.

Stan: <http:https://mc-stan.org/>