Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PERMANOVA and the Behrens-Fisher F statistic #344

Open
davismj87 opened this issue Feb 10, 2020 · 7 comments
Open

PERMANOVA and the Behrens-Fisher F statistic #344

davismj87 opened this issue Feb 10, 2020 · 7 comments

Comments

@davismj87
Copy link

I am working with a data set with an unbalanced sampling design that violates the assumption of homogeneity of group dispersions. I was informed that one can modify the source code for the adonis() function to calculate a modified Behrens-Fisher F-statistic per Anderson et al. 2017 (doi: 10.1111/anzs.12176). Has anyone tried this? Is there a version of the source code already available that can accomplish this task? I was able to brute force calculate the BFS for a simple, one-variable PERMANOVA based on the mathematical descriptions provided in the article; however, my abilities with matrix algebra aren't super sophisticated, so it's been hard for me to wrap my head around how to calculate the BFS for a PERMANOVA with multiple variables and interaction effects. Any information the devs or the community might have about how to code this in R and/or modify the adonis() function would be super helpful. Thank you!

@MarkReader
Copy link

MarkReader commented Feb 10, 2020 via email

@leholman
Copy link

You can examine the code for adonis here - https://github.com/jarioksa/vegan/blob/master/R/adonis.R

I agree that the solution provided in Anderson et al. 2017 should be implemented in the adonis function. It is currently provided in the PERMANOVA+ module for PRIMER-e.

Currently there is no R implementation of this solution that I am aware of.

🙏 any statistical gurus out there want to help out? 🧞‍♂️

@ecologyjh
Copy link

If still relevant - I recently emailed PRIMER about this and got a reply from Anderson who said the adjusted F value in the 2017 paper is currently only available in the PRIMER software (as far as she is aware!). If anyone can implement this into the adonis function you're amazing!

@ailich
Copy link

ailich commented Nov 12, 2020

@ecologyjh, it has also been implemented in the Fathom Toolbox in MATLAB (functions f_permanova and f_permanovaPW).

@ailich
Copy link

ailich commented Jun 7, 2021

For a one-way ANOVA with just a standard categorical factor I believe this would calculate the modified F (I'm getting the same modified F as the Fathom Toolbox implementation). I took the beginning of the adonis function and then added a block of code to calculate F.Mod and for now commented out the standard way F.Mod is calculated. That'd need to be added into where f.perm is determined in adonis for significance tests though and I'm not sure exactly how to do that. Also, I'm not sure how this would expand to cases more than just a standard one-way anova (though according to the paper, that is doable).

Modified_F<- function(formula, data,
permutations = 999,
method = "bray", 
strata = NULL,
contr.unordered = "contr.sum", 
contr.ordered = "contr.poly",
parallel = getOption("mc.cores")){
  EPS <- sqrt(.Machine$double.eps)
  TOL <- 1e-07
  Terms <- terms(formula, data = data)
  lhs <- formula[[2]]
  lhs <- eval(lhs, data, parent.frame())
  formula[[2]] <- NULL
  rhs.frame <- model.frame(formula, data, drop.unused.levels = TRUE)
  op.c <- options()$contrasts
  options(contrasts = c(contr.unordered, contr.ordered))
  rhs <- model.matrix(formula, rhs.frame)
  options(contrasts = op.c)
  grps <- attr(rhs, "assign")
  qrhs <- qr(rhs)
  rhs <- rhs[, qrhs$pivot, drop = FALSE]
  rhs <- rhs[, 1:qrhs$rank, drop = FALSE]
  grps <- grps[qrhs$pivot][1:qrhs$rank]
  u.grps <- unique(grps)
  nterms <- length(u.grps) - 1
  if (nterms < 1) 
    stop("right-hand-side of formula has no usable terms")
  H.s <- lapply(2:length(u.grps), function(j) {
    Xj <- rhs[, grps %in% u.grps[1:j]]
    qrX <- qr(Xj, tol = TOL)
    Q <- qr.Q(qrX)
    tcrossprod(Q[, 1:qrX$rank])
  })
  if (inherits(lhs, "dist")) {
    if (any(lhs < -TOL)) {stop("dissimilarities must be non-negative")}
    dmat <- as.matrix(lhs^2)
  } else if ((is.matrix(lhs) || is.data.frame(lhs)) && isSymmetric(unname(as.matrix(lhs)))) {
    dmat <- as.matrix(lhs^2)
    lhs <- as.dist(lhs)
  } else {
    dist.lhs <- as.matrix(vegdist(lhs, method = method))
    dmat <- dist.lhs^2
  }
  n <- nrow(dmat)
  G <- -sweep(dmat, 1, rowMeans(dmat))/2
  SS.Exp.comb <- sapply(H.s, function(hat) sum(G * t(hat)))
  SS.Exp.each <- c(SS.Exp.comb - c(0, SS.Exp.comb[-nterms]))
  H.snterm <- H.s[[nterms]]
  tIH.snterm <- t(diag(n) - H.snterm)
  if (length(H.s) > 1) 
    for (i in length(H.s):2) H.s[[i]] <- H.s[[i]] - H.s[[i - 
                                                           1]]
  SS.Res <- sum(G * tIH.snterm)
  df.Exp <- sapply(u.grps[-1], function(i) sum(grps == i))
  df.Res <- n - qrhs$rank
  if (inherits(lhs, "dist")) {
    beta.sites <- qr.coef(qrhs, as.matrix(lhs))
    beta.spp <- NULL
  } else {
    beta.sites <- qr.coef(qrhs, dist.lhs)
    beta.spp <- qr.coef(qrhs, as.matrix(lhs))
  }
  colnames(beta.spp) <- colnames(lhs)
  colnames(beta.sites) <- rownames(lhs)
  #F.Mod <- (SS.Exp.each/df.Exp)/(SS.Res/df.Res)
  
  ###########NEW CODE STARTS HERE########################################
  grp_ID<- as.character(interaction(as.data.frame(rhs))) #Reference to group membership of each observation
  n_per_group<- table(grp_ID) #Sample size of each group
  VL<- rep(0, length = length(unique(grp_ID))) #Initialize
  names(VL)<- names(n_per_group)
  for (i in 1:(n-1)){
    for (j in (i+1):n) {
      d_ij2<- dmat[i,j]
      grp_i<-grp_ID[i]
      grp_j<- grp_ID[j]
      if(grp_i==grp_j){
        Eij_L<- 1 
      } else{
        Eij_L<- 0
      } #Indicator whether observations i and j are in the same group
      nL<- n_per_group[grp_i] #number of observations in group L
      VL[grp_i]<- VL[grp_i]+ (Eij_L*d_ij2/(nL*(nL-1))) #Within Group Dispersion for each group
    }
  }
  
  F.Mod_denominator<- 0 
  for (i in 1:length(VL)) {
    F.Mod_denominator<- F.Mod_denominator + ((1 - (n_per_group[i]/n))*VL[i])
  }
  names(F.Mod_denominator)<- NULL
  F.Mod_numerator<- SS.Exp.comb #Among Group Sum of Squares is equal to tr(HG), SS.Exp.comb seems to be that
  F.Mod<- F.Mod_numerator/F.Mod_denominator #Modified F statistic accounting for different dispersions
  return(F.Mod)
  ###########NEW CODE ENDS HERE########################################
  }

Example of Calculating modified F on the dune dataset

library(vegan)
data(dune)
data(dune.env)
Modified_F(dune~ Management, data = dune.env)
[1] 3.130212

@pkmnsandy
Copy link

For a one-way ANOVA with just a standard categorical factor I believe this would calculate the modified F (I'm getting the same modified F as the Fathom Toolbox implementation). I took the beginning of the adonis function and then added a block of code to calculate F.Mod and for now commented out the standard way F.Mod is calculated. That'd need to be added into where f.perm is determined in adonis for significance tests though and I'm not sure exactly how to do that. Also, I'm not sure how this would expand to cases more than just a standard one-way anova (though according to the paper, that is doable).

Modified_F<- function(formula, data,
permutations = 999,
method = "bray", 
strata = NULL,
contr.unordered = "contr.sum", 
contr.ordered = "contr.poly",
parallel = getOption("mc.cores")){
  EPS <- sqrt(.Machine$double.eps)
  TOL <- 1e-07
  Terms <- terms(formula, data = data)
  lhs <- formula[[2]]
  lhs <- eval(lhs, data, parent.frame())
  formula[[2]] <- NULL
  rhs.frame <- model.frame(formula, data, drop.unused.levels = TRUE)
  op.c <- options()$contrasts
  options(contrasts = c(contr.unordered, contr.ordered))
  rhs <- model.matrix(formula, rhs.frame)
  options(contrasts = op.c)
  grps <- attr(rhs, "assign")
  qrhs <- qr(rhs)
  rhs <- rhs[, qrhs$pivot, drop = FALSE]
  rhs <- rhs[, 1:qrhs$rank, drop = FALSE]
  grps <- grps[qrhs$pivot][1:qrhs$rank]
  u.grps <- unique(grps)
  nterms <- length(u.grps) - 1
  if (nterms < 1) 
    stop("right-hand-side of formula has no usable terms")
  H.s <- lapply(2:length(u.grps), function(j) {
    Xj <- rhs[, grps %in% u.grps[1:j]]
    qrX <- qr(Xj, tol = TOL)
    Q <- qr.Q(qrX)
    tcrossprod(Q[, 1:qrX$rank])
  })
  if (inherits(lhs, "dist")) {
    if (any(lhs < -TOL)) {stop("dissimilarities must be non-negative")}
    dmat <- as.matrix(lhs^2)
  } else if ((is.matrix(lhs) || is.data.frame(lhs)) && isSymmetric(unname(as.matrix(lhs)))) {
    dmat <- as.matrix(lhs^2)
    lhs <- as.dist(lhs)
  } else {
    dist.lhs <- as.matrix(vegdist(lhs, method = method))
    dmat <- dist.lhs^2
  }
  n <- nrow(dmat)
  G <- -sweep(dmat, 1, rowMeans(dmat))/2
  SS.Exp.comb <- sapply(H.s, function(hat) sum(G * t(hat)))
  SS.Exp.each <- c(SS.Exp.comb - c(0, SS.Exp.comb[-nterms]))
  H.snterm <- H.s[[nterms]]
  tIH.snterm <- t(diag(n) - H.snterm)
  if (length(H.s) > 1) 
    for (i in length(H.s):2) H.s[[i]] <- H.s[[i]] - H.s[[i - 
                                                           1]]
  SS.Res <- sum(G * tIH.snterm)
  df.Exp <- sapply(u.grps[-1], function(i) sum(grps == i))
  df.Res <- n - qrhs$rank
  if (inherits(lhs, "dist")) {
    beta.sites <- qr.coef(qrhs, as.matrix(lhs))
    beta.spp <- NULL
  } else {
    beta.sites <- qr.coef(qrhs, dist.lhs)
    beta.spp <- qr.coef(qrhs, as.matrix(lhs))
  }
  colnames(beta.spp) <- colnames(lhs)
  colnames(beta.sites) <- rownames(lhs)
  #F.Mod <- (SS.Exp.each/df.Exp)/(SS.Res/df.Res)
  
  ###########NEW CODE STARTS HERE########################################
  grp_ID<- as.character(interaction(as.data.frame(rhs))) #Reference to group membership of each observation
  n_per_group<- table(grp_ID) #Sample size of each group
  VL<- rep(0, length = length(unique(grp_ID))) #Initialize
  names(VL)<- names(n_per_group)
  for (i in 1:(n-1)){
    for (j in (i+1):n) {
      d_ij2<- dmat[i,j]
      grp_i<-grp_ID[i]
      grp_j<- grp_ID[j]
      if(grp_i==grp_j){
        Eij_L<- 1 
      } else{
        Eij_L<- 0
      } #Indicator whether observations i and j are in the same group
      nL<- n_per_group[grp_i] #number of observations in group L
      VL[grp_i]<- VL[grp_i]+ (Eij_L*d_ij2/(nL*(nL-1))) #Within Group Dispersion for each group
    }
  }
  
  F.Mod_denominator<- 0 
  for (i in 1:length(VL)) {
    F.Mod_denominator<- F.Mod_denominator + ((1 - (n_per_group[i]/n))*VL[i])
  }
  names(F.Mod_denominator)<- NULL
  F.Mod_numerator<- SS.Exp.comb #Among Group Sum of Squares is equal to tr(HG), SS.Exp.comb seems to be that
  F.Mod<- F.Mod_numerator/F.Mod_denominator #Modified F statistic accounting for different dispersions
  return(F.Mod)
  ###########NEW CODE ENDS HERE########################################
  }

Example of Calculating modified F on the dune dataset

library(vegan)
data(dune)
data(dune.env)
Modified_F(dune~ Management, data = dune.env)
[1] 3.130212

Hi, will I able to use this code for a 4-way PERMANOVA or just for one-way?

@ailich
Copy link

ailich commented Jul 23, 2022

@pkmnsandy, this is unfortunately just for a one-way PERMANOVA. The Fathom implementation states that this is a "one-way (modified) PERMANOVA" and the original paper states "The F2 test statistic can be extended to allow for differences in dispersions of replicates within sites, and differences in dispersions of site centroids within regions, for relevant tests of individual factors at each spatial scale. Similar extensions can be formulated and derived for tests of individual terms in fixed, random or mixed multi-way ANOVA models including interactions. We shall leave these extensions (beyond the scope of the current contribution) for a future endeavour."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants