PERMANOVA and the Behrens-Fisher F statistic #344

davismj87 · 2020-02-10T18:34:37Z

I am working with a data set with an unbalanced sampling design that violates the assumption of homogeneity of group dispersions. I was informed that one can modify the source code for the adonis() function to calculate a modified Behrens-Fisher F-statistic per Anderson et al. 2017 (doi: 10.1111/anzs.12176). Has anyone tried this? Is there a version of the source code already available that can accomplish this task? I was able to brute force calculate the BFS for a simple, one-variable PERMANOVA based on the mathematical descriptions provided in the article; however, my abilities with matrix algebra aren't super sophisticated, so it's been hard for me to wrap my head around how to calculate the BFS for a PERMANOVA with multiple variables and interaction effects. Any information the devs or the community might have about how to code this in R and/or modify the adonis() function would be super helpful. Thank you!

MarkReader · 2020-02-10T23:49:32Z

Thanks for the query. Fascinating / interesting paper. Maybe drop Anderson an email, if you have no luck here. Much appreciate having it highlighted in the mail. Best wishes, Mark

…

On Mon, 10 Feb 2020, 18:34 Melanie, ***@***.***> wrote: I am working with a data set with an unbalanced sampling design that violates the assumption of homogeneity of group dispersions. I was informed that one can modify the source code for the adonis() function to calculate a modified Behrens-Fisher F-statistic per Anderson et al. 2017 (doi: 10.1111/anzs.12176). Has anyone tried this? Is there a version of the source code already available that can accomplish this task? I was able to brute force calculate the BFS for a simple, one-variable PERMANOVA based on the mathematical descriptions provided in the article; however, my abilities with matrix algebra aren't super sophisticated, so it's been hard for me to wrap my head around how to calculate the BFS for a PERMANOVA with multiple variables and interaction effects. Any information the devs or the community might have about how to code this in R and/or modify the adonis() function would be super helpful. Thank you! — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#344?email_source=notifications&email_token=AELPYXBZRXBEW345JJAN7ZTRCGM37A5CNFSM4KSTK6NKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IMK35VQ>, or

leholman · 2020-02-19T10:07:03Z

You can examine the code for adonis here - https://github.com/jarioksa/vegan/blob/master/R/adonis.R

I agree that the solution provided in Anderson et al. 2017 should be implemented in the adonis function. It is currently provided in the PERMANOVA+ module for PRIMER-e.

Currently there is no R implementation of this solution that I am aware of.

🙏 any statistical gurus out there want to help out? 🧞‍♂️

ecologyjh · 2020-06-29T23:48:48Z

If still relevant - I recently emailed PRIMER about this and got a reply from Anderson who said the adjusted F value in the 2017 paper is currently only available in the PRIMER software (as far as she is aware!). If anyone can implement this into the adonis function you're amazing!

ailich · 2020-11-12T19:50:49Z

@ecologyjh, it has also been implemented in the Fathom Toolbox in MATLAB (functions f_permanova and f_permanovaPW).

ailich · 2021-06-07T22:21:49Z

For a one-way ANOVA with just a standard categorical factor I believe this would calculate the modified F (I'm getting the same modified F as the Fathom Toolbox implementation). I took the beginning of the adonis function and then added a block of code to calculate F.Mod and for now commented out the standard way F.Mod is calculated. That'd need to be added into where f.perm is determined in adonis for significance tests though and I'm not sure exactly how to do that. Also, I'm not sure how this would expand to cases more than just a standard one-way anova (though according to the paper, that is doable).

Modified_F<- function(formula, data,
permutations = 999,
method = "bray", 
strata = NULL,
contr.unordered = "contr.sum", 
contr.ordered = "contr.poly",
parallel = getOption("mc.cores")){
  EPS <- sqrt(.Machine$double.eps)
  TOL <- 1e-07
  Terms <- terms(formula, data = data)
  lhs <- formula[[2]]
  lhs <- eval(lhs, data, parent.frame())
  formula[[2]] <- NULL
  rhs.frame <- model.frame(formula, data, drop.unused.levels = TRUE)
  op.c <- options()$contrasts
  options(contrasts = c(contr.unordered, contr.ordered))
  rhs <- model.matrix(formula, rhs.frame)
  options(contrasts = op.c)
  grps <- attr(rhs, "assign")
  qrhs <- qr(rhs)
  rhs <- rhs[, qrhs$pivot, drop = FALSE]
  rhs <- rhs[, 1:qrhs$rank, drop = FALSE]
  grps <- grps[qrhs$pivot][1:qrhs$rank]
  u.grps <- unique(grps)
  nterms <- length(u.grps) - 1
  if (nterms < 1) 
    stop("right-hand-side of formula has no usable terms")
  H.s <- lapply(2:length(u.grps), function(j) {
    Xj <- rhs[, grps %in% u.grps[1:j]]
    qrX <- qr(Xj, tol = TOL)
    Q <- qr.Q(qrX)
    tcrossprod(Q[, 1:qrX$rank])
  })
  if (inherits(lhs, "dist")) {
    if (any(lhs < -TOL)) {stop("dissimilarities must be non-negative")}
    dmat <- as.matrix(lhs^2)
  } else if ((is.matrix(lhs) || is.data.frame(lhs)) && isSymmetric(unname(as.matrix(lhs)))) {
    dmat <- as.matrix(lhs^2)
    lhs <- as.dist(lhs)
  } else {
    dist.lhs <- as.matrix(vegdist(lhs, method = method))
    dmat <- dist.lhs^2
  }
  n <- nrow(dmat)
  G <- -sweep(dmat, 1, rowMeans(dmat))/2
  SS.Exp.comb <- sapply(H.s, function(hat) sum(G * t(hat)))
  SS.Exp.each <- c(SS.Exp.comb - c(0, SS.Exp.comb[-nterms]))
  H.snterm <- H.s[[nterms]]
  tIH.snterm <- t(diag(n) - H.snterm)
  if (length(H.s) > 1) 
    for (i in length(H.s):2) H.s[[i]] <- H.s[[i]] - H.s[[i - 
                                                           1]]
  SS.Res <- sum(G * tIH.snterm)
  df.Exp <- sapply(u.grps[-1], function(i) sum(grps == i))
  df.Res <- n - qrhs$rank
  if (inherits(lhs, "dist")) {
    beta.sites <- qr.coef(qrhs, as.matrix(lhs))
    beta.spp <- NULL
  } else {
    beta.sites <- qr.coef(qrhs, dist.lhs)
    beta.spp <- qr.coef(qrhs, as.matrix(lhs))
  }
  colnames(beta.spp) <- colnames(lhs)
  colnames(beta.sites) <- rownames(lhs)
  #F.Mod <- (SS.Exp.each/df.Exp)/(SS.Res/df.Res)
  
  ###########NEW CODE STARTS HERE########################################
  grp_ID<- as.character(interaction(as.data.frame(rhs))) #Reference to group membership of each observation
  n_per_group<- table(grp_ID) #Sample size of each group
  VL<- rep(0, length = length(unique(grp_ID))) #Initialize
  names(VL)<- names(n_per_group)
  for (i in 1:(n-1)){
    for (j in (i+1):n) {
      d_ij2<- dmat[i,j]
      grp_i<-grp_ID[i]
      grp_j<- grp_ID[j]
      if(grp_i==grp_j){
        Eij_L<- 1 
      } else{
        Eij_L<- 0
      } #Indicator whether observations i and j are in the same group
      nL<- n_per_group[grp_i] #number of observations in group L
      VL[grp_i]<- VL[grp_i]+ (Eij_L*d_ij2/(nL*(nL-1))) #Within Group Dispersion for each group
    }
  }
  
  F.Mod_denominator<- 0 
  for (i in 1:length(VL)) {
    F.Mod_denominator<- F.Mod_denominator + ((1 - (n_per_group[i]/n))*VL[i])
  }
  names(F.Mod_denominator)<- NULL
  F.Mod_numerator<- SS.Exp.comb #Among Group Sum of Squares is equal to tr(HG), SS.Exp.comb seems to be that
  F.Mod<- F.Mod_numerator/F.Mod_denominator #Modified F statistic accounting for different dispersions
  return(F.Mod)
  ###########NEW CODE ENDS HERE########################################
  }

Example of Calculating modified F on the dune dataset

library(vegan)
data(dune)
data(dune.env)
Modified_F(dune~ Management, data = dune.env)
[1] 3.130212

pkmnsandy · 2022-07-23T10:33:45Z

For a one-way ANOVA with just a standard categorical factor I believe this would calculate the modified F (I'm getting the same modified F as the Fathom Toolbox implementation). I took the beginning of the adonis function and then added a block of code to calculate F.Mod and for now commented out the standard way F.Mod is calculated. That'd need to be added into where f.perm is determined in adonis for significance tests though and I'm not sure exactly how to do that. Also, I'm not sure how this would expand to cases more than just a standard one-way anova (though according to the paper, that is doable).

Modified_F<- function(formula, data,
permutations = 999,
method = "bray", 
strata = NULL,
contr.unordered = "contr.sum", 
contr.ordered = "contr.poly",
parallel = getOption("mc.cores")){
  EPS <- sqrt(.Machine$double.eps)
  TOL <- 1e-07
  Terms <- terms(formula, data = data)
  lhs <- formula[[2]]
  lhs <- eval(lhs, data, parent.frame())
  formula[[2]] <- NULL
  rhs.frame <- model.frame(formula, data, drop.unused.levels = TRUE)
  op.c <- options()$contrasts
  options(contrasts = c(contr.unordered, contr.ordered))
  rhs <- model.matrix(formula, rhs.frame)
  options(contrasts = op.c)
  grps <- attr(rhs, "assign")
  qrhs <- qr(rhs)
  rhs <- rhs[, qrhs$pivot, drop = FALSE]
  rhs <- rhs[, 1:qrhs$rank, drop = FALSE]
  grps <- grps[qrhs$pivot][1:qrhs$rank]
  u.grps <- unique(grps)
  nterms <- length(u.grps) - 1
  if (nterms < 1) 
    stop("right-hand-side of formula has no usable terms")
  H.s <- lapply(2:length(u.grps), function(j) {
    Xj <- rhs[, grps %in% u.grps[1:j]]
    qrX <- qr(Xj, tol = TOL)
    Q <- qr.Q(qrX)
    tcrossprod(Q[, 1:qrX$rank])
  })
  if (inherits(lhs, "dist")) {
    if (any(lhs < -TOL)) {stop("dissimilarities must be non-negative")}
    dmat <- as.matrix(lhs^2)
  } else if ((is.matrix(lhs) || is.data.frame(lhs)) && isSymmetric(unname(as.matrix(lhs)))) {
    dmat <- as.matrix(lhs^2)
    lhs <- as.dist(lhs)
  } else {
    dist.lhs <- as.matrix(vegdist(lhs, method = method))
    dmat <- dist.lhs^2
  }
  n <- nrow(dmat)
  G <- -sweep(dmat, 1, rowMeans(dmat))/2
  SS.Exp.comb <- sapply(H.s, function(hat) sum(G * t(hat)))
  SS.Exp.each <- c(SS.Exp.comb - c(0, SS.Exp.comb[-nterms]))
  H.snterm <- H.s[[nterms]]
  tIH.snterm <- t(diag(n) - H.snterm)
  if (length(H.s) > 1) 
    for (i in length(H.s):2) H.s[[i]] <- H.s[[i]] - H.s[[i - 
                                                           1]]
  SS.Res <- sum(G * tIH.snterm)
  df.Exp <- sapply(u.grps[-1], function(i) sum(grps == i))
  df.Res <- n - qrhs$rank
  if (inherits(lhs, "dist")) {
    beta.sites <- qr.coef(qrhs, as.matrix(lhs))
    beta.spp <- NULL
  } else {
    beta.sites <- qr.coef(qrhs, dist.lhs)
    beta.spp <- qr.coef(qrhs, as.matrix(lhs))
  }
  colnames(beta.spp) <- colnames(lhs)
  colnames(beta.sites) <- rownames(lhs)
  #F.Mod <- (SS.Exp.each/df.Exp)/(SS.Res/df.Res)
  
  ###########NEW CODE STARTS HERE########################################
  grp_ID<- as.character(interaction(as.data.frame(rhs))) #Reference to group membership of each observation
  n_per_group<- table(grp_ID) #Sample size of each group
  VL<- rep(0, length = length(unique(grp_ID))) #Initialize
  names(VL)<- names(n_per_group)
  for (i in 1:(n-1)){
    for (j in (i+1):n) {
      d_ij2<- dmat[i,j]
      grp_i<-grp_ID[i]
      grp_j<- grp_ID[j]
      if(grp_i==grp_j){
        Eij_L<- 1 
      } else{
        Eij_L<- 0
      } #Indicator whether observations i and j are in the same group
      nL<- n_per_group[grp_i] #number of observations in group L
      VL[grp_i]<- VL[grp_i]+ (Eij_L*d_ij2/(nL*(nL-1))) #Within Group Dispersion for each group
    }
  }
  
  F.Mod_denominator<- 0 
  for (i in 1:length(VL)) {
    F.Mod_denominator<- F.Mod_denominator + ((1 - (n_per_group[i]/n))*VL[i])
  }
  names(F.Mod_denominator)<- NULL
  F.Mod_numerator<- SS.Exp.comb #Among Group Sum of Squares is equal to tr(HG), SS.Exp.comb seems to be that
  F.Mod<- F.Mod_numerator/F.Mod_denominator #Modified F statistic accounting for different dispersions
  return(F.Mod)
  ###########NEW CODE ENDS HERE########################################
  }

Example of Calculating modified F on the dune dataset

library(vegan)
data(dune)
data(dune.env)
Modified_F(dune~ Management, data = dune.env)
[1] 3.130212

Hi, will I able to use this code for a 4-way PERMANOVA or just for one-way?

ailich · 2022-07-23T20:04:56Z

@pkmnsandy, this is unfortunately just for a one-way PERMANOVA. The Fathom implementation states that this is a "one-way (modified) PERMANOVA" and the original paper states "The F2 test statistic can be extended to allow for differences in dispersions of replicates within sites, and differences in dispersions of site centroids within regions, for relevant tests of individual factors at each spatial scale. Similar extensions can be formulated and derived for tests of individual terms in fixed, random or mixed multi-way ANOVA models including interactions. We shall leave these extensions (beyond the scope of the current contribution) for a future endeavour."

jarioksa added the feature-request label Jul 1, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PERMANOVA and the Behrens-Fisher F statistic #344

PERMANOVA and the Behrens-Fisher F statistic #344

davismj87 commented Feb 10, 2020

MarkReader commented Feb 10, 2020 via email

leholman commented Feb 19, 2020

ecologyjh commented Jun 29, 2020

ailich commented Nov 12, 2020

ailich commented Jun 7, 2021

pkmnsandy commented Jul 23, 2022

ailich commented Jul 23, 2022

PERMANOVA and the Behrens-Fisher F statistic #344

PERMANOVA and the Behrens-Fisher F statistic #344

Comments

davismj87 commented Feb 10, 2020

MarkReader commented Feb 10, 2020 via email

leholman commented Feb 19, 2020

ecologyjh commented Jun 29, 2020

ailich commented Nov 12, 2020

ailich commented Jun 7, 2021

pkmnsandy commented Jul 23, 2022

ailich commented Jul 23, 2022