Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Misleading summary in capscale (db-RDA)? #636

Closed
fvlampe opened this issue Apr 2, 2024 · 4 comments
Closed

Misleading summary in capscale (db-RDA)? #636

fvlampe opened this issue Apr 2, 2024 · 4 comments
Assignees

Comments

@fvlampe
Copy link

fvlampe commented Apr 2, 2024

Dear devs,

I recently calculated db-RDA with capscale and realised one thing in the model outputs:

The reported results differ when calling the model object directly with model and when calling summary(model). When calling the model object negative Eigenvalues are reported as seperate item of sum of negative eigenvalues (Imaginary; as stated in the documentary). When calling summary(model) the item of sum of negative eigenvalues is entirely missing and all proportions are calculated only based on the axes with non-negative Eigenvalues. This may be very misleading, especially when one is only using summary() to call the full results?

Thanks for all and all the best,
Friedemann

Here an example output:

> species.dbrda    
Call: capscale(formula = species.log ~ AP + MAT + t_cover + twi_alos_posgar_10 +
   heat_load_alos_posgar_10 + Condition(transect), data = env, distance = "bray", sqrt.dist = F, add = F)

               Inertia Proportion Rank
Total         50.68149    1.00000     
Conditional    1.24226    0.02451    1
Constrained   13.43771    0.26514    5
Unconstrained 42.15945    0.83185   80
Imaginary     -6.15793   -0.12150   60
Inertia is squared Bray distance 
Species scores projected from ‘species.log’ 


> summary(species.dbrda)
Call:
capscale(formula = species.log ~ AP + MAT + t_cover + twi_alos_posgar_10 +      
  heat_load_alos_posgar_10 + Condition(transect), data = env,  distance = "bray", sqrt.dist = F, add = F) 

Partitioning of squared Bray distance:
              Inertia Proportion
Total          56.839    1.00000
Conditioned     1.242    0.02186
Constrained    13.438    0.23642
Unconstrained  42.159    0.74173

Eigenvalues, and their contribution to the squared Bray distance 
after removing the contribution of conditiniong variables
@fvlampe fvlampe changed the title Misleading/wrong summary in capscale (db-RDA) Misleading summary in capscale (db-RDA)? Apr 2, 2024
@jarioksa
Copy link
Contributor

jarioksa commented Apr 2, 2024

Please do not call capscale to db-RDA: we do have a function for db-RDA, and it is called dbrda. capscale is something related, but not the real thing.

I really do not know what is right and wrong in handling negative eigenvalues in capscale. You can see this scanning commit history of capscale where alternative ways have been alternating. It is clear that now we have two inconsistent ways of reporting negative eigenvalues. In old times we did not report negative eigenvalues at all, and their handling was fixed in release 1.17-6 (2011-01-10). In vegan 2.4-0 (2016-06-15) we introduced real dbrda and in this connection we also added reporting on negative eigenvalues in capscale. However, we did not notice that we should change the summary as well – mainly because nobody of us was using summary any longer – and @gavinsimpson had promised to rewrite summary anyway (issue #203). We also discussed deprecating and ultimately removing capscale function (issue #217) in favour of dbrda, but I made a mistake and let it linger with minimal maintenance.

I'll make the summary consistent with the modern way.

@jarioksa
Copy link
Contributor

jarioksa commented Apr 4, 2024

First a WARNING: I made a wrong assumption about weights in wcmdscale and this precipitates in underestimating the magnitude of imaginary components in capscale. For correct results and to replicate the results in this message, you need to use either the CRAN release (version 2.6-4) or a current development version. Github version of vegan from November 2022 underestimated the imaginary components.

Then to capscale.

I have now gone through handling the imaginary components in various vegan functions. We seem to have three different ways of handling those, but I am not sure which of these are misleading (or at least two may be misleading, but perhaps nobody wants to have the third one).

We have two related functions: capscale and the real db-RDA dbrda. capscale was released in vegan 1.6-0 (Oct 2003), and dbrda in 2.4-0 (Jun 2016). capscale works by analysing Euclidean mapping input dissimilarities. With non-Euclidean dissimilarities this cannot be done exactly but there are some negative eigenvalues with associated imaginary dimensions that correct the errors of Euclidean mapping. These are ignored in capscale. dbrda works with original dissimilarities also taking into account the imaginary dimensions (per component). I considered dropping capscale from vegan after introducing dbrda and we briefly discussed this (issue #217), and while the majority voted for removing capscale I still kept the function (which now again begins to look like a mistake).

Here a comparison of printed output of unconstrained analysis of Dune meadow data with capscale and dbrda:

Call: capscale(formula = d ~ 1)
              Inertia Rank
Total          4.2990     
Unconstrained  4.5939   14
Imaginary     -0.2949    5
Inertia is squared Bray distance 

Call: dbrda(formula = d ~ 1)
              Inertia Rank RealDims
Total           4.299              
Unconstrained   4.299   19       14
Inertia is squared Bray distance 

The total inertia – or the sum of all eigenvalues – is estimated directly from the input dissimilarities prior to ordination, and it is the same in both analyses. This inertia is the same as the total variation in the data. In dbrda this total inertia is directly decomposed into constrained and residual variation and both of these may have negative eigenvalues (though usually only unconstrained component has those apart from overfitted models). In capscale the unconstrained component is larger than the total inertia, meaning that they display higher variance than the data. That extra variance would be handled by the imaginary component, and Total = Unconstrained + Imaginary (with minus sign for Imaginary component). capscale ignores the imaginary component and only analyses the real component. For capscale we have two different ways of expression the variation of these components, like you reported:

### print
Call: capscale(formula = d ~ Management + Moisture, data = dune.env)
              Inertia Proportion Rank
Total          4.2990     1.0000     
Constrained    2.6771     0.6227    6
Unconstrained  1.9169     0.4459   13
Imaginary     -0.2949    -0.0686    5
Inertia is squared Bray distance 

### summary
Partitioning of squared Bray distance:
              Inertia Proportion
Total           4.594     1.0000
Constrained     2.677     0.5827
Unconstrained   1.917     0.4173

The first default print compares proportions to the total inertia of the input matrix (4.299), but instead we analysed its Euclidean mapping which had higher inertia and ignored the imaginary component. This means that for inertiae, constrained + unconstrained > total, and for proportion constrained + unconstrained > 1, or in this case 1.07 (107%). The summary (which uses eigenvals function) compares the components to the analyses Euclidean mapping and for them 2.677 + 1.917 = 4.594, and for proportions 0.5827 + 0.4173 = 1. Surely one of these ways is misleading, but I'm not sure which one.

It is clear that we have been inconsistent, and this inconsistency must be cleared. We have used three different strategies:

  1. Only consider the real component and ignore imaginary. This has been the most common strategy. It is used in summary as well as in eigenvals and its summary that gives the cumulative proportions. It is also used in permutation tests in anova & permutest.
  2. Take into account both the imaginary and real components. About the only place where we use this is the printed brief result of the analysis that I have displayed above. In addition, there are several support functions which can handle imaginary dimensions although this is not visible to the user.
  3. Refuse to analyse data if there are negative components because it is not clear if this should be done after point 1 or point 2. In addition there are some functions which were not implemented for capscale for similar reasons. For instance bstick (broken sticks) refuse results with negative eigenvalues, either capscale or dbrda. An amusing detail is that summary of eigenvals for dbrda refuses to give cumulative proportions explained, although it gives these proportions for each axis (and with positive sign for negative eigenvalues – this clearly needs to be fixed.

Something must be done, but I'm not sure what. Option one is to follow the majority of methods that compare the proportions to real components and ignores imaginary. The only change needed is to make the brief result look like:

              Inertia Proportion Rank
Total          4.2990
Real           4.5939     1.0000     
Constrained    2.6771     0.5827    6
Unconstrained  1.9169     0.4173   13
Imaginary     -0.2949               5

Option 2 is to change other functions to compare proportions to the original total inertia, which means that people must learn to live with >100% proportions. Obviously this concerns eigenvals (where dbrda needs fixing anyway), but also some other functions.

In both cases those functions that refuse to handle negative eigenvalues could be made consistent with this decision.

Finally, I'm still more attracted to deprecating capscale and pointing people to use dbrda instead.

Opinions are appreciated.

@jarioksa jarioksa self-assigned this Apr 4, 2024
@fvlampe
Copy link
Author

fvlampe commented Apr 4, 2024

Thanks a lot for taking care of this topic so quickly and for the detailed analysis and explanation!

I am not expert enough to have a clear opinion, but it seems to me that Option 2 is very unintuitive and explained proportions seem overestimated and not meaningful.
Therefore consistent integration of Option 1 would allow to always recognize the presence of the imaginary component and get consistent proportions. Maybe even a note could be printed, indicating that the imaginary component is ignored in calculating the proportions. In that case, I guess the user should then go for Option 3 and apply one of the available corrections to avoid negative components at all.
However, dbrda seems to follow Option 2, but is simply substracting the imaginary component from the unconstrained component? I wonder if explained proportions for the constrained variation are meaningful at all here (suggesting Option 3)...

I have remade my analysis with dbrda and don't see any reason to keep capscale right now.

jarioksa added a commit that referenced this issue Apr 5, 2024
Earlier we used the observed total from input dissimilarities that
could have negative eigenvalues. Analysis discards these negative
axes and now we assess proportions with the Euclidean mapping
discarding negative eigenvalues and hence higher total.
@jarioksa
Copy link
Contributor

jarioksa commented Apr 5, 2024

I think I have now changed most things to be consistent following option 1: ignore imaginary components and only base proportions on real components:

Call: capscale(formula = dune ~ Condition(Moisture) + Management, data
= dune.env, distance = "bray")

              Inertia Proportion Rank
Total          4.2990                
RealTotal      4.5939     1.0000     
Conditional    1.7440     0.3796    3
Constrained    0.9331     0.2031    3
Unconstrained  1.9169     0.4173   13
Imaginary     -0.2949                
Inertia is squared Bray distance 

The corresponding part in summary reads:

Partitioning of squared Bray distance:
              Inertia Proportion
Total          4.5939     1.0000
Conditioned    1.7440     0.3796
Constrained    0.9331     0.2031
Unconstrained  1.9169     0.4173

RsquareAdj was also changed to return:

$r.squared
[1] 0.2031098

$adj.r.squared
[1] 0.133984

Many other functions changed on the way. See commit history if you want to see all changes. Some of these changes do not directly relate to these capscale changes, but things I found while inspecting this issue.

Some bigger associated changes are:

  • summary was made less verbose per wishes in issue Provide less verbose output in summary.foo for ordinations #203.
  • dbrda: summary of eigenvals shows now proportions of eigenvalues against sum of all eigenvalues, or the original inertia of the input dissimilarities. This means that proportions add up to >1 for positive eigenvalues and goes down to 1 with negative eigenvalues. Here I'm open to opinion. I haven't yet decided how to handle broken sticks of eigenvalues: now bstick refuses to touch dbrda because she doesn't know what is the total inertia.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants