-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Methods to identify the number of nontrivial axes in wcmdscale #78
Comments
The methods that Paul & Anderson discuss look pretty simple and could be implemented. I'm not fond of these methods because I don't think they work. With "not working" I mean that I don't think that they usually give the correct number of "non-trivial" axes even when such a number exists. This also applies to An ecological classic on the number of axes is Jackson, DA (1993) Stopping rules in principal components analysis: a comparison of heuristical and statistical approaches. Ecology 74, 2204-2214. Paul & Anderson cite this paper and say they are implementing one of his heuristic methods. Then about the aspect of "not working" and why I think so. You can use > library(vegan)
> data(mite)
> m <- capscale(vegdist(wisconsin(sqrt(mite))) ~ 1)
> m2 <- capscale(vegdist(wisconsin(sqrt(mite))) ~ scores(m, dis="sites", choice=1:2))
> signbstick <-
function (model, rank="full")
{
mod <- capscale(simulate(model, rank=rank) ~ 1)
max(min(which(bstick(mod) > eigenvals(mod))) - 1, 0)
}
> table(replicate(100, signbstick(model=m2)))
2 3 4 5 6
12 24 44 17 3 This generates data with two axes plus independent random normal error and estimates the residual variance from the residuals of the model (there are other choices). In > m5 <- capscale(vegdist(wisconsin(sqrt(mite))) ~ scores(m, dis="sites", choice=1:5))
> table(replicate(100, signbstick(model=m5, rank=2)))
2 3
94 6 It is not only the number of systematic axes, but the signal/noise ratio: the non-trivial axis must be strong relative to residual variation. Unfortunately, if that is not the case, |
Thanks Jari for your thoughts (and the ref to jackson 1993)!
also discusses a lot of methods. Most of them should be easy to implement. Your examples are about constrained ordinations (capscale()) - do your critics also apply to unconstrained ordinations? |
Here is another paper that I found in last spring (northern hemisphere) and found then useful, mainly in explaining why things like broken stick should underestimate first eigenvalues, and for underlying geometry in particular: Johnstone IM (2007) High dimensional statistical inference and random matrices. Proceedings of the International Congress of Mathematicians, Madrid, Spain, pages 307-333. This is a conference paper, but I found this on the Internet so that it should be available. My example above actually studied unconstrained ordination: it only constrained the ordination by first unconstrained axes to get the desired dimensionality to the systematic part, and to get matching random variation. |
I don't think differently; I consider these at best a guide to the potential/likely dimensionality, not some infallible, all-knowing method that can discern the will of <insert favourite deity>. Bootstrapping eigenvalues seems at first blush a great solution until you realise that all this gives you is say a confidence interval on the eigenvalues and some measure of bias in the values you got for your data. You still have the problem of saying whether the eigenvalues +/- the bootstrap CI are meaningful or not. I need to revisit Pedro's paper to recall what they did. If we think these are useful as guides and perhaps may perform better than |
I would add that the more formal approaches to doing unconstrained ordination, such as using latent variable models, would allow a real statistical (as opposed to an heuristical) solution to the problem as then we'd have likelihoods to generate an LRT, or use information criteria, to choose between models with different numbers of latent variables (axes). The longer-term usage of vegan will either decline as these new-fangled methods gain traction (which undoubtedly they will, eventually), or not, if we choose to add vegan-esque implementations. Thoughts? |
This issue was raised in StackOverflow. |
Paul and Anderson (2013) discuss 4 methods to assess how many numbers of axis might be usefull/interpretable/[putwhatyouwanthere]
a) broken-stick [implemented in vegan]
b) bootstrap CI for eigenvalues
c) bootstrap from null model
d) variation of c)
I wondered if b)-c) are already implemented in vegan? And if not, if it's worth implementing them?
I have somewhere on my machine a more veganized version of their code...
The text was updated successfully, but these errors were encountered: