-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Canned dissimilarities? #196
Comments
jarioksa
pushed a commit
to jarioksa/natto
that referenced
this issue
Sep 5, 2018
this version is similar as outlined in github issue vegandevs/vegan#196 and lacks indices and lacks tools of documentation.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Function
designdist
is currently faster thanvegdist
. With"binary"
and"quadratic"
terms it is much faster thanvegdist
. With"minimum"
terms (used by most dissimilarity functions invegdist
) it used to be slower thanvegdist
, but I wrote C code with.Call()
interface to find those minimum terms (5fb205d), and now even these are faster than invegdist
. The speed comes with some cost:designdist
is higher. I madevegdist
to have.Call()
interface which further reduced the memory footprint ofvegdist
and makes the difference even larger in 2.5-0 than it used to be (and still is in 2.4-1). In the same process I also madevegdist
faster and it now matchesstats::dist()
which used to be much faster earlier. However, this does not close the gap todesigndist
(major changes in 8125d43).NA
) indesigndist
, but invegdist
we can use ´´pairwise deletion´´. For"minimum"
terms this is the main reason for fasterdesigndist
.designdist(x, "A+B-2*J", terms="quadratic")
. However, they are not numerically equivalent, but quadratic terms can lose precision and give erratic results. This concerns most other indices, and it is safer to use compiled code that was designed to be numerically more stable.designdist
coefficients must be designed and written which may be tricky for some users.The last point could be solved by providing a function of canned dissimilarity functions. We could have a long list of dissimilarity indices defined in
designdist
terms, and these could be selected with an index name. The following function demonstrates the concept:The list of indices could grow to any desired size. For instance, an article by Z. Hubalek lists 86 binary indices, and there are many more.
The function is simple, but the real challenge is documentation. The list of indices is dynamic, and when it reaches something like 200 alternatives, we need also ways of paging the output, filtering the results, finding synonyms (there are synonyms even in the list above) etc. Currently I have a simple
help
argument inbetadiver
which lists the seventeen indices available there, but this would not be sufficient for this choice of canned dissimilarities.Probably we would also want to have optional fields like
synonym
andnote
which could printmessage()
of canonical names or implementation specifics for certain indices. Perhaps also an entry onsource
could be useful to give the source reference to literature on each index (not usually the original but a text book or similar), but this would call for a more complicated design as same sources are duplicated and we do not want to write them in full for each index.What do you think of this idea. Should we have a function like this?
This popped up in issue #182 but I decided to make this a separate issue.
The text was updated successfully, but these errors were encountered: