Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why only 4/8 methods? #1

Open
vsandul opened this issue Feb 2, 2021 · 1 comment
Open

Why only 4/8 methods? #1

vsandul opened this issue Feb 2, 2021 · 1 comment

Comments

@vsandul
Copy link

vsandul commented Feb 2, 2021

Hi! First of all, thanks a lot for the library - it seems it will very helpful for my job! But I have a question - why there are only 4 methods instead of 8 which can be founded in Mullner's package?

@cdalitz
Copy link
Owner

cdalitz commented Feb 3, 2021

The problem with the remaining methods ("ward" and "centroid") is that they do not work with the matrix of distances as input, but its square (for the "ward" method this is even more complicated, that's why there are three (!) "ward" methods). This means, that an end user will most likely get incorrect results when using these methods, unless (s)he is very knowledgeable about the internal implementation of these methds.

Actually, I was even reluctant to exposing "median", because the implementation only works with Euclidean distances, which will mean that some users will scratch their heads due to the surprising/incorrect results of the "median" method. Nevertheless, it might be useful in some instances, so I included it.

Concerning the confusion about the "ward" method, see

Murtagh, Legendre: "Ward’s Hierarchical Agglomerative Clustering Method: Which Algorithms Implement Ward’s Criterion?" Journal of Classification 31:274-295 (2014)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants