Skip to content

How to draw Dendrogram in clustering analysis

Notifications You must be signed in to change notification settings

rcv911/dendrogram

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

Dendrogram. How to draw.

Description

It's one of the clustering methods using hierarchical clustering. We are going to use special Scipy library for Python where you can find useful function for clustering analysis saving your time. This special Scipy library in GitHub.

We are going to use this 2 data from this project but changing some parameters:

Algorithm

  • So, we have distance matrix
	d = sch.distance.pdist(X) # import scipy.cluster.hierarchy as sch

or manually (using Euclidean distance)

	for i in range(N):
		for j in range(i+1, N):
			d[j, i] = d[i, j] = (sum((X[i, :]-X[j, :])**2))**0.5

It's important. You can choose any of the metrics in Python function scipy.cluster.hierarchy.distance.pdist()

  • We know the distances between each pair of points. We assume each point is a cluster and we starts to combining them.

Important. We combine only two of the cluster at each step. Not the points. One cluster shifts as a whole to another cluster.

  • 2 stopping criteria:
    • you achieved critical distance.
    • you have the right number of clusters

Results

dendrogram for the first test data with 1 cluster dendrogram for the second test data with 3 clusters

Learn more

Installation

You can use Python with data package: Anaconda or Miniconda. There's another way - use Portable Python. Also you can use whatever IDE for Python.

License

Free