Skip to content

Agglomerative based clustering on gene expression dataset

Notifications You must be signed in to change notification settings

havelhakimi/gene-expression

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Agglomerative Clustering Analysis on gene expression dataset

This a solution notebook to an assignment question given in a Data Mining graduate course. Each code block is accompanied by relevant analysis wherever required.
Dataset link: https://archive.ics.uci.edu/ml/datasets/gene+expression+cancer+RNA-Seq
Broadly, the following steps have been performed in this solution notebook:

  • Minimal preprocessing on the dataset
  • Explained wide usage of Agglomerative clustering over Divisive Clustering
  • Visualization of given class labels using TSNE
  • Ran agglomerative clustering using the following linkages {single, complete, group average, minimum variance}.
    • Compared the clustering performance both visually and empirically on the dataset.
    • Reported the best results on various cluster validity indices.
  • These above assumptions and the flow of work is according to the questions asked in assignment.