Skip to content

Research project to perform feature selection in a fully unsupervised scenario

License

Notifications You must be signed in to change notification settings

marcosd3souza/MUI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Framework

Unsupervised Feature Selection Methodology for Clustering in High Dimensionality Datasets

Research project to perform feature selection in unsupervised learning

Abstract

Feature selection is an important research area that seeks to eliminate unwanted features from datasets. Many feature selection methods are suggested in the literature, but the evaluation of the best set of features is usually performed using supervised metrics, where labels are required. In this work we propose a methodology that tries to aid data specialists to answer simple but important questions, such as: (1) do current feature selection methods give similar results? (2) is there is a consistently better method ? (3) how to select the m-best features? (4) as the methods are not parameter-free, how to choose the best parameters in the unsupervised scenario? and (5) given different options of selection, could we get better results if we fusion the results of the methods? If yes, how can we combine the results? We analyze these issues and propose a methodology that, based on some unsupervised methods, will make feature selection using strategies that turn the execution of the process fully automatic and unsupervised, in high-dimensional datasets. After, we evaluate the obtained results, when we see that they are better than those obtained by using the selection methods at standard configurations. In the end, we also list some further improvements that can be made in future works.

Pre-Conditions

It's necessary install some packages/softwares before execution:

  • Octave GNU Linux
  • Scikit-learn (package to execute K-Means)
  • oct2py (package to execute octave code)
  • Anaconda 1.6+

Next, execute by

python /src/main/execution/main.py

Reference

de Souza Oliveira, M., & Queiroz, S. (2020). Unsupervised feature selection methodology for clustering in high dimensionality datasets. Revista de Informática Teórica e Aplicada, 27(2), 30-41.

@article{de2020unsupervised,
  title={Unsupervised feature selection methodology for clustering in high dimensionality datasets},
  author={de Souza Oliveira, Marcos and Queiroz, Sergio},
  journal={Revista de Inform{\'a}tica Te{\'o}rica e Aplicada},
  volume={27},
  number={2},
  pages={30--41},
  year={2020}
}

PDF

https://www.seer.ufrgs.br/rita/article/download/RITA_VOL27_NR2_30/pdf

About

Research project to perform feature selection in a fully unsupervised scenario

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published