K-Means Clustering

Classifying the famous iris dataset using k-means clustering in C++ with SSE SIMD instructions

Motivation

The IRIS dataset is probably one of the most popular datasets for testing clustering/unsupervised machine learning algorithms.
By having 4 features (namely sepal length, sepal width, petal length, petal width), it seemed like the perfect dataset to build a model using Intel x86 SSE instructions as each observation would be 4 floats, matching perfectly to the instrinsic type __m128.
The actual SSE code is implemented in the vec4 class, so that no intrinsic calls need to be made in the actual k-means implementation.

The model does not classify all of the data points 100% correctly, but it does perform resonably well. A possible future improvement could be to change the centroids initialization algorithm, with technics such as k-means++.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.github/workflows		.github/workflows
data		data
.gitattributes		.gitattributes
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
LICENCE.txt		LICENCE.txt
README.md		README.md
build.sh		build.sh
build_run.sh		build_run.sh
iris_data.hpp		iris_data.hpp
kmeans.hpp		kmeans.hpp
main.cpp		main.cpp
vec4.hpp		vec4.hpp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

K-Means Clustering

Motivation

About

Releases

Packages

Languages

License

aotodev/iris_kmeans

Folders and files

Latest commit

History

Repository files navigation

K-Means Clustering

Motivation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages