Skip to content

K-Means clustering algorithm from scratch in C++ with SSE SIMD instructions

License

Notifications You must be signed in to change notification settings

aotodev/iris_kmeans

Repository files navigation

K-Means Clustering

CMake license

Classifying the famous iris dataset using k-means clustering in C++ with SSE SIMD instructions


Motivation

The IRIS dataset is probably one of the most popular datasets for testing clustering/unsupervised machine learning algorithms.
By having 4 features (namely sepal length, sepal width, petal length, petal width), it seemed like the perfect dataset to build a model using Intel x86 SSE instructions as each observation would be 4 floats, matching perfectly to the instrinsic type __m128.
The actual SSE code is implemented in the vec4 class, so that no intrinsic calls need to be made in the actual k-means implementation.

The model does not classify all of the data points 100% correctly, but it does perform resonably well. A possible future improvement could be to change the centroids initialization algorithm, with technics such as k-means++.


About

K-Means clustering algorithm from scratch in C++ with SSE SIMD instructions

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published