Fast fuzzy clustering C (MEX API) implementation for MATLAB (FCM, Gustafson-Kessel, clustering validity, extrapolation with presumed cluster centers)
The only planned changes at this time are to improve the documentation, i.e. this README file.
This repository provides efficient C implementation of two types of Fuzzy c-Means Clustering algorithms and several related tools that can be used in clustering evaluation and validation. Once installed, the provided functions are directly callable from MATLAB and will run considerably faster compared to similar fuzzy clustering packages that are implemented purely in MATLAB language. To use these algorithms outside of MATLAB, one would need to modify the existing C code. Additional implementation details that may be useful are included in the implementation section below.
The following functionality is supported:
- Fuzzy c-Means Clustering
(based on the work of James C. Bezdek and Enrique Ruspini.)
- Euclidean, diagonal, and Mahalanobis distance metric
- Gustafson-Kessel fuzzy c-means clustering
- Initial fuzzy partition matrix generation
- 12 Clustering validity functionals:
- partition coefficient, partition entropy, nonfuzzy index, minimum and mean hard tendencies, minimum and maximum relative fuzziness, the minimum nearest maximum membership cardinality, compactness and separation index, fuzzy hypervolume, average partition density, and the partition density of a resulting fuzzy partition
- Fuzzy partition matrix extrapolation that uses the presumed cluster centers
- Fuzzy scatter and covariance matrices calculation for fuzzy clusters
- Test code (in MATLAB) to verify the proper installation of software
- Detailed help text describing how to use the provided functions
The primary motivation for this work was to estimate membership functions of the Temporal Fuzzy Sets which span the Fuzzy Information Space. Nevertheless, the provided software package may be used for solving any problem that requires fuzzy c-means clustering, e.g. tracking of speech and noise power changes, echo path estimation, analysis, and characterization, etc. Fuzzy clustering has been applied within a wide range of engineering and research disciplines including geology, agriculture, social sciense, etc.
The installation may be done either by downloading the latest release which includes generated binaries for the Windows x64 MEX files (MATLAB 2022b used), or by cloaning or downloading the source code. The process of generating your own binaries from the source code is straighforward.
To use this software one must have MATLAB installed and if you plan to generate the MEX binaries you must be able to run the mex command. This command is used for generating the accelerated versions of the three main algorithms from their C source code.
Once you verified you have MATLAB installed you can download
the latest release.
All the
necessary files are contained within the fcmclt_M_m_p_w64.zip
file. When you unpack the
zip archive, the MATLAB scripts and the binaries will be contained in the
./src
folder. You may optionally copy the contents of ./src
directory to a folder where you
want MATLAB
to access it. As an example, let us assume that you copied all the files
from ./src
to D:\MyMatlab\fcmclt
on a Windows system.
You may now proceed to testing your installation as described below.
Once you verified you have MATLAB installed and you can successfully run mex command to build MATLAB functions implemented in C language, you may proceed with the following steps:
- Clone or download this repository to the system where you have MATLAB installed
- Copy the contents of
./src
directory to a folder where you want MATLAB to access it- As an example, let us assume you copied all the files from
./src
toD:\MyMatlab\fcmclt
on a Windows system.
- As an example, let us assume you copied all the files from
- Start MATLAB and within MATLAB change directory to
D:\MyMatlab\fcmclt
>> cd D:\MyMatlab\fcmclt
- Run the following three
mex
commands:mex extfpm.c mex fcmc.c mex gkfcmc.c
- You should see three new files created:
extfpm.<mexext>
,fcmc.<mexext>
,gkfcmc.<mexext>
, where<mexext>
reflects the mex file extension on your system. For 64-bit Windows system, it should bemexw64
.
- You can test your installation by running test routines that were provided in
./src
folder and that you copied toD:\MyMatlab\fcmclt
, e.g. try running>> testfcmc
from that folder. - Try all other test routines. They have prefix
test
ortst
.- NOTE: Some may display a warning indicating the dataset may be too small for reliable clustering. That is normal and expected behavior for those examples.
If you want to make the fcmclt
package available to you no matter which folder you are in,
you may
add the D:\MyMatlab\fcmclt
to your MATLAB path. You may consult MATLAB documentation
if you do not know how to do it.
The MEX files provided with the release have been generated using MATLAB R2022b on Windows x64 OS. The C code did not undergo any changes since MATLAB R13. The generation of MEX binaries has been tested most recently with 2018b, 2019b, and 2022b MATLAB releases.
The best way to learn how to use the functions provided in this package is to read their
documentation. You may run >> help <function-name>
to get the function description.
In addition to the help text for the three main algorithms (fcmc, gkfcmc, and extfpm),
you should consult the MATLAB source code for the test routines provided in .m
files. That
will provide you with examples of how to integrate these routines into your programs
and how to initialize fuzzy partition matrix or evaluate the results.
The MATLAB source code may also provide references to additional reading material. The notation used in the source code is based primarily on this book:
- J.C. Bezdek, "Pattern Recognition with Fuzzy Objective Function Algorithms," Plenum Press, New York, 1981.
The support is not provided for this package.
The initial implementation of this package was done between 1992-1995. There were only a handful of minor changes done since that time. Those were limited to accomodate the MEX API changes introduced by MathWorks, Inc. over a long period of time (25+ years).
The code in ./src
folder contains C and MATLAB sources. The C files are:
extfpm.c
: Extrapolates the fuzzy partition matrix using the presumed cluster centersfcmc.c
: Implements the Fuzzy c-Means Clustering (FCMC) algorithmgkfcmc.c
: Implements Gustafson-Kessel (GK) variant of the FCMC algorithm
The relevant MATLAB files are:
Contents.m
: Contains help text for thefcmclt
package.fcmcinit.m
: Very important routine that is used to generate the initial fuzzy partition matrix U0.cltvalid.m
: Calculates clustering validity functionalsfscat.m
: Calculates fuzzy scatter and covariance matrices for a fuzzy clustertestextfpm.m
: Example of how to extrapolate fuzzy partition matrix from the presumed cluster centerstestfcmc.m
: Example of how to use FCMC algorithmtestgk.m
: Example of GK variant for two Gaussian classestestgk2.m
: Example of GK for Gustafson's crosststvalid.m
: Example of how to use validity functionals for the FCMC algorithm
The C functions include only two header files. The math.h
and mex.h
. The only
two core MATLAB algorithms that are invoked within the C code are the matrix inverse and
determinant.
For brevity, only the fcmc.c
code structure is outlined here:
- The main function called by MATLAB is
mexFunction()
- Checks the input arguments and applies the defaults for optional ones as needed
- Fetches pointers to vector data
- Allocates some memory
- Prepares for the FCMC routine
- Selects the distance metric (may allocate additional memory and may invoke MATLAB inverse function in case of Mahalanobis metric)
- Executes the FCMC algorithm (
do_fcm()
) - Checks and creates output variables
- All other functions are written in plain C code
In essense, the mexFunction()
is a MATLAB wrapper for the algorithms that are to be
accelerated.
NOTE: If you would like to use the C code outside of MATLAB, you may need to do a few things:
- Make it re-entrant. That is, you would need to instantiate the variables that are currently global or static. That was not a problem for the MATLAB implementation, but may create problems if you try to integrate this code somewhere else.
- You would need to provide compatible implementations for matrix inverse and determinant MATLAB functions in case you have to use the parts of code that depend on these.
- Finally, you would need to replace the
mexFunction()
implementation with your own wrapper that would handle memory management, optional input arguments, etc. The memory management you will need to implement will likely differ from the way MATLAB manages memory. Especially when it comes to allocating and deallocating memory.
The code in gkfcmc.c
and extfpm.c
files is structured in a similar way.
There are no plans to expand this package with additional features.
Pull requests will only be considered for the following contributions:
- Bug fixes (if you can find any)
- Interesting examples that show how to use the provided functions
- Please try to provide a single self-contained MATLAB file that does not require any data files. If additional MATLAB toolboxes are required, please list them all.
- Document your code! If your code cannot be easily reviewed it will be swiftly rejected no matter how great you think it is.
The software provided here has been developed by Bogdan Kosanovic in the early 1990s during his Ph.D. work at the University of Pittsburgh.
The author would like to acknowledge Dr. Brahim Hamadicharef
who was at the time
(August, 2003) with the
University of Plymouth for submitting a fix to make the fcmclt
package work for
MATLAB 6.5 R13. If we are to trust the LinkedIn, Dr. Hamadicharef is now (2022) with
the Institute of High Performance Computing (IHPC) in Singapore.
For general information on fuzzy clustering, partition coefficient (F), and partition entropy (H), the best starting point is
J.C. Bezdek, "Pattern Recognition with Fuzzy Objective
Function Algorithms", Plenum Press, New York, 1981.
For more information on nonfuzzy index (NFI), please refer to
M. Roubens, "Pattern Classification Problems and Fuzzy Sets",
Fuzzy Sets and Systems, 1:239-253, 1978.
For more information on minimum and mean hard tendencies (MinHT and MeanHT), please refer to
F.F. Rivera, E.L. Zapata, and J.M. Carazo, "Cluster validity
based on the hard tendency of the fuzzy classification",
Pattern Recognition Letters, 11:7-12, 1990.
For more information on minimum and maximum relative fuzziness (MinRF and MaxRF), please refer to
H.L. Gordon and R.L. Somorjai, "Fuzzy Cluster Analysis of
Molecular Dynamics Trajectories", Proteins: Structure,
Function, and Genetics, 14:249-264, 1992.
For more information on minimum nearest maximum membership cardinality (MinNMMcard) and related functionals including a way to go about the selection of fuzzy exponent m, please refer to
B.R. Kosanovic, "Signal and System Analysis in Fuzzy Information Space",
Ph.D. Dissertation, University of Pittsburgh, 1995.
For more information on compactness and separation index (S), please refer to
X.L. Xie and G. Beni, "A Validity Measure for Fuzzy Clustering",
IEEE Trans. PAMI, 13(8):841-847, 1991.
For more information on fuzzy hypervolume (Fhv), average partition density (Dpa), and partition density (Pd), please refer to
I. Gath and A.B. Geva, "Fuzzy clustering for the estimation of
the parameters of the components of mixtures of normal
distributions", Pattern Recognition Letters, 9:77-86, 1989.