Uniform Manifold Approximation with Two-phase Optimization

Uniform Manifold Approximation with Two-phase Optimization (UMATO) is a dimensionality reduction technique, which can preserve the global as well as the local structure of high-dimensional data. Most existing dimensionality reduction algorithms focus on either of the two aspects, however, such insufficiency can lead to overlooking or misinterpreting important patterns in the data. For this aim, we propose a two-phase optimization: global optimization and local optimization. First, we obtain the global structure by selecting and optimizing the hub points. Next, we initialize and optimize other points using the nearest neighbor graph. Our experiments with one synthetic and three real world datasets show that UMATO can outperform the baseline algorithms, such as PCA, t-SNE, Isomap, UMAP, Topological Autoencoders and Anchor t-SNE, in terms of global measures and qualitative projection results.

System Requirements

Python 3.6 or greater
scikit-learn
numpy
scipy
numba
pandas (to read csv files)

Installation

UMATO is available via pip.

pip install umato

import umato
from sklearn.datasets import load_iris

X, y = load_iris(return_X_y=True)
emb = umato.UMATO(hub_num=20).fit_transform(X)

Evaluation

Training models & Generating embedding result

We will generate embedding results for each algorithm for the comparison. The algorithms we will use are the following:

We can run each method separately, or all of them at once.

# run all datasets
bash run-benchmark.sh

# run specific dataset (e.g., MNIST dataset)
bash run-benchmark.sh mnist

This will cover PCA, t-SNE, UMAP and Topological Autoencoders. To run Anchor t-SNE, you need CUDA and GPU. Please refer to here for specification.

Qualitative evaluation

For the qualitative evaluation, we can compare the 2D visualization of each algorithm. We used the svelte web framework and d3 for the visualization.

# see visualization
cd visualization

# install requirements
npm install

# run svelte app
npm run dev

Embedding results of the Spheres dataset for each algorithm

2D visualization

Quantitative evaluation

Likewise, we compared the embedding result quantitatively. We use measures such as Distance to a measure and KL divergence between density distributions for comparison.

To print the quantitative result:

# print table result
python -m evaluation.comparison --algo=all --data=spheres --measure=all

Result for the Spheres dataset

	PCA	Isomap	t-SNE	UMAP	TopoAE	At-SNE	UMATO (ours)
DTM	0.9950	0.7784	0.9116	0.9209	0.6619	0.9448	0.3849
KL-Div (sigma=0.01)	0.7568	0.4492	0.6070	0.6100	0.1865	0.6584	0.1569
KL-Div (sigma=0.1)	0.6525	0.4267	0.5365	0.5383	0.3007	0.5712	0.1333
KL-Div (sigma=1.)	0.0153	0.0095	0.0128	0.0134	0.0057	0.0138	0.0008
Cont	0.7983	0.9041	0.8903	0.8760	0.8317	0.8721	0.7884
Trust	0.6088	0.6266	0.7073	0.6499	0.6339	0.6433	0.6558
MRRE_X	0.7985	0.9039	0.9032	0.8805	0.8317	0.8768	0.7887
MRRE_Z	0.6078	0.6268	0.7261	0.6494	0.6326	0.6424	0.6557

DTM & KL divergence: Lower is better
The winnder and runner-up is in bold.

References

Maaten, L. V. D., & Hinton, G. (2008). Visualizing data using t-SNE. JMLR, 9(Nov), 2579-2605.
McInnes, L., Healy, J., & Melville, J. (2018). Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426.
Moor, M., Horn, M., Rieck, B., & Borgwardt, K. (2020). Topological autoencoders. ICML.
Fu, C., Zhang, Y., Cai, D., & Ren, X. (2019, July). AtSNE: Efficient and Robust Visualization on GPU through Hierarchical Optimization. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (pp. 176-186).

Name		Name	Last commit message	Last commit date
Latest commit History 204 Commits
data		data
figures		figures
src		src
visualization		visualization
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
Pipfile		Pipfile
README.md		README.md
download.sh		download.sh
package-lock.json		package-lock.json
run-benchmark.sh		run-benchmark.sh
runtime.py		runtime.py
setup.py		setup.py
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Uniform Manifold Approximation with Two-phase Optimization

System Requirements

Installation

Evaluation

Training models & Generating embedding result

Qualitative evaluation

Embedding results of the Spheres dataset for each algorithm

Quantitative evaluation

Result for the Spheres dataset

References

About

Releases

Packages

Languages

License

hj-n/umato

Folders and files

Latest commit

History

Repository files navigation

Uniform Manifold Approximation with Two-phase Optimization

System Requirements

Installation

Evaluation

Training models & Generating embedding result

Qualitative evaluation

Embedding results of the Spheres dataset for each algorithm

Quantitative evaluation

Result for the Spheres dataset

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages