You can also find all 40 answers here π Devinterview.io - Curse of Dimensionality
The Curse of Dimensionality refers to challenges and limitations that arise when working with data in high-dimensional spaces. Although this concept originated in mathematics and data management, it is of particular relevance in the domains of machine learning and data mining.
-
Data Sparsity: As the number of dimensions increases, the available data becomes sparse, potentially leading to overfitting in machine learning models.
-
Metric Space Issues: Even simple measures such as the Euclidean distance can become less effective in high-dimensional spaces. All points become 'far' from one another, resulting in a lack of neighborhood distinction.
-
Computational Complexity: Many algorithms tend to slow down as data dimensionality increases. This has implications for both training and inference.
-
Increased Noise Sensitivity: High-dimensional datasets are prone to containing more noise, potentially leading to suboptimal models.
-
Feature Selection and Dimensionality Reduction: These tasks become important to address the issues associated with high dimensionality.
-
Curse of Dimensionality and Hyperparameter Tuning: As you increase the number of dimensions, the space over which you are searching also increases exponentially, which makes it more difficult to find the optimum set of hyperparameters.
-
Object Recognition: When dealing with images in high-resolution, traditional methods may struggle due to the sheer volume of pixel information.
-
Computational Chemistry: The equations used to model chemical behavior can handle only up to a certain number of atoms, which creates the need for dimensionality reduction in such calculations.
-
Feature Engineering: Domain knowledge can help identify and construct meaningful features, reducing dependence on raw data.
-
Dimensionality Reduction: Techniques like Principal Component Analysis (PCA) and t-distributed Stochastic Neighbor Embedding (t-SNE) aid in projecting high-dimensional data into a lower-dimensional space.
-
Model-Based Selection: Some algorithms, such as decision trees, are inherently less sensitive to dimensionality, making them more favorable choices for high-dimensional data.
2. Explain how the Curse of Dimensionality affects distance measurements in high-dimensional spaces.
The Curse of Dimensionality has significant implications for measuring distances in high-dimensional spaces. As the number of dimensions increases, the "neighborhood" of points expands, making it more challenging to accurately capture relationships between them.
In high-dimensional spaces:
-
Sparsity results in data points being far apart from each other, making density estimation and model fitting difficult.
-
Volume Expansion causes the space that contains the data to grow exponentially. The majority of the data is in the corners or on the edges of the space, making it harder to generalize.
-
Concentration suggests that the majority of data concentrates within a certain distance from the origin, leading to an uneven distribution.
-
Maturity refers to the saturation of distances beyond a certain dimensionality, where adding more features doesn't significantly change the distribution of distances.
The problem isn't with specific distance metrics, but rather with the perception of similarity or dissimilarity from the multidimensional data.
-
Euclidean distance: It's familiar in lower dimensions but becomes less meaningful as the number of dimensions increases. Points become increasingly equidistant from each other in higher dimensions, affecting the discriminative power of this measure.
-
Cosine similarity: This measure is commonly used for comparing text data in a high-dimensional space. It captures the angle between vectors, providing stable measurements even in high dimensions.
-
Mahalanobis distance: Adjusted for the covariance structure of the data, making it suitable when features are correlated. However, dimensionality limits its applicability.
-
Minkowski metric: This generalized metric reduces to Euclidean, Manhattan, or Chebyshev distances based on its parameters.
Despite these efforts, none of the methods fully resolve the diminishing effectiveness of distance measures beyond a certain number of dimensions.
While the Curse of Dimensionality remains an open challenge in high-dimensional spaces, several strategies help address its impact:
-
Dimensionality Reduction: Methods like Principal Component Analysis (PCA) optimize feature sets, cutting dimensionality while preserving variance and information content.
-
Feature Selection: Identifying the most influential variables can streamline datasets and alleviate sparsity.
-
Local Distance Measures: Focusing on immediate neighbors or data clusters can offer more accurate proximity estimations.
-
Localized Representations: Embedding high-dimensional data into a lower-dimensional space using methods like t-distributed Stochastic Neighbor Embedding (t-SNE) is effective for visualization and exploratory analysis tasks.
5.Algorithmic Considerations: Certain models, such as k-nearest neighbors, are more sensitive to the Curse of Dimensionality. Choosing algorithms robust to this challenge is crucial.
The curse of dimensionality highlights numerous issues when dealing with high-dimensional datasets. Such datasets can cause problems with computational resources, algorithmic performance, storage requirements, and interpretability.
As the number of dimensions increases, computational demands can grow exponentially or combinatorially.
Example: In Euclidean space, computing pairwise distances among points is
def pairwise_distances(X):
n = X.shape[0]
distances = np.zeros((n, n))
for i in range(n):
for j in range(n):
distances[i, j] = np.linalg.norm(X[i] - X[j])
return distances
In high-dimensional spaces, noise might have a stronger effect, making it more challenging to discern useful patterns.
Example: In 2D space, two points
With very high-dimensional data, models may fit the noise in the training set as opposed to the actual underlying patterns, leading to poor generalization.
Example: In a linear regression setting, if you have 1000 features for just 10 training examples, the model can produce perfect predictions on the training data by adjusting to noise, resulting in poor predictions on new data.
High-dimensional datasets might be sparse, meaning that most data points do not have non-zero values for many dimensions.
Example: In a bag-of-words representation of text data, a document might have thousands of features, each representing the presence or absence of a distinct word. However, a vast majority of these will be zeros for most documents.
Directly visualizing data in more than three dimensions is impossible for humans. While projection techniques can reduce dimensionality for visualization, such methods can introduce distortions.
Visual and interpretative shortcomings in multidimensional datasets can give rise to biases in machine learning models.
- Feature Selection: Choose only the most relevant features to reduce redundancy and overfitting risk.
- Manifold Learning: Approaches like t-distributed Stochastic Neighbor Embedding (t-SNE) can plot high-dimensional data in 2D or 3D, making it more visually understandable.
- Regularization: Techniques like Lasso Regression can help automatically select important features, minimizing the impact of irrelevant ones.
Sparsity addresses the distribution of data points within a high-dimensional space and plays a significant role in mitigating the Curse of Dimensionality.
Hyper-dimensional datasets are characterized by sparsity, meaning that most points are concentrated within a subset of all dimensions. This suggests the existence of a low-dimensional structure, allowing for more manageable data representations and better performing algorithms.
Identifying and keeping only the most relevant features decreases both the dimensionality and the effects of the curse. For instance, the LASSO algorithm minimizes the absolute sum of feature coefficients, often resulting in a subset of nonzero coefficients, effectively selecting features.
Methods like Principal Component Analysis (PCA) use transformations to create new sets of uncorrelated variables, or principal components, while preserving as much variance as possible. This technique can help in reducing dimensions.
Applying the kernel trick in Support Vector Machines allows for implicit high-dimensional mapping. This means that optimization in a kernel space might circumvent the need to handle the true high dimensionality of data.
Random projection algorithms, like the Johnson-Lindenstrauss lemma, transform a dataset using a random projection matrix, reducing dimensionality while maintaining pairwise distances to a certain degree.
Autoencoders use neural networks to reconstruct input through a lower-dimensional representation, often called the bottleneck layer. This process effectively learns a compressed representation of the input data.
The Curse of Dimensionality characterizes the challenges that arise when dealing with high-dimensional datasets. These challenges have significant ramifications for machine learning algorithms.
-
Increased Data Complexity: With higher dimensions, data points become deceptively "far apart," making it harder for algorithms to discern relationships.
-
Increased Data Sparsity: Datasets in high-dimensional spaces can become extremely sparse, often leading to overfitting.
-
Computational Demands: As the number of dimensions grows, algorithms, such as k-Nearest Neighbors, are computationally more intensive.
-
Redundant Information: High dimensions can introduce data redundancies, which may confuse the learning algorithm.
-
Challenges in Visualizations: Beyond three dimensions, human comprehension becomes nearly impossible, making it difficult to interpret or validate results.
-
Feature Selection and Engineering: Opt for a subset of features relevant to the task at hand, which can reduce dimensionality.
-
Dimensionality Reduction Techniques: Methods like Principal Component Analysis (PCA), t-Distributed Stochastic Neighbor Embedding (t-SNE), and autoencoders transform high-dimensional data into a more manageable form.
-
Experimentation: It's crucial to consider dimensional effects when designing machine learning systems and to test algorithms across different dimensionalities to gauge performance.
-
Domain Knowledge: Understanding the underlying problem can help identify the most crucial features, potentially reducing the curse's impact.
6. Can you provide a simple example illustrating the Curse of Dimensionality using the volume of a hypercube?
Let's use the concept of a hypercube to intuitively understand the "Curse of Dimensionality."
We can start with a 2D square and then extend to higher dimensions and observe the drastic increase in volume and sparsity of data points as dimensionality grows.
Consider an
- Its volume can be calculated as
$V_n = s^n$ . - The distance between two corners (diameter) of the hypercube, using the Pythagorean theorem, is given by:
$d_n = \sqrt{n} \times s$ .
-
Volume Growth: The volume of the hypercube increases rapidly with
$n$ . For example, the length of each side is 1. If the number of dimensions,$n$ , is 2, 3, 4, 5, and so on, the volumes of the hypercubes are: 1, 1, 1, 1,... (no change), showing that they all occupy the same space. -
Residual Volume: This rests relative to the volume of an
$n$ βββ1 dimensional simple hypersurface that encloses it. Thus, as a ratio, the residual volume within a hypercube diminishes rapidly to 0 as$n$ becomes large. -
Data Sparsity: This decrease in residual volume results in a myriad of points that are found concentrated near the hypersurface's vertex.
Here is the Python code:
import numpy as np
# Define function to calculate hypercube volume
def calculate_volume(s, n):
return s**n
# Evaluate volume for a 2D hypercube
s_2d = 1 # Side length is 1
n_2d = 2
volume_2d = calculate_volume(s_2d, n_2d)
# Evaluate volume for a 3D hypercube
s_3d = 1 # Side length is 1
n_3d = 3
volume_3d = calculate_volume(s_3d, n_3d)
# Define function to calculate hypercube diameter
def calculate_diameter(s, n):
return np.sqrt(n) * s
# Evaluate diameter for 2D hypercube
diameter_2d = calculate_diameter(s_2d, n_2d)
# Evaluate diameter for 3D hypercube
diameter_3d = calculate_diameter(s_3d, n_3d)
print("2D Hypercube Volume: ", volume_2d)
print("3D Hypercube Volume: ", volume_3d)
print("2D Hypercube Diameter: ", diameter_2d)
print("3D Hypercube Diameter: ", diameter_3d)
Feature selection is a crucial step in machine learning that not only boosts model performance and interpretability, but also combats the curse of dimensionality, particularly in high-dimensional datasets.
-
Computational Efficiency: Reducing the number of features lowers the computational load. This is especially beneficial for algorithms that struggle with high dimensions such as
$k$ -means clustering. -
Improved Generalization: By focusing on the most relevant features, the risk of overfitting due to noise from irrelevant or redundant features is minimized.
-
Enhanced Model Interpretability: Simplifying models makes it easier to understand how and why predictions are made.
Let's see an example in Python about feature selection with PCA to reduce dimensionality. Here is the code:
import numpy as np
from sklearn.decomposition import PCA
# Simulated data with 1000 samples and 50 features
X = np.random.rand(1000, 50)
# Apply PCA to reduce dimensionality to 10 components
pca = PCA(n_components=10)
X_reduced = pca.fit_transform(X)
print(X_reduced.shape) # Output: (1000, 10)
8. How does the curse of dimensionality affect the performance of K-nearest neighbors (KNN) algorithm?
Curse of Dimensionality describes the various challenges that come with high-dimensional data, adversely impacting machine learning algorithms like
KNN's efficacy deteriorates in high dimensions due to the following factors:
-
Increased Local Density Variation: As dimensions increase, the average distance between points also grows, making the
$k$ nearest neighbors likely to come from the same cluster, diluting their discriminatory power. -
Inflating Distance Measures: In higher dimensions, Euclidean distances can misrepresent true spatial relationships due to data flattening.
-
Hunter's Paradox: Additional dimensions may lead to even more data needed to capture the same density in the vicinity. This means that, in high dimensions, the regions enclosed by a
$k$ -NN classifier's decision boundaries may become increasingly sparse, needing a disproportionately large amount of data to estimate accurately.
Here is the Python code:
All points in the high-dimensional sphere will be considered as inside, which means that while training, the
import numpy as np
import matplotlib.pyplot as plt
# Generate data
np.random.seed(1)
N = 1000
D = 50
X = np.random.randn(N, D)
# Estimate proportion of points within a unit hypersphere
num_inside = 0
for i in range(N):
if np.sqrt(sum(X[i, :] ** 2)) < 1:
num_inside += 1
prop_inside = num_inside / N
estimated_volume = (2 ** D) * prop_inside
# Calculate true volume
true_volume = np.pi ** (D / 2) / (np.math.factorial(D/2))
# Show results
print('Estimated Volume: %f' % estimated_volume)
print('True Volume: %f' % true_volume)
In this example, we compare the estimated volume of a unit hypersphere in 50 dimensions based on a random sample to the true volume obtained from a mathematical expression. You will see that the estimated volume is far from the true volume. This is an example of how data sparsity emerges in high-dimensional spaces.
These phenomena adversely affect models like KNN, emphasizing the importance of dimensionality reduction and feature selection to mitigate such challenges.
The Curse of Dimensionality refers to the challenges and limitations that arise when dealing with datasets containing a large number of features or dimensions, particularly in the context of machine learning.
These challenges include increased computational demands, statistical sparsity, and diminishing discrimination, all of which can lead to poorer model performance.
- Computational Complexity: As the number of features increases, computations become more demanding.
- Increased Data Demand: Higher dimensions require exponentially more data to maintain statistical significance.
- Model Overfitting: With many dimensions, the risk of learning noise instead of signal increases.
- Enhanced Model Performance: By focusing on key attributes, models can better capture the underlying structure, leading to improved predictions.
- Improved Model Interpretability: Reduced data dimensions often align more closely with real-world factors, making models easier to understand.
- Computational Efficiency: Operating in a reduced feature space accelerates model training and predictions.
- Data Visualization: Techniques like Principal Component Analysis (PCA) enable visual exploration of high-dimensional datasets.
-
Feature Selection: Picks the most informative attributes based on domain knowledge or statistical measures.
-
Feature Extraction: Generates new attributes that are combinations of the original ones.
- PCA: Linear transformation to a reduced feature space based on feature covariance.
- Linear Discriminant Analysis (LDA): Maximizes class separation.
- Non-negative Matrix Factorization (NMF): Suitable for non-negative data.
-
Manifold Learning: Designed for non-linear data structures and generally preserves intrinsic dimensionality.
-
Locally Linear Embedding (LLE): Focuses on maintaining local relationships.
-
t-Distributed Stochastic Neighbor Embedding (t-SNE): Optimized for data visualization, it best preserves local neighborhoods.
-
Principal Component Analysis (PCA) is a widely-used technique for dimensionality reduction.
It simplifies datasets by identifying and preserving the most influential characteristics, called principal components (PCs), while dropping the aspects that contribute little to the variance.
- Centralized Data: The mean is subtracted from each feature, centering the dataset.
- Covariance Matrix: This matrix, reflecting the relationships between features, is constructed.
- Eigenvectors and Eigenvalues: Linear transformations of the data result in these vectors, indicating the most critical directions of variance.
- Feature Selection: Based on eigenvalues, the most essential features are retained.
- Standardize the Data: This brings all features to the same scale, essential for PCA.
-
from sklearn.preprocessing import StandardScaler standardized_data = StandardScaler().fit_transform(data)
- Covariance Matrix Calculation: The dot product of standardized data with its transpose provides this:
-
import numpy as np covariance_matrix = np.cov(standardized_data.T)
-
Eigenvector and Eigenvalue Computation:
np.linalg.eig
can be used. -
Eigenvector Sorting: The key eigenvectors are selected, usually by discarding those with lower eigenvalues.
-
Feature Space Transformation: Data is projected onto the feature space described by the key eigenvectors.
-
transformed_data = standardized_data.dot(eigenvectors)
- Information Loss Estimation: By calculating the percentage of variance captured, one can assess the efficacy of dimensionality reduction:
-
variance_capture = (abs(selected_eigenvalues) / sum(abs(eigenvalues))) * 100
- Scatter Plot: Two principal components can be plotted to visualize data separability in lower dimensions.
- Variance Explained: A scree plot, displaying eigenvalues, provides insight into variance explained by each PC.
Here is the Python code:
from sklearn.decomposition import PCA
import numpy as np
data = # Your dataset
# Step 1: Standardize the Data
standardized_data = StandardScaler().fit_transform(data)
# Step 2: Calculate the Covariance Matrix
covariance_matrix = np.cov(standardized_data.T)
# Step 3 & 4: Compute Eigenvalues and Eigenvectors, and Select Principal Components
eigenvalues, eigenvectors = np.linalg.eig(covariance_matrix)
# Select the top k eigenvectors based on eigenvalues
# Step 5: Transform the Data to Feature Space
transformed_data = standardized_data.dot(eigenvectors)
# Step 6: Estimate Information Loss
variance_capture = (abs(selected_eigenvalues) / sum(abs(eigenvalues))) * 100
# Using scikit-learn for PCA
pca = PCA(n_components=2)
principal_components = pca.fit_transform(standardized_data)
In the scikit-learn example, pca.explained_variance_ratio_
provides the percentage of variance captured by each PC, aiding in the assessment of information loss.
- Variance Captured: The percentage of variance explained, akin to R-squared in linear regression.
- Modelling Performance: For supervised learning tasks, it's crucial to evaluate model performance post-dimensionality reduction.
11. Discuss the differences between feature extraction and feature selection in the context of high-dimensional data.
Feature extraction and feature selection are techniques that address the challenges posed by high-dimensional data. They reduce the dimensionality of datasets, making them more manageable and improving the overall quality of analysis and model performance.
Feature extraction involves transforming the original input data into a new, reduced feature space. The goal is to retain as much information as possible within a smaller number of features while reducing redundancy and noise.
-
Principal Component Analysis (PCA): Identifies orthogonal axes (principal components) that capture the most variance in the data. These components can be used as new features, and less important ones are discarded.
-
Linear Discriminant Analysis (LDA): Similar to PCA, LDA aims to maximize the separability between different classes. It's particularly useful in the context of supervised learning.
-
Non-Negative Matrix Factorization (NMF): Suitable for datasets with non-negative values, NMF decomposes the data matrix into two lower-dimensional matrices. These can be seen as representing the features and their coefficients.
Here is the Python code:
from sklearn.decomposition import PCA
# Let's assume X is your feature matrix
# Specify the number of components to keep
# (you can also specify an explained variance ratio)
pca = PCA(n_components=2)
# Apply the transformation to your data
X_pca = pca.fit_transform(X)
Feature selection, on the other hand, is the process of identifying and keeping only the most informative features from the original dataset.
-
Univariate Methods: Select features based on their individual performance (e.g., using statistical tests).
-
Model-Based Strategies: Use a model to identify the most important features and discard the others.
-
Greedy Methods: These are more computationally intense procedures that iteratively add or remove features to optimize some criterion.
Here is the Python code:
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import f_classif
# Let's assume X and y are your feature and target arrays
# Select the k best features using an F-test
selector = SelectKBest(score_func=f_classif, k=3)
X_new = selector.fit_transform(X, y)
12. Briefly describe the idea behind t-Distributed Stochastic Neighbor Embedding (t-SNE) and its application to high-dimensional data.
t-SNE (t-Distributed Stochastic Neighbor Embedding) is a non-linear dimensionality reduction technique designed for the visual exploration and analysis of high-dimensional data. It is especially adept at capturing structure and relationships within data that linear methods, such as PCA, can fail to represent effectively. While physiologically inspired, t-SNE is not a predictor, classification model, or featurizer. Instead it is used for data exploration and visualization.
t-SNE operates on the principle of preserving local neighborhoods in both high-dimensional and low-dimensional spaces. This is established using a Gaussian distribution for conditional probabilities in the high-dimensional space and a Student's t-distribution for low dimensions.
The t-SNE cost function entails minimizing the divergence between two distributions: one for pairwise similarities in the original space, and another for pairwise similarities in the embedded space. The minimized cost is the Kullback-Leibler (KL) divergence.
The Kullback-Leibler divergence, defined as:
quantifies the difference between two discrete probability distributions
The core contribution of the t-distribution is that it is heavier-tailed than the Gaussian distribution (as seen in the ratio of their variances):
where
This heavier tail reduces the chances of dissimilar pairs 'crowding out' similar pairs in the low-dimensional space, preserving global structures more faithfully.
- Perplexity: Balances emphasis on local vs. global structure. Recommend a value between 5 and 50, with lower values emphasizing local structure.
- Learning Rate: Serves as step size during optimization. Commonly set between 100 and 1000.
t-SNE visualizations reveal hidden structures in high-dimensional data by identifying clusters, outliers, and local structures that are tougher to discern in the original space. However, it's important to remember that these visual representations are for exploration only and not suitable for quantitative analyses or modeling.
Random Forests are generally robust to the Curse of Dimensionality, making them effective even with high-dimensional data.
-
Data Reduction: Individual decision trees within the forest can focus on different subsets of features, effectively reducing data's dimensionality.
-
Random Feature Selection: Each tree is trained on a random subset of features. While older methods have suggested
$\sqrt{d}$ features, the more common modern approach is to tune this number. -
Feature Importance: Random Forests' ability to rank features by importance can be leveraged to select relevant dimensions, mitigating the Curse of Dimensionality.
-
Inner Structure Learning: Beyond individual features, Random Forests can identify interaction patterns, making them valuable in high-dimensional data scenarios.
Here is the Python code:
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
# Load dataset
mnist = fetch_openml('mnist_784', version=1)
X, y = mnist.data, mnist.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train Random Forest with 100 trees
clf = RandomForestClassifier(n_estimators=100, random_state=42)
clf.fit(X_train, y_train)
# Evaluate on test data
y_pred = clf.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Test accuracy with Random Forest:", accuracy)
In the above code, we are using the Random Forest classifier to classify the MNIST dataset, which is a high-dimensional dataset with 784 dimensions (28x28 images).
Regularization techniques, such as
-
Multi-Variable Complexity: Regularization simplifies models by controlling the coefficients of many variables. This helps in high-dimensional datasets, reducing sensitivity to noise and increasing generalization.
-
Penalty Functions: Regularization uses penalty-based cost functions.
$L1$ adds the absolute sum of the coefficients, leading to sparsity, while$L2$ squares and sums them, allowing graded reduction. This feature helps in variable selection and continuous feature reduction, essential in high dimensions. -
Bias-Variance Trade-off: Regularization influences the balance between bias and variance, vital for managing model complexity.
Here is the Python code:
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn import datasets
# Load the iris dataset
iris = datasets.load_iris()
X = iris.data
y = iris.target
# Split the data into a training and test set
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
# Standardize features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
# Create the L1-regularized logistic regression object
l1_logreg = LogisticRegression(penalty='l1', solver='liblinear')
# Train the model
l1_logreg.fit(X_train, y_train)
# Print the coefficients
print(l1_logreg.coef_)
Manifold learning is a dimensionality reduction technique that assumes high-dimensional data can be represented by a lower-dimensional manifold within that space. Manifold learning provides a way to better understand the structure of complex or high-dimensional data by visualizing it in a more manageable space.
-
Mathematical Model: High-dimensional datasets are assumed to be embedded in a lower-dimensional manifold. The goal is to learn the properties of this manifold to perform effective reduction.
-
Curse of Dimensionality: Traditional distance metrics become less reliable as the number of dimensions increases. Manifold learning addresses this issue by focusing on local, rather than global, structures.
-
Important Concepts:
- Intrinsic Dimensionality: The number of dimensions required to represent data accurately within the manifold.
- Local Linearity: The notion that data points are locally linearly related to their neighbors on the manifold.
- Advantages:
- Suited for complex, nonlinear datasets.
- More effective in preserving local structures.
- Useful for visualization.
- Limitations:
- No explicit transformation matrix, which can limit interpretability.
- Complexity in handling different manifold structures.
-
Locally Linear Embedding (LLE): It seeks the lowest-dimensional representation of the data while preserving local relationships.
-
Isomap: Utilizes geodesic distances (distances measured along curved paths on the manifold) for mapping, which can be more accurate for non-linear structures.
-
t-distributed Stochastic Neighbor Embedding (t-SNE): Optimizes a similarity measure, with an emphasis on preserving local structures, making it a popular choice for high-dimensional visualization.
-
Uniform Manifold Approximation and Projection (UMAP): Balances speed with preserving both global and local structures.