ISOMAP as a Dimensionality Reduction Technique

Betul Mescioglu
4 min readDec 17, 2022

--

By Betul Mescioglu

The need for dimensionality reduction and the problems associated with it:

When we deal with data with numerous features, we cannot extract meaningful relevant structures of the data that is hidden in high dimensions. To solve this problem, we need to transform the data from its higher dimension to a lower dimension where human mind is much more adept at understanding the structures through visualizations or by using other tools. Unfortunately, not all data structures can be fully represented in the lower dimension. For example, Principal Component Analysis (PCA) and Multi-Dimensional Scaling (MDS) methods do not always carry the nonlinearity of the data to the lower dimensions. Even for the algorithms which can carry nonlinearity, we run into a problem of calculating the distance between the data points when we are dealing with data manifolds. Calculating the Euclidean distance between the data points do not reflect the true topology of the manifold.

What is Isomap?

A method called Isomap solves this problem by calculating the distance between data points by creating a path which connects all in-between data points. This distance is called geodesic path. I would like to liken this to a children’s playground spiral slide. For a child, who is on top of the slide to reach to ground, there are two ways. They can either jump to the ground, or they can slide down. Jumping down to the ground traverses the Euclidean distance, whereas sliding down traverses the geodesic path. This can be illustrated as follows:

In figure A the dashed line shows the Euclidean distance, and the blue solid line shows the geodesic path. When we unravel this roll (bringing it to two-dimension), we see that the data points are actually farther away from each other than what the Euclidean distance would make us believe in three-dimension. This geodesic distance actually reflects the true low-dimensional geometry of the manifold, but PCAs and MDS’ use the Euclidean distance, thus failing to capture intrinsic two-dimensionality.

How does Isomap work?

Isomap combines the major algorithmic features of PCA and MDS; computational efficiency, global optimality, and asymptotic convergence guarantees. On top of these, they provide the ability to learn the features of nonlinear manifolds. They are built on MDS algorithm, but, as mentioned above, they also estimate the geodesic distance between faraway points using only input space distances.

The three steps of the algorithm achieving this are as follows:

1) Determine which points are neighbors in the manifold (M), based on the distances between pairs of points in the input space. This is a good approximation to geodesic distance for neighboring points. Two methods can be used calculating this:

· Calculate the distances from a certain data point to all data points within a fixed radius.

· Calculate the distances from a certain data point to all K nearest neighbors.

Eventually a weighted graph is obtained where vertices represent the data points and edges represent the weights (distances between the data points).

2) Once the graph is obtained it is updated as follows:

· If two points (i, j) are connecting, do not change anything

· If not, go through all data points (k=1,2,3,…, N), and for each data point, calculate the distance from i to k and k to j. Add these two distances. Find the data point (k) that provides the shortest distance from i to j (i+k+j+k). This distance is entered into the graph (matrix). The final matrix will contain the shortest path distances between all pairs of points.

3) The final step reduces the dimensions by applying classical MDS to the matrix obtained in step 2. But, in this case, transforming data to d-dimensional Euclidean space is done by preserving the intrinsic geometric features of the manifold.

Conclusion

The residual variance stabilizes at lower dimensions than the PCA or MDS, which means this algorithm captures more of the statistical features of the data at lower dimensions (2D or 3D) than PCA or MDS does. On top of that, it works for nonlinear data too. But I wonder if this algorithm would work well if the data were too sparse. I also wonder if the swiss-roll were too tightly coiled, would the algorithm start getting confused as to which data point belongs to what surface? The creators of the algorithm say we would need an impractical sample size to overcome these problems but in general it still converges.

Note: This is a summary of the paper “A Global Geometric Framework for Nonlinear Dimensionality Reduction” published by Joshua B. Tenenbaum, Vin de Silva, John C. Langford. The imagery was taken from that paper as well.

--

--

Betul Mescioglu

M.S. in Data Science, B.S and M.S. in Electrical Engineering (Digital Signal Processing)