Locally Linear Embedding (LLE) | Data Mining and Machine Learning

Mihir
Analytics Vidhya
Published in
5 min readOct 10, 2019
Photo by Ashley Jurius on Unsplash

Read without paywall

Locally Linear Embedding (LLE) is a method of Non Linear Dimensionality reduction proposed by Sam T. Roweis and Lawrence K. Saul in 2000 in their paper titled “Nonlinear Dimensionality Reduction by Locally Linear Embedding”. This article is based on multiple sources mentioned in the references section. The project by Jennifer Chu helped me understand LLE better.

Machine Learning algorithms use the features they are trained on to predict the output. For example, in the case of a house price prediction problem, there might be a number of features like the size of the house, number of bedrooms, number of bathrooms, etc. which are trained using some machine learning model to try to predict the house price as accurately as possible. One major problem many machine learning algorithms face while doing this is that of overfitting, where the model fits the training data so well that it is unable to predict the real life test data accurately. This is a problem since it makes the algorithm very effective.

Dimensionality reduction helps reduce the complexity of the machine learning model helping reduce overfitting to an extent. This happens because the more features we use, the more complex a model gets, and this may cause the model to fit the data too well, causing overfitting. Features that do not help decide the output label also may be used, which may not help in real life. For example, in the house price prediction problem, we may have a feature like the age of the seller, which may not affect the house price in any way. Dimensionality reduction helps us keep the more important features in the feature set,reducing the number of features required to predict the output.

Dimensionality Reduction. Source : https://www.geeksforgeeks.org/dimensionality-reduction/

Locally Linear Embedding (LLE)

Data sets can often be represented in a n-Dimensional feature space, with each dimension used for a specific feature.

The LLE algorithm is an unsupervised method for dimensionality reduction. It tries to reduce these n-Dimensions while trying to preserve the geometric features of the original non-linear feature structure. For example, in the below illustration, we cast the structure of the swiss roll into a lower dimensional plane, while maintaining its geometric structure.

In short, if we have D dimensions for data X1, we try to reduce X1 to X2 in a feature space with d dimensions.

LLE first finds the k-nearest neighbors of the points. Then, it approximates each data vector as a weighted linear combination of its k-nearest neighbors. Finally, it computes the weights that best reconstruct the vectors from its neighbors, then produce the low-dimensional vectors best reconstructed by these weights [6].

  1. Finding the K nearest neighbours.
    One advantage of the LLE algorithm is that there is only one parameter to tune, which is the value of K, or the number of nearest neighbours to consider as part of a cluster. If K is chosen to be too small or too large, it will not be able to accomodate the geometry of the original data.
    Here, for each data point that we have we compute the K nearest neighbours.
  2. We do a weighted aggregation of the neighbours of each point to construct a new point. We try to minimize the cost function, where j’th nearest neighbour for point Xi.

3. Now we define the new vector space Y such that we minimize the cost for Y as the new points.

Source

A detailed algorithm pseudocode for this algorithm can be found here.

Advantages of LLE over other Dimensionality Reduction Algorithms

  1. Consideration of the non-linearity of the structure
    LLE goes beyond density modeling techniques such as local PCA or mixtures of factor analyzers. Density models do not provide a consistent set of global coordinates which embed the observations across the entire manifold; thus they cannot be used, for example, to visualize low dimensional projections of the original dataset. These are able to detect only linear features as shown in the image below. It does a bad job at detecting the curved pattern underneath, which LLE is able to detect.
    Similarly, other methods like Kernel PCA, Isomap are also unable to detect the features which are detected by LLE.
    In the below images we observe that the neighbourhoods of the local points have been preserved by LLE, but not by the other algorithms.
From left : Swiss Roll features in 3 dimensions, Swiss Roll example using PCA Source
From left : Swiss Roll features using Kernel PCA, Swiss Roll features using LLE

2. Better computational time
Since LLE tends to accumulate sparse matrices, it is more efficient than the other algorithms in terms of computational space and time.

--

--

Mihir
Analytics Vidhya

Master of Computing Student at National University of Singapore | mihirkhandekar.github.io