VC: SNE and t-SNE, Who Is My Neighbour?

Published in

The Startup

7 min readSep 10, 2020

Many dimensionality reduction techniques attempt to preserve distances of the original data. However, it can be beneficial to focus on preserving the nearest neighbours for visualization. t-SNE[van der Maaten/Hinton 2008] abstracts away density and distance information. Since it preserves the neighbours, it often reveals the cluster structure more clearly than any other dimensionality reduction technique. t-SNE is really popular in many applications including life science.

Comparison

We are going to do many dimensionality reduction techniques for the same purpose to cluster the MNIST dataset.

Left: PCA, Right: ISOMAP [scikit-learn, Manifold learning on handwritten digits]

Left: MDS, Right: t-SNE [scikit-learn, Manifold learning on handwritten digits]

This images from the sci-kit-learn official guideline you can take a look more result. As you can see, t-SNE overwhelmingly perform well. It also shows the semantics of distances. The small cluster of 1, it has an underbar at the bottom, is closer to 2 than 1 because 2 has the same underbar. Now, you know why you should learn t-SNE.

VC: SNE and t-SNE, Who Is My Neighbour?

Comparison

SNE

Written by Jeheonpark