VC: SNE and t-SNE, Who Is My Neighbour?

Jeheonpark
The Startup
Published in
7 min readSep 10, 2020

--

Many dimensionality reduction techniques attempt to preserve distances of the original data. However, it can be beneficial to focus on preserving the nearest neighbours for visualization. t-SNE[van der Maaten/Hinton 2008] abstracts away density and distance information. Since it preserves the neighbours, it often reveals the cluster structure more clearly than any other dimensionality reduction technique. t-SNE is really popular in many applications including life science.

Comparison

We are going to do many dimensionality reduction techniques for the same purpose to cluster the MNIST dataset.

MNIST Dataset
Left: PCA, Right: ISOMAP [scikit-learn, Manifold learning on handwritten digits]
Left: MDS, Right: t-SNE [scikit-learn, Manifold learning on handwritten digits]

This images from the sci-kit-learn official guideline you can take a look more result. As you can see, t-SNE overwhelmingly perform well. It also shows the semantics of distances. The small cluster of 1, it has an underbar at the bottom, is closer to 2 than 1 because 2 has the same underbar. Now, you know why you should learn t-SNE.

SNE

--

--

Jeheonpark
The Startup

Jeheon Park, Software Engineer at Kakao in South Korea