The Startup
Published in

The Startup

Review — SNE: Stochastic Neighbor Embedding (Data Visualization)

High-Dimensional Data Mapping to Low-Dimensional Space

  • A probabilistic approach, by pairwise dissimilarity, using Gaussian, is used to map the high-dimensional data distribution on the low-dimensional space, e.g. 2D space, for data visualization.
  • Cost function, a sum of Kullback-Leibler divergences, is minimized using gradient descent.


  1. Basic Stochastic Neighbor Embedding (SNE)
  2. Experimental Results

1. Basic Stochastic Neighbor Embedding (SNE)

  • For each object, i, and each potential neighbor, j, we start by computing the asymmetric probability pij, that i would pick j as its neighbor:
  • The dissimilarities, dij², can be computed as the scaled squared Euclidean distance between two high-dimensional points, xi, xj:
  • where σi is either set by hand or by binary searching.
  • In the low-dimensional space, Gaussian neighborhoods are also used but with a fixed variance:
  • The calculation of qij is similar to that of pij but qij is in low-dimensional space, and yi, yj, yk are those data points in low-dimensional space.
  • The aim of the embedding/mapping is to match these two distributions as well as possible.
  • This is achieved by minimizing a cost function which is a sum of Kullback-Leibler divergences between the original (pij) and induced (qij) distributions over neighbors for each object:
  • So, there is a cost for modeling a big distance in the high-dimensional space with a small distance in the low-dimensional space.
  • Differentiating C, the result is:
  • which has the nice interpretation of a sum of forces pulling yi toward yj or pushing it away depending on whether j is observed to be a neighbor more or less often than desired.
  • The embedding is initialized by putting all the low-dimensional images in random locations very close to the origin.

2. Experimental Results

2.1. Images of Digits

The result of running the SNE algorithm on 3000 256-dimensional grayscale images of handwritten digits
  • A set of 3000 digit bitmaps from the UPS database, with 600 examples from each of the five classes 0,1,2,3,4, is used.
  • It is important to note that SNE was given no information about class labels.
  • As shown in figure above, it quite cleanly separates the digit groups.

2.2. Word-Author Counts

Embedding of NIPS authors into two dimensions
  • Each of the 676 authors who published more than one paper in NIPS vols.
  • We can see Prof. LeCun and Prof. Hinton at the bottom of the figure.
  • SNE seems to have grouped authors by broad NIPS field: generative models, support vector machines, neuroscience, reinforcement learning and VLSI all have distinguishable localized regions.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store