Published in

The Startup

Review — SNE: Stochastic Neighbor Embedding (Data Visualization)

High-Dimensional Data Mapping to Low-Dimensional Space

• A probabilistic approach, by pairwise dissimilarity, using Gaussian, is used to map the high-dimensional data distribution on the low-dimensional space, e.g. 2D space, for data visualization.
• Cost function, a sum of Kullback-Leibler divergences, is minimized using gradient descent.

Outline

1. Basic Stochastic Neighbor Embedding (SNE)
2. Experimental Results

1. Basic Stochastic Neighbor Embedding (SNE)

• For each object, i, and each potential neighbor, j, we start by computing the asymmetric probability pij, that i would pick j as its neighbor:
• The dissimilarities, dij², can be computed as the scaled squared Euclidean distance between two high-dimensional points, xi, xj:
• where σi is either set by hand or by binary searching.
• In the low-dimensional space, Gaussian neighborhoods are also used but with a fixed variance:
• The calculation of qij is similar to that of pij but qij is in low-dimensional space, and yi, yj, yk are those data points in low-dimensional space.
• The aim of the embedding/mapping is to match these two distributions as well as possible.
• This is achieved by minimizing a cost function which is a sum of Kullback-Leibler divergences between the original (pij) and induced (qij) distributions over neighbors for each object:
• So, there is a cost for modeling a big distance in the high-dimensional space with a small distance in the low-dimensional space.
• Differentiating C, the result is:
• which has the nice interpretation of a sum of forces pulling yi toward yj or pushing it away depending on whether j is observed to be a neighbor more or less often than desired.
• The embedding is initialized by putting all the low-dimensional images in random locations very close to the origin.

2. Experimental Results

2.1. Images of Digits

• A set of 3000 digit bitmaps from the UPS database, with 600 examples from each of the five classes 0,1,2,3,4, is used.
• It is important to note that SNE was given no information about class labels.
• As shown in figure above, it quite cleanly separates the digit groups.

2.2. Word-Author Counts

• Each of the 676 authors who published more than one paper in NIPS vols.
• We can see Prof. LeCun and Prof. Hinton at the bottom of the figure.
• SNE seems to have grouped authors by broad NIPS field: generative models, support vector machines, neuroscience, reinforcement learning and VLSI all have distinguishable localized regions.

[SNE]